 View Question
Q: Calculate the median using only counters ( Answered ,   0 Comments ) Question
 Subject: Calculate the median using only counters Category: Computers > Algorithms Asked by: jh963-ga List Price: \$20.00 Posted: 06 Oct 2006 13:20 PDT Expires: 05 Nov 2006 12:20 PST Question ID: 771372
 ```Construct an algorithm to determine the median of a set of integers. You are given: The number of integers in the set (which may be even or odd). The range of the integers (from 0 to 2^12 - 1) Counter 1, which counts all values <= an integer limit value you set Counter 2, which counts all values >= an integer limit value you set (The limit values can be different on the 2 counters.) You can stream the entire set of integers past the counters any number of times (although, obviously, fewer times is better). After each streaming, you can read the counter values and adjust the limit values. One streaming will increment the values in both counters. You cannot: * Change any values in the stream (e.g., to sort the numbers) * Stream only part of the set past the counters. * Determine the index in the stream of any particular value. * Read the values off the stream into memory and manipulate them there. Each time you stream the integers past the counters takes 5 milliseconds. An excellent solution will take less than 100 milliseconds to compute. An acceptable solution can take up to 200 milliseconds. For your testing, consider the following abbreviated data sets with a sample size of 8 and range of values from 0 to 7: Sample 1 6 2 2 7 5 1 6 5 Sample 2 4 7 3 3 2 1 6 1 Sample 3 5 2 7 2 4 1 7 2 Sample 4 1 6 6 5 6 3 2 7 Sample 5 7 2 7 6 6 6 2 5 Sample 6 1 0 2 6 5 4 3 7 Sample 7 0 0 0 0 0 0 0 0 Sample 8 1 1 1 1 1 1 1 1 Sample 9 4 4 4 4 4 4 4 4 Sample 10 2 7 7 2 2 7 7 2 Your algorithm must correctly calculate the median value. If the sample size is odd, the median is the (N/2 + 1)'th element (if the sequence were sorted). If the sample size is even, the median is defined as the AVERAGE of the N/2 and N/2 + 1 element. No, this isn't a homework problem. We have a custom piece of hardware that can stream a data set past the 2 counters. We need to be able to calculate the median of this data set. J.``` Request for Question Clarification by maniac-ga on 07 Oct 2006 09:52 PDT ```Hello Jh963, Are there any additional constraints that should be accounted for? For example, can you [or NOT] remember the results of the counters between streams? [this would make the algorithm REALLY hard] Do you know the total number of integers that will be streamed before the first run? [if false, use the first stream with C1<=max and/or C2>=0 to determine the total count - adding one to the number of streams required] Also - you don't indicate how you want the algorithm represented - would something like the explanation below be acceptable or do you need a specific format for the algorithm? Let me outline a solution and see if it meets your needs. The constraints listed include: - an "excellent solution" will have at most 19 cycles of the counters - an "acceptable" solution will have at most 40 cycles of the counters - the solution must accomodate both odd & even values - the values are seen only by the counters (and not available for other processing) In addition, I believe we should also assume that - the data can be sparse (one or more values can be omitted) For the odd case, a relatively simple approach is to perform a binary search on the 2^12 possible values to locate the median value. That require the data to be streamed across the counters a dozen times or less. You could set both counter 1 and 2 to the same limit value & stop when the counters of both exceed (N/2)+1. For the even case, its necessary to find both the N/2 and (N/2)+1 values. We can apply a method similar to the odd case, to find one (say the N/2 value), but then we need to find the (N/2)+1 value within 8 streams which will work in most cases - but not all. As an alternative, do a pair of parallel binary searches to locate the N/2 and (N/2)+1 values at the same time. For example, use C1 & search for the minimum value which has the counter >= N/2 C2 & search for the maximum value which has the counter >= N/2 If the counters reach the same value - that's the median. If not, use the average calculation to determine the median. This latter solution should find the solution within the same dozen streams as the odd solution. Please reply with clarifications to assumptions I've referred to and if you need more detail in the algorithm to complete the answer. --Maniac``` Clarification of Question by jh963-ga on 07 Oct 2006 11:45 PDT ```I'll try to answer your questions: Yes, you know beforehand how many integers are in the stream. Consider it an argument passed in to the function. Yes, you can remember the counter results between streaming the integers. I'd like the algorithm represented as pseudo code or C code. Something like: counter1_limit = 0 counter2_limit = 0xffff stream_integers(); if counter1_value > X ... You get the idea. Yes, as my sample data sets show, the data can be "sparse". In particular, see samples 7, 8, 9, and 10. The difficulty in this problem is in the actual design of the binary search. I'd like to see the pseudo code solution, and you should have run all of the sample sets through the algorithm to ensure your algorithm works. J.``` Answer
 Subject: Re: Calculate the median using only counters Answered By: maniac-ga on 07 Oct 2006 16:37 PDT Rated: ```Hello Jh963, I pasted your data into an Excel spreadsheet, generated a pair of functions & compared the result of those functions with the built in MEDIAN function. These pass all your sample data without problem for the even cases & when removing the last value from each sample does the same for odd cases. The functions are included at the end of the answer for reference / as "pseudo code" for the solution. Every place that INT is used in that code - be sure to use integer arithmetic (or truncate floating point values) to get the right result. Let's explain a little bit what the functions do and how they do it. The CountRange function acts as your "counter". The first parameter is the set of values, the second is the limit set, and the third should be True for counter C1, and False for C2. The return value is the number of values that match the counter condition / limit. The DoMedian function has two main cases as described before, the "odd" and the "even" case. The odd case is much simpler - let's describe it first. The loop is initialized to start at the middle of the range of possible values. The loop then repeats until the values of the counters are both greater or equal to LCount ((Count+1) / 2). The top of the loop passes the data pass both counters (calling CountRange for each), setting C1 and C2 respectively. The IF statement that follows allows an early exit if we stumble onto the result "early". The sequence of IF statements that follow have three conditions: - IF LDelta is more than one, divide the range by 2 & move up/down based on the relative values of C1 and LCount. - the second condition is used is LDelta is one & this is the second time we've gotten to this point. If that's the case, return the current index (CurL). [note - my code did not trigger this condition when I set a breakpoint there, but I needed this for a limit on the even side] - the third condition means that LDelta is one & this is the first time we've gotten here. If that's the case, we try "one more time" and move the index in the proper direction. This is necessary for a corner condition at the extreme ranges of possible values (e.g., if the median is 7). Now, for the even case, the coding is similar but handles the index for C1 and C2 separately. It also has to remember the highest / lowest match values to determine if an index is acceptable. I believe I misstated the conditions previously - we need to maximize the index for C1 and minimize the index for C2. So, after setting up the loop (with a few additional variables), The loop again streams the data against the counters C1 and C2. The first IF statement checks C1 against the limit & based on that computes the new index for C1, remembering a "match" index if this is the maximum seen so far. The second IF statement does the same for C2, remembering the match to minimize seen so far. The termination condition is similar to the end case for the odd code - adjusting the index delta to 1/2 each time until one is reached. When one is reached the first time, go ahead and do that. The second time the delta is one, we now know we've generated a solution - and the average of the two indices is returned as the function value. That's it. As you noted, there are some issues with the careful design and implementation of a binary search. It is sometimes difficult to handle the edge cases (e.g., zero and your limit) - the code has to be carefully constructed to start at the right location in your data. If you have any difficulty understanding the answer or if the answer is incomplete, please make a clarification request and I would be glad to provide further information. Good luck with your work. --Maniac Function DoMedian(X As Range, Limit) As Double ' ' Function DoMedian determines the median of a set of values based ' using a pair of "counters" that count values within X as being above ' or below some limit. The limit to this function is the maximum value ' of the values expected (zero is the minimum). ' Created by maniac on October 7, 2006 ' DoMedian = 0 XCount = X.Count LDelta = Int((Limit + 1) / 2) SeenOne = False ' Two different algorithms, simple for odd number of samples, more complex if even If XCount = (Int(XCount / 2) * 2) Then ' Even case ' Remember how many samples for the counters LCount = LDelta CurL1 = LDelta CurL2 = LDelta MatchL1 = -1 MatchL2 = Limit + 1 LDelta = Int((LDelta + 1) / 2) Do ' We're trying to maximize CurL1 and minimize CurL2 ' where they satisfy C1 (or C2) >= LCount C1 = CountRange(X, CurL1, True) C2 = CountRange(X, CurL2, False) ' Have we found the "right" values? If (C1 >= LCount) Then ' OK, we have a possible match to maximize If (CurL1 > MatchL1) Then MatchL1 = CurL1 End If CurL1 = CurL1 + LDelta Else CurL1 = CurL1 - LDelta End If If (C2 >= LCount) Then ' OK, we have a possible match to minimize If (CurL2 < MatchL2) Then MatchL2 = CurL2 End If CurL2 = CurL2 - LDelta Else CurL2 = CurL2 + LDelta End If If (LDelta > 1) Then LDelta = Int((1 + LDelta) / 2) ElseIf SeenOne Then ' All done, return the result DoMedian = (MatchL1 + MatchL2) / 2 Exit Do Else SeenOne = True End If Loop Else ' Odd case ' remember how many samples for the counters LCount = LDelta CurL = LDelta - 1 Do C1 = CountRange(X, CurL, True) C2 = CountRange(X, CurL, False) ' Have we found the "right" value? If ((C1 >= LCount) And (C2 >= LCount)) Then DoMedian = CurL Exit Do End If If LDelta > 1 Then LDelta = Int((1 + LDelta) / 2) If (C1 < LCount) Then ' Need to go down CurL = CurL - LDelta Else ' Need to go up CurL = CurL + LDelta End If ElseIf SeenOne Then DoMedian = CurL Exit Do Else SeenOne = True If (C1 < LCount) Then ' Need to go down CurL = CurL - LDelta Else ' Need to go up CurL = CurL + LDelta End If End If Loop End If End Function Function CountRange(X As Range, Limit, GE As Boolean) As Integer Count = 0 If GE Then For Each C In X If C.Value >= Limit Then Count = Count + 1 End If Next C Else For Each C In X If C.Value <= Limit Then Count = Count + 1 End If Next C End If CountRange = Count End Function``` Request for Answer Clarification by jh963-ga on 09 Oct 2006 10:59 PDT ```I converted your code to C code and ran my 10 samples: they worked. I then tried this data set: 2061 2106 1537 2230 1962 1024 2075 616 2522 487 2630 3205 3299 2795 1568 3209 2404 2559 3166 2976 488 3409 261 524 I pass in this set of 24 integers and pass in a limit of 4095. I think the median should be 2168. The answer I get from the algorithm is 2047.5. It's possible that I made a mistake in my translation to C. When I stepped through your code, however, I saw CurL1 and CurL2 DIVERGING. One getting closer to 0 and 1 getting closer to 4096. There may be a problem with your code when the limit size is not equal to the sample size. Could you try this data set in your test code and see what result you get? Thanks, J.``` Clarification of Answer by maniac-ga on 10 Oct 2006 08:29 PDT ```Hello Jh963, Hmm. Not enough diversity on the test cases. Please change the value of LCount to LCount = Int((XCount + 1) / 2) near the top of each case (even & odd) [note: the +1 is not needed on the even case]. Its coincidental that I used 7 or 8 samples (with a range of 0-7) for all my testing so the value set to LCount was always correct. As you noted - the big difference in the number of samples and range triggered the bug. As a side note - you may get some diversion between CurL1 and CurL2 in the even case as they try to find the N/2 and N+1/2 samples. They should go to the limits only if the data drives them there (e.g., alternating 0, 4095). Oops - I found another bug. Add SeenTwo = False near the top & revise the ElseIF SeenOne case to read ElseIf SeenTwo Then ' All done, return the result DoMedian = (MatchL1 + MatchL2) / 2 Exit Do ElseIf SeenOne Then ' In even case, make sure we get to the minimum value (0) SeenTwo = True The even case was stopping one "too soon" in my sample testing, this tries "one more time" to force CurL1 and CurL2 to the limits if needed. The odd case doesn't need this extra pass (checked that too - can return both 0 & 4095). Please let me know if you have any further problems - I am glad to help. --Maniac``` Request for Answer Clarification by jh963-ga on 10 Oct 2006 10:13 PDT ```Ok, it's better but I'm still getting some failures. I created a test loop that generates a random number of sample elements (between 1 and 128). I then ran this loop 64 times and got errors on all of the odd sample size cases. Below are a few representative examples of the failures (I chose the smaller sample sizes for brevity here). "Median = XXX" is result from the DoMedian routine. I then calculate the median myself by sorting the sample. BTW "s/b" == "should be". J. Sample (size 7): 709 1097 1142 1447 1977 3389 4016 Median = 1535.000000 ******** Error: median s/b 1447.000000 but is 1535.000000. Sample (size 21): 29 331 598 707 746 995 1264 1514 1774 1785 2029 2035 2791 2916 2997 3261 3523 3548 3567 3817 3916 Median = 1791.000000 ******** Error: median s/b 2029.000000 but is 1791.000000. Sample (size 9): 886 1000 1005 1334 2260 2792 3775 3780 4087 Median = 2047.00 0000 ******** Error: median s/b 2260.000000 but is 2047.000000. Sample (size 17): 138 322 806 1011 1080 1427 1567 1595 1756 1882 1910 2614 2867 3188 3353 3621 3818 Median = 1791.000000 ******** Error: median s/b 1756.000000 but is 1791.000000.``` Request for Answer Clarification by jh963-ga on 10 Oct 2006 10:16 PDT ```OOOPPPPPSSSS! That was my mistake, not yours. I misread your last message and thought it said the "+ 1" was not needed on the ODD case, not the even case. I fixed that and the algorithm seems to be working now. J.``` Clarification of Answer by maniac-ga on 10 Oct 2006 11:16 PDT ```Hello Jh963, Thank you for the kind words and I am glad everthing has worked out for your problem. --Maniac```
 jh963-ga rated this answer: and gave an additional tip of: \$20.00 `Great extra effort on this harder-than-it-looks problem.` Comments
 There are no comments at this time.
 Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service. If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for Google Home - Answers FAQ - Terms of Service - Privacy Policy