Hello Jh963,
I pasted your data into an Excel spreadsheet, generated a pair of
functions & compared the result of those functions with the built in
MEDIAN function. These pass all your sample data without problem for
the even cases & when removing the last value from each sample does
the same for odd cases. The functions are included at the end of the
answer for reference / as "pseudo code" for the solution. Every place
that INT is used in that code - be sure to use integer arithmetic (or
truncate floating point values) to get the right result.
Let's explain a little bit what the functions do and how they do it.
The CountRange function acts as your "counter". The first parameter is
the set of values, the second is the limit set, and the third should
be True for counter C1, and False for C2. The return value is the
number of values that match the counter condition / limit.
The DoMedian function has two main cases as described before, the
"odd" and the "even" case. The odd case is much simpler - let's
describe it first.
The loop is initialized to start at the middle of the range of
possible values. The loop then repeats until the values of the
counters are both greater or equal to LCount ((Count+1) / 2). The top
of the loop passes the data pass both counters (calling CountRange for
each), setting C1 and C2 respectively. The IF statement that follows
allows an early exit if we stumble onto the result "early". The
sequence of IF statements that follow have three conditions:
- IF LDelta is more than one, divide the range by 2 & move up/down
based on the relative values of C1 and LCount.
- the second condition is used is LDelta is one & this is the second
time we've gotten to this point. If that's the case, return the
current index (CurL). [note - my code did not trigger this condition
when I set a breakpoint there, but I needed this for a limit on the
even side]
- the third condition means that LDelta is one & this is the first
time we've gotten here. If that's the case, we try "one more time" and
move the index in the proper direction. This is necessary for a corner
condition at the extreme ranges of possible values (e.g., if the
median is 7).
Now, for the even case, the coding is similar but handles the index
for C1 and C2 separately. It also has to remember the highest / lowest
match values to determine if an index is acceptable. I believe I
misstated the conditions previously - we need to maximize the index
for C1 and minimize the index for C2. So, after setting up the loop
(with a few additional variables), The loop again streams the data
against the counters C1 and C2. The first IF statement checks C1
against the limit & based on that computes the new index for C1,
remembering a "match" index if this is the maximum seen so far. The
second IF statement does the same for C2, remembering the match to
minimize seen so far. The termination condition is similar to the end
case for the odd code - adjusting the index delta to 1/2 each time
until one is reached. When one is reached the first time, go ahead and
do that. The second time the delta is one, we now know we've generated
a solution - and the average of the two indices is returned as the
function value.
That's it. As you noted, there are some issues with the careful design
and implementation of a binary search. It is sometimes difficult to
handle the edge cases (e.g., zero and your limit) - the code has to be
carefully constructed to start at the right location in your data.
If you have any difficulty understanding the answer or if the answer
is incomplete, please make a clarification request and I would be glad
to provide further information.
Good luck with your work.
--Maniac
Function DoMedian(X As Range, Limit) As Double
'
' Function DoMedian determines the median of a set of values based
' using a pair of "counters" that count values within X as being above
' or below some limit. The limit to this function is the maximum value
' of the values expected (zero is the minimum).
' Created by maniac on October 7, 2006
'
DoMedian = 0
XCount = X.Count
LDelta = Int((Limit + 1) / 2)
SeenOne = False
' Two different algorithms, simple for odd number of samples,
more complex if even
If XCount = (Int(XCount / 2) * 2) Then
' Even case
' Remember how many samples for the counters
LCount = LDelta
CurL1 = LDelta
CurL2 = LDelta
MatchL1 = -1
MatchL2 = Limit + 1
LDelta = Int((LDelta + 1) / 2)
Do
' We're trying to maximize CurL1 and minimize CurL2
' where they satisfy C1 (or C2) >= LCount
C1 = CountRange(X, CurL1, True)
C2 = CountRange(X, CurL2, False)
' Have we found the "right" values?
If (C1 >= LCount) Then
' OK, we have a possible match to maximize
If (CurL1 > MatchL1) Then
MatchL1 = CurL1
End If
CurL1 = CurL1 + LDelta
Else
CurL1 = CurL1 - LDelta
End If
If (C2 >= LCount) Then
' OK, we have a possible match to minimize
If (CurL2 < MatchL2) Then
MatchL2 = CurL2
End If
CurL2 = CurL2 - LDelta
Else
CurL2 = CurL2 + LDelta
End If
If (LDelta > 1) Then
LDelta = Int((1 + LDelta) / 2)
ElseIf SeenOne Then
' All done, return the result
DoMedian = (MatchL1 + MatchL2) / 2
Exit Do
Else
SeenOne = True
End If
Loop
Else
' Odd case
' remember how many samples for the counters
LCount = LDelta
CurL = LDelta - 1
Do
C1 = CountRange(X, CurL, True)
C2 = CountRange(X, CurL, False)
' Have we found the "right" value?
If ((C1 >= LCount) And (C2 >= LCount)) Then
DoMedian = CurL
Exit Do
End If
If LDelta > 1 Then
LDelta = Int((1 + LDelta) / 2)
If (C1 < LCount) Then
' Need to go down
CurL = CurL - LDelta
Else
' Need to go up
CurL = CurL + LDelta
End If
ElseIf SeenOne Then
DoMedian = CurL
Exit Do
Else
SeenOne = True
If (C1 < LCount) Then
' Need to go down
CurL = CurL - LDelta
Else
' Need to go up
CurL = CurL + LDelta
End If
End If
Loop
End If
End Function
Function CountRange(X As Range, Limit, GE As Boolean) As Integer
Count = 0
If GE Then
For Each C In X
If C.Value >= Limit Then
Count = Count + 1
End If
Next C
Else
For Each C In X
If C.Value <= Limit Then
Count = Count + 1
End If
Next C
End If
CountRange = Count
End Function |