With the values you provided and following the excellent guide at
the huffman coding I came up with is:
I build up the tree staring with the least frequent values as follows:
Note: (A , B) refers to the one / zero bit values for the left / right
branches of the tree.
(X6, X7), total frequency 0.020
(X5, (X6 , X7)), total frequency 0.050
(X4, (X5, (X6, X7))), total frequency 0.150
(X3, (X4, (X5, (X6, X7)))), total frequency 0.350
at this point, there are three "trees" in the forest remaining, the
one above at frequency 0.350, X1 at 0.350 and X2 at 0.300. I chose to
match the big tree with X2 but matching X2 with X1 works as well. With
my choice, I get
(X2, (X3, (X4, (X5, (X6, X7))))), total frequency 0.650
((X2, (X3, (X4, (X5, (X6, X7))))), X1), total frequency 1.000
Walking the tree (left is zero, right is one) provides the encodings listed above.
According to the source code referred to at
or more directly at
the calculation of entropy is
sum (-p(i) * ln2 (p(i)))
where p(i) is the probability for each symbol (1 through 7 in this
case). Using that formula I get
entropy = 2.128638214
which is close to the computed "average number of bits"
sum ( p(i) * lenght of encoded symbol i)
average bits = 2.22
I found this information using the search phrase
and following the most promising references.
If this is some homework assignment - I strongly suggest you check the
result I provided carefully. The algorithm I referred to may not be
exactly the same as expected by your professor and I assumed the term
"upper branch" refers to the branch with higher probability and "lower
branch" refers to the branch with lower probability.
If you need some additional information or if you find the answer
unclear, please make a clarification request prior to your deadline so
I can provide some further explanation.