Google Answers Logo
View Question
 
Q: Least number of combinations with the least redundency ( Answered 5 out of 5 stars,   2 Comments )
Question  
Subject: Least number of combinations with the least redundency
Category: Science > Math
Asked by: digitalperk-ga
List Price: $25.00
Posted: 28 Jun 2004 08:47 PDT
Expires: 28 Jul 2004 08:47 PDT
Question ID: 367273
This is similar to the social golfers problem. I have N nodes that
need to be assigned into groups. There can't be more then 20 nodes to
a group. A node can be assigned to an unlimited number of groups and
there can be an unlimited number of groups. The constraint is that
every node has to be in some group with every other node so that all
nodes have at least one link with every other node. A node has a link
with another node if they are in the same group. The ideal solution is
where a node is only in one group with any other node so there is only
one link between them. This is like the golfer will only play another
golfer only once constraint. I want to achieve full "socialization" of
the nodes so that a group will introduce maximum number of new links
that were not established in some other group. N will be around 60 to
80. It will be great if there is some algorithm that will give me a
good solution for different values of N and the max number of nodes in
a group.

Please help!!

Request for Question Clarification by mathtalk-ga on 28 Jun 2004 15:56 PDT
Hi, digitalperk-ga:

As with many combinatorial problems, there may be constructions to be
found here, but the optimal solution may be difficult to find.

But let's try to be a little clearer about what makes a solution
optimal.  The rules would allow for every pair of nodes to have their
own group, which satisfies this goal:

"The ideal solution is where a node is only in one group with any
other node so there is only one link between them."

Also, each group would "introduce" the one and only link "not
established in some other group," so in one sense it would satisfy
your "maximum number" condition (you may mean maximum in some other
context, of course).

It seems to me a natural "metric" for the solutions would be the
number of groups used and a goal of minimizing the number of groups.  
Given two solutions with an equal number of groups, perhaps one should
prefer that which has the smallest maximum group size (tending to
force groups to have equal sizes as much as possible).

What about applying these sorts of "objective functions" to your
design problem?  One can show some upper bounds on the number of
groups needed (by construction of solutions) and some lower bounds (by
combinatorial argument) which vary with N = number of nodes and m =
maximum group size.

regards, mathtalk-ga

Clarification of Question by digitalperk-ga on 28 Jun 2004 20:32 PDT
Thank you for your reply. 

Let me give some examples. Let N = 10 and m = 5. The maximum number of
groups needed to satisfy the "all nodes have at least one link with
every other node" constraint is just 10 choose 5 using the combination
operation. This gives 252 combinations. Of course, not all 252
combinations are needed to guarantee the "all nodes have at least one
link with every other node" constraint since many links are repeated.
For example, two combinations are:

[1,2,3,4,5]
[1,2,3,4,6]

The second combination repeated the links between 1-4 but introduced 4
more new links between 6 and 1-4. A better arrangement is:

[1,2,3,4,5]
[6,7,8,9,10]

This arrangement does not have any redundant links between the two
combinations and introduced the maximum number of links. However, the
nodes 1-5 do not have a link to nodes 6-10 so a complete configuration
that achieves the goal of "at least one link" constraint is below:

[1, 2,3,4,5]
[6,7,8,9,10]
[1,6,7,8,9]
[2,6,7,8,9]
[3,6,7,8,9]
[4,6,7,8,9]
[5,6,7,8,9]
[1,2,3,4,10]
[5,10]

This configuration of 9 groups consists of many redundant links of
nodes 6-9 but satisfies the goal. What I seek to achieve is satisfying
the goal with the least number of redundant links and the least number
of groups. The metric of the solution could only be the number of
groups since it seems like with more nodes in a group, the number of
groups will decrease.

This problem is very similar to another question on Google Answers
titled: "Unique combinations of 4 numbers between 1 to " N""
http://answers.google.com/answers/threadview?id=274891

I think this problem also has the notion of "nice solutions" presented
in the above link. I tried to use a matrix to solve the problem where
N = 16 and m = 4. I think the solution you posted on the above link
about the matrix is the solution for this configuration and results in
20 groups. However, this solution doesn't scale well to where N is in
the 60's range.

I have devised a way to calculate the min number of groups needed to
solve the problem by calculating the number of links needed to assure
all nodes are connected to all other nodes. This seems to be (N-1) +
(N-2) + (N-3) + ... + 1. Then divide the total number of links by the
maximum number of links a group can represent (i.e., a group of 3
nodes can represent 2+1 links). In the case of N = 16 and m = 4, this
formula gave me the correct answer of 20 groups. However, I think this
formula works only with "nice solutions" since it assumes no
repetition of links.

The bottom line is, given an N and m, I want to find the solution with
the least number of groups and satisfying constraint of at least one
link between every pair of nodes. The configuration then needs to be
enumerated so I can configure the nodes accordingly.

I hope that helped!!

Request for Question Clarification by mathtalk-ga on 30 Jun 2004 15:11 PDT
Let N be the number of nodes, m the maximum size of a group.

As you have argued:

There are C(N,2) pairs to be linked and at most C(m,2) pairs linked by
a group, so at least C(N,2)/C(m,2) = N(N-1)/(m(m-1)) groups are
required.

That is one sort of lower bound on the number of groups needed, and
the lower bound can only be met, as you suspect, in special cases.

Particularly when m (the block size, in design theory terminology) is
large, e.g. already when m > 5, not a lot is known about the existence
of optimal designs.  If you like, I can suggest some solutions with m
= 20 and N between 60 and 80, but it's doubtful that I'll be able to
prove such are optimal (in the terms we've now set, of using the
minimum number of groups).

One way to judge the quality of such solutions is by comparison with
lower bounds on the number of groups, as inferred by that argument you
gave and some others.  For example, so long as m >= N, we can put all
the nodes in one group and that one group is enough.  But when N =
m+1, the minimum number of groups required is 3 (provided m > 1).  In
other words, the minimum number of groups has jumped "suddenly" from 1
to 3.

regards, mathtalk-ga

Request for Question Clarification by mathtalk-ga on 01 Jul 2004 13:46 PDT
Hi, digitalperk-ga:

It can be shown that the minimal number of groups for N = 10 nodes and
maximum group size of m = 5 required is 6.

I suspect that for the cases you are interested in, where N is
substantially greater than m, the maximum size of a group, you would
find the constructions of sub-optimal designs that I could describe to
be satisfactory.  E.g. the number of groups used would be comfortably
within a factor of two of a known lower bound on the number of groups
required.

regards, mathtalk-ga

Clarification of Question by digitalperk-ga on 01 Jul 2004 19:43 PDT
Thank you for your help. I coded up a program that will find these
sub-optimal solutions for me. For N = 64 and M = 20, the lowest group
count so far is 25. This is a very satisfactory number and if the
solution is correct, I can just go ahead and use that solution. Unless
there are no more ways to further optimize the solution, I guess this
will have to do.

Thanks

Request for Question Clarification by mathtalk-ga on 02 Jul 2004 03:33 PDT
For the case N = 64 and group size M = 20, there exists a solution
with 15 groups and a lower bound of 13 groups.  So the optimal number
must be between 13 & 15.

regards, mathtalk-ga

Request for Question Clarification by mathtalk-ga on 07 Jul 2004 08:39 PDT
Hi, digitalperk-ga:

If you no longer want an Answer to this Question, you can Close
(expire) it with the button at upper right.  If you are still
interested, I can post the methods that were used to produce near
optimal groupings for N = 60 to 80 and M = 20, along with some
references (mostly on the Web) to bounds and algorithms for this
problem.

regards, mathtalk-ga
Answer  
Subject: Re: Least number of combinations with the least redundency
Answered By: mathtalk-ga on 11 Jul 2004 10:39 PDT
Rated:5 out of 5 stars
 
Hi, digitalperk-ga:

Let's begin with a collection of 13 groups of size 20 in which all
possible pairs of nodes 1 to 65 are found:

[ 1, 2, 3, 4, 5, 6, 7, 8, 9,10,16,17,18,19,20,46,47,48,49,50]
[ 1, 2, 3, 4, 5,11,12,13,14,15,41,42,43,44,45,61,62,63,64,65]
[ 1, 2, 3, 4, 5,21,22,23,24,25,26,27,28,29,30,36,37,38,39,40]
[ 1, 2, 3, 4, 5,31,32,33,34,35,51,52,53,54,55,56,57,58,59,60]
[ 6, 7, 8, 9,10,11,12,13,14,15,21,22,23,24,25,51,52,53,54,55]
[ 6, 7, 8, 9,10,26,27,28,29,30,31,32,33,34,35,41,42,43,44,45]
[ 6, 7, 8, 9,10,36,37,38,39,40,56,57,58,59,60,61,62,63,64,65]
[11,12,13,14,15,16,17,18,19,20,26,27,28,29,30,56,57,58,59,60]
[11,12,13,14,15,31,32,33,34,35,36,37,38,39,40,46,47,48,49,50]
[16,17,18,19,20,21,22,23,24,25,31,32,33,34,35,61,62,63,64,65]
[16,17,18,19,20,36,37,38,39,40,41,42,43,44,45,51,52,53,54,55]
[21,22,23,24,25,41,42,43,44,45,46,47,48,49,50,56,57,58,59,60]
[26,27,28,29,30,46,47,48,49,50,51,52,53,54,55,61,62,63,64,65]

Omitting node 65 in the four places it occurs (replacing it, if
desired, with a suitable lower alternative), you get a solution as
requested for N = 64, M = 20.  Below we will use this example to
illustrate the theory and construction which shows its optimality.

The kind of problem you are dealing with here is combinatorial. 
However there are several points of view and corresponding variations
in terminology that can be applied.  Often the current literature
seems focused on problems of greater generality, which also
complicates ones search for relevant results.


Definitions
===========

Let's adopt the terminology of "covering designs" for the sake of this
discussion.  Your setup is a "universal set" of points (nodes in your
terms) of size v = N, for which there exists a collection of "blocks"
(groups) of size k = M, such that any subset of size t = 2 (pair) is
contained in at least one block.

The minimal number b of blocks required for such a covering design is
referred to as the "covering number" C(v,k,t).  For smallish values of
those parameters Dan Gordon and collaborators have established a
useful repository here:

[La Jolla Covering Repository - Dan Gordon et al]
http://www.ccrwest.org/cover.html

We already have in the parameter t one direction in which your problem
can be generalized, ie. to cover unordered t-tuples of points for
values t > 2.  Other kinds of generalizations will be mentioned toward
the end of this note, but the presence of parameter t in the
literature is so pervasive that its mention throughout seems
unavoidable.  Indeed we would like to abbreviate C(v,k,2) as C(v,k),
but this risks some confusion with the notation for "combinations of v
things taken k at a time".  So I've decided to show the parameter t =
2 explicitly throughout, and for clarity will count ordinary
combinations of v things taken t at a time as "v choose t".

To illustrate these definitions we have in the covering design above:

  v = 65
  k = 20
  t =  2
  b = 13

The argument "by omission" of node 65 says that C(64,20,2) <= C(65,20,2).


Theoretical Lower Bounds
========================

You have already noticed that one way to get a lower bound on the
number of blocks required is to divide the number of unordered pairs
of nodes by the number of unordered pairs within a block, ie. the
combinations of v = N things taken 2 at a time divided by combinations
of k = M things taken 2 at a time:

 C(v,k,2)  >=  (v choose 2)/(k choose 2)

Because the number of blocks is a whole number, we can "sharpen" this
estimate by taking the "ceiling" (least integer above) for the ratio:

 C(v,k,2)  >=  ceil( (v*(v-1)) / (k*(k-1)) )

In our given example:

C(65,20,2) >=  ceil( 4160 / 380 )  =  11

But a different way of counting leads to a lower bound that is subtly better:

 C(v,k,2)  >=  ceil(v * ceil((v-1)/(k-1)) / k)

and which turns out to be tight in a surprising number of cases.  For example:

C(65,20,2) >=  ceil( 65 * ceil(64/19) / 20)  =  65*4/20 = 13

One way to establish this improvement is to consider the number of
blocks that a point must appear in.  That is, each specific point
needs to "pair" with v-1 other points, but each block that contains
that point will only provide k-1 such pairings.  Therefore at least
ceil((v-1)/(k-1)) appearances of each point are required.  As there
are v points and k points per block, the revised estimate above
follows.

This form of lower bound estimate is referred to as the Schonheim bound:

J. Schonheim, On Coverings, Pacific Journal of Mathematics 14 (1964), pp. 1405-1411

where a recursive application to larger values t > 2 is presented.


Constructive Upper Bounds
=========================

There's considerable literature on methods for finding efficient
coverings.  A selection of references available on-line will be
presented at bottom.

The limiting situation when v = N is arbitrarily large relative to k =
M has been favorably understood for some time:

R. M. Wilson, Decomposition of complete graphs into subgraphs
isomorphic to a given graph,
Congressus Numerantium XV (1975), 647-659.

Wilson shows that for fixed block size k, there exists a function n(k)
such that for all v >= n(k), one can find a "nice" minimal covering in
which every pair is included once and only once subject only to the
"necessary conditions":

 i)  (v-1)/(k-1) is a whole number
 
ii)  v(v-1)/(k(k-1)) is a whole number

More recent work by Caro and Yuster has added to the picture for large v = N:

[Efficient covering designs of the complete graph - Caro and Yuster]
http://www.combinatorics.org/Volume_4/PDF/v4i1r10.pdf

but in a practical sense the required sizes are too large to be
helpful here.  The function n(k) would be at least quadratic in k.

Our approach is simple both to understand and to implement but
requires a base of previously solved cases.  As the authors here (the
same folks who have made the La Jolla repository of covering designs
available) advise:

[Asymptotically optimal covering designs - Gordon et al]
http://arxiv.org/abs/math.CO/9511224

"The best way to use the induced covering algorithm in practice is to
first find or make a large table of good coverings with small
parameters using many different methods..."

Suppose that v,k have a proper divisor in common, eg. m.  Then a
covering design (v,k,t) can be induced from one for (v/m,k/m,t) by
replacing all points in each block for the latter by m points from the
original set of v points such that these v/m subsets form a disjoint
partition.

To take our favorite example, we can begin with a covering design from
the La Jolla repository:

[La Jolla Covering Repository Tables]
http://www.ccrwest.org/cover/HIGH.html

which establishes that C(13,4,2) <= 13:

[ 1,  2,  4, 10]
[ 1,  3,  9, 13]
[ 1,  5,  6,  8]
[ 1,  7, 11, 12]
[ 2,  3,  5, 11]
[ 2,  6,  7,  9]
[ 2,  8, 12, 13]
[ 3,  4,  6, 12]
[ 3,  7,  8, 10]
[ 4,  5,  7, 13]
[ 4,  8,  9, 11]
[ 5,  9, 10, 12]
[ 6, 10, 11, 13]

Now 65 = 13*5, 20 = 4*5, and we replace each entry in the blocks of
this design by the corresponding points in this partition:

 1 |--> { 1, 2, 3, 4, 5}
 2 |--> { 6, 7, 8, 9,10}
 3 |--> {11,12,13,14,15}
 4 |--> {16,17,18,19,20}
 5 |--> {21,22,23,24,25}
 6 |--> {26,27,28,29,30}
 7 |--> {31,32,33,34,35}
 8 |--> {36,37,38,39,40}
 9 |--> {41,42,43,44,45}
10 |--> {46,47,48,49,50}
11 |--> {51,52,53,54,55}
12 |--> {56,57,58,59,60}
13 |--> {61,62,63,64,65}

The result is the 13 block covering design we opened with.

All the values posted in my earlier Comment (see below) were obtained
smaller designs taken from the repository for either k = 4 or 5.  It
seems likely that, if the repository table were extended beyond its
current limit, that some additional improvements could be inferred
from cases with k = 10.


Additional References
=====================


Methods for finding covering designs:
-------------------------------------

[Approximation algorithms for the k-clique covering problem - Hochbaum et al]
http://riot.ieor.berkeley.edu/~dorit/pub/k-clique.html

[A Parallel Implementation of an Algorithm for Computing Covering
Numbers - Li and Russell]
http://www.cs.umanitoba.ca/~lipakc/parallel_cover.pdf


Generalizations:
----------------

An additional parameter m can be introduced, with the modified
requirement that any m-subset of the universal set must intersect some
block in at least t points.  The earlier cases would then be those
with m = t.

[Covering Designs]
http://perso.wanadoo.fr/colin.barker/cover/Cover.htm

This generalization is especially interesting to lotto players, for
whom it imputes a number of tickets that must be purchased in order to
guarantee matching at least t numbers, given a selection of k numbers
per ticket and a drawing of m numbers randomly.

[Lotto Designs - Bate and van Rees]
http://www.cs.umanitoba.ca/~vanrees/Lotto.pdf

For a collection of related links, many of which are now broken:

[Lotto Designs - Nick Koutras]
http://lottodesigns.50megs.com/

His terminology is at some variance, however, with the preceding
because an objective for the gambler may be to sacrifice a guaranteed
match at a given size (referred to as a wheel) for better odds of
winning (relative to the number of tickets purchased).

Other sorts of generalizations look at coverings that are constrained
in some manner, possibly as in these papers by the cardinality of
differences between blocks or between a block and a selected target:

[Some New Bounds on Single-Change Covering Designs - Zhang (abstract)]
http://epubs.siam.org/sam-bin/dbq/article/22470

[On Asymmetric Coverings and Covering Numbers - Applegate, Rains, and Sloane]
http://www.research.att.com/~njas/doc/covcod.pdf

The use of the word "designs" here connects this subject with that of
balanced incomplete block designs (BIBD), where the requirement
becomes _exactly_ one block per t-subset.  The material in another
Google Answers thread:

[Q: Math Question for Golf Teaming]
http://answers.google.com/answers/threadview?id=327232

is a sort of "dual" to the covering problem in which we require _at
most_ one block per t-subset, and these are thus termed "packing
designs" because of their avoidance of overlaps.  The Schonheim bound
can be established for packing designs as an upper bound on the number
of blocks possible:

J. Schonheim, On maximal systems of k-tuples, Studia Sci. Math.
Hungar. 1 (1966), 363-368

Although our terminology is based on "designs" (and the underlying
incidence structure of blocks and points), many of the references use
graph theory formulations interchangeably.  A complete graph K_n is
one in which n nodes are all connected to one another by edges. 
Generalizations to "hypergraphs" in which "edges" are replaced by node
sets larger than 2 are then equivalent to the consideration of
covering designs with t > 2.  Another point of view might be the
selection of (k-1)-faces from a (v-1)-simplex (because an n-simplex is
essentially a complete graph on n+1 nodes) that covers all the edges
(1-simplices) or some higher dimensional faces.


Applications of covering designs:
---------------------------------

In addition to lotto games there are applications of covering designs
to software testing and other statistical topics:

[Factor-covering designs for testing software - Dalal and Mallows]
http://aetgweb.argreenhouse.com/papers/1998-technometrics.pdf

[An application of covering designs: determining the maximum
consistent set of shares in a threshold scheme - Rees, Stinson, and
Wei]
http://www.cs.umanitoba.ca/~vanrees/wei.pdf



regards, mathtalk-ga

Clarification of Answer by mathtalk-ga on 13 Jul 2004 18:24 PDT
A few more remarks.  We can point out the six specific designs from
Gordan's repository that suffice to establish the upper bounds cited
in the Comment below.   Combining these we can give this elaboration:

   Estimated numbers C(N,20,2)
   ===========================

         theor.   const.  induce
   N     lower    upper    from
         bound    bound   design
 -----  -------  ------- --------
  60      12       12    C(12,4,2)
  61      13       13    C(13,4,2)
  62      13       13    C(13,4,2)
  63      13       13    C(13,4,2)
  64      13       13    C(13,4,2)
  65      13       13    C(13,4,2)
  66      14       16    C(17,5,2)
  67      14       16    C(17,5,2)
  68      14       16    C(17,5,2)
  69      14       18    C(18,5,2)
  70      14       18    C(18,5,2)
  71      15       18    C(18,5,2)
  72      15       18    C(18,5,2)
  73      15       19    C(19,5,2)
  74      15       19    C(19,5,2)
  75      15       19    C(19,5,2)
  76      16       19    C(19,5,2)
  77      16       20    C(16,4,2)
  78      20       20    C(16,4,2)
  79      20       20    C(16,4,2)
  80      20       20    C(16,4,2)
 
This paper shows some ways to generalize the "induced covering" we
presented above, e.g. points can be replaced by varying numbers of
points:

[Covering Designs on 13 Blocks Revisited - Grieg et al]
http://www.cs.umanitoba.ca/~vanrees/todorov2.pdf

Since the covering designs above include some with an optimal number
of 13 blocks, it is interesting to know all parameters v,k for which
C(v,k,2) = 13.  This result was originally proved by Todorov (with two
possible exceptions), but the full details were not published in an
accessible form.  So the three authors above have undertaken in this
recent but lengthy note to fill in the steps for us and resolve the
two possible exceptions (in the negative).

One of the authors of the above paper has done his disseration on
finding so-called lotto designs, the generalization of covering
designs mentioned before:

[Some Results on Lotto Designs - PC Li]
(link about a quarter of the way down the page, postcript format)
http://www.cs.umanitoba.ca/~lipakc/

Be forewarned... this paper is 300+ pages.  It recounts some history
along with programming stuff.

regards, mathtalk-ga
digitalperk-ga rated this answer:5 out of 5 stars
THANK YOU! You went above and beyond my basic question! Your answers
were clear and complete. Thanks!

Comments  
Subject: Re: Least number of combinations with the least redundency
From: mathtalk-ga on 10 Jul 2004 12:23 PDT
 
For the number of nodes N ranging from 60 to 80 and a group size not
to exceed 20, I found theoretical lower bounds (on the number of
groups needed) together with constructed upper bounds (ie. taken from
explict groupings):

         theor.   const.
   N     lower    upper
         bound    bound
 -----  -------  -------
  60      12       12
  61      13       13
  62      13       13
  63      13       13
  64      13       13
  65      13       13
  66      14       16
  67      14       16
  68      14       16
  69      14       18
  70      14       18
  71      15       18
  72      15       18
  73      15       19
  74      15       19
  75      15       19
  76      16       19
  77      16       20
  78      20       20
  79      20       20
  80      20       20

Note that my methods proved to be "tight" at the extremes of this
range and somewhat loose in the middle.  In relative terms the gap is
largest for N = 69,70 (18 is roughly 30% more than 14).  It would be
interesting to know if the gaps in this part of the range can be
resolved and the optimal number of groups determined.

regards, mathtalk-ga
Subject: Re: Least number of combinations with the least redundency
From: digitalperk-ga on 10 Jul 2004 19:09 PDT
 
This is a very good algorithm as it provides very tight bounds with
the min. Please let me know how you did this! It would be great to
reduce the groups to 15. Can you also get me a listing of all the
groups for  N = 64 and M = 20?

Thanks!

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy