Hi, digitalperk-ga:
Let's begin with a collection of 13 groups of size 20 in which all
possible pairs of nodes 1 to 65 are found:
[ 1, 2, 3, 4, 5, 6, 7, 8, 9,10,16,17,18,19,20,46,47,48,49,50]
[ 1, 2, 3, 4, 5,11,12,13,14,15,41,42,43,44,45,61,62,63,64,65]
[ 1, 2, 3, 4, 5,21,22,23,24,25,26,27,28,29,30,36,37,38,39,40]
[ 1, 2, 3, 4, 5,31,32,33,34,35,51,52,53,54,55,56,57,58,59,60]
[ 6, 7, 8, 9,10,11,12,13,14,15,21,22,23,24,25,51,52,53,54,55]
[ 6, 7, 8, 9,10,26,27,28,29,30,31,32,33,34,35,41,42,43,44,45]
[ 6, 7, 8, 9,10,36,37,38,39,40,56,57,58,59,60,61,62,63,64,65]
[11,12,13,14,15,16,17,18,19,20,26,27,28,29,30,56,57,58,59,60]
[11,12,13,14,15,31,32,33,34,35,36,37,38,39,40,46,47,48,49,50]
[16,17,18,19,20,21,22,23,24,25,31,32,33,34,35,61,62,63,64,65]
[16,17,18,19,20,36,37,38,39,40,41,42,43,44,45,51,52,53,54,55]
[21,22,23,24,25,41,42,43,44,45,46,47,48,49,50,56,57,58,59,60]
[26,27,28,29,30,46,47,48,49,50,51,52,53,54,55,61,62,63,64,65]
Omitting node 65 in the four places it occurs (replacing it, if
desired, with a suitable lower alternative), you get a solution as
requested for N = 64, M = 20. Below we will use this example to
illustrate the theory and construction which shows its optimality.
The kind of problem you are dealing with here is combinatorial.
However there are several points of view and corresponding variations
in terminology that can be applied. Often the current literature
seems focused on problems of greater generality, which also
complicates ones search for relevant results.
Definitions
===========
Let's adopt the terminology of "covering designs" for the sake of this
discussion. Your setup is a "universal set" of points (nodes in your
terms) of size v = N, for which there exists a collection of "blocks"
(groups) of size k = M, such that any subset of size t = 2 (pair) is
contained in at least one block.
The minimal number b of blocks required for such a covering design is
referred to as the "covering number" C(v,k,t). For smallish values of
those parameters Dan Gordon and collaborators have established a
useful repository here:
[La Jolla Covering Repository - Dan Gordon et al]
http://www.ccrwest.org/cover.html
We already have in the parameter t one direction in which your problem
can be generalized, ie. to cover unordered t-tuples of points for
values t > 2. Other kinds of generalizations will be mentioned toward
the end of this note, but the presence of parameter t in the
literature is so pervasive that its mention throughout seems
unavoidable. Indeed we would like to abbreviate C(v,k,2) as C(v,k),
but this risks some confusion with the notation for "combinations of v
things taken k at a time". So I've decided to show the parameter t =
2 explicitly throughout, and for clarity will count ordinary
combinations of v things taken t at a time as "v choose t".
To illustrate these definitions we have in the covering design above:
v = 65
k = 20
t = 2
b = 13
The argument "by omission" of node 65 says that C(64,20,2) <= C(65,20,2).
Theoretical Lower Bounds
========================
You have already noticed that one way to get a lower bound on the
number of blocks required is to divide the number of unordered pairs
of nodes by the number of unordered pairs within a block, ie. the
combinations of v = N things taken 2 at a time divided by combinations
of k = M things taken 2 at a time:
C(v,k,2) >= (v choose 2)/(k choose 2)
Because the number of blocks is a whole number, we can "sharpen" this
estimate by taking the "ceiling" (least integer above) for the ratio:
C(v,k,2) >= ceil( (v*(v-1)) / (k*(k-1)) )
In our given example:
C(65,20,2) >= ceil( 4160 / 380 ) = 11
But a different way of counting leads to a lower bound that is subtly better:
C(v,k,2) >= ceil(v * ceil((v-1)/(k-1)) / k)
and which turns out to be tight in a surprising number of cases. For example:
C(65,20,2) >= ceil( 65 * ceil(64/19) / 20) = 65*4/20 = 13
One way to establish this improvement is to consider the number of
blocks that a point must appear in. That is, each specific point
needs to "pair" with v-1 other points, but each block that contains
that point will only provide k-1 such pairings. Therefore at least
ceil((v-1)/(k-1)) appearances of each point are required. As there
are v points and k points per block, the revised estimate above
follows.
This form of lower bound estimate is referred to as the Schonheim bound:
J. Schonheim, On Coverings, Pacific Journal of Mathematics 14 (1964), pp. 1405-1411
where a recursive application to larger values t > 2 is presented.
Constructive Upper Bounds
=========================
There's considerable literature on methods for finding efficient
coverings. A selection of references available on-line will be
presented at bottom.
The limiting situation when v = N is arbitrarily large relative to k =
M has been favorably understood for some time:
R. M. Wilson, Decomposition of complete graphs into subgraphs
isomorphic to a given graph,
Congressus Numerantium XV (1975), 647-659.
Wilson shows that for fixed block size k, there exists a function n(k)
such that for all v >= n(k), one can find a "nice" minimal covering in
which every pair is included once and only once subject only to the
"necessary conditions":
i) (v-1)/(k-1) is a whole number
ii) v(v-1)/(k(k-1)) is a whole number
More recent work by Caro and Yuster has added to the picture for large v = N:
[Efficient covering designs of the complete graph - Caro and Yuster]
http://www.combinatorics.org/Volume_4/PDF/v4i1r10.pdf
but in a practical sense the required sizes are too large to be
helpful here. The function n(k) would be at least quadratic in k.
Our approach is simple both to understand and to implement but
requires a base of previously solved cases. As the authors here (the
same folks who have made the La Jolla repository of covering designs
available) advise:
[Asymptotically optimal covering designs - Gordon et al]
http://arxiv.org/abs/math.CO/9511224
"The best way to use the induced covering algorithm in practice is to
first find or make a large table of good coverings with small
parameters using many different methods..."
Suppose that v,k have a proper divisor in common, eg. m. Then a
covering design (v,k,t) can be induced from one for (v/m,k/m,t) by
replacing all points in each block for the latter by m points from the
original set of v points such that these v/m subsets form a disjoint
partition.
To take our favorite example, we can begin with a covering design from
the La Jolla repository:
[La Jolla Covering Repository Tables]
http://www.ccrwest.org/cover/HIGH.html
which establishes that C(13,4,2) <= 13:
[ 1, 2, 4, 10]
[ 1, 3, 9, 13]
[ 1, 5, 6, 8]
[ 1, 7, 11, 12]
[ 2, 3, 5, 11]
[ 2, 6, 7, 9]
[ 2, 8, 12, 13]
[ 3, 4, 6, 12]
[ 3, 7, 8, 10]
[ 4, 5, 7, 13]
[ 4, 8, 9, 11]
[ 5, 9, 10, 12]
[ 6, 10, 11, 13]
Now 65 = 13*5, 20 = 4*5, and we replace each entry in the blocks of
this design by the corresponding points in this partition:
1 |--> { 1, 2, 3, 4, 5}
2 |--> { 6, 7, 8, 9,10}
3 |--> {11,12,13,14,15}
4 |--> {16,17,18,19,20}
5 |--> {21,22,23,24,25}
6 |--> {26,27,28,29,30}
7 |--> {31,32,33,34,35}
8 |--> {36,37,38,39,40}
9 |--> {41,42,43,44,45}
10 |--> {46,47,48,49,50}
11 |--> {51,52,53,54,55}
12 |--> {56,57,58,59,60}
13 |--> {61,62,63,64,65}
The result is the 13 block covering design we opened with.
All the values posted in my earlier Comment (see below) were obtained
smaller designs taken from the repository for either k = 4 or 5. It
seems likely that, if the repository table were extended beyond its
current limit, that some additional improvements could be inferred
from cases with k = 10.
Additional References
=====================
Methods for finding covering designs:
-------------------------------------
[Approximation algorithms for the k-clique covering problem - Hochbaum et al]
http://riot.ieor.berkeley.edu/~dorit/pub/k-clique.html
[A Parallel Implementation of an Algorithm for Computing Covering
Numbers - Li and Russell]
http://www.cs.umanitoba.ca/~lipakc/parallel_cover.pdf
Generalizations:
----------------
An additional parameter m can be introduced, with the modified
requirement that any m-subset of the universal set must intersect some
block in at least t points. The earlier cases would then be those
with m = t.
[Covering Designs]
http://perso.wanadoo.fr/colin.barker/cover/Cover.htm
This generalization is especially interesting to lotto players, for
whom it imputes a number of tickets that must be purchased in order to
guarantee matching at least t numbers, given a selection of k numbers
per ticket and a drawing of m numbers randomly.
[Lotto Designs - Bate and van Rees]
http://www.cs.umanitoba.ca/~vanrees/Lotto.pdf
For a collection of related links, many of which are now broken:
[Lotto Designs - Nick Koutras]
http://lottodesigns.50megs.com/
His terminology is at some variance, however, with the preceding
because an objective for the gambler may be to sacrifice a guaranteed
match at a given size (referred to as a wheel) for better odds of
winning (relative to the number of tickets purchased).
Other sorts of generalizations look at coverings that are constrained
in some manner, possibly as in these papers by the cardinality of
differences between blocks or between a block and a selected target:
[Some New Bounds on Single-Change Covering Designs - Zhang (abstract)]
http://epubs.siam.org/sam-bin/dbq/article/22470
[On Asymmetric Coverings and Covering Numbers - Applegate, Rains, and Sloane]
http://www.research.att.com/~njas/doc/covcod.pdf
The use of the word "designs" here connects this subject with that of
balanced incomplete block designs (BIBD), where the requirement
becomes _exactly_ one block per t-subset. The material in another
Google Answers thread:
[Q: Math Question for Golf Teaming]
http://answers.google.com/answers/threadview?id=327232
is a sort of "dual" to the covering problem in which we require _at
most_ one block per t-subset, and these are thus termed "packing
designs" because of their avoidance of overlaps. The Schonheim bound
can be established for packing designs as an upper bound on the number
of blocks possible:
J. Schonheim, On maximal systems of k-tuples, Studia Sci. Math.
Hungar. 1 (1966), 363-368
Although our terminology is based on "designs" (and the underlying
incidence structure of blocks and points), many of the references use
graph theory formulations interchangeably. A complete graph K_n is
one in which n nodes are all connected to one another by edges.
Generalizations to "hypergraphs" in which "edges" are replaced by node
sets larger than 2 are then equivalent to the consideration of
covering designs with t > 2. Another point of view might be the
selection of (k-1)-faces from a (v-1)-simplex (because an n-simplex is
essentially a complete graph on n+1 nodes) that covers all the edges
(1-simplices) or some higher dimensional faces.
Applications of covering designs:
---------------------------------
In addition to lotto games there are applications of covering designs
to software testing and other statistical topics:
[Factor-covering designs for testing software - Dalal and Mallows]
http://aetgweb.argreenhouse.com/papers/1998-technometrics.pdf
[An application of covering designs: determining the maximum
consistent set of shares in a threshold scheme - Rees, Stinson, and
Wei]
http://www.cs.umanitoba.ca/~vanrees/wei.pdf
regards, mathtalk-ga |