Dear gotquestion,
I'll begin with the problem of enumerating all combinations of M
elements drawn from a set of size N. Let us number the elements n_0,
n_1, ..., n_N. Observe that if M = 1, we can run through the elements
one by one, and each will constitute a combination on its own.
If M > 1, then let's first enumerate all combinations that include the
element n_0. To do so, we remove n_0 from the set, and now we
enumerate all combinations of M-1 elements drawn from the remaining
N-1 elements. To make a full combination of length M, we concatenate
n_0 with the combination of length M-1.
The subproblem of enumerating all combinations of M-1 elements from a
set of size of N-1 is solved similarly to the original problem. We
remove an element and then enumerate all combinations of M-2 elements
from the remaining N-2. Eventually, we will end up with the subproblem
of enumerating combinations of 1 element.
Once we've run through all combinations for the original problem that
include the element n_0, we can proceed to enumerate all remaining
combinations that include n_1. To do so, we take out n_1 in addition
to the already removed n_0, so that N-2 elements remain in the set. We
are now considering combinations that begin with the element n_1, so
we enumerate all combinations of M-1 elements from the remaining set
of size N-2. When that is done, we take out n_2 and enumerate all
combinations of M-1 elements from a set of size N-3, and so on. Once
we've reached the element n_M, we are done.
I'll summarize this algorithm with the following pseudocode. The input
parameter P is the currently known prefix of the combination. K is the
number of elements that remain to be drawn from the set S. If we
originally call ENUMERATE with empty P and with a value K = M, then it
is necessarily the case that when K reaches 0, P contains M elements
and is therefore a full combination.
ENUMERATE (input: a list P; an integer K; a set S):
if K is equal to 0:
print P
else:
for each element s_i of S:
let P' be P with s_i added to the end
remove s_i from S
ENUMERATE(P', K-1, S)
An implementation in C follows. You'll notice that my function doesn't
take as a parameter the set itself, because in this program, S is a
global variable containing all elements. Instead, I pass the least
index number such that all lower-numbered elements of S have already
been considered. The effect of using this bound is to simulate the
removal of elements from S.
/* begin combs.c */
#include<stdlib.h>
#include<stdio.h>
char *S = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
int N, count;
char P[100];
void enumerate(char *P, int K, int n_i) {
int i;
if (K == 0)
printf("%d. %s\n", ++count, P);
else
for (i = n_i; i < N; i++) {
P[strlen(P)+1] = '\0';
P[strlen(P)] = S[i];
enumerate(P, K-1, i+1);
P[strlen(P)-1] = '\0';
}
}
int main(int argc, char **argv) {
int M, max = strlen(S);
if (argc != 3) {
printf("usage: combs <int M> <int N>\n");
return 0;
}
M = atoi(argv[1]);
if (M < 1 || M > max) {
printf("error: M must lie between 1 and %d, inclusive\n", max);
return 0;
}
N = atoi(argv[2]);
if (N < M || N > max) {
printf("error: N must lie between %d and %d, inclusive\n", M, max);
return 0;
}
enumerate(P, M, 0);
return 0;
}
/* end combs.c */
I'm not sure what platform you're using, but I'm on a UNIX-type
system. To compile the program, I execute
gcc combs.c -o combs
and then to run it, I pass a pair of integer arguments, as in the
following examples.
./combs 2 3
./combs 3 5
./combs 5 10
There's some error checking built into the main function, so it should
be fairly robust. Let me know if you have trouble compiling or running
my program, and I'll try to provide further assistance.
Observe that in the act of enumerating all combinations, we have also
counted them. It is possible, however, to compute the number of
combinations without enumeration. In fact, this is often preferred
because the combinations can easily get so numerous that printing them
all would take an impractical amount of time.
Consider how we choose the elements that make up any single
combination. The number of choices for the first element of the
combination is N, the size of the available set. When that element is
removed, we have N-1 choices for the second element of the
combination. Next, we have N-2 choices for the third element, and so
on, until we have built up the entire combination. Notice that in
making these M choices, we may have selected exactly the same elements
as in another combination, but in a different order. Thus, the number
of ways to choose M elements among N is not simply
N * (N-1) * (N-2) * ... * (N-(M-1)).
This is the number of ways to order M elements among N. To find the
number of combinations, we must divide by the number of ways to order
M elements alone. The number of ways to do so is, by similar reasoning
to the above,
M * (M-1) * (M-2) * ... * 1.
In conclusion, the number of combinations of M elements chosen among a
set of size N is
N * (N-1) * (N-2) * ... * (N-(M-1)) / (M * (M-1) * (M-2) * ... * 1).
Here is some pseudocode to compute this number.
COMBINATIONS (input: integer M; integer N):
let T be 1
for each integer i between N and N-(M-1), inclusive:
let T be i*T
for each integer i between M and 1, inclusive:
let T be T/i
return T
A C function follows.
int combinations(int M, int N) {
long T = 1;
int i;
for (i = N; i >= N-(M-1); i--)
T *= (long) i;
for (i = M; i >= 1; i--)
T /= (long) i;
return (int) T;
}
To try out this code, you can insert it before the main function I
gave earlier, then replace the line
enumerate(P, M, 0);
with
printf("There are %d combinations.\n", combinations(M, N));
and compile the whole program before executing it. See how much faster
it runs without enumeration?
The algorithms I have given above apply only to the problem of drawing
elements from a set, where each member is unique. It is a more
complicated business to enumerate combinations made from a bag, also
known as a multiset, which may contain duplicates. Because you didn't
use the word bag or multiset in your question, I assumed that you were
dealing just with sets.
If you should find my answer incomplete or inaccurate in any way,
please post a clarification so that I have a chance to meet your needs
before you assign a rating.
Cheers,
leapinglizard |