Google Answers: All possible Combinations

View Question

Q: All possible Combinations ( Answered, 3 Comments )

Question

Subject: All possible Combinations
Category: Computers > Algorithms
Asked by: prashant_jain-ga
List Price: $15.00

Posted: 09 May 2004 16:28 PDT
Expires: 08 Jun 2004 16:28 PDT
Question ID: 343709

I am trying to find an efficient way to find all possible combinations of N things taken K at a time. Now I know this is possible to do it when N and K are small. But my problem is somewhat difficult. My N can be anything in the order of 1000 to 5000 and K can be anything from 5-10. Now if I try to write any simple looping code, it takes forever to do it. I am looking for someway to do this in a decent time (less than 10 hrs). I would be obliged if someone could help me out with this one.
Request for Question Clarification by mathtalk-ga on 09 May 2004 20:13 PDT Hi,prashant_jain-ga: It is possible that the simple looping code to which you refer is efficient (though I cannot be sure of this without seeing it). Even using the lowest range of your numbers, 1000 things taken 5 at a time, there would be 8,250,291,250,200 possible combinations. If you could process 10 million combinations per second, it would take about ten days of running time to exhaust all of them. So while I can indeed provide quite efficient methods for generating all the possible combinations described, it seems that to do the larger solution sets mentioned in less than 10 hours is unrealistic. If despite this you are still interested in efficient algorithms for producing all combinations of N things taken K at a time, please post a Clarification. regards, mathtalk-ga
Clarification of Question by prashant_jain-ga on 10 May 2004 09:15 PDT Hi, I am still interested in getting an efficient algorithm for doing this. Another thing is that, could you tell me any way that I could solve this problem in say like a 12 hr time. What I have to do is to find a best weighted subset of k things out of n things. All I could think of is trying the brute force method of getting all possible combinations of size k taken out of n. I would be glad if you could help me out. Thanks, Prashant
Request for Question Clarification by mathtalk-ga on 10 May 2004 09:38 PDT Hi, prashant_jain-ga: I'm not sure what "best weighted subset" means in your circumstance, but it does suggest that there may be much more efficient algorithms than the "brute force" approach of trying all k-subsets. When the "solution space" to be searched has the form of a tree, one heuristic that is often applied to reduce the amount of searching is called alpha-beta pruning. The intuitive idea is to eliminate branches of the tree which either seem unpromising or conclusively cannot lead to an improvement over tentative solutions already identified. I would guess that you have an "objective function" which expresses the criterion for judging which weighted k-subset is best. Please explain this notion in greater detail so that I can tailor my suggestions more closely to your actual problem. regards, mathtalk-ga

Answer

Subject: Re: All possible Combinations
Answered By: mathtalk-ga on 13 May 2004 18:39 PDT

Hi, prashant_jain-ga:

There are several ways to efficiently enumerate all the k-subsets of n
items.  The three methods that I'm about to discuss are all based on a
common ordering of the solution search space, so that the same methods
might be used to divide up the search space between multiple
computers.  If you have a hard requirement to perform such a long
computation in a fixed time period, then ultimately "parallel
processing" may be your best solution, as more efficient
implementations can only push the performance speed to a certain
limit.

Most likely the approach you'll want to code is the first method that
I'll outline.  It is easy to understand and program and very
efficient; it's one drawback is having a "hard coded" dependence on
the size k of subsets.  Two more methods are then outlined for the
sake of completeness which overcome that shortcoming.

We will assume that the selectable items themselves can be indexed,
e.g. through an array of "objects", so that our problem can without
much indirection be reduced to one of selecting k-subsets of {1,...,n}
(or for the C purists, of {0,...,n-1}).

The first method that I'd like to discuss may be the closest to what
you are already doing, since it involves a nested set of loops.  Now
this method, because it involves a fixed number of loops corresponding
to the cardinality k of the subsets, has a disadvantage in terms of
generality.  However the range of values for k which you cited is
narrow (k = 5 to 10), so this disadvantage may be a bit academic.  One
can easily write this sort of code six times for the given values of k
and be done.

I will try to express the algorithms in a suitably intuitive
"pseudocode", but if a construction is unclear, please let me know and
I'll be happy to provide further clarification.

Suppose that we have an objective function F(..) that applies to any
k-subset we select, and we want to evaluate this for every k-subset in
order to determine the largest value.  Since this first method is "k
specific", we will assume that in fact F(..) takes k arguments,
although the value returned is independent of the order of these
arugments (as a set is independent of any order of its elements). [If
your notion of "best weighted subset" is a minimum rather than a
maximum, then this change in perspective is easily accomodated, e.g.
by taking the minimum over the values found rather than the maximum as
here, or by exchanging -F for F in the code.]

So "nest" k for-loops in this manner (ik' denotes the predecessor variable to ik):

0. Let max := F(1,2,..,k).

1. for i1 := 1 to n-k+1

2.   for i2 := i1+1 to n-k+2

. . .

k.     for ik := ik'+1 to n

k+1.     Let new := F(i1,i2,...,ik',ik).

k+2.     if new > max then max := new

       next ik
   . . .
  next i2
next i1

What this approach does is to enumerate the k-subsets of {1,...,n} in
so-called lexicographic ordering (ascending) on the sorted
representations i1,i2,..,ik of these subsets.  The arrangements of
nested index ranges guarantees that these arguments will always be
generated in ascending order and that every possible combination is
generated.

Aside from the necessity of committing to a specific value of k, this
method is attractive because it is simple to code and efficient.  If
the looping is done in C (or even better, in hand coded assembly!),
then it achieves the purpose of enumeration with quite low overhead. 
In fact for the given counts 5 <= k <= 10, the overhead of the looping
is certainly comparable to the overhead of the inner function call
(meaning just the stack operations, not the execution thereof), so
that your CPU cycles will be concentrated on the evaluation of the
objective function.

The second method that I will outline uses function recursion to
enumerate the k-subsets.  Here depth of recursion takes the place of
loop nesting to order the distinct items selected.  It is less
efficient in most architectures to do a function call than to
increment a loop variable, as the stack frame must be pushed and
popped, but this may be negligible in comparison to the evaluation of
your objective function.  The nice thing is that the size k of subsets
need not be "hard coded", because it can be represented instead by a
function argument.  It can be a little difficult to grasp how
recursion takes the place of looping though.

Many such implementations are possible, but let me throw out some
ideas.  At the deepest level we will need to access all the items
chosen at outer levels of recursion (so that the objective function
can be evaluated there).  This can be accomplished either by a
"static" array in the given function (or "global" array, visible there
and elsewhere in the program) or by passing an array of selections
along as a function argument.  [I'm describing the structure as an
array for simplicity, but it could be any of various "collection
class" datatypes, e.g. lists or queues.]

In this approach we might have a top level call to function pick(..)
with one argument that specifies the number of items to choose and
another that specifies the items chosen.  For notational convenience
I'll use a Prolog-like syntax for lists to represent the array of
items chosen.  Thus, at the top level we would invoke:

pick(k,[])

The function pick(..) will then (repeatedly) call itself, decrementing
the value of k by 1 and adjoining a selected element (greater than any
chosen so far by "outer" calls), until by recursion one reaches the
inner call:

pick(0,[i1,i2,...,ik])

At that point we are ready to call the object function, written to
accept the array or list as an argument:

F([i1,i2,...,ik])

The inner invocation of pick(..) must also keep track of the maximum
value returned by an objective function call, and to do this a global
variable "max" probably makes the most sense (though it is possible to
do it with "output" arguments for the function pick as well).  The
function pick(..) itself does not return (from a given level) until it
has "tried" all the possible items to adjoin to list/array passed in
to it.

This approach does avoid the need to hard code the size k of the
subsets, but at a price of some loss in efficiency and (arguably) in
legibiity of the code, since the logic for enumerating the subsets is
now "spread around" among coordinated calls to function pick(..).

The third method that I'll propose is an attempt to have the best of
both approaches by using a data structure iteration.  We can avoid the
overhead of function recursion on the one hand and avoid having to
hard code the size k of subsets on the other.

The characteristic of this data structure is that it represents any of
the lexicographically ordered k subsets, and indeed has an associated
"next" operation that advances from one such subset to the successor
in this ordering.  At a high level the algorithm can be written this
way:

Let s := {k,...,2,1}, max = F(s).

while s < {n,...,n-k+1}
begin
  s := next(s)
  
  max := greaterof(max, F(s))

end

I've written the elements of subset s "backwards" to suggest the
implementation for the  operation "next", which for efficiency works
mostly on the "tail" of the selections.  We can in fact implement the
data structure for s as array of length k, so this can be done in
almost any programming language that supports arrays.  Conceptually
though the individual array needs to be taken in context with the
values n (defining the range of possible selections) and k (the size
of the subset or "array length" essentially).

To implement the next(s) operation one starts at the end of an array
representing s:

[i1,i2,...,ik]

and locates the "farthest" index j between 1 and k such that there is
room to increment the selection in that slot.

To describe a bit more clearly, once ik reaches n, it no longer has
"room to increment" because n is the largest possible value.  So if we
want to move up in the ordering of subsets, we have to move
"backwards" in the index of items to ik' and ask ourselves whether ik'
has reached n-1.  If it has, then there's no room to increment there,
and we must continue to back through the array's entries until we find
a place that can be incremented.

Ultimately we reach the beginning of the array, and if i1 = n-k+1,
then we are done; we've gone from subset [1,..,k] all the way to
subset [n-k+1,..,n], and there are no more subsets to check.

In the meantime when we do locate a position in the array where an
index can be incremented, we do so and reset all the remaining indexes
beyond that point to their successive values.

An illustration may help.  Suppose that we are trying to enumerate the
3-subsets of 9 items.  If the current s is represented by [1,5,9],
then when we check the final index we find it cannot be incremented
further.  Backing up one entry, we see that 5 < 8, so we still have
room there.  Hence:

next([1,5,9]) = [1,6,7]

The following iteration will discover that there is new room in the
last entry to increment, and thus the next operation will produce
[1,6,8].  And so on.

A skillful implementation of the data structure iteration can be more
efficient in a high level language than the nesting for-loops
approach, though this requires careful attention to the details.  The
"high level" code sketched above would be only a crude road map to
such an implementation.  The efficiency comes from needing "most of
the time" only to check the final entry in the array/subset and
increment that one, so that as a fraction of operations it is like
spending most of the time in the innermost nested loop.  However to
make the implementation competitive, one must combine the "test" for s
< {n,...,n-k+1} with the "work" of discovering the furthest index that
has room to increment.  When one digs into coding this, one discovers
that this is a natural consequence of computing such index j.  While
we have not yet reached the end of the enumeration j will lie between
k and 1, but when we finish, this manifests itself in "computing" the
index j = 0.

If you have any need to clarify the algorithmic ideas here, I would be
happy to provide that as a Clarification to this Question.

regards, mathtalk-ga

Comments

Subject: Re: All possible Combinations
From: samrat_barve-ga on 11 May 2004 23:06 PDT

Hi prashant,

I think u have situation which needs choosing the best combination out
of all possible combinations.

for example :
Suppose u have some stamps of certain denomination for example 
let us take denomination of stamps as 0.5,7,10 bucks ..
u want to buys stamps of 22 bucks such that the number of stamps are minimum .

in above situation the best combination will be 3 stamps of  7 bucks
and 2 stamps of 0.5 bucks ..

and not the 2 stamps of 10 bucks and 4 stamps of 0.5 bucks ..

Am  I correct prashant ?? 

Thnxs ,
Samrat.A.Barve

Subject: Re: All possible Combinations
From: geekpankaj-ga on 05 Jul 2004 21:03 PDT

is the answer what you were seeking. i mean in any case you willneed
to make n! computations, using recursion or iteration. i am unable to
understand how the answer that is given above reduces the computations
and hence the time taken to solve your problem??

Subject: Re: All possible Combinations
From: mathtalk-ga on 06 Jul 2004 09:26 PDT

It is possible to use a construction that enumerates the combinations
in a fixed order to parcel out the possibilities to several
processors/computers.

Also, it is hinted in the Question that some objective function (a
"weighting" of the combinations) is to be optimized.  Combining this
"search" with the above split/branching of the search space leads to
the idea of alpha/beta pruning for parallel optimization.

regards, mathtalk-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy