 View Question Question
 Subject: All possible Combinations Category: Computers > Algorithms Asked by: prashant_jain-ga List Price: \$15.00 Posted: 09 May 2004 16:28 PDT Expires: 08 Jun 2004 16:28 PDT Question ID: 343709
 ```I am trying to find an efficient way to find all possible combinations of N things taken K at a time. Now I know this is possible to do it when N and K are small. But my problem is somewhat difficult. My N can be anything in the order of 1000 to 5000 and K can be anything from 5-10. Now if I try to write any simple looping code, it takes forever to do it. I am looking for someway to do this in a decent time (less than 10 hrs). I would be obliged if someone could help me out with this one.``` Request for Question Clarification by mathtalk-ga on 09 May 2004 20:13 PDT ```Hi,prashant_jain-ga: It is possible that the simple looping code to which you refer is efficient (though I cannot be sure of this without seeing it). Even using the lowest range of your numbers, 1000 things taken 5 at a time, there would be 8,250,291,250,200 possible combinations. If you could process 10 million combinations per second, it would take about ten days of running time to exhaust all of them. So while I can indeed provide quite efficient methods for generating all the possible combinations described, it seems that to do the larger solution sets mentioned in less than 10 hours is unrealistic. If despite this you are still interested in efficient algorithms for producing all combinations of N things taken K at a time, please post a Clarification. regards, mathtalk-ga``` Clarification of Question by prashant_jain-ga on 10 May 2004 09:15 PDT ```Hi, I am still interested in getting an efficient algorithm for doing this. Another thing is that, could you tell me any way that I could solve this problem in say like a 12 hr time. What I have to do is to find a best weighted subset of k things out of n things. All I could think of is trying the brute force method of getting all possible combinations of size k taken out of n. I would be glad if you could help me out. Thanks, Prashant``` Request for Question Clarification by mathtalk-ga on 10 May 2004 09:38 PDT ```Hi, prashant_jain-ga: I'm not sure what "best weighted subset" means in your circumstance, but it does suggest that there may be much more efficient algorithms than the "brute force" approach of trying all k-subsets. When the "solution space" to be searched has the form of a tree, one heuristic that is often applied to reduce the amount of searching is called alpha-beta pruning. The intuitive idea is to eliminate branches of the tree which either seem unpromising or conclusively cannot lead to an improvement over tentative solutions already identified. I would guess that you have an "objective function" which expresses the criterion for judging which weighted k-subset is best. Please explain this notion in greater detail so that I can tailor my suggestions more closely to your actual problem. regards, mathtalk-ga``` Subject: Re: All possible Combinations Answered By: mathtalk-ga on 13 May 2004 18:39 PDT
 ```Hi, prashant_jain-ga: There are several ways to efficiently enumerate all the k-subsets of n items. The three methods that I'm about to discuss are all based on a common ordering of the solution search space, so that the same methods might be used to divide up the search space between multiple computers. If you have a hard requirement to perform such a long computation in a fixed time period, then ultimately "parallel processing" may be your best solution, as more efficient implementations can only push the performance speed to a certain limit. Most likely the approach you'll want to code is the first method that I'll outline. It is easy to understand and program and very efficient; it's one drawback is having a "hard coded" dependence on the size k of subsets. Two more methods are then outlined for the sake of completeness which overcome that shortcoming. We will assume that the selectable items themselves can be indexed, e.g. through an array of "objects", so that our problem can without much indirection be reduced to one of selecting k-subsets of {1,...,n} (or for the C purists, of {0,...,n-1}). The first method that I'd like to discuss may be the closest to what you are already doing, since it involves a nested set of loops. Now this method, because it involves a fixed number of loops corresponding to the cardinality k of the subsets, has a disadvantage in terms of generality. However the range of values for k which you cited is narrow (k = 5 to 10), so this disadvantage may be a bit academic. One can easily write this sort of code six times for the given values of k and be done. I will try to express the algorithms in a suitably intuitive "pseudocode", but if a construction is unclear, please let me know and I'll be happy to provide further clarification. Suppose that we have an objective function F(..) that applies to any k-subset we select, and we want to evaluate this for every k-subset in order to determine the largest value. Since this first method is "k specific", we will assume that in fact F(..) takes k arguments, although the value returned is independent of the order of these arugments (as a set is independent of any order of its elements). [If your notion of "best weighted subset" is a minimum rather than a maximum, then this change in perspective is easily accomodated, e.g. by taking the minimum over the values found rather than the maximum as here, or by exchanging -F for F in the code.] So "nest" k for-loops in this manner (ik' denotes the predecessor variable to ik): 0. Let max := F(1,2,..,k). 1. for i1 := 1 to n-k+1 2. for i2 := i1+1 to n-k+2 . . . k. for ik := ik'+1 to n k+1. Let new := F(i1,i2,...,ik',ik). k+2. if new > max then max := new next ik . . . next i2 next i1 What this approach does is to enumerate the k-subsets of {1,...,n} in so-called lexicographic ordering (ascending) on the sorted representations i1,i2,..,ik of these subsets. The arrangements of nested index ranges guarantees that these arguments will always be generated in ascending order and that every possible combination is generated. Aside from the necessity of committing to a specific value of k, this method is attractive because it is simple to code and efficient. If the looping is done in C (or even better, in hand coded assembly!), then it achieves the purpose of enumeration with quite low overhead. In fact for the given counts 5 <= k <= 10, the overhead of the looping is certainly comparable to the overhead of the inner function call (meaning just the stack operations, not the execution thereof), so that your CPU cycles will be concentrated on the evaluation of the objective function. The second method that I will outline uses function recursion to enumerate the k-subsets. Here depth of recursion takes the place of loop nesting to order the distinct items selected. It is less efficient in most architectures to do a function call than to increment a loop variable, as the stack frame must be pushed and popped, but this may be negligible in comparison to the evaluation of your objective function. The nice thing is that the size k of subsets need not be "hard coded", because it can be represented instead by a function argument. It can be a little difficult to grasp how recursion takes the place of looping though. Many such implementations are possible, but let me throw out some ideas. At the deepest level we will need to access all the items chosen at outer levels of recursion (so that the objective function can be evaluated there). This can be accomplished either by a "static" array in the given function (or "global" array, visible there and elsewhere in the program) or by passing an array of selections along as a function argument. [I'm describing the structure as an array for simplicity, but it could be any of various "collection class" datatypes, e.g. lists or queues.] In this approach we might have a top level call to function pick(..) with one argument that specifies the number of items to choose and another that specifies the items chosen. For notational convenience I'll use a Prolog-like syntax for lists to represent the array of items chosen. Thus, at the top level we would invoke: pick(k,[]) The function pick(..) will then (repeatedly) call itself, decrementing the value of k by 1 and adjoining a selected element (greater than any chosen so far by "outer" calls), until by recursion one reaches the inner call: pick(0,[i1,i2,...,ik]) At that point we are ready to call the object function, written to accept the array or list as an argument: F([i1,i2,...,ik]) The inner invocation of pick(..) must also keep track of the maximum value returned by an objective function call, and to do this a global variable "max" probably makes the most sense (though it is possible to do it with "output" arguments for the function pick as well). The function pick(..) itself does not return (from a given level) until it has "tried" all the possible items to adjoin to list/array passed in to it. This approach does avoid the need to hard code the size k of the subsets, but at a price of some loss in efficiency and (arguably) in legibiity of the code, since the logic for enumerating the subsets is now "spread around" among coordinated calls to function pick(..). The third method that I'll propose is an attempt to have the best of both approaches by using a data structure iteration. We can avoid the overhead of function recursion on the one hand and avoid having to hard code the size k of subsets on the other. The characteristic of this data structure is that it represents any of the lexicographically ordered k subsets, and indeed has an associated "next" operation that advances from one such subset to the successor in this ordering. At a high level the algorithm can be written this way: Let s := {k,...,2,1}, max = F(s). while s < {n,...,n-k+1} begin s := next(s) max := greaterof(max, F(s)) end I've written the elements of subset s "backwards" to suggest the implementation for the operation "next", which for efficiency works mostly on the "tail" of the selections. We can in fact implement the data structure for s as array of length k, so this can be done in almost any programming language that supports arrays. Conceptually though the individual array needs to be taken in context with the values n (defining the range of possible selections) and k (the size of the subset or "array length" essentially). To implement the next(s) operation one starts at the end of an array representing s: [i1,i2,...,ik] and locates the "farthest" index j between 1 and k such that there is room to increment the selection in that slot. To describe a bit more clearly, once ik reaches n, it no longer has "room to increment" because n is the largest possible value. So if we want to move up in the ordering of subsets, we have to move "backwards" in the index of items to ik' and ask ourselves whether ik' has reached n-1. If it has, then there's no room to increment there, and we must continue to back through the array's entries until we find a place that can be incremented. Ultimately we reach the beginning of the array, and if i1 = n-k+1, then we are done; we've gone from subset [1,..,k] all the way to subset [n-k+1,..,n], and there are no more subsets to check. In the meantime when we do locate a position in the array where an index can be incremented, we do so and reset all the remaining indexes beyond that point to their successive values. An illustration may help. Suppose that we are trying to enumerate the 3-subsets of 9 items. If the current s is represented by [1,5,9], then when we check the final index we find it cannot be incremented further. Backing up one entry, we see that 5 < 8, so we still have room there. Hence: next([1,5,9]) = [1,6,7] The following iteration will discover that there is new room in the last entry to increment, and thus the next operation will produce [1,6,8]. And so on. A skillful implementation of the data structure iteration can be more efficient in a high level language than the nesting for-loops approach, though this requires careful attention to the details. The "high level" code sketched above would be only a crude road map to such an implementation. The efficiency comes from needing "most of the time" only to check the final entry in the array/subset and increment that one, so that as a fraction of operations it is like spending most of the time in the innermost nested loop. However to make the implementation competitive, one must combine the "test" for s < {n,...,n-k+1} with the "work" of discovering the furthest index that has room to increment. When one digs into coding this, one discovers that this is a natural consequence of computing such index j. While we have not yet reached the end of the enumeration j will lie between k and 1, but when we finish, this manifests itself in "computing" the index j = 0. If you have any need to clarify the algorithmic ideas here, I would be happy to provide that as a Clarification to this Question. regards, mathtalk-ga``` ```Hi prashant, I think u have situation which needs choosing the best combination out of all possible combinations. for example : Suppose u have some stamps of certain denomination for example let us take denomination of stamps as 0.5,7,10 bucks .. u want to buys stamps of 22 bucks such that the number of stamps are minimum . in above situation the best combination will be 3 stamps of 7 bucks and 2 stamps of 0.5 bucks .. and not the 2 stamps of 10 bucks and 4 stamps of 0.5 bucks .. Am I correct prashant ?? Thnxs , Samrat.A.Barve```
 ```is the answer what you were seeking. i mean in any case you willneed to make n! computations, using recursion or iteration. i am unable to understand how the answer that is given above reduces the computations and hence the time taken to solve your problem??```
 ```It is possible to use a construction that enumerates the combinations in a fixed order to parcel out the possibilities to several processors/computers. Also, it is hinted in the Question that some objective function (a "weighting" of the combinations) is to be optimized. Combining this "search" with the above split/branching of the search space leads to the idea of alpha/beta pruning for parallel optimization. regards, mathtalk-ga``` 