|
|
Subject:
Number Packing
Category: Science > Math Asked by: shopsinc-ga List Price: $6.00 |
Posted:
04 Jun 2002 17:20 PDT
Expires: 11 Jun 2002 17:20 PDT Question ID: 21050 |
Some friends and I were discussing a website in which they had something like a billion places of pi in a database, and a visitor could search for any given number to see if and where it appeared in pi. This brought me to my question: What is the minimum number of digits necessary to create a number that contains all numbers of n length. (This could be just as simply applied to strings I believe) For example, if you wanted to solve for a 1 digit number it would be easy, all you would need is: 0123456789 but for two it gets more complicated because you can have portions of one number represented inside another. For example if your first number is 11 your second number could be 12 then you have only used 3 digits 112 to represent 2 numbers. This effect becomes amplified when you use larger numbers. When we thought of this, my friends and I were using phone numbers. So the question would be, what is the minimum length a number would need to be to contain every 10 digit number. Additionally, what is the formula used to come up with this answer? Thanks, Charlie |
|
Subject:
Re: Number Packing
Answered By: blubs_bstr-ga on 04 Jun 2002 20:45 PDT |
Hello Charlie, Your question is an instance of the "Shortest Superstring Problem": 'The shortest-superstring problem is to find a shortest possible string that contains every string in a given set as substrings.' [http://epubs.siam.org/sam-bin/dbq/article/28612] This problem is NP-Hard. That's a technical term meaning the problem belongs to a set of problems which are computationally very difficult to solve, i.e. taking an exponential amount of time. You have listed two questions. The first: What is the minimum number of digits necessary to create a number that contains all numbers of n length? We can quite simply develop rough upper and lower bounds. There are exactly 10^n numbers of length 10, or 90% of these if you wish to exclude leading 0's. The smallest possible string containing all these numbers must contain at least L = 10^n + n - 1 digits That's because such a string contains exactly 10^n substrings of length n (one starting at each of the first 10^n positions). If each one of these substrings were unique, then we'd have the string we're looking for, however this is hard to guarantee. And an upper bound would be M = n*10^n digits which is arrived at simply by listing the 10^n numbers, each of length n. Notice the upper and lower bound are only different by a factor of approximately n, so we have an approximate answer. Your second question was: ...what is the minimum length a number would need to be to contain every 10 digit number? If we use the formulas above, and n=10, we see that the shortest string must be somewhere between 10 billion and 100 billion digits long. To find the length of the shortest string requires finding the shortest string with the desired property somewhere in this range, and is decidedly very difficult. "Informally, the key to any algorithm for the shortest superstring problem is to identify sets of strings with large amounts of similarity, or overlap." [http://www.cs.dartmouth.edu/reports/abstracts/TR94-214/] There has been a good amount of interest in this problem since it can be applied to finding sequences DNA in the field of molecular biology. An approximate solution to the problem is a string that is guaranteed to be not too much longer than the shortest string itself. For example, the algorithm described in this paper [http://epubs.siam.org/sam-bin/dbq/article/28612], finds a string that is 2.89 times longer than the shortest string. Here's another paper on an approximation algorithm for the shorest superstring: http://citeseer.nj.nec.com/cachedpage/1581/1 And the keywords I used: shortest superstring problem ://www.google.com/search?q=shortest+superstring+problem shortest superstring problem approximation ://www.google.com/search?q=shortest+superstring+problem+approximation shortest length string containing all substrings ://www.google.com/search?q=shortest+length+string+containing+all+substrings I suggest you follow those search links if you are interested in a detailed mathematical discussion of the ultimate answer to your question. Hope that helps! |
|
Subject:
Re: Number Packing
From: rbnn-ga on 04 Jun 2002 23:52 PDT |
Just another addendum: the shortest sequence of containing all length-n digit strings as continuous substrings is, therefore, 10^n+9 . |
Subject:
Re: Number Packing
From: rbnn-ga on 05 Jun 2002 11:53 PDT |
A de Bruijn sequence for length-n words over an alphabet of size m is a minimal length finite sequence of characters from the alphabet such that every word of length n formed from the alphabet occurs as a contiguous subsequence of the sequence WITH WRAPAROUND (so the de Bruijn sequence is a circular sequence). It is a fact that there are length-n deBruijn sequences over the alphabet of size 10 with minimal length of 10^n (i.e., 10 billion). Such a de Bruijn sequence, however, does not quite solve your problem, because in your case you do not want to allow wraparound in the sequence. However, you can reach a minimal linear sequence (without wraparound) from a linear sequence (with wraparound) simply by appending the first 9 characters of the de Bruijn sequence to the end of the sequence. Therefore, the answer to your question of the minimal string of digits that contains as a contiguous substring each possible 10-digit string is 10^10+9, or 10,000,000,009 Links: For an example of computing de Bruijn sequences in Mathematica, see: http://www.mathematica.co.kr/mathcomm/05_discretemath/debruijn/01.html For a good reference for de Bruijn sequences (and other mathematical topics), see Eric Weisstein's note at: http://mathworld.wolfram.com/deBruijnSequence.html . This explains how to get a deBruijn sequence. You start with a de Bruijn graph ( http://mathworld.wolfram.com/deBruijnGraph.html ) and from there generate an Eulerian cycle in that graph ( http://mathworld.wolfram.com/EulerianCircuit.html ) . Mathematica also has a way to get a deBruijn graph. Another elegant way to look at de Bruijn sequences involves the theory of finite fields, specifically coefficients of polynomials over finite fields. There are references to this method on Usenet, e.g., http://groups.google.com/groups?hl=en&lr=&ie=UTF8&oe=UTF8&threadm=875262742.22893%40dejanews.com&rnum=1&prev=/groups%3Fq%3Dde%2Bbruijn%2Bsequence%26hl%3Den%26lr%3D%26ie%3DUTF8%26oe%3DUTF8%26selm%3D875262742.22893%2540dejanews.com%26rnum%3D1 and http://groups.google.com/groups?q=de+bruijn+sequence&hl=en&lr=&ie=UTF8&oe=UTF8&selm=jacob-0812952034170001%40eriks.matematik.su.se&rnum=4 . A claim of a simple algorithm for this, which I have not verified, is http://groups.google.com/groups?q=de+bruijn+sequence&hl=en&lr=&ie=UTF8&oe=UTF8&selm=6pkf4a%24il9%241%40pale-rider.INS.CWRU.Edu&rnum=7 . I would like to give an explicit example for you, but since you are seeking 10-digit length 10 sequences, the answer would be about 10 gigabytes so I am going to pass on that. However, if you like I can probably send you some smaller sequences satisfying your condition. Search Strategy: This was a most interesting question. For some reason I started out by searching for Euwe sequences, which are similar to Thue sequences, and are sequences that do not contain certain types of overlaps or squares. It turns out that I was confusing the two Dutch names de Bruijn and Euwe (who, by the way, was a famous chess champion). Once I realized the de Bruijn sequence was what I was looking for, I had to verify that the condition could be modified to solve your problem, since in your problem you do not allow wraparound in the sequence, and de Bruijn does. It is important to search under both deBruijn and "de Bruijn" as both spellings occur. I hope that this is helpful. Thanks for reminding me of some nice math. ------ |
Subject:
Re: Number Packing
From: rjyanco-ga on 14 Jun 2002 01:57 PDT |
"And an upper bound would be M = n*10^n digits which is arrived at simply by listing the 10^n numbers, each of length n." This is obviously incorrect; you can always make a string that doesn't work, so there is no upper bound. (Of course, this doesn't work for the lead-in pi question, since the digit 7 isn't one of the first 1*10^1=10 digits in the decimal expansion.) The shortest superstring problem may be NP-hard, but it's rather easy to see that creating a 10, 101, 1002, 10003, etc., digit string containing all 1, 2, 3, 4, etc., respectively, digit strings is not hard at all. You simply think of the problem in graph-theoretic terms. Each desired string (for three digits, 000, 001, 002, ... 998, 999) is a vertex. From each vertex, draw an arrow to each vertex corresponding to a possible next string. So from 123, you have arrows to 230, 231, ..., 239. Yes, this is a lot of arrows. Now you notice that there are always 10 arrows into, and 10 arrows out of, each vertex (since there are 10 digits in base ten). That means the graph is traversable: start at a vertex and go anywhere you want, deleting all other "out" arrows when you use one. You're guaranteed to end up where you started without getting stuck. You may not have hit all the vertices, but what remains has the same property, and when you're done it's not hard to splice your loops of numbers together. There's an interesting article on de Bruijn sequences in Ian Stewart's _Game, Set, and Math_ (out of print, but it's easy to find a used copy). I haven't read it in a long time, though. The article began with a discussion of snakes eating their own tails. Hope this helps. |
Subject:
Re: Number Packing
From: blubs_bstr-ga on 14 Jun 2002 03:17 PDT |
Hello rjyanco, In response to: ------ "And an upper bound would be M = n*10^n digits which is arrived at simply by listing the 10^n numbers, each of length n." This is obviously incorrect; you can always make a string that doesn't work, so there is no upper bound. ------ It is clear to the questioner that you can generate an arbitrarly long string without the desired properties. I was using 'upper bound' in the sense that the shortest string need never be longer than M, and in that sense it is an upper bound. Although I mentioned that the shortest possible string would have to be of length at least 10^10 + 9, I neglected to mention that the de Bruijn sequence is such a string. I apologize for this large oversight on my part. |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |