Google Answers Logo
View Question
 
Q: Number Packing ( Answered,   4 Comments )
Question  
Subject: Number Packing
Category: Science > Math
Asked by: shopsinc-ga
List Price: $6.00
Posted: 04 Jun 2002 17:20 PDT
Expires: 11 Jun 2002 17:20 PDT
Question ID: 21050
Some friends and I were discussing a website in which they had
something like a billion places of pi in a database, and a visitor
could search for any given number to see if and where it appeared in
pi. This brought me to my question:

What is the minimum number of digits necessary to create a number that
contains all numbers of n length. (This could be just as simply
applied to strings I believe)

For example, if you wanted to solve for a 1 digit number it would be
easy, all you would need is:

0123456789

but for two it gets more complicated because you can have portions of
one number represented inside another. For example if your first
number is

11

your second number could be

12

then you have only used 3 digits 112 to represent 2 numbers. This
effect
becomes amplified when you use larger numbers.

When we thought of this, my friends and I were using phone numbers. So
the
question would be, what is the minimum length a number would need to
be to
contain every 10 digit number. Additionally, what is the formula used
to come up with this answer?

Thanks,
Charlie
Answer  
Subject: Re: Number Packing
Answered By: blubs_bstr-ga on 04 Jun 2002 20:45 PDT
 
Hello Charlie,

Your question is an instance of the "Shortest Superstring Problem":

'The shortest-superstring problem is to find a shortest possible
string that contains every string in a given set as substrings.'
[http://epubs.siam.org/sam-bin/dbq/article/28612]

This problem is NP-Hard. That's a technical term meaning the problem
belongs to a set of problems which are computationally very difficult
to solve, i.e. taking an exponential amount of time.

You have listed two questions.

The first:
What is the minimum number of digits necessary to create a number that
contains all numbers of n length?

We can quite simply develop rough upper and lower bounds. There are
exactly 10^n numbers of length 10, or 90% of these if you wish to
exclude leading 0's. The smallest possible string containing all these
numbers must contain at least

L = 10^n + n - 1  digits

That's because such a string contains exactly 10^n substrings of
length n (one starting at each of the first 10^n positions).

If each one of these substrings were unique, then we'd have the string
we're looking for, however this is hard to guarantee.

And an upper bound would be

M = n*10^n digits

which is arrived at simply by listing the 10^n numbers, each of length
n.

Notice the upper and lower bound are only different by a factor of
approximately n, so we have an approximate answer.

Your second question was:
...what is the minimum length a number would need to be to contain
every 10 digit number?

If we use the formulas above, and n=10, we see that the shortest
string must be somewhere between 10 billion and 100 billion digits
long. To find the length of the shortest string requires finding the
shortest string with the desired property somewhere in this range, and
is decidedly very difficult.

"Informally, the key to any algorithm for the shortest superstring
problem is to identify sets of strings with large amounts of
similarity, or overlap."
[http://www.cs.dartmouth.edu/reports/abstracts/TR94-214/]

There has been a good amount of interest in this problem since it can
be applied to finding sequences DNA in the field of molecular biology.
An approximate solution to the problem is a string that is guaranteed
to be not too much longer than the shortest string itself. For
example, the algorithm described in this paper
[http://epubs.siam.org/sam-bin/dbq/article/28612], finds a string that
is 2.89 times longer than the shortest string.

Here's another paper on an approximation algorithm for the shorest
superstring:
http://citeseer.nj.nec.com/cachedpage/1581/1

And the keywords I used:

shortest superstring problem
://www.google.com/search?q=shortest+superstring+problem

shortest superstring problem approximation
://www.google.com/search?q=shortest+superstring+problem+approximation

shortest length string containing all substrings
://www.google.com/search?q=shortest+length+string+containing+all+substrings

I suggest you follow those search links if you are interested in a
detailed mathematical discussion of the ultimate answer to your
question.

Hope that helps!
Comments  
Subject: Re: Number Packing
From: rbnn-ga on 04 Jun 2002 23:52 PDT
 
Just another addendum: the shortest sequence of containing all
length-n digit strings as continuous substrings is, therefore, 10^n+9
.
Subject: Re: Number Packing
From: rbnn-ga on 05 Jun 2002 11:53 PDT
 
A de Bruijn sequence for length-n words over an alphabet of size m is
a minimal length finite sequence of characters from the alphabet such
that every word of length n formed from the alphabet occurs as a
contiguous subsequence of the sequence WITH WRAPAROUND (so the de
Bruijn sequence is a circular sequence).

It is a fact that there are length-n deBruijn sequences over the
alphabet of size 10 with minimal length of 10^n (i.e., 10 billion).
Such a de Bruijn sequence, however, does not quite solve your problem,
because in your case you do not want to allow wraparound in the
sequence. However, you can reach a minimal linear sequence (without
wraparound) from a linear sequence (with wraparound) simply by
appending the first 9 characters of the de Bruijn sequence to the end
of the sequence.

Therefore, the answer to your question of the minimal string of digits
that contains as a contiguous substring each possible 10-digit string
is 10^10+9, or
  10,000,000,009

Links:
For an example of computing de Bruijn sequences in Mathematica, see:
http://www.mathematica.co.kr/mathcomm/05_discretemath/debruijn/01.html

For a good reference for de Bruijn sequences (and other mathematical
topics), see Eric Weisstein's note at:
http://mathworld.wolfram.com/deBruijnSequence.html .

This explains how to get a deBruijn sequence. You start with a de
Bruijn graph ( http://mathworld.wolfram.com/deBruijnGraph.html ) and
from there generate an Eulerian cycle in that graph (
http://mathworld.wolfram.com/EulerianCircuit.html ) .
Mathematica also has a way to get a deBruijn graph.

Another elegant way to look at de Bruijn sequences involves the theory
of finite fields, specifically coefficients of polynomials over finite
fields. There are references to this method on Usenet, e.g.,
http://groups.google.com/groups?hl=en&lr=&ie=UTF8&oe=UTF8&threadm=875262742.22893%40dejanews.com&rnum=1&prev=/groups%3Fq%3Dde%2Bbruijn%2Bsequence%26hl%3Den%26lr%3D%26ie%3DUTF8%26oe%3DUTF8%26selm%3D875262742.22893%2540dejanews.com%26rnum%3D1
and
http://groups.google.com/groups?q=de+bruijn+sequence&hl=en&lr=&ie=UTF8&oe=UTF8&selm=jacob-0812952034170001%40eriks.matematik.su.se&rnum=4
.

A claim of a simple algorithm for this, which I have not verified, is
http://groups.google.com/groups?q=de+bruijn+sequence&hl=en&lr=&ie=UTF8&oe=UTF8&selm=6pkf4a%24il9%241%40pale-rider.INS.CWRU.Edu&rnum=7
.

I would like to give an explicit example for you, but since you are
seeking 10-digit length 10 sequences, the answer would be about 10
gigabytes so I am going to pass on that. However, if you like I can
probably send you some smaller sequences satisfying your condition.

Search Strategy:

This was a most interesting question. For some reason I started out by
searching for Euwe sequences, which are similar to Thue sequences, and
are sequences that do not contain certain types of overlaps or
squares. It turns out that I was confusing the two Dutch names de
Bruijn and Euwe (who, by the way, was a famous chess champion). Once I
realized the de Bruijn sequence was what I was looking for, I had to
verify that the condition could be modified to solve your problem,
since in your problem you do not allow wraparound in the sequence, and
de Bruijn does.

It is important to search under both deBruijn and "de Bruijn" as both
spellings occur.

I hope that this is helpful. Thanks for reminding me of some nice
math.
------
Subject: Re: Number Packing
From: rjyanco-ga on 14 Jun 2002 01:57 PDT
 
"And an upper bound would be M = n*10^n digits which is arrived at
simply by listing the 10^n numbers, each of length n."

This is obviously incorrect; you can always make a string that doesn't
work, so there is no upper bound.  (Of course, this doesn't work for
the lead-in pi question, since the digit 7 isn't one of the first
1*10^1=10 digits in the decimal expansion.)

The shortest superstring problem may be NP-hard, but it's rather easy
to see that creating a 10, 101, 1002, 10003, etc., digit string
containing all 1, 2, 3, 4, etc., respectively, digit strings is not
hard at all.  You simply think of the problem in graph-theoretic
terms.  Each desired string (for three digits, 000, 001, 002, ... 998,
999) is a vertex.  From each vertex, draw an arrow to each vertex
corresponding to a possible next string.  So from 123, you have arrows
to 230, 231, ..., 239.  Yes, this is a lot of arrows.  Now you notice
that there are always 10 arrows into, and 10 arrows out of, each
vertex (since there are 10 digits in base ten).  That means the graph
is traversable:  start at a vertex and go anywhere you want, deleting
all other "out" arrows when you use one.  You're guaranteed to end up
where you started without getting stuck.  You may not have hit all the
vertices, but what remains has the same property, and when you're done
it's not hard to splice your loops of numbers together.

There's an interesting article on de Bruijn sequences in Ian Stewart's
_Game, Set, and Math_ (out of print, but it's easy to find a used
copy).  I haven't read it in a long time, though.  The article began
with a discussion of snakes eating their own tails.

Hope this helps.
Subject: Re: Number Packing
From: blubs_bstr-ga on 14 Jun 2002 03:17 PDT
 
Hello rjyanco,

In response to:
------
"And an upper bound would be M = n*10^n digits which is arrived at
simply by listing the 10^n numbers, each of length n."
 
This is obviously incorrect; you can always make a string that doesn't
work, so there is no upper bound. 
------

It is clear to the questioner that you can generate an arbitrarly long
string
without the desired properties. I was using 'upper bound' in the sense
that the
shortest string need never be longer than M, and in that sense it is
an upper bound.

Although I mentioned that the shortest possible string would have to
be of length at least 10^10 + 9, I neglected to mention that the de
Bruijn sequence is such a string. I apologize for this large oversight
on my part.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy