Google Answers Logo
View Question
 
Q: Algorithms ( No Answer,   7 Comments )
Question  
Subject: Algorithms
Category: Computers > Algorithms
Asked by: ace24-ga
List Price: $4.00
Posted: 26 Mar 2003 21:35 PST
Expires: 25 Apr 2003 22:35 PDT
Question ID: 181583
What techniques can be used to convert a 20 character unique number
into 'say' 8 character unique code (involving alphabet characters and
number characters if required)

Request for Question Clarification by efn-ga on 26 Mar 2003 23:37 PST
Could you specify the question a bit more precisely?

Is the input range any decimal number up to 20 decimal digits long?

How many possible values can be used in each of the output characters?
 Is there a specific set of characters that can be used in the output?

Is the output limited to 8 characters, or do you just want it as short
as you can get it?

Request for Question Clarification by maniac-ga on 27 Mar 2003 05:12 PST
Hello Ace24,

Generating a radix 36 (or another radix) conversion is pretty straight
forward. For example:
 - repeatedly use integer division (to get the result & remainder)
 - each remainder used to lookup the code for that digit in an array
However, as a comment noted somewhat tersely - you can't encode a
number up to
  - 10^20 (or 100000000000000000000)
using 8 characters (up to 256 possible values)
  - 256^8 (or  18446744073709551616)
it requires 13 radix 36 digits
  - 36^13 (or 170581728179578208256)

Is there some variation on the question you want answered instead
[such as a way to determine the minimum number of digits (in any base)
required to represent numbers up to 10^20]?

  --Maniac

Clarification of Question by ace24-ga on 27 Mar 2003 15:13 PST
Clarification to efn-ga's request
-  The input range is only numeric characters
-  I was hoping to restirct the output to combination of numeric
characters and alphabet characters
-  The output is not limited to 8 characters, I just want it to be as
short as I can get.  However, the input is also not restricted to 20
characters - it can go up to 30 characters

Thanks

Request for Question Clarification by efn-ga on 27 Mar 2003 20:32 PST
Hi ace24,

Thanks for the clarification.

The radix conversion approach described by Maniac is mathematically
optimal.  For a given input length and output character set, it will
give you the shortest possible output length.

A variant technique might give you either better performance or easier
implementation, at the cost of longer output.  The details would
depend on the specifics of the target platform, the exact
requirements, and what you were trying to optimize.

So possible answers to your question might include:

a.  A detailed, general, abstract explanation of radix conversion
(more verbose than the one in Maniac's request for clarification), for
any input length and output character set.

b.  A detailed, specific, theoretical explanation of radix conversion
for an input length of 30 and an output character set consisting of
digits 0-9 and upper-case letters A-Z (or some other specific
character set you specify).  ("Theoretical" means we don't worry about
implementation details like how you divide a 30-digit number on a real
computer.)

c.  A general discussion of ways radix conversion might be tweaked for
improved performance or ease of implementation.

d.  A specific discussion of how radix conversion might be tweaked for
improved performance or ease of implementation on some specified real
computer platform.

Which of these, if any, would you consider a satisfactory answer?

--efn

Clarification of Question by ace24-ga on 29 Mar 2003 23:36 PST
Clarification to denco-ga comment of 27 March

- Is it possible for you to elaborate on two unorthodox techniques you
have come up with.
- Yes, the primary purpose is to code the longer input (of numeric
characters)into a shorter code (it can go little bit outside of just
numbers and letters).
- Also can you also explain base-n approach.

Thanks for the comments from all the researchers.  I know each one of
them is valuable in one way or the other.

Thanks - ace24-ga
Answer  
There is no answer at this time.

Comments  
Subject: Re: Algorithms
From: xarqi-ga on 26 Mar 2003 21:56 PST
 
None.
Subject: Re: Algorithms
From: denco-ga on 26 Mar 2003 22:51 PST
 
If one is allowed to use an extended character set (extended ASCII
keys, such as é) then maybe you could use a modified Base-N (where
n would be some "large" number such as 36 or larger?) to extend the
base-n number to reduce the answer to the 8 character limit.

99999999999999999999 = L3R41IFS0QO40 in Base-36

If you then replaced all of the "L3" (for instance) in the resulting
Base-36 "numbers" with, say, ! it might work.  I don't know if there
are enough extended characters for it to work.  Up to 12 character or
so numbers, Base-36 by itself would suffice, and up to 16 characters
numbers you could use an extended character set scheme.

I don't know though, Base-N work makes my head hurt.

denco-ga
Subject: Re: Algorithms
From: carnegie-ga on 27 Mar 2003 08:38 PST
 
Dear Ace24,

May I expand a little on Xarqi's accurate comment?

As Maniac wrote, representing the required number of entities in a
sufficiently short code requires a sufficiently large character set. 
Assuming your numbers are indeed of 20 _decimal_ digits, there are 10
to the power 20 of them to consider.  In order to represent all these
by an 8-character code, you need an n-character set, where n is given
by:

  n^8 = 10^20

Taking logarithms of both sides:

  ln(n^8) = ln(10^20)
  8 ln(n) = ln(10^20)
  ln(n) = ln(10^20)/8
  n = exp(ln(10^20)/8)

The solution to this is a little over 316, so you need a set of 317
characters to achieve your requirement.  If you remember that an 8-bit
byte can carry 256 states, you will realise that this is quite large. 
But it is feasible as long as you can select 317 characters that your
readers will recognise and easily differentiate.

If your 20-digit number cannot take all possible values, you could get
away with a smaller character set at the expense of a more complicated
conversion algorithm.

I trust this helps.

Carnegie
Subject: Re: Algorithms
From: denco-ga on 27 Mar 2003 09:11 PST
 
If "all" you are trying to do is to have some kind of "coding" system
wherein a 20 digit sequence can be reduced to a 8 character (one that
might go outside the sphere of "just" numbers/letters) sequence, I've
come up with 2 unorthodox schemes to reduce it to 7 characters, yet
still be readable.  I am sure there more ways possible as well.
Subject: Re: Algorithms
From: xarqi-ga on 27 Mar 2003 20:23 PST
 
At the risk of being cryptic, rather than terse:
To map a sparse address space into a denser one, use a hash function.
Subject: Re: Algorithms
From: carnegie-ga on 28 Mar 2003 04:11 PST
 
Dear Ace24,

You now say that your input can be up to 30 digits long and that you
wish to restrict the output code to numeric and alphabetic characters.
 There is a little danger here, of course, as if the codes are to be
read by humans there will be confusion between 1 and I and between 0
and O.  You do not say whether you wish to use, say, all capitals or a
mixture of capital and lower case.  Using lower case, of course, would
add the confusion between 1, I, and l!

Just to give you some idea of the length, similar calculations to my
earlier one tell us that reducing a general 30-digit decimal number to
a combination of digits and capital letters (a 36-character set) would
need a 20-character code, and reducing it to a combination of digits,
capital letters, and lower case letters (a 62-character set) would
require a 17-character code.

One thing you should look at carefully is whether the input number can
take all posible 30-digit values - all one million million million of
them.  If not, it will be possible - by using a more sophisticated
algorithm - to shorten the resulting code.

I trust this helps. 
 
Carnegie
Subject: Re: Algorithms
From: denco-ga on 31 Mar 2003 18:49 PST
 
Howdy ace24!

- Is it possible for you to elaborate on two unorthodox techniques
you have come up with.

Well, if the intent of the coding is (for instance) to present to
a viewer a way of discerning a 20 character number:

12345678901234567890

from a (in this scheme) a 7 character "object" wherein the actual
20 character number would not be readily discernable to someone
that was not aware that there was coding system in place, then:

You create a custom set of new objects; these objects have the
visual aspect of numbers, but with additional information coded
in a visual manner as part of the objects.  This encoding could
be of a format that Optical Character Recognition (OCR) would be
able to read, as well as (with minimal training) human beings.

The first scheme (and there are probably more ways to do this)
would be in the form of (as an example) a black box with a white
number in the center, with white "notches" along the left, right
and maybe the top and bottom sides of the boxes.  So, to represent
the number 999 you would have a black box with a white 9 in the
center, and 9 white notches arranged across the top and left sides,
and nine white notches along the right and bottom sides.

You then break the 20 character number into 6 sets of 3 characters
and 1 set of 2 characters and encode each set with the above style
of characters.

The second scheme is similar (except not as "elegant") except it
uses color and font differences for the encoding.  One could borrow
the color code from the electronic part of resistors, wherein the
color brown is a 1, the color red is a 2, etc.  The larger a font,
the larger the number, so that 222 might be coded to be a red number
2 of a font size of 12.

You could do a hybrid of both of the above; a red 7 in a box with
3 notches would represent 327.

- Also can you also explain base-n approach.

Here goes; to reduce your 20 digit character the first step, you
can first reduce it in size by depicting it in another way, such
as Base-36 (it starts out being depicted in Base-10), so:

99999999999999999999 = L3R41IFS0QO40 in Base-36

This gets us down to (as maniac points out) 13 digits.

To further reduce this set, as this way of presenting digits does
not include lower case letters or an extended character set, such
as ñ or Ñ, etc. perhaps you could take subsets of the end product
(L3R41IFS0QO40 above) and further encode sections.  So 3R would be
depicted by (for instance) an ampersand (&), with an end result of
a 7 character encoding (the example is purely that):

99999999999999999999 = L3R41IFS0QO40 = L&e*SQd

The problem with the Modified Base-N scheme (as carnegie points out
in different ways) is that you would need 10 (the numbers times 26
(the alphabet) or 260 unique "other" characters for the second step
of encoding.  If you use the full extended character set you might
have enough, but you end up with something like my first oddball
method except tougher to encode and decode.

Don't know if this help or just make things worse.

Looking Forward, denco

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy