|
|
Subject:
Compressing short text strings
Category: Computers > Programming Asked by: tseven-ga List Price: $15.00 |
Posted:
21 Mar 2005 09:23 PST
Expires: 20 Apr 2005 10:23 PDT Question ID: 498051 |
My goal: To compress a fairly short text string into an even shorter string. For example: String to compress: 813459546754565ROBERT.JOHNSON Possible compressed string outcome: A$21#JSK I will be integrating this function into an Excel Spreadsheet using Visual Basic. I am looking for at least an algorithm or example function written in any language (preferably VB) so I may adapt it to my program. I could settle for a free self-contained binary (exe) which will allow me to pass the string to and from a standard text file. Reason: I need to take a rather large string and shorten it so that it may be read by a bar scanner. There are some characters which the bar scanner will not recognize. I have seen Software Key generators take a string such as "John Smith" and translate it to a 5 digit number. This is what I would like to do. Links to the history of text compression will not help me. I need source code examples or at minimum an working algorithm which I can attempt to implement in Visual Basic. Thanks, Steve | |
|
|
There is no answer at this time. |
|
Subject:
Re: Compressing short text strings
From: frde-ga on 21 Mar 2005 11:42 PST |
I think that you have seen CRC16 algorithms They take a string of bytes and produce a number between 0 and 65535 (note the 5 digits) The numbers are not guaranteed to be unique to that string - but the chances are high, and if you control 'both ends' you can guarantee it Since converting '48637' back to '813459546754565ROBERT.JOHNSON' is totally impossible without some sort of lookup table - you might as well trash the CRC16 and use a lookup table of your own devising. You can make a hamburger out of a cow - but making a cow out of (partial) hamburgers is not easy |
Subject:
Re: Compressing short text strings
From: philnj-ga on 21 Mar 2005 13:48 PST |
Do you need a deterministic compression method? I mean, if you print a bar code, must you be able to determine what the original text was simply from the information contained in the bar code? Or can the bar code writer and the bar code reader share a database? If so, then you can generate a key for each entry, and bar code the key, the reader would read the key and do a database search to retrieve the info. Another possibility is using Huffman Compression. I seem to remember that it uses a statistical approach to substitute common characters for shorter bit sequences. The length of the resulting code is not guaranteed to be a ceratin size given a fix input size. A long time ago, there was a system that I think was called radix-56 (or "radix-something") used by DEC to store strings. Maybe there are some old PDP-11 guys (or gals) out there who remember this system of compression. There is a practical limit to the amount of compression you can accomplish and still retain a usable amount of information. Sorry I can't be of more help. |
Subject:
Re: Compressing short text strings
From: tseven-ga on 21 Mar 2005 21:09 PST |
To frde: You are probably correct in the crc16 algorithms. I thought it was more advanced than that. To philnj: Yes it must be deterministic, as I cannot predict the strings contents nor exact length. A common database would not be possible. Thanks for the thoughts, it has given me new avenues to look down. |
Subject:
Re: Compressing short text strings
From: bbcbasic-ga on 25 Mar 2005 14:57 PST |
> A long time ago, there was a system that I think was called radix-56 > (or "radix-something") used by DEC to store strings It was called RAD50 (where the '50' is octal for 40). The principle behind it was that 40-cubed is 64000, so will fit in a single 16-bit word. It used a limited character set of 40 characters (letters A-Z, numbers 0-9 and four others) and could store three such characters in a 16-bit word. With RAD50 the string "813459546754565ROBERT.JOHNSON" would compress to 10 words (20 bytes). Encoding and decoding RAD50 is very straightforward. If you want some BASIC code to do it let me know. |
Subject:
Re: Compressing short text strings
From: 0x426f62-ga on 28 Mar 2005 10:35 PST |
The constraints of your request as I understand it (that is be deterministic, have no constraints as to length or character set, and reversable without a data base) render the solution basically impossible. Each 8-bit byte of data represents one of 256 possible pieces of information (characters). If you allow the constraint that your character set is limited to the 40 characters suggested earlier, that takes 5.5 bits to encode. The most you save is about 2.5 bits per character. The most you can compress your data without using pattern or predictive techniques will be 24 bits down to 16. If your input character set has to be larger than 40, then your initial compression will be even less. But this isn't the worst news. You say your bar code can't carry every possible 8-bit combination. But you haven't said exactly how large the character set is that you can use to do the encoding. Assuming it's 7-bits (just a guess), then your best compression is going to save you 1.5 bits per 8-bit character. Your best reduction will be 18%. If you allow linear techniques, then your compression can and probably will be greater most of the time, but no matter what technique you choose, there will be a "worst case" out there that ends up occupying the full 24 characters for your 30 character string. You might want to re-think your whole problem, or post the underlying question. |
Subject:
Re: Compressing short text strings
From: divineflight-ga on 30 Mar 2005 16:53 PST |
I'd hate to say this but compressing '813459546754565ROBERT.JOHNSON' to something as short as 'A$21#JSK' is nearly impossible. I say nearly because it could happen if significant portion of the original string is repeated. Lookup tables won?t help either and will probably result tension. |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |