|
|
Subject:
Exact CSV file format specification (as exported from Microsoft Office 2002)
Category: Computers > Programming Asked by: tomazos-ga List Price: $5.00 |
Posted:
26 Jul 2004 23:50 PDT
Expires: 25 Aug 2004 23:50 PDT Question ID: 379536 |
I'm looking for the technical file format specification for the "CSV format" as reffered to in Microsoft Excel and Microsoft Outlook as: 1. "Comma Seperated Values (DOS)" Outlook 2002 on Win XP File menu > Import and Export > Export to file 2. "Comma Seperated Values (Windows)" Outlook 2002 on Win XP File menu > Import and Export > Export to file 3. "CSV (Comma delimited)" Microsoft Excel 2002 on Win XP File menu > Save as 4. "CSV (MS-DOS)" Microsoft Excel 2002 on Win XP File menu > Save as 5. "CSV (Macintosh)" Microsoft Excel 2002 on Win XP File menu > Save as What is the precise character set, format and escape characters that each of these five file formats uses? |
|
There is no answer at this time. |
|
Subject:
Re: Exact CSV file format specification (as exported from Microsoft Office 2002)
From: dreamboat-ga on 27 Jul 2004 08:30 PDT |
Perhaps I misunderstand the question because I'm not sure what "format" and "escape" characters are. However, to my knowledge, CSV files are nothing more than ascii text files, which means the font is courier and 12 pt. If necessary, I am happy to create a sample file, except from a Mac of course. |
Subject:
Re: Exact CSV file format specification (as exported from Microsoft Office 2002)
From: crythias-ga on 27 Jul 2004 09:58 PDT |
The precise character set is most likely UTF-7 (7-bit ANSI/ASCII), primarily all normal printable characters on a US keyboard. The difference between windows, dos, and MAC are all regarding termination (End of Record) characters. In each case of CSV export, you are given the options of field termination (tab, comma, fixed width, etc.) and a character that designates text fields (Usually quotation marks). By default, a text field has " on either side of the entry, though it is possible to not use quotes. The quotes are helpful when you are using a comma delimeter and have a comma in a text field, which you don't want to delimit (break apart into multiple fields). By default, Windows CSV adds both a carriage return {(CR), chr$(13), ^M, (ctrl-M)} and a line feed {(LF), chr$(10), ^J, (ctrl-J)} to the end of a record. A Macintosh format uses (IIRC) just a carriage return for end of line/end of record. A DOS format may only use a line feed. In general, this matters not a lot with import/export. However, if you've used text editors in linux/unix, you can see ^M's everywhere (carriage returns) opening a windows text file, whereas sometimes you'll see unwrapped lines in windows text editors opening a unix doc. |
Subject:
Re: Exact CSV file format specification (as exported from Microsoft Office 2002)
From: crythias-ga on 27 Jul 2004 10:00 PDT |
http://www.websiterepairguy.com/articles/os/crlf.html |
Subject:
Re: Exact CSV file format specification (as exported from Microsoft Office 2002)
From: duoas-ga on 04 Aug 2004 20:52 PDT |
You have asked two questions. 1. "format", "escape", and "control" are all words used to describe characters that have special meaning. An example is the line-feed character (ASCII 10) which instructs the line device (printer, terminal, etc.) to advance the print head (or cursor, or whatever) to the next line. ASCII 13 (carriage-return) moves the print head back to the left side of the carriage (or crt, ...). ASCII-7 control characters are in the range 0..31 and 127. All other characters are printable, such as a space and the letter 'A'. Google ASCII for a chart. 2. CSV (Comma-Separated) files follow a very simple syntax. Notation FS (Field Separator) is almost always a comma or a tab. FD (Field Delimiter) is almost always a double-quote. NL (New Line) is almost always the ASCII CR LF combination. Unix systems tend to use LF only. Macintoshes tend to use CR only. Description The file is ASCII encoded. Each line represents one record. Lines are terminated by any valid NL. Each record contains multiple fields, separated by the FS character. Again, this is normally a comma or a tab, but it could be anything. Whitespace surrounding the FS character is ignored (the field begins and ends with the first and last non-space characters). Neither FS nor FD can be considered whitespace, even if it would be under normal circumstances. Thus three tabs delimits four fields, not one. A field is a string of text characters, which *may* be delimited by the FD character. Again, this is usually the double-quote ("), but it can be anything convenient. I have seen pipe (|) and single-quote (') used. The FS character may appear in a FD-delimited field; in this case it is not treated as the field separator. The FD character may appear in a FD-delimited field; simply double it. I do not know whether an FD character can appear in the middle of an *un- delimited* field. - Is it treated as a normal character? - Does it delimit text to be concatenated with surrounding text? I have never come across an instance where this is a problem. If someone knows the answer, it would be nice to see it posted. The end of the file should contain a single blank line (no terminating NL). This, however, is not guaranteed. The last record just might not end with a NL, or there might be two, or more... In any case, your algorithm should be able to discard blank lines. Example FS = , FD = " one,two, , "three,four " ,"five ""six"" seven", " eight,""", ",nine" indicates the following seven fields one 3 characters two 3 0 three,four 11 five "six" seven 16 eight," 8 ,nine 5 Hope this helps. Duoas |
Subject:
Re: Exact CSV file format specification (as exported from Microsoft Office 2002)
From: creativist-ga on 17 Aug 2004 06:55 PDT |
tomazos, I've spent quite a bit of time on this issue. Like you, I have never found an official document, so have worked out the details using experimentation and input from others. These efforts are documented here: http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm The paper, including the format description has had a few additions and corrections over the years. There's always the possibility that there is still another gotcha hiding in there, but it's pretty clean to my knowledge (I haven't had any bug reports for a while). Hope this helps. -cvst |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |