Google Answers Logo
View Question
 
Q: Exact CSV file format specification (as exported from Microsoft Office 2002) ( No Answer,   5 Comments )
Question  
Subject: Exact CSV file format specification (as exported from Microsoft Office 2002)
Category: Computers > Programming
Asked by: tomazos-ga
List Price: $5.00
Posted: 26 Jul 2004 23:50 PDT
Expires: 25 Aug 2004 23:50 PDT
Question ID: 379536
I'm looking for the technical file format specification for the "CSV
format" as reffered to in Microsoft Excel and Microsoft Outlook as:

    1. "Comma Seperated Values (DOS)"
       Outlook 2002 on Win XP
       File menu > Import and Export > Export to file

    2. "Comma Seperated Values (Windows)"
       Outlook 2002 on Win XP
       File menu > Import and Export > Export to file
       
    3. "CSV (Comma delimited)"
       Microsoft Excel 2002 on Win XP
       File menu > Save as

    4. "CSV (MS-DOS)"
       Microsoft Excel 2002 on Win XP
       File menu > Save as

    5. "CSV (Macintosh)"
       Microsoft Excel 2002 on Win XP
       File menu > Save as

What is the precise character set, format and escape characters that
each of these five file formats uses?
Answer  
There is no answer at this time.

Comments  
Subject: Re: Exact CSV file format specification (as exported from Microsoft Office 2002)
From: dreamboat-ga on 27 Jul 2004 08:30 PDT
 
Perhaps I misunderstand the question because I'm not sure what
"format" and "escape" characters are.

However, to my knowledge, CSV files are nothing more than ascii text
files, which means the font is courier and 12 pt.

If necessary, I am happy to create a sample file, except from a Mac of course.
Subject: Re: Exact CSV file format specification (as exported from Microsoft Office 2002)
From: crythias-ga on 27 Jul 2004 09:58 PDT
 
The precise character set is most likely UTF-7 (7-bit ANSI/ASCII),
primarily all normal printable characters on a US keyboard. The
difference between windows, dos, and MAC are all regarding termination
(End of Record) characters. In each case of CSV export, you are given
the options of field termination (tab, comma, fixed width, etc.) and a
character that designates text fields (Usually quotation marks). By
default, a text field has " on either side of the entry, though it is
possible to not use quotes. The quotes are helpful when you are using
a comma delimeter and have a comma in a text field, which you don't
want to delimit (break apart into multiple fields).

By default, Windows CSV adds both a carriage return {(CR), chr$(13),
^M, (ctrl-M)} and a line feed {(LF), chr$(10), ^J, (ctrl-J)} to the
end of a record. A Macintosh format uses (IIRC) just a carriage return
for end of line/end of record. A DOS format may only use a line feed.

In general, this matters not a lot with import/export. However, if
you've used text editors in linux/unix, you can see ^M's everywhere
(carriage returns) opening a windows text file, whereas sometimes
you'll see unwrapped lines in windows text editors opening a unix doc.
Subject: Re: Exact CSV file format specification (as exported from Microsoft Office 2002)
From: crythias-ga on 27 Jul 2004 10:00 PDT
 
http://www.websiterepairguy.com/articles/os/crlf.html
Subject: Re: Exact CSV file format specification (as exported from Microsoft Office 2002)
From: duoas-ga on 04 Aug 2004 20:52 PDT
 
You have asked two questions.

1. "format", "escape", and "control" are all words used to describe
characters that have special meaning. An example is the line-feed
character (ASCII 10) which instructs the line device (printer,
terminal, etc.) to advance the print head (or cursor, or whatever) to
the next line. ASCII 13 (carriage-return) moves the print head back to
the left side of the carriage (or crt, ...).

ASCII-7 control characters are in the range 0..31 and 127. All other
characters are printable, such as a space and the letter 'A'. Google
ASCII for a chart.

2. CSV (Comma-Separated) files follow a very simple syntax.

Notation
  FS (Field Separator) is almost always a comma or a tab.
  FD (Field Delimiter) is almost always a double-quote.
  NL (New Line) is almost always the ASCII CR LF combination.
     Unix systems tend to use LF only.
     Macintoshes tend to use CR only.

Description
  The file is ASCII encoded.

  Each line represents one record. Lines are terminated by any valid NL.

  Each record contains multiple fields, separated by the FS character.
  Again, this is normally a comma or a tab, but it could be anything.

  Whitespace surrounding the FS character is ignored (the field begins and
  ends with the first and last non-space characters). Neither FS nor FD can
  be considered whitespace, even if it would be under normal circumstances.
  Thus three tabs delimits four fields, not one.

  A field is a string of text characters, which *may* be delimited by the
  FD character. Again, this is usually the double-quote ("), but it can
  be anything convenient. I have seen pipe (|) and single-quote (') used.

  The FS character may appear in a FD-delimited field; in this case it is not
  treated as the field separator.

  The FD character may appear in a FD-delimited field; simply double it.

  I do not know whether an FD character can appear in the middle of an *un-
  delimited* field.
    - Is it treated as a normal character?
    - Does it delimit text to be concatenated with surrounding text?
  I have never come across an instance where this is a problem.
  If someone knows the answer, it would be nice to see it posted.

  The end of the file should contain a single blank line (no terminating NL).
  This, however, is not guaranteed. The last record just might not end with
  a NL, or there might be two, or more...  In any case, your algorithm should
  be able to discard blank lines.

Example
  FS = ,
  FD = "

  one,two, , "three,four " ,"five ""six"" seven", " eight,""", ",nine"

  indicates the following seven fields
    one                              3 characters
    two                              3
                                     0
    three,four                      11
    five "six" seven                16
     eight,"                         8
    ,nine                            5

Hope this helps.

Duoas
Subject: Re: Exact CSV file format specification (as exported from Microsoft Office 2002)
From: creativist-ga on 17 Aug 2004 06:55 PDT
 
tomazos,

I've spent quite a bit of time on this issue.  Like you, I have never
found an official document, so have worked out the details using
experimentation and input from others.  These efforts are documented
here:

  http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm

The paper, including the format description has had a few additions
and corrections over the years.  There's always the possibility that
there is still another gotcha hiding in there, but it's pretty clean
to my knowledge (I haven't had any bug reports for a while).

Hope this helps.

 -cvst

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy