Hi, viseu-ga:
It's a generally useful tactic, when trying to develop a piece of VBA
code, to try Record Macro to get a code snippet that at least does
correctly something close to what is wanted.
First I used TextPad 4.6 to create sample "ANSI" text document with
some special (upper ASCII) characters, taken as it happens from a
Google Answers thread (answered by Scriptor-GA) here:
[Translate Song into German]
http://answers.google.com/answers/main?cmd=threadview&id=173434
Krieg! Ha! Paßt auf!
Was hat er Gutes?
Absolut rein gar nichts! Hört mir zu.
Ah, ich hasse den Krieg,
Weil ganz alleine der Tod nur siegt.
Krieg heißt Tränen, und er trifft die Mütter hart,
Denn ihre Söhne, die sind tot, vergessen und verscharrt!
Then I recorded this macro, which correctly opens the file (macro
slightly edited for formatting purposes):
Sub myOpen()
'
' myOpen Macro for Word 2002
' Macro recorded 3/18/2003 by mathtalk-ga
'
Documents.Open FileName:="WordASCII.txt", _
ConfirmConversions:=False, ReadOnly:=False, _
AddToRecentFiles:=False, PasswordDocument:="", _
PasswordTemplate:="", Revert:=False, _
WritePasswordDocument:="", WritePasswordTemplate:="", _
Format:=wdOpenFormatAuto, Encoding:=1252
End Sub
That final "Encoding" parameter, which is supported in Word 2000 and
2002 but not in Word 97, works in combination with the "Format"
parameter to control how text files are converted:
[Word 2002 Documents.Open]
(click on bolded "Documents" to reveal the syntax)
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vbawd10/html/womthOpen.asp
[Word 2000 Documents.Open]
(click on bolded "Documents" to reveal the syntax)
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/off2000/html/womthopen.asp
[Word 97 Documents.Open]
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/office97/html/output/F1/D4/S5ABE9.asp?frame=true
The mystery value 1252 shown above has a "coder friendly" equivalent,
the constant msoEncodingWestern. The particular value was apparently
chosen to match the Windows Standard code page, ANSI 1252 (see History
below for more on the "code page" concept). This was Microsoft's
"improvement" on the ISO Western Latin(1) extension of ASCII known as
ISO-8859-1. For details of their minor differences, see this
comparison by George Hernandez:
[ANSI]
http://www.georgehernandez.com/xComputers/CharacterSets/ANSI.htm
For a list of all the MsoEncoding values in Office VBA, see here:
[Encoding Property]
(click on bolded "MsoEncoding" to reveal the list)
msdn.microsoft.com/library/en-us/vbawd10/ html/woproEncoding.asp
These same enumeration constants are used in other related contexts.
For example:
[ReloadAs Method]
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vbawd10/html/womthReloadAs.asp
Strangely the "Encoding" parameter was not symmetrically added to the
Save method, as discussed here:
[Ask Dr.International #5: Word Macro Recording Misses Encoding]
(first Q&A item listed)
http://www.microsoft.com/globaldev/DrIntl/columns/005/default.mspx
Instead the way to control how Word encodes text documents during
saves is to set the SaveEncoding document property:
[SaveEncoding]
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vbawd10/html/woproSaveEncoding.asp
For the sake of completeness here's the list of possible values for
the "Format" parameter:
wdOpenFormatAllWord
wdOpenFormatAuto [Default]
wdOpenFormatDocument
wdOpenFormatEncodedText
wdOpenFormatRTF
wdOpenFormatTemplate
wdOpenFormatText
wdOpenFormatUnicodeText
wdOpenFormatWebPages
[Word 2002 Documents.Open]
(click on bolded "Documents" to reveal the syntax)
(click on bolded "WdOpenFormat" to reveal the list)
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vbawd10/html/womthOpen.asp
History
=======
Bearing in mind your desire for a conceptual understanding, let's stop
and ask exactly what does it mean for a text document to be "ANSI"
format? Historically the ASCII (American Standard Code for
Information Interchange) addressed only a set of 7-bit signals between
computer and "teletype" terminals (even if they were video display
terminals or "glass TTY's" that emulated the original "hardcopy"
teletypes). As dialup-modems become normative for terminal-computer
communications, rather than hardwiring these connections, the 7-bit
character signals were "embedded" in 8-bit groups. The eighth bit was
then available for additional information, such as "error detection"
(e.g. requiring even or odd parity for each 8-bit group).
By the time that "personal" computers were blessed by IBM's entry into
the marketplace, there were two sorts of uses for what had come to be
called the "upper ASCII" characters, treated as individual values on
independent footing from their original "lower ASCII" 7-bit
correspondances. One of these uses was as graphical characters,
exemplified in the IBM "PC DOS" operating system as a set of primarily
line-drawing symbols (vertical, horizontal, corners, double lines,
etc.)
The other use was for displaying "foreign" (from an English alphabetic
perspective) characters. The PC-DOS character set includes, for
example, a certain number of vowels with diacritical marks and a
handful of Greek alphabet and mathematical symbols, though hardly
sufficient for serious applications.
The ASCII character set was eventually incorporated into an
"international" (ISO) standard as ISO-646-US-ASCII:
http://www.ascii-table.org/
In order to support "localization" of IBM PC's into a number of
European countries, IBM developed what were termed "country code
pages". What this involved, in its primative formulation, was loading
of customized fonts (from disk) at "boot time" based on settings in
the ubiquitous CONFIG.SYS file. Applications (such as word
processors), however, would need to be written to take cognizance of
these "code page" settings, and packages such as WordPerfect did this
with greater or lesser fidelity.
But now we had a classic "tower of Babel" situation, in which simple
text files would display differently, depending on setting external to
the text files themselves. Several approaches were proposed to remedy
this, eventually converging on the Unicode Standard (UCS):
http://www.unicode.org/
which aims to simultaneously represent all character sets, even
"large" ones like Chinese characters. In order to do this the 256
possibilities allowed by 8 bits are obviously insufficient. Hence one
often sees the phrase "wide character" in connection with Unicode
implementations, although these are not synonyms.
A key to understanding the Unicode standard is to appreciate the
difference between the abstract assignments of all character sets, the
BMP (Basic Multilingual Plane), and "encodings" of those sets in
"storage" mappings like UTF-7, UTF-8, UTF-16, and UTF-32. These
designations in essence describe the number of bits used in code
blocks to map characters, with the former encodings providing
substantial backward compatibility with older ASCII/ANSI text files.
The Unicode Standard continues to evolve and to incorporate new
"alphabets".
Other Links of Interest
=======================
For a good discussion on Microsoft's compatibility aims with Word and
Unicode:
[Taking Advantage of Unicode Support]
http://www.microsoft.com/office/ork/xp/three/intd02.htm
A little known quirk of how VBA handles passing strings into DLL's is
"implicit" conversion from Unicode to ANSI:
[Anatomy of a Declare Statement]
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odeopg/html/deovranatomyofdeclarestatement.asp
Sometimes, of course, one wishes to pass Unicode strings & needs to
bypass this conversion:
[Working Around VBA String Conversion from Unicode to ANSI for DLLs]
http://www.mvps.org/vb/index2.html?tips/varptr.htm
Search Strategies
=================
recording a macro in Word 2002
consulting Office/Word VBA help files
searching MSDN Library (online and offline)
Keywords:
MsoEncoding WdOpenFormat
Unicode
ASCII
ANSI 1252 |