Tuesday, February 19, 2013

ANSI vs. Unicode and the Alias Clause

ANSI

ANSI is the most popular character standard used by personal computers. Because the ANSI standard uses only a single byte to represent each character, it is limited to a maximum of 256 character and punctuation codes. Although this is adequate for English, it doesn't fully support many other languages.

Unicode

Unicode is a character-encoding scheme that uses 2 bytes for every character. The International Standards Organization (ISO) defines a number in the range of 0 to 65,535 (216 – 1) for just about every character and symbol in every language (plus some empty spaces for future growth). On all 32-bit versions of Windows, Unicode is used by the Component Object Model (COM), the basis for OLE and ActiveX technologies. Unicode is fully supported by Windows NT. Although both Unicode and DBCS have double-byte characters, the encoding schemes are completely different.

If you are using Windows XP or later, then you should use Unicode encoding instead of ANSI. ANSI is a legacy encoding and is provided for backward compatibility with older applications. You should always use Unicode encoding if the application supports it.

All Windows API functions that have textual parameters come in two flavors: Those thatoperate on ANSI strings have an A suffix, whereas those that operate on Unicode strings have a W suffix. For example,although the documentation and searches on MSDN talk about FindWindow, the Windows DLLs do not actually contain a function of that name—they contain two functions called FindWindowA and FindWindowW. We use the Alias statement to provide the actual name (case sensitive) for the function contained in the DLL.