VC++的Unicode編程
一、什么是Unicode
先從ASCII說(shuō)起,ASCII是用來(lái)表示英文字符的一種編碼規(guī)范。每個(gè)ASCII字符占用1個(gè)字節(jié),因此,ASCII編碼可以表示的最大字符數(shù)是255(00H—FFH)。其實(shí),英文字符并沒(méi)有那么多,一般只用前128個(gè)(00H—7FH,最高位為0),其中包括了控制字符、數(shù)字、大小寫(xiě)字母和其它一些符號(hào)。而最高位為1的另128個(gè)字符(80H—FFH)被稱(chēng)為“擴(kuò)展ASCII”,一般用來(lái)存放英文的制表符、部分音標(biāo)字符等等的一些其它符號(hào)。
這種字符編碼規(guī)則顯然用來(lái)處理英文沒(méi)有什么問(wèn)題。但是面對(duì)中文、阿拉伯文等復(fù)雜的文字,255個(gè)字符顯然不夠用。
于是,各個(gè)國(guó)家紛紛制定了自己的文字編碼規(guī)范,其中中文的文字編碼規(guī)范叫做“GB2312—80”,它是和ASCII兼容的一種編碼規(guī)范,其實(shí)就是利用擴(kuò)展ASCII沒(méi)有真正標(biāo)準(zhǔn)化這一點(diǎn),把一個(gè)中文字符用兩個(gè)擴(kuò)展ASCII字符來(lái)表示,以區(qū)分ASCII碼部分。
但是這個(gè)方法有問(wèn)題,最大的問(wèn)題就是中文的文字編碼和擴(kuò)展ASCII碼有重疊。而很多軟件利用擴(kuò)展ASCII碼的英文制表符來(lái)畫(huà)表格,這樣的軟件用到中文系統(tǒng)中,這些表格就會(huì)被誤認(rèn)作中文字符,出現(xiàn)亂碼。
另外,由于各國(guó)和各地區(qū)都有自己的文字編碼規(guī)則,它們互相沖突,這給各國(guó)和各地區(qū)交換信息帶來(lái)了很大的麻煩。
要真正解決這個(gè)問(wèn)題,不能從擴(kuò)展ASCII的角度入手,而必須有一個(gè)全新的編碼系統(tǒng),這個(gè)系統(tǒng)要可以將中文、法文、德文……等等所有的文字統(tǒng)一起來(lái)考慮,為每一個(gè)文字都分配一個(gè)單獨(dú)的編碼。
于是,Unicode誕生了。
Unicode也是一種字符編碼方法,它占用兩個(gè)字節(jié)(0000H—FFFFH),容納65536個(gè)字符,這完全可以容納全世界所有語(yǔ)言文字的編碼。
在Unicode里,所有的字符被一視同仁,漢字不再使用“兩個(gè)擴(kuò)展ASCII”,而是使用“1個(gè)Unicode”,也就是說(shuō),所有的文字都按一個(gè)字符來(lái)處理,它們都有一個(gè)唯一的Unicode碼。
二、使用Unicode編碼的好處
使用Unicode編碼可以使您的工程同時(shí)支持多種語(yǔ)言,使您的工程國(guó)際化。
另外,Windows NT是使用Unicode進(jìn)行開(kāi)發(fā)的,整個(gè)系統(tǒng)都是基于Unicode的。如果調(diào)用一個(gè)API函數(shù)并給它傳遞一個(gè)ANSI(ASCII字符集以及由此派生并兼容的字符集,如:GB2312,通常稱(chēng)為ANSI字符集)字符串,那么系統(tǒng)首先要將字符串轉(zhuǎn)換成Unicode,然后將Unicode字符串傳遞給操作系統(tǒng)。如果希望函數(shù)返回ANSI字符串,系統(tǒng)就會(huì)首先將Unicode字符串轉(zhuǎn)換成ANSI字符串,然后將結(jié)果返回給您的應(yīng)用程序。進(jìn)行這些字符串的轉(zhuǎn)換需要占用系統(tǒng)的時(shí)間和內(nèi)存。如果用Unicode來(lái)開(kāi)發(fā)應(yīng)用程序,就能夠使您的應(yīng)用程序更加有效地運(yùn)行。
下面例舉幾個(gè)字符的編碼以簡(jiǎn)單演示ANSI和Unicode的區(qū)別:
字符 | A | N | 和 |
ANSI碼 | 41H | 4eH | cdbaH |
Unicode碼 | 0041H | 004eH | 548cH |
三、使用C++進(jìn)行Unicode編程
對(duì)寬字符的支持其實(shí)是ANSI C標(biāo)準(zhǔn)的一部分,用以支持多字節(jié)表示一個(gè)字符。寬字符和Unicode并不完全等同,Unicode只是寬字符的一種編碼方式。
1、寬字符的定義
在ANSI中,一個(gè)字符(char)的長(zhǎng)度為一個(gè)字節(jié)(Byte)。使用Unicode時(shí),一個(gè)字符占據(jù)一個(gè)字,C++在wchar.h頭文件中定義了最基本的寬字符類(lèi)型wchar_t:
typedef unsigned short wchar_t;
從這里我們可以清楚地看到,所謂的寬字符就是無(wú)符號(hào)短整數(shù)。
2、常量寬字符串
對(duì)C++程序員而言,構(gòu)造字符串常量是一項(xiàng)經(jīng)常性的工作。那么,如何構(gòu)造寬字符字符串常量呢?很簡(jiǎn)單,只要在字符串常量前加上一個(gè)大寫(xiě)的L就可以了,比如:
wchar_t *str1=L" Hello";
這個(gè)L非常重要,只有帶上它,編譯器才知道你要將字符串存成一個(gè)字符一個(gè)字。還要注意,在L和字符串之間不能有空格。
3、寬字符串庫(kù)函數(shù)
為了操作寬字符串,C++專(zhuān)門(mén)定義了一套函數(shù),比如求寬字符串長(zhǎng)度的函數(shù)是
size_t __cdel wchlen(const wchar_t*);
為什么要專(zhuān)門(mén)定義這些函數(shù)呢?最根本的原因是,ANSI下的字符串都是以’/0’來(lái)標(biāo)識(shí)字符串尾的(Unicode字符串以“/0/0”結(jié)束),許多字符串函數(shù)的正確操作均是以此為基礎(chǔ)進(jìn)行。而我們知道,在寬字符的情況下,一個(gè)字符在內(nèi)存中要占據(jù)一個(gè)字的空間,這就會(huì)使操作ANSI字符的字符串函數(shù)無(wú)法正確操作。以”Hello”字符串為例,在寬字符下,它的五個(gè)字符是:
0x0048 0x0065 0x006c 0x006c 0x006f
在內(nèi)存中,實(shí)際的排列是:
48 00 65 00 6c 00 6c 00 6f 00
于是,ANSI字符串函數(shù),如strlen,在碰到第一個(gè)48后的00時(shí),就會(huì)認(rèn)為字符串到尾了,用strlen對(duì)寬字符串求長(zhǎng)度的結(jié)果就永遠(yuǎn)會(huì)是1!
4、用宏實(shí)現(xiàn)對(duì)ANSI和Unicode通用的編程
可見(jiàn),C++有一整套的數(shù)據(jù)類(lèi)型和函數(shù)實(shí)現(xiàn)Unicode編程,也就是說(shuō),您完全可以使用C++實(shí)現(xiàn)Unicode編程。
如果我們想要我們的程序有兩個(gè)版本:ANSI版本和Unicode版本。當(dāng)然,編寫(xiě)兩套代碼分別實(shí)現(xiàn)ANSI版本和Unicode版本完全是行得通的。但是,針對(duì)ANSI字符和Unicode字符維護(hù)兩套代碼是非常麻煩的事情。為了減輕編程的負(fù)擔(dān),C++定義了一系列的宏,幫助您實(shí)現(xiàn)對(duì)ANSI和Unicode的通用編程。
C++宏實(shí)現(xiàn)ANSI和Unicode的通用編程的本質(zhì)是根據(jù)”_UNICODE”(注意,有下劃線(xiàn))定義與否,這些宏展開(kāi)為ANSI或Unicode字符(字符串)。
如下是tchar.h頭文件中部分代碼摘抄:
#ifdef _UNICODE
typedef wchar_t TCHAR;
#define __T(x) L##x
#define _T(x) __T(x)
#else
#define __T(x) x
typedef char TCHAR;
#endif
可見(jiàn),這些宏根據(jù)”_UNICODE” 定義與否,分別展開(kāi)為ANSI或Unicode字符。 tchar.h頭文件中定義的宏可以分為兩類(lèi):
A、實(shí)現(xiàn)字符和常量字符串定義的宏我們只列出兩個(gè)最常用的宏:
宏 | 未定義_UNICODE(ANSI字符) | 定義了_UNICODE(Unicode字符) |
TCHAR | char | wchar_t |
_T(x) | x | L##x |
注意:
“##”是ANSI C標(biāo)準(zhǔn)的預(yù)處理語(yǔ)法,它叫做“粘貼符號(hào)”,表示將前面的L添加到宏參數(shù)上。也就是說(shuō),如果我們寫(xiě)_T(“Hello”),展開(kāi)后即為L(zhǎng)“Hello”
B、實(shí)現(xiàn)字符串函數(shù)調(diào)用的宏
C++為字符串函數(shù)也定義了一系列宏,同樣,我們只例舉幾個(gè)常用的宏:
宏 | 未定義_UNICODE(ANSI字符) | 定義了_UNICODE(Unicode字符) |
_tcschr | strchr | wcschr |
_tcscmp | strcmp | wcscmp |
_tcslen | strlen | wcslen |
四、使用Win32 API進(jìn)行Unicode編程
Win32 API中定義了一些自己的字符數(shù)據(jù)類(lèi)型。這些數(shù)據(jù)類(lèi)型的定義在winnt.h頭文件中。例如:
typedef char CHAR; typedef unsigned short WCHAR; // wc, 16-bit UNICODE character typedef CONST CHAR *LPCSTR, *PCSTR;
Win32 API在winnt.h頭文件中定義了一些實(shí)現(xiàn)字符和常量字符串的宏進(jìn)行ANSI/Unicode通用編程。同樣,只例舉幾個(gè)最常用的:
從以上頭文件可以看出,winnt.h根據(jù)是否定義了UNICODE(沒(méi)有下劃線(xiàn)),進(jìn)行條件編譯。 Win32 API也定義了一套字符串函數(shù),它們根據(jù)是否定義了“UNICODE”分別展開(kāi)為ANSI和Unicode字符串函數(shù)。如:lstrlen。API的字符串操作函數(shù)和C++的操作函數(shù)可以實(shí)現(xiàn)相同的功能,所以,如果需要的話(huà),建議您盡可能使用C++的字符串函數(shù),沒(méi)必要去花太多精力再去學(xué)習(xí)API的這些東西。 也許您從來(lái)沒(méi)有注意到,Win32 API實(shí)際上有兩個(gè)版本。一個(gè)版本接受MBCS字符串,另一個(gè)接受Unicode字符串。例如:其實(shí)根本沒(méi)有SetWindowText()這個(gè)API函數(shù),相反,有SetWindowTextA()和SetWindowTextW()。后綴A表明這是MBCS函數(shù),后綴W表示這是Unicode版本的函數(shù)。這些API函數(shù)的頭文件在winuser.h中聲明,下面例舉winuser.h中的SetWindowText()函數(shù)的聲明部分:
#ifdef UNICODE #define SetWindowText SetWindowTextW #else #define SetWindowText SetWindowTextA #endif // !UNICODE
可見(jiàn),API函數(shù)根據(jù)定義UNICODE與否決定指向Unicode版本還是MBCS版本。
細(xì)心的讀者可能已經(jīng)注意到了UNICODE和_UNICODE的區(qū)別,前者沒(méi)有下劃線(xiàn),專(zhuān)門(mén)用于Windows頭文件;后者有一個(gè)前綴下劃線(xiàn),專(zhuān)門(mén)用于C運(yùn)行時(shí)頭文件。換句話(huà)說(shuō),也就是在ANSI C++語(yǔ)言里面根據(jù)_UNICODE(有下劃線(xiàn))定義與否,各宏分別展開(kāi)為Unicode或ANSI字符,在Windows里面根據(jù)UNICODE(無(wú)下劃線(xiàn))定義與否,各宏分別展開(kāi)為Unicode或ANSI字符。
在后面我們將會(huì)看到,實(shí)際使用中我們不加嚴(yán)格區(qū)分,同時(shí)定義_UNICODE和UNICODE,以實(shí)現(xiàn)UNICODE版本編程。
五、VC++6.0中編寫(xiě)Unicode編碼的應(yīng)用程序
VC++ 6.0支持Unicode編程,但默認(rèn)的是ANSI,所以開(kāi)發(fā)人員只需要稍微改變一下編寫(xiě)代碼的習(xí)慣便可以輕松編寫(xiě)支持UNICODE的應(yīng)用程序。
使用VC++ 6.0進(jìn)行Unicode編程主要做以下幾項(xiàng)工作:
1、為工程添加UNICODE和_UNICODE預(yù)處理選項(xiàng)。
具體步驟:打開(kāi)[工程]->[設(shè)置…]對(duì)話(huà)框,如圖1所示,在C/C++標(biāo)簽對(duì)話(huà)框的“預(yù)處理程序定義”中去除_MBCS,加上_UNICODE,UNICODE。(注意中間用逗號(hào)隔開(kāi))改動(dòng)后如圖2:
圖一
圖二
在沒(méi)有定義UNICODE和_UNICODE時(shí),所有函數(shù)和類(lèi)型都默認(rèn)使用ANSI的版本;在定義了UNICODE和_UNICODE之后,所有的MFC類(lèi)和Windows API都變成了寬字節(jié)版本了。
2、設(shè)置程序入口點(diǎn)
因?yàn)镸FC應(yīng)用程序有針對(duì)Unicode專(zhuān)用的程序入口點(diǎn),我們要設(shè)置entry point。否則就會(huì)出現(xiàn)連接錯(cuò)誤。
設(shè)置entry point的方法是:打開(kāi)[工程]->[設(shè)置…]對(duì)話(huà)框,在Link頁(yè)的Output類(lèi)別的Entry Point里填上wWinMainCRTStartup。
圖三
3、使用ANSI/Unicode通用數(shù)據(jù)類(lèi)型
微軟提供了一些ANSI和Unicode兼容的通用數(shù)據(jù)類(lèi)型,我們最常用的數(shù)據(jù)類(lèi)型有_T ,TCHAR,LPTSTR,LPCTSTR。
順便說(shuō)一下,LPCTSTR和const TCHAR*是完全等同的。其中L表示long指針,這是為了兼容Windows 3.1等16位操作系統(tǒng)遺留下來(lái)的,在Win32 中以及其它的32位操作系統(tǒng)中,long指針和near指針及far修飾符都是為了兼容的作用,沒(méi)有實(shí)際意義。P(pointer)表示這是一個(gè)指針;C(const)表示是一個(gè)常量;T(_T宏)表示兼容ANSI和Unicode,STR(string)表示這個(gè)變量是一個(gè)字符串。綜上可以看出,LPCTSTR表示一個(gè)指向常固定地址的可以根據(jù)一些宏定義改變語(yǔ)義的字符串。比如:
TCHAR* szText=_T(“Hello!”); TCHAR szText[]=_T(“I Love You”); LPCTSTR lpszText=_T(“大家好!”);
使用函數(shù)中的參數(shù)最好也要有變化,比如:
MessageBox(_T(“你好”));
其實(shí),在上面的語(yǔ)句中,即使您不加_T宏,MessageBox函數(shù)也會(huì)自動(dòng)把“你好”字符串進(jìn)行強(qiáng)制轉(zhuǎn)換。但我還是推薦您使用_T宏,以表示您有Unicode編碼意識(shí)。
4、修改字符串運(yùn)算問(wèn)題
一些字符串操作函數(shù)需要獲取字符串的字符數(shù)(sizeof(szBuffer)/sizeof(TCHAR)),而另一些函數(shù)可能需要獲取字符串的字節(jié)數(shù)sizeof(szBuffer)。您應(yīng)該注意該問(wèn)題并仔細(xì)分析字符串操作函數(shù),以確定能夠得到正確的結(jié)果。
ANSI操作函數(shù)以str開(kāi)頭,如strcpy(),strcat(),strlen();
Unicode操作函數(shù)以wcs開(kāi)頭,如wcscpy,wcscpy(),wcslen();
ANSI/Unicode操作函數(shù)以_tcs開(kāi)頭 _tcscpy(C運(yùn)行期庫(kù));
ANSI/Unicode操作函數(shù)以lstr開(kāi)頭 lstrcpy(Windows函數(shù));
考慮ANSI和Unicode的兼容,我們需要使用以_tcs開(kāi)頭或lstr開(kāi)頭的通用字符串操作函數(shù)。
六、舉個(gè)Unicode編程的例子
第一步:
打開(kāi)VC++6.0,新建基于對(duì)話(huà)框的工程Unicode,主對(duì)話(huà)框IDD_UNICODE_DIALOG中加入一個(gè)按鈕控件,雙擊該控件并添加該控件的響應(yīng)函數(shù):
void CUnicodeDlg::OnButton1() { TCHAR* str1=_T("ANSI和UNICODE編碼試驗(yàn)"); m_disp=str1; UpdateData(FALSE); }
添加靜態(tài)文本框IDC_DISP,使用ClassWizard給該控件添加CString類(lèi)型變量m_disp。使用默認(rèn)ANSI編碼環(huán)境編譯該工程,生成Unicode.exe。
第二步:
打開(kāi)“控制面板”,單擊“日期、時(shí)間、語(yǔ)言和區(qū)域設(shè)置”選項(xiàng),在“日期、時(shí)間、語(yǔ)言和區(qū)域設(shè)置”窗口中繼續(xù)單擊“區(qū)域和語(yǔ)言選項(xiàng)”選項(xiàng),彈出“區(qū)域和語(yǔ)言選項(xiàng)”對(duì)話(huà)框。在該對(duì)話(huà)框中,單擊“高級(jí)”標(biāo)簽,將“非Unicode的程序的語(yǔ)言”選項(xiàng)改為“日語(yǔ)”,單擊“應(yīng)用”按鈕,如圖四:
圖四
彈出的對(duì)話(huà)框單擊“是”,重新啟動(dòng)計(jì)算機(jī)使設(shè)置生效。
運(yùn)行Unicode.exe程序并單擊“Button1”按鈕,看,靜態(tài)文本框出現(xiàn)了亂碼。
第三步:
改為Unicode編碼環(huán)境編譯該工程,生成Unicode.exe。再次運(yùn)行Unicode.exe程序并單擊“Button1”按鈕。看到Unicode編碼的優(yōu)勢(shì)了吧。
就說(shuō)這些吧,祝您好運(yùn)。
Unicode, MBCS and Generic text mappings
Introduction
In order to allow your programs to be used in international markets it is worth making your application Unicode or MBCS aware. The Unicode character set is a "wide character" (2 bytes per character) set that contains every character available in every language, including all technical symbols and special publishing characters. Multibyte character set (MBCS) uses either 1 or 2 bytes per character and is used for character sets that contain large numbers of different characters (eg Asian language character sets).
Which character set you use depends on the language and the operating system. Unicode requires more space than MBCS since each character is 2 bytes. It is also faster than MBCS and is used by Windows NT as standard, so non-Unicode strings passed to and from the operating system must be translated, incurring overhead. However, Unicode is not supported on Win95 and so MBCS may be a better choice in this situation. Note that if you wish to develop applications in the Windows CE environment then all applications must be compiled in Unicode.
Using MBCS or Unicode
The best way to use Unicode or MBCS - or indeed even ASCII - in your programs is to use the generic text mapping macros provided by Visual C++. That way you can simply use a single define to swap between Unicode, MBCS and ASCII without having to do any recoding.
To use MBCS or Unicode you need only define either _MBCS
or _UNICODE
in your project. For Unicode you will also need to specify the entry point symbol in your Project settings as wWinMainCRTStartup
. Please note that if both _MBCS
and _UNICODE
are defined then the result will be unpredictable.
Generic Text mappings and portable functions
The generic text mappings replace the standard char or LPSTR types with generic TCHAR or LPTSTR macros. These macros will map to different types and functions depending on whether you have compiled with Unicode or MBCS (or neither) defined. The simplest way to use the TCHAR type is to use the CString
class - it is extremely flexible and does most of the work for you.
In conjunction with the generic character type, there is a set of generic string manipulation functions prefixed by _tcs
. For instance, instead of using the strrev
function in your code, you should use the _tcsrev
function which will map to the correct function depending on which character set you have compiled for. The table below demonstrates:
#define | Compiled Version | Example |
_UNICODE | Unicode (wide-character) | _tcsrev maps to _wcsrev |
_MBCS | Multibyte-character | _tcsrev maps to _mbsrev |
None (the default: neither _UNICODE nor _MBCS defined) | SBCS (ASCII) | _tcsrev maps to strrev |
Each str*
function has a corresponding tcs*
function that should be used instead. See the TCHAR.H file for all the mapping and macros that are available. Just look up the online help for the string function in question in order to find the equivalent portable function.
Note: Do not use the str*
family of functions with Unicode strings, since Unicode strings are likely to contain embedded null bytes.
The next important point is that each literal string should be enclosed by the TEXT()
(or _T()
) macro. This macro prepends a "L" in front of literal strings if the project is being compiled in Unicode, or does nothing if MBCS or ASCII is being used. For instance, the string _T( " Hello" )
will be interpreted as " Hello"
in MBCS or ASCII, and L " Hello"
in Unicode. If you are working in Unicode and do not use the _T()
macro, you may get compiler warnings.
Note that you can use ASCII and Unicode within the same program, but not within the same string.
All MFC functions except for database class member functions are Unicode aware. This is because many database drivers themselves do not handle Unicode, and so there was no point in writing Unicode aware MFC classes to wrap these drivers.
Converting between Generic types and ASCII
ATL provides a bunch of very useful macros for converting between different character format. The basic form of these macros is X2Y()
, where X is the source format. Possible conversion formats are shown in the following table.
String Type Abbreviation
ASCII (LPSTR) | A |
WIDE (LPWSTR) | W |
OLE (LPOLESTR) | OLE |
Generic (LPTSTR) | T |
Const | C |
Thus, A2W
converts an LPSTR
to an LPWSTR
, OLE2T
converts an LPOLESTR
to an LPTSTR
, and so on.
There are also const
forms (denoted by a C
) that convert to a const
string. For instance, A2CT
converts from LPSTR
to LPCTSTR
.
When using the string conversion macros you need to include the USES_CONVERSION
macro at the beginning of your function:
Collapse
void foo(LPSTR lpsz) { USES_CONVERSION; ... LPTSTR szGeneric = A2T(lpsz) // Do something with szGeneric ... }
Two caveats on using the conversion macros:
Never use the conversion macros inside a tight loop. This will cause a lot of memory to be allocated each time the conversion is performed, and will result in slow code. Better to perform the conversion outside the loop and pass the converted value into the loop.
Never return the result of the macros directly from a function, unless the return value implies making a copy of the data before returning. For instance, if you have a function that returns an LPOLESTR, then do not do the following:
Collapse
LPTSTR BadReturn(LPSTR lpsz) { USES_CONVERSION; // do something return A2T(lpsz); }
Instead, you should return the value as a CString, which would imply a copy of the string would be made before the function returns:
Collapse
CString GoodReturn(LPSTR lpsz) { USES_CONVERSION; // do something return A2T(lpsz); }
Tips and Traps
The TRACE statement
The TRACE
macros have a few cousins - namely the TRACE0
, TRACE1
, TRACE2
and TRACE3
macros. These macros allow you to specify a format string (as in the normal TRACE
macro), and either 0,1,2 or 3 parameters, without the need to enclose your literal format string in the _T()
macro. For instance,
Collapse
TRACE(_T( " This is trace statement number %d/n"), 1);
can be written
Collapse
TRACE1( " This is trace statement number %d/n", 1);
Viewing Unicode strings in the debugger
If you are using Unicode in your applciation and wish to view Unicode strings in the debugger, then you will need to go to Tools | Options | Debug and click on "Display Unicode Strings".
The Length of strings
Be careful when performing operations that depend on the size or length of a string. For instance, CString::GetLength
returns the number of characters in a string, NOT the size in bytes. If you were to write the string to a CArchive
object, then you would need to multiply the length of the string by the size of each character in the string to get the number of bytes to write:
Collapse
CString str = _T( " Hello, World"); archive.Write( str, str.GetLength( ) * sizeof( TCHAR ) );
Reading and Writing ASCII text files
If you are using Unicode or MBCS then you need to be careful when writing ASCII files. The safest and easiest way to write text files is to use the CStdioFile
class provided with MFC. Just use the CString
class and the ReadString
and WriteString
member functions and nothing should go wrong. However, if you need to use the CFile
class and it's associated Read
and Write
functions, then if you use the following code:
Collapse
CFile file(...); CString str = _T( " This is some text"); file.Write( str, (str.GetLength()+1) * sizeof( TCHAR ) );
instead of
Collapse
CStdioFile file(...); CString str = _T( " This is some text"); file.WriteString(str);
then the results will be Significantly different. The two lines of text below are from a file created using the first and second code snippets respectively:
(This text was viewed using WordPad)
Not all structures use the generic text mappings
For instance, the CHARFORMAT
structure, if the RichEditControl version is less than 2.0, uses a char []
for the szFaceName field, instead of a TCHAR
as would be expected. You must be careful not to blindly change "..." to _T( " ..." )
without first checking. In this case, you would probably need to convert from TCHAR
to char before copying any data to the szFaceName field.
Copying text to the Clipboard
This is one area where you may need to use ASCII and Unicode in the same program, since the CF_TEXT
format for the clipboard uses ASCII only. NT systems have the option of the CF_UNICODETEXT
if you wish to use Unicode on the clipboard.
Installing the Unicode MFC libraries
The Unicode versions of the MFC libraries are not copied to your hard drive unless you select them during a Custom installation. They are not copied during other types of installation. If you attempt to build or run an MFC Unicode application without the MFC Unicode files, you may get errors.
(From the online docs) To copy the files to your hard drive, rerun Setup, choose Custom installation, clear all other components except "Microsoft Foundation Class Libraries," click the Details button, and select both "Static Library for Unicode" and "Shared Library for Unicode."
License
This article, along with any associated source code and files, is licensed under
About the Author
Chris Maunder Member | Chris is the Co-founder, Administrator, Architect, Chief Editor and Shameless Hack who wrote and runs The Code Project. He's been programming since 1988 while pretending to be, in various guises, an astrophysicist, mathematician, physicist, hydrologist, geomorphologist, defence intelligence researcher and then, when all that got a bit rough on the nerves, a web developer. He is a Microsoft Visual C++ MVP both globally and for Canada locally.
His programming experience includes C/C++, C#, SQL, MFC, ASP, ASP.NET, and far, far too much FORTRAN. He has worked on PocketPCs, AIX mainframes, Sun workstations, and a CRAY YMP C90 behemoth but finds notebooks take up less desk space. He dodges, he weaves, and he never gets enough sleep. He is kind to small animals. Chris was born and bred in Australia but splits his time between Toronto and Melbourne, depending on the weather. For relaxation he is into road cycling, snowboarding, rock climbing, and storm chasing. |
Other popular C / C++ Language articles:
Member Function Pointers and the Fastest Possible C++ Delegates
A comprehensive tutorial on member function pointers, and an implementation of delegates that generates only two ASM opcodes!
How a C++ compiler implements exception handling
An indepth discussion of how VC++ implements exception handling. Source code includes exception handling library for VC++.
A Beginner's Guide to Pointers
An article showing the use of pointers in C and C++
XML class for processing and building simple XML documents
Link CMarkup into your VC++ app and avoid complex XML tools and dependencies
PugXML - A Small, Pugnacious XML Parser
Discussion of techniques for fast, robust, light-weight XML parsing.
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號(hào)聯(lián)系: 360901061
您的支持是博主寫(xiě)作最大的動(dòng)力,如果您喜歡我的文章,感覺(jué)我的文章對(duì)您有幫助,請(qǐng)用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點(diǎn)擊下面給點(diǎn)支持吧,站長(zhǎng)非常感激您!手機(jī)微信長(zhǎng)按不能支付解決辦法:請(qǐng)將微信支付二維碼保存到相冊(cè),切換到微信,然后點(diǎn)擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對(duì)您有幫助就好】元
