|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Unicode encoding problem ?I need help to resolve some problem. The problem is following: I have the table in MS SQL Db to store xml documents. The xml documents have content (in CDATA section) in plain text with html formating. Content can have text in foregin languages (polish,russian,french). Saving xml documents working very well but for examle when I opening existing record with russian text and add something text and html etc. and then save (occur only russain others languages is ok) all russian (cyrillic) are replace by char '?'. I dont know why :( The field to store xml is ntext (unicode), and stored procedures to create and updating are parameter @xmlContent ntext. I think this a problem in MS SQL because before execute stored procedures create/update I saved xml content in file on hdd and was well formated. Do you have any ideas ? Luke (lukasz.golo***@gmail.com) writes:
Show quote > I need help to resolve some problem. You talk abot editing the documents, but you don't say where you edit> The problem is following: > I have the table in MS SQL Db to store xml documents. The xml documents > have content (in CDATA section) in plain text with html formating. > Content can have text in foregin languages (polish,russian,french). > Saving xml documents working very well but for examle when I opening > existing record with russian text and add something text and html etc. > and then save (occur only russain others languages is ok) all russian > (cyrillic) are replace by char '?'. > I dont know why :( > > The field to store xml is ntext (unicode), and stored procedures to > create and updating are parameter @xmlContent ntext. > > I think this a problem in MS SQL because before execute stored > procedures create/update I saved xml content in file on hdd and was > well formated. them. Do you edit them on disk? The data that is stored in SQL Server? If the latter, what application do you use to edit them with? -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx Hello,
I have written webapplication in C# (Visual Studio). Application have html wysiwyg editor where you can edit the html content. During saving the html content is packing to xml and saving to MS SQL Server. Aplication have utf-8 encoding. I saving xml on hdd in file and to Database because this way I can check that my code is ok and haven't encoding conversion bugs or something. By the way on MSDN I found something like that: "Any time Unicode data must be inserted into these columns, the columns will be internally converted from Unicode using the WideCharToMultiByte API and the code page associated with the collation. Any time a character cannot be represented on the given code page, it will be replaced by a question mark (?); this makes random question marks a good indication of data that has been corrupted due to this conversion. It also is a good indication that you really needed a Unicode data type. If you use a string literal of a non-Unicode type, it will be converted first using the database's default code page (derived from its collation)." But how resolve my problem ? :( There is no answer ony sugesstion " that you really needed a Unicode data type" :( In my Database I using Polish Collation but sometimes its saving well sometimes not and that I dont understand. Sorry for my English ;) What is the datatype of the column where you store the xml? Text, NText,
nvarchar, varchar...? MC Show quote "Luke" <lukasz.golo***@gmail.com> wrote in message news:1137663336.941616.220500@z14g2000cwz.googlegroups.com... > Hello, > > I have written webapplication in C# (Visual Studio). Application have > html wysiwyg editor where you can edit the html content. During saving > the html content is packing to xml and saving to MS SQL Server. > Aplication have utf-8 encoding. I saving xml on hdd in file and to > Database because this way I can check that my code is ok and haven't > encoding conversion bugs or something. > By the way on MSDN I found something like that: > > "Any time Unicode data must be inserted into these columns, the columns > will be internally converted from Unicode using the WideCharToMultiByte > API and the code page associated with the collation. Any time a > character cannot be represented on the given code page, it will be > replaced by a question mark (?); this makes random question marks a > good indication of data that has been corrupted due to this conversion. > It also is a good indication that you really needed a Unicode data > type. If you use a string literal of a non-Unicode type, it will be > converted first using the database's default code page (derived from > its collation)." > > But how resolve my problem ? :( There is no answer ony sugesstion " > that you really needed a Unicode data type" :( > In my Database I using Polish Collation but sometimes its saving well > sometimes not and that I dont understand. > Sorry for my English ;) > SQL uses UTF-16, so should your application. At least make sure the string is
UTF-16 before you try to insert it. ML --- http://milambda.blogspot.com/ Hymmm this is some solution thanks.
The strange is that sometimes inserting well. You think this is only way to resolve ? I'm pretty sure about the SQL part. :)
Perhaps you should ask in a .Net newsgroup as well. ML --- http://milambda.blogspot.com/ Ok.
The last question. If I convert utf-8 to Unicode you think should help ? I'm pretty certain it will. Now you'll test it and then we all will know for
sure. :) ML --- http://milambda.blogspot.com/ I tested.
In application added code converting xml (utf-8) document to unicode before inserting to database and I have this same result. :(. In .NET I haven't any more options to convert :(. Luke wrote:
> Ok. yes or no!> The last question. If I convert utf-8 to Unicode you think should help ? For example, if you convert Latin 'A' from utf-8 to Unicode it is simple from 0x41 to 0x0041 and WideCharToMultiByte will do it. But for Cyrillic it is different: Cyrillic 'A' in Win1251 is 0xC0 and 0x410 in Unicode. Procedure for Win1251 to Unicode in C looks like WORD ConvertWin1251ToUnicode(BYTE byCode) { WORD wCode; if(byCode<0xC0) { switch(byCode) { case 0xa8: wCode = 0x401; break; case 0xb8: wCode = 0x451; break; default: wCode = byCode; break; } } else { wCode = 0x410 + (WORD)(byCode - 0xc0); } return wCode; } some more about, including converters at http://russiantext.ircdb.org Luke (lukasz.golo***@gmail.com) writes:
> I tested. It's still very unclear to me where you get these ?, how you insert> In application added code converting xml (utf-8) document to unicode > before inserting to database and I have this same result. :(. In .NET I > haven't any more options to convert :(. the text etc. If you look at the data from Query Analyzer what do you see? There is obviously some conversion problem going on, but without know- ledge about your code, it's difficult to say. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.seBooks Online for SQL Server 2005 athttp://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 athttp://www.microsoft.com/sql/prodinfo/previousversions/books.mspx hi,
when i tried to save the mails from MICROSOFT OUTLOOK using VBA to mysql database also happened the same. but when i tried to save using HTML Form its saving correctly.(database field type UTF-8 General).This means it is not the datafield problem but of data trasfering from VB. I hadn't got the solution for this.still i am quoting this here to mention theproblem is of the VB code. thanks *** Sent via Developersdex http://www.developersdex.com *** try this:
[url]http://www.geniusconnect.com/geniusconnect.asp[/url] -- RadekM ------------------------------------------------------------------------ Posted via http://www.codecomments.com ------------------------------------------------------------------------
Show quote
"Luke" wrote: You must precede all Unicode strings with a prefix N when you deal with > Hello ! > I need help to resolve some problem. > The problem is following: > I have the table in MS SQL Db to store xml documents. The xml documents > have content (in CDATA section) in plain text with html formating. > Content can have text in foregin languages (polish,russian,french). > Saving xml documents working very well but for examle when I opening > existing record with russian text and add something text and html etc. > and then save (occur only russain others languages is ok) all russian > (cyrillic) are replace by char '?'. > I dont know why :( > > The field to store xml is ntext (unicode), and stored procedures to > create and updating are parameter @xmlContent ntext. > > I think this a problem in MS SQL because before execute stored > procedures create/update I saved xml content in file on hdd and was > well formated. > > Do you have any ideas ? > Unicode string constants. When dealing with Unicode string constants in SQL Server you must precede all Unicode strings with a capital letter N, as documented in the SQL Server Books Online topic "Using Unicode Data". The "N" prefix stands for National Language in the SQL-92 standard, and must be uppercase. If you do not prefix a Unicode string constant with N, SQL Server will convert it to the non-Unicode code page of the current database before it uses the string. http://support.microsoft.com/?kbid=239530 |
|||||||||||||||||||||||