Home All Groups Group Topic Archive Search About

Unicode encoding problem ?

Author
19 Jan 2006 8:25 AM
Luke
Hello !
I need help to resolve some problem.
The problem is following:
I have the table in MS SQL Db to store xml documents. The xml documents
have content (in CDATA section) in plain text with html formating.
Content can have text in foregin languages (polish,russian,french).
Saving xml documents working very well but for examle when I opening
existing record with russian text and add something text and html etc.
and then save (occur only russain others languages is ok) all russian
(cyrillic) are replace by char '?'.
I dont know why :(

The field to store xml is ntext (unicode), and stored procedures to
create and updating are parameter @xmlContent ntext.

I think this a problem in MS SQL because before execute stored
procedures create/update I saved xml  content in file on hdd and was
well formated.

Do you have any ideas ?

Author
19 Jan 2006 9:16 AM
Erland Sommarskog
Luke (lukasz.golo***@gmail.com) writes:
Show quote
> I need help to resolve some problem.
> The problem is following:
> I have the table in MS SQL Db to store xml documents. The xml documents
> have content (in CDATA section) in plain text with html formating.
> Content can have text in foregin languages (polish,russian,french).
> Saving xml documents working very well but for examle when I opening
> existing record with russian text and add something text and html etc.
> and then save (occur only russain others languages is ok) all russian
> (cyrillic) are replace by char '?'.
> I dont know why :(
>
> The field to store xml is ntext (unicode), and stored procedures to
> create and updating are parameter @xmlContent ntext.
>
> I think this a problem in MS SQL because before execute stored
> procedures create/update I saved xml  content in file on hdd and was
> well formated.

You talk abot editing the documents, but you don't say where you edit
them. Do you edit them on disk? The data that is stored in SQL Server?
If the latter, what application do you use to edit them with?


--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
19 Jan 2006 9:35 AM
Luke
Hello,

I have written webapplication in C# (Visual Studio). Application have
html wysiwyg editor where you can edit the html content. During saving
the html content is packing to xml and saving to MS SQL Server.
Aplication have utf-8 encoding. I saving xml on hdd in file and to
Database because this way I can check that my code is ok and haven't
encoding conversion bugs or something.
By the way on MSDN I found something like that:

"Any time Unicode data must be inserted into these columns, the columns
will be internally converted from Unicode using the WideCharToMultiByte
API and the code page associated with the collation. Any time a
character cannot be represented on the given code page, it will be
replaced by a question mark (?); this makes random question marks a
good indication of data that has been corrupted due to this conversion.
It also is a good indication that you really needed a Unicode data
type. If you use a string literal of a non-Unicode type, it will be
converted first using the database's default code page (derived from
its collation)."

But how resolve my problem ? :( There is no answer ony sugesstion "
that you really needed a Unicode data type" :(
In my Database I using Polish Collation but sometimes its saving well
sometimes not and that I dont understand.
Sorry for my English ;)
Author
19 Jan 2006 10:45 AM
MC
What is the datatype of the column where you store the xml? Text, NText,
nvarchar, varchar...?

MC

Show quote
"Luke" <lukasz.golo***@gmail.com> wrote in message
news:1137663336.941616.220500@z14g2000cwz.googlegroups.com...
> Hello,
>
> I have written webapplication in C# (Visual Studio). Application have
> html wysiwyg editor where you can edit the html content. During saving
> the html content is packing to xml and saving to MS SQL Server.
> Aplication have utf-8 encoding. I saving xml on hdd in file and to
> Database because this way I can check that my code is ok and haven't
> encoding conversion bugs or something.
> By the way on MSDN I found something like that:
>
> "Any time Unicode data must be inserted into these columns, the columns
> will be internally converted from Unicode using the WideCharToMultiByte
> API and the code page associated with the collation. Any time a
> character cannot be represented on the given code page, it will be
> replaced by a question mark (?); this makes random question marks a
> good indication of data that has been corrupted due to this conversion.
> It also is a good indication that you really needed a Unicode data
> type. If you use a string literal of a non-Unicode type, it will be
> converted first using the database's default code page (derived from
> its collation)."
>
> But how resolve my problem ? :( There is no answer ony sugesstion "
> that you really needed a Unicode data type" :(
> In my Database I using Polish Collation but sometimes its saving well
> sometimes not and that I dont understand.
> Sorry for my English ;)
>
Author
19 Jan 2006 10:47 AM
Luke
The field is NText
Author
19 Jan 2006 10:57 AM
ML
SQL uses UTF-16, so should your application. At least make sure the string is
UTF-16 before you try to insert it.


ML

---
http://milambda.blogspot.com/
Author
19 Jan 2006 11:18 AM
Luke
Hymmm this is some solution thanks.
The strange is that sometimes inserting well.
You think this is only way to resolve ?
Author
19 Jan 2006 11:28 AM
ML
I'm pretty sure about the SQL part. :)
Perhaps you should ask in a .Net newsgroup as well.


ML

---
http://milambda.blogspot.com/
Author
19 Jan 2006 11:57 AM
Luke
Ok.
The last question. If I convert utf-8 to Unicode you think should help ?
Author
19 Jan 2006 12:06 PM
ML
I'm pretty certain it will. Now you'll test it and then we all will know for
sure. :)


ML

---
http://milambda.blogspot.com/
Author
19 Jan 2006 12:23 PM
Luke
I tested.
In application added code converting xml (utf-8) document to unicode
before inserting to database and I have this same result. :(. In .NET I
haven't any more options to convert :(.
Author
19 Jan 2006 1:30 PM
ML
As I said. Better ask in a .Net newsgroup.


ML

---
http://milambda.blogspot.com/
Author
19 Jan 2006 8:01 PM
Smike
Luke wrote:
> Ok.
> The last question. If I convert utf-8 to Unicode you think should help ?

yes or no!
For example, if you convert Latin 'A' from utf-8 to Unicode it is
simple
from 0x41 to 0x0041 and WideCharToMultiByte will do it.
But for Cyrillic it is different: Cyrillic 'A' in Win1251 is 0xC0 and
0x410 in Unicode.
Procedure for Win1251 to Unicode in C looks like

WORD ConvertWin1251ToUnicode(BYTE byCode)
{
   WORD wCode;
   if(byCode<0xC0)
   {
      switch(byCode)
      {
      case 0xa8:
         wCode = 0x401;
         break;
      case 0xb8:
         wCode = 0x451;
         break;
      default:
         wCode = byCode;
         break;
      }
   } else {
      wCode = 0x410 + (WORD)(byCode - 0xc0);
   }
   return wCode;
}

some more about, including converters at
http://russiantext.ircdb.org
Author
22 Jan 2006 10:39 PM
Erland Sommarskog
Luke (lukasz.golo***@gmail.com) writes:
> I tested.
> In application added code converting xml (utf-8) document to unicode
> before inserting to database and I have this same result. :(. In .NET I
> haven't any more options to convert :(.

It's still very unclear to me where you get these ?, how you insert
the text etc. If you look at the data from Query Analyzer what do you
see?

There is obviously some conversion problem going on, but without know-
ledge about your code, it's difficult to say.



--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.seBooks Online for SQL
Server 2005
athttp://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000
athttp://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
31 Jan 2006 11:29 AM
priya nair
hi,

when i tried to save the mails from MICROSOFT OUTLOOK using VBA to mysql
database also happened the same. but when i tried to save using HTML
Form its saving correctly.(database field type UTF-8 General).This means
it is not the datafield problem but of data trasfering from VB.

I hadn't got the solution for this.still i am quoting this here to
mention theproblem is of the VB code.

thanks




*** Sent via Developersdex http://www.developersdex.com ***
Author
1 Mar 2006 10:40 PM
RadekM
try this:
[url]http://www.geniusconnect.com/geniusconnect.asp[/url] -- RadekM ------------------------------------------------------------------------ Posted via http://www.codecomments.com ------------------------------------------------------------------------
Author
22 May 2006 12:08 AM
RaptorCX
Show quote
"Luke" wrote:

> Hello !
> I need help to resolve some problem.
> The problem is following:
> I have the table in MS SQL Db to store xml documents. The xml documents
> have content (in CDATA section) in plain text with html formating.
> Content can have text in foregin languages (polish,russian,french).
> Saving xml documents working very well but for examle when I opening
> existing record with russian text and add something text and html etc.
> and then save (occur only russain others languages is ok) all russian
> (cyrillic) are replace by char '?'.
> I dont know why :(
>
> The field to store xml is ntext (unicode), and stored procedures to
> create and updating are parameter @xmlContent ntext.
>
> I think this a problem in MS SQL because before execute stored
> procedures create/update I saved xml  content in file on hdd and was
> well formated.
>
> Do you have any ideas ?
>

You must precede all Unicode strings with a prefix N when you deal with
Unicode string constants.

When dealing with Unicode string constants in SQL Server you must precede
all Unicode strings with a capital letter N, as documented in the SQL Server
Books Online topic "Using Unicode Data". The "N" prefix stands for National
Language in the SQL-92 standard, and must be uppercase. If you do not prefix
a Unicode string constant with N, SQL Server will convert it to the
non-Unicode code page of the current database before it uses the string.

http://support.microsoft.com/?kbid=239530

AddThis Social Bookmark Button