Home All Groups Group Topic Archive Search About

Full-Text search indexing HTML content

Author
23 Nov 2005 10:45 PM
bacusgod
I've been quite busy these days looking up  which was the best way to create a full-text search catalog for a field composed entirely of HTML content, and I found out that the best way to handle it, so it could ignore HTML tags as the search was performed, was to make the column an Image-type field, and associate a file type in a separate column.

I developed a small application that read each of the text field entries, converted it to a byte[] variable using the UnicodeEncoding class (and it's GetString and GetBytes methods) and saved the resulting byte array as a binary file in the new image field. I even tested how the field data would fare in a physical file, by using a FileStream to write the Byte[] data on several different records.

So far, so good. It all seemed to be working. The problem is that after I build up the full-text catalog (using the wizard, and specifying the file type field related to the "image" field where the HTML file is stored), I get no resulting records whenever I perform a query using either CONTAINS or FREETEXT. Such queries worked just fine when the field was a text field instead of an image field, so I'm guessing something went wrong either on the data conversion, or on the catalog building process.

Anyone has any ideas? -- bacusgod ------------------------------------------------------------------------ Posted via http://www.codecomments.com ------------------------------------------------------------------------

Author
24 Nov 2005 9:56 PM
Hugo Kornelis
On Wed, 23 Nov 2005 16:45:54 -0600, bacusgod wrote:

(snip)
>Anyone has any ideas?

Hi bacusgod,

You might want to try reposting this in the group where the full text
experts hang out: microsoft.public.sqlserver.fulltext

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

AddThis Social Bookmark Button