Home All Groups Group Topic Archive Search About

full text indexing on binary fields

Author
1 Jul 2005 6:24 PM
Brian Henry
i want to store PDF files in a database, but yet still have them full text
searchable by the database... I have the adobe PDF IFilter interface
installed and registered on the SQL Server box which is to allow for full
text searching on PDF files in a database... Now, my question is how do i do
a full text search on a image field? is there any thing special I need to
take into consideration or do? or any special setup in the full text
catalogue's? thanks!

Author
1 Jul 2005 6:33 PM
Tom Moreau
You have to designate another column in the table to act as a discriminator,
i.e. it tells SQL Server what type of binary the indexed column is.  In your
case, the value 'PDF' would be stored in the discriminator column.

--
   Tom

----------------------------------------------------
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON   Canada
www.pinpub.com
..
"Brian Henry" <nospam@nospam.com> wrote in message
news:%230ZZsomfFHA.3912@tk2msftngp13.phx.gbl...
i want to store PDF files in a database, but yet still have them full text
searchable by the database... I have the adobe PDF IFilter interface
installed and registered on the SQL Server box which is to allow for full
text searching on PDF files in a database... Now, my question is how do i do
a full text search on a image field? is there any thing special I need to
take into consideration or do? or any special setup in the full text
catalogue's? thanks!
Author
1 Jul 2005 6:40 PM
Christian Donner
"Brian Henry" schrieb:
> i want to store PDF files in a database, but yet still have them full text
> searchable by the database... I have the adobe PDF IFilter interface
> installed and registered on the SQL Server box which is to allow for full
> text searching on PDF files in a database... Now, my question is how do i do
> a full text search on a image field? is there any thing special I need to
> take into consideration or do? or any special setup in the full text
> catalogue's? thanks!

I don't think it's possible to search for images - they are serendipitious ...
If you really need this functionality, you'll have to use OCR software and
read the textual content of the pdf files into text fields (which you can
index and search).
Author
1 Jul 2005 6:59 PM
Tom Moreau
SQL Server 2000 can full-text index on binaries, provided that the
appropriate IFilter has been installed and that you have a discriminator
column in the table.

--
   Tom

----------------------------------------------------
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON   Canada
www.pinpub.com
..
"Christian Donner" <ChristianDon***@discussions.microsoft.com> wrote in
message news:FA23555F-0EC2-4CB6-8737-17EAE98F9BB7@microsoft.com...
"Brian Henry" schrieb:
> i want to store PDF files in a database, but yet still have them full text
> searchable by the database... I have the adobe PDF IFilter interface
> installed and registered on the SQL Server box which is to allow for full
> text searching on PDF files in a database... Now, my question is how do i
> do
> a full text search on a image field? is there any thing special I need to
> take into consideration or do? or any special setup in the full text
> catalogue's? thanks!

I don't think it's possible to search for images - they are serendipitious
....
If you really need this functionality, you'll have to use OCR software and
read the textual content of the pdf files into text fields (which you can
index and search).
Author
3 Jul 2005 4:39 PM
John Kane
Brian,
Tom is correct. SQL Server 2000 can FT Index and FT Search binary files,
such as MS Word, MS Excel, HTML files. Specifically, "Microsoft® SQL ServerT
2000 includes filters for these file extensions: .doc, .xls, .ppt, .txt, and
..htm." - quoted from the SQL Server 2000 Books Online (BOL) title "Filtering
Supported File Types". Note, this is a new feature of SQL Server 2000 as
this feature is not present in SQL Server 7.0.

I've attached a SQL script file (Import_FTS_Images.sql) to this reply that
demonstrates the "discriminator column in the table", i.e., column name
"ExtCol". Note, that this column MUST be defined as char(3), varchar(4) or
sysname for this function to work correctly. Then you would populated this
column with the appropriate value for the document stored in the respective
row, for example "pdf" or ".pdf" or "doc" or ".doc". Once you uploaded your
PDF files, then run a Full Population and confirm a successful FT Population
in the Application Event log, then you can use the standard SQL FTS query
predicates of CONTAINS, FREETEXT, CONTAINSTABLE or FREETEXTTABLE to query
the contents of these PDF files.

Thanks,
John
--
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/


Show quote
"Tom Moreau" <tom@dont.spam.me.cips.ca> wrote in message
news:ucr8N8mfFHA.1948@TK2MSFTNGP12.phx.gbl...
> SQL Server 2000 can full-text index on binaries, provided that the
> appropriate IFilter has been installed and that you have a discriminator
> column in the table.
>
> --
>    Tom
>
> ----------------------------------------------------
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
> SQL Server MVP
> Columnist, SQL Server Professional
> Toronto, ON   Canada
> www.pinpub.com
> .
> "Christian Donner" <ChristianDon***@discussions.microsoft.com> wrote in
> message news:FA23555F-0EC2-4CB6-8737-17EAE98F9BB7@microsoft.com...
> "Brian Henry" schrieb:
> > i want to store PDF files in a database, but yet still have them full
text
> > searchable by the database... I have the adobe PDF IFilter interface
> > installed and registered on the SQL Server box which is to allow for
full
> > text searching on PDF files in a database... Now, my question is how do
i
> > do
> > a full text search on a image field? is there any thing special I need
to
> > take into consideration or do? or any special setup in the full text
> > catalogue's? thanks!
>
> I don't think it's possible to search for images - they are serendipitious
> ...
> If you really need this functionality, you'll have to use OCR software and
> read the textual content of the pdf files into text fields (which you can
> index and search).
>

[attached file: Import_FTS_Images.sql]

AddThis Social Bookmark Button