|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
How to do field content search in sqlI have a silly question about SQL, I wonder if it is possible using existing technique to accomplish it. I have a binary field(e.g. Image) in SQL, I need to store image file(scanned from original document) in that field. I don't think it is possible but my boss want me to give him at least an alternative solution: Can I search the text in this field? To make it clear: I scan a paper document, I get a jpeg file, the original file contains: text object, handwritting object. There is a word "Bush" in the document. I store this jpeg file in a field (type of Image) in SQL database, Now I want to do a search for "Bush" in that field in the database. Can I do that? I told him it is impossible to do that but you know he is looking for kind of alternative solution, he doesn't care money. I told him I can search for text object, if you put some description on that image field, then I can search those description. But he want to search the whole document. Can I do sort of recognization and extract content from the scanned jpeg, and then store these extracted content in sql so that I can do some search? Thanks. You can't do that out-of-the-box with SQL Server. However, there are
plenty of document imaging systems that will scan and index documents into a database for you. You should probably take a look at some of those packages. -- David Portas SQL Server MVP -- Raymond,
David, this can actually be done with a slight and very easy modification to Raymond's procedures... Raymond, when you scan a paper document can your scanner or scanner software save the file as a Tagged Image File Format (TIFF) file? Most scanners can store files as TIFF as well do OCR too... If you can then you can use the TIFF IFilter (mspfilt.dll) as it filters files with the TIFF extension. This filter gets installed by MS Office 2003 & MS Office XP. If you cannot, you could possibly use one of several JPEG IFilters, for example the XMP IFilter lets you index JPEG, GIF, TIFF, PNG, PS, EPS, PSD, AI and SVG files (see http://www.ifiltershop.com/xmpfilter.html) or JPEG IFilter - JPEG Content Filter for Microsoft Indexing Service (see http://www.aimingtech.com/jpeg_ifilter/). While I've not tested these JPEG IFilters with SQL Server 2000 FTS, but as they say an ifilter, is an ifilter, is an ifilter... These IFilters work the same way and use the same technology as Adobe's PDF IFilter that can used with SQL Server 2000 Full-text Search (FTS). You store the binary file in a column defined with the IMAGE datatype and then alter your table and add a new column for "file extension" and define this column as char(3), varchar(4) or sysname and populate it with "pdf", ".pdf" respectively. Below are a couple KB articles related to the TIFF IFilter: Q321820 FIX: Non-OCR/Non-Display TIFF Data Indexed by SQL Server Full-Text http://support.microsoft.com/default.aspx?scid=kb;en-us;Q321820 Q283950 SPS: Some Character Sets Are Not Supported for OCR by the TIFF Index Filter http://support.microsoft.com/default.aspx?scid=kb;en-us;Q283950 Q291835 SPS: TIFF Filter Stops Working After You Start the Windows Components Wizard http://support.microsoft.com/default.aspx?scid=kb;en-us;q291835 Q294303 SPS: TIFF Filter Does Not Perform OCR on Indexed Files http://support.microsoft.com/default.aspx?scid=kb;en-us;Q294303 Q. Can I do sort of recognition and extract content from the scanned jpeg, and then store these extracted content in sql so that I can do some search? A. Yes, if your scanner &/or scanner software supports OCR or Optical Character Recognition, then you will be able to extract the content. Hope that helps! John Show quote "David Portas" <REMOVE_BEFORE_REPLYING_dpor***@acm.org> wrote in message news:1125682987.259473.144690@g44g2000cwa.googlegroups.com... > You can't do that out-of-the-box with SQL Server. However, there are > plenty of document imaging systems that will scan and index documents > into a database for you. You should probably take a look at some of > those packages. > > -- > David Portas > SQL Server MVP > -- > Hi John and David,
Thank you for your reply. John, I tried your script on your blog to set up a full text search, I stall couldn't get any matching file when I do my search. For example, I inserted a txt file containing "John" into the image column of table FTSTable, and then I search it in that table, no luck. Did I miss anything? I followed every step on the script on your blog. Thanks. *** Sent via Developersdex http://www.developersdex.com *** Raymond,
Could you review your server's Application event log for any "Microsoft Search" or MssCi source events. Specifically, for MssCi (informational, warnings & errors) as this may indicate why the Full Population did not succeed for the file type. Can I assume you are using English as the "Language for Word Breaker" and that English is the language in the docs? Additionally, could you confirm that you have properly associated the correct file type MS Word with "doc" in the file extension column defined as char(3)? Feel free to email me directly, at jt-kane at comcast dot net Thanks, John Show quote "Raymond Xie" <halifax***@yahoo.ca> wrote in message news:%23M9k48xsFHA.3236@TK2MSFTNGP09.phx.gbl... > Hi John and David, > > Thank you for your reply. > > John, I tried your script on your blog to set up a full text search, I > stall couldn't get any matching file when I do my search. For example, I > inserted a txt file containing "John" into the image column of table > FTSTable, and then I search it in that table, no luck. > > Did I miss anything? I followed every step on the script on your blog. > > Thanks. > > > > > > *** Sent via Developersdex http://www.developersdex.com *** Raymond,
A follow-up... Did you install Adobe's PDF IFilter? If not, you can download the PDF Ifilter from the following blog entry: "IFilters or Indexing Filters used with SQL FTS..." at: http://spaces.msn.com/members/jtkane/Blog/cns!1pWDBCiDX1uvH5ATJmNCVLPQ!374.entry Click on PDF - Adobe and then save or install the ifilter60.exe file on your server where SQL Server 2000 is installed. Then re-run a Full Population and check the server's application event log for a successful Full Population! Thanks, John Show quote "John Kane" <jt-k***@comcast.net> wrote in message news:uqVHf8CtFHA.3236@TK2MSFTNGP09.phx.gbl... > Raymond, > Could you review your server's Application event log for any "Microsoft > Search" or MssCi source events. Specifically, for MssCi (informational, > warnings & errors) as this may indicate why the Full Population did not > succeed for the file type. Can I assume you are using English as the > "Language for Word Breaker" and that English is the language in the docs? > Additionally, could you confirm that you have properly associated the > correct file type MS Word with "doc" in the file extension column defined > as char(3)? > > Feel free to email me directly, at jt-kane at comcast dot net > > Thanks, > John > -- > SQL Full Text Search Blog > http://spaces.msn.com/members/jtkane/ > > > > "Raymond Xie" <halifax***@yahoo.ca> wrote in message > news:%23M9k48xsFHA.3236@TK2MSFTNGP09.phx.gbl... >> Hi John and David, >> >> Thank you for your reply. >> >> John, I tried your script on your blog to set up a full text search, I >> stall couldn't get any matching file when I do my search. For example, I >> inserted a txt file containing "John" into the image column of table >> FTSTable, and then I search it in that table, no luck. >> >> Did I miss anything? I followed every step on the script on your blog. >> >> Thanks. >> >> >> >> >> >> *** Sent via Developersdex http://www.developersdex.com *** > > |
|||||||||||||||||||||||