I am undertaking a project which requires the ability to perform phonetic searching across business and personal names and addresses. This data is relatively uncleansed to date. The amount of work to cleanse the data is tremendous based upon existing data analysis. My thoughts are that if we can find the right tools we should be able to perform matching and searching across the data without implementing heavy duty cleansing prior to the search. When I talk about uncleansed data in terms of names I am talking about things like the following: Acme, Inc Acme, Incorp. Acme Incorporated **Note variations in abbreviations of Inc. this occurs on other words such as Company, Association, Limited, DBA and other unknowns. Smith Plumbing Smyth Plumbing Kool Runnings Cool Runnings **Note variations in phonetic spelling of words Ultimately what I am wondering is to what success have people had with searching fuzzy type searches like this using solely Microsofts Full Text Indexing. I would also be interested to know if SQL Server 2005 improves on the capabilities of fuzzy matching. In addition I found some information regardng SQL Turbo http://www.quest.com/sql_turbo/ which seems to be an extender or different implementation of FTS which does support phonetic matche and I would be interested to know if anyone has used this tool as well. Any thoughts on how to approach this matter would be appreciated.
You need to build a thesaurus with all the alternative spellings to implement this in SQL FTS. You could try to cleanse your data before indexing it using fuzzy grouping, or by rolling your own levenstein edit distance. SQL FTS fuzzy search (aka freetext search) will stem your search phrase for verb forms but will not do the kind of fuzzy searches you are looking for. There was a time when SQL Turbo was actually faster than SQL 2000 FTS, but SQL 2005 is now faster, but I have not compared it against the new version of SQL Turbo yet. -- Hilary Cotter Looking for a SQL Server replication book? http://www.nwsu.com/0974973602.html Looking for a FAQ on Indexing Services/SQL FTS http://www.indexserverfaq.com wrote in message news:1173533768.419591.192960@t69g2000cwt.googlegroups.com... >I am undertaking a project which requires the ability to perform > phonetic searching across business and personal names and addresses. > This data is relatively uncleansed to date. The amount of work to > cleanse the data is tremendous based upon existing data analysis. My > thoughts are that if we can find the right tools we should be able to > perform matching and searching across the data without implementing > heavy duty cleansing prior to the search. When I talk about > uncleansed data in terms of names I am talking about things like the > following: > > Acme, Inc > Acme, Incorp. > Acme Incorporated > > **Note variations in abbreviations of Inc. this occurs on other words > such as Company, Association, Limited, DBA and other unknowns. > > Smith Plumbing > Smyth Plumbing > Kool Runnings > Cool Runnings > > **Note variations in phonetic spelling of words > > Ultimately what I am wondering is to what success have people had with > searching fuzzy type searches like this using solely Microsofts Full > Text Indexing. I would also be interested to know if SQL Server 2005 > improves on the capabilities of fuzzy matching. In addition I found > some information regardng SQL Turbo http://www.quest.com/sql_turbo/ > which seems to be an extender or different implementation of FTS which > does support phonetic matche and I would be interested to know if > anyone has used this tool as well. Any thoughts on how to approach > this matter would be appreciated. >
Have you looked at: http://www.codeproject.com/string/dmetaphone6.asp The double-metaphone algorithm is available and has been implemented as an extended stored procedure. It is pretty good for people names, but I don't know if it would work well for business names. FWIW - RLF wrote in message news:1173533768.419591.192960@t69g2000cwt.googlegroups.com... >I am undertaking a project which requires the ability to perform > phonetic searching across business and personal names and addresses. > This data is relatively uncleansed to date. The amount of work to > cleanse the data is tremendous based upon existing data analysis. My > thoughts are that if we can find the right tools we should be able to > perform matching and searching across the data without implementing > heavy duty cleansing prior to the search. When I talk about > uncleansed data in terms of names I am talking about things like the > following: > > Acme, Inc > Acme, Incorp. > Acme Incorporated > > **Note variations in abbreviations of Inc. this occurs on other words > such as Company, Association, Limited, DBA and other unknowns. > > Smith Plumbing > Smyth Plumbing > Kool Runnings > Cool Runnings > > **Note variations in phonetic spelling of words > > Ultimately what I am wondering is to what success have people had with > searching fuzzy type searches like this using solely Microsofts Full > Text Indexing. I would also be interested to know if SQL Server 2005 > improves on the capabilities of fuzzy matching. In addition I found > some information regardng SQL Turbo http://www.quest.com/sql_turbo/ > which seems to be an extender or different implementation of FTS which > does support phonetic matche and I would be interested to know if > anyone has used this tool as well. Any thoughts on how to approach > this matter would be appreciated. >