Saturday, June 4, 2011

SQL Server 2008 (T-SQL) Creating FullText Index

this example work on adventureworks database.

for working with fulltext index, we need to create index on table . the following code create table with DocID . this column datatype is int with primary kry. note also the column DocContent with varbinary(max) and column DocSummary with nvarchar(max) . the last two column is our target of creating fulltext index.

CREATE TABLE _My_ProductDocs (
DocID INT NOT NULL IDENTITY,
DocTitle NVARCHAR(50) NOT NULL,
DocFilename NVARCHAR(400) NOT NULL,
FileExtension NVARCHAR(8) NOT NULL,
DocSummary NVARCHAR(MAX) NULL,
DocContent VARBINARY(MAX) NULL,
CONSTRAINT [PK_ProductDocs_DocID] PRIMARY KEY CLUSTERED (DocID ASC)
)
GO

--------------------------------------------------------------------------------

Now we need to filling data from Production.Document (adventureworks DB) to our created table.

INSERT INTO _My_ProductDocs
(DocTitle, DocFilename, FileExtension, DocSummary, DocContent)
SELECT Title, FileName, FileExtension, DocumentSummary, Document
FROM Production.Document
GO

--------------------------------------------------------------------------------

Showing data inserted in the our table

select * from _My_ProductDocs

--------------------------------------------------------------------------------

First step  now is creating FULLTEXT Catalog.

--Creating the Full-Text Catalog
USE AdventureWorks
GO
CREATE FULLTEXT CATALOG My_ProductFTS
WITH ACCENT_SENSITIVITY = OFF

--------------------------------------------------------------------------------

To know defalut database collation, you can select from system view sys.databases the column name collation_name will tell about the collation of the selected database.

SELECT name, collation_name FROM sys.databases
WHERE name = 'AdventureWorks'

--------------------------------------------------------------------------------

Verify that the catalog has been created

SELECT fulltext_catalog_id, name FROM sys.fulltext_catalogs

--------------------------------------------------------------------------------

define a full-text index . it's need in this step to specify your columns that you need to make your fulltext index. in the following code fulltext index will create on our table _My_ProductDocs and use the columns (DocSummary - DocContent) . because of the DocContent has varbinary(max) datatype you can specify TYPE COLUMN keyword, it's help SQL Server to determine the type file for each cell binary in the DocContent column . also you need to specify the type of your language you will use by your fulltext index it's make by keyword LANGUAGE with the number of your language.

at general, you can't create fulltext index without a primarykey in your table. the creation of your fulltext index will use the primarykey constraint of your table PK_ProductDocs_DocID . fulltext index need also to associate himself with the name of specific fulltext catalog  My_ProductFTS .

finally, you can attach s stoplist items for you fulltext index. stoplist item will help SQL Server to detect noisy words that you don't like to be included in your fulltext index.
the line of code WITH STOPLIST = SYSTEM mean that your fulltext index will be use the system STOPLIST the is found in SQL Server. you can make you owen list and configure that .

CREATE FULLTEXT INDEX ON _My_ProductDocs
(DocSummary, DocContent TYPE COLUMN FileExtension LANGUAGE 1033)
KEY INDEX PK_ProductDocs_DocID
ON My_ProductFTS
WITH STOPLIST = SYSTEM

--------------------------------------------------------------------------------

Create a stoplist

CREATE FULLTEXT STOPLIST ProductSL
FROM SYSTEM STOPLIST;


Verify that the stoplist has been created

SELECT stoplist_id, name FROM sys.fulltext_stoplists

---------------------------------------------------------------------------------

You can view stoplist contents

SELECT stopword FROM sys.fulltext_stopwords
WHERE stoplist_id = 5 AND language_id = 1033

---------------------------------------------------------------------------------

Add words to stoplist

ALTER FULLTEXT STOPLIST ProductSL
ADD 'nuts' LANGUAGE 1033;

---------------------------------------------------------------------------------

Test how SQL Server will tokenize a string based on a specific language and stoplist

SELECT special_term, display_term
FROM sys.dm_fts_parser
(' "testing for fruit and nuts, any type of nut" ', 1033, 5, 0)

---------------------------------------------------------------------------------

Drop “nuts” from the stoplist

ALTER FULLTEXT STOPLIST ProductSL
DROP 'nuts' LANGUAGE 1033;
-----------------------------------------------------------------------------------

Modify FULTEXT INDEX with new STOPLIST

ALTER FULLTEXT INDEX ON _My_ProductDocs
SET STOPLIST ProductSL

--------------------------------------------------------------------------------
Return a list of the document types supported by full-text search

SELECT document_type, version, manufacturer
FROM sys.fulltext_document_types

--------------------------------------------------------------------------------

View a list of the identifiers and their associated languages

SELECT * FROM sys.fulltext_languages
ORDER BY lcid

--------------------------------------------------------------------------------

Verify the index has been created and is associated with the correct catalog

SELECT t.name AS TableName, c.name AS FTCatalogName
FROM sys.tables t JOIN sys.fulltext_indexes i
ON t.object_id = i.object_id
JOIN sys.fulltext_catalogs c
ON i.fulltext_catalog_id = c.fulltext_catalog_id

---------------------------------------------------------------------------------

Lists the contents of the index

SELECT display_term, column_id, document_count
FROM sys.dm_fts_index_keywords
(DB_ID('AdventureWorks'), OBJECT_ID('_My_ProductDocs'))

---------------------------------------------------------------------------------

Modifying the Full-Text Thesaurus

EXEC sys.sp_fulltext_load_thesaurus_file 1033

No comments: