How to: Load Licensed Third-Party Word Breakers

SQL Server 2008 R2 includes licensed third-party word breakers for the following languages:

  • Danish

  • Polish

  • Turkish

These word breakers are available but are not installed by default, and must be manually registered and then added to the list of LCIDs that are supported for full-text indexing and querying. These word breakers are not enabled by default because they are owned by third parties who have not yet provided the level of testing, security, and robustness that is required for them to be enabled by default.

Prerequisite Information

Before you can load a word breaker, you need the following information:

  • Instance names for each instance of SQL Server on which you want to register the word breakers.

  • The FTDATA path for each instance.

    After obtaining the instance IDs, you must retrieve the appropriate instance-specific path to the FTData folder. You will use this path when adding configuration values that specify the lexicon and thesaurus files for a language.

To Obtain Instance Name for Each Instance of SQL Server

  1. Click Start, and click Run.

  2. In the Run dialog box, in the Open box, type Regedit.

  3. Click OK. This opens the Registry Editor.

  4. In the Registry Editor, select the following registry key for the first instance of SQL Server 2008 R2, whose instance ID is MSSQL10_50.MSSQLSERVER: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\SQL

    The right pane displays the instance name and the corresponding instance ID.

Important

To obtain the instance name of another server instance, you must use its instance ID in the registry path instead of MSSQL10_50.MSSQLSERVER.

To Obtain the FTData Path for Each Instance

  1. Click Start, and click Run.

  2. In the Run dialog box, in the Open box, type Regedit.

  3. Click OK.

  4. In the Registry Editor, select the following registry key for an instance of SQL Server: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\instance_ID\MSSQLServer where instance_ID is MSSQL10_50.MSSQLSERVER for the first instance of SQL Server. The registry key value will be:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\Setup

    The right pane displays the FullTextDefaultPath value, which contains the instance specific path to the FTData folder. For example, for the first instance of SQL Server 2008 R2 this might be the default path:

    C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\FTData

The installation procedure for third-party word breakers licensed by Microsoft consists of three stages. The following list summarizes these stages, whose steps are described later in this section.

  1. Add the COM ClassID(s) for the word breaker and stemmer interfaces for the language being registered as a key to the <InstanceRoot>\MSSearch\CLSID node of the registry.

  2. Add a key to the <InstanceRoot>\MSSearch\Language node for the language.

  3. Add configuration values that specify the location of the lexicon and thesaurus files for the language.

Note

The Danish word breaker is used as an example in this section. The values required for installing word breakers for each of the languages are provided in the tables later in this topic.

Stage 1: Add the COM ClassID(s) for the Word Breaker and Stemmer Interfaces for the Language Being Registered

Warning

Incorrectly editing the registry can severely damage your system. Before making changes to the registry, you should back up any valued data on the computer.

To add COM Class ID(s) for these components for the Danish language**:**

  1. Open the Registry Editor, by:

    1. Clicking Start, and clicking Run.

    2. In the Run dialog box, in the Open box, type Regedit.

  2. In Registry Editor, select the following registry key for the first instance of SQL Server: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSearch\CLSID

  3. On the menu bar, click Edit, click New, and click Key.

  4. Type {16BC5CE4-2C78-4CB9-80D5-386A68CC2B2D}.

  5. Press ENTER.

  6. In the right pane, right-click the Default registry value, and then click Modify.

  7. In the Edit String dialog box, in the Value data box, type danlr.dll, and then click OK.

  8. Repeat steps 3 through 7, replacing the value in step 4 with {83BC7EF7-D27B-4950-A743-0F8E5CA928F8}.

For a given language, follow the steps above, replacing the key values in steps 4 and 8 with the key values for the language you want. These values are listed below. In step 7, replace danlr.dll with the .dll name for the language you want.

Language

Key value for step 4

.DLL name for step 7

Key value for step 8

Danish

{16BC5CE4-2C78-4CB9-80D5-386A68CC2B2D}

danlr.dll

{83BC7EF7-D27B-4950-A743-0F8E5CA928F8}

Polish

{B8713269-2D9D-4BF5-BF40-2615D75723D8}

lrpolish.dll

{CA665B09-4642-4C84-A9B7-9B8F3CD7C3F6}

Turkish

{23A9C1C3-3C7A-4D2C-B894-4F286459DAD6}

trklr.dll

{8DF412D1-62C7-4667-BBEC-38756576C21B}

Stage 2: Add a Key to the <InstanceRoot>\MSSearch\Language Node for the Language

To add a key to this node for the Danish language:

  1. Select the following registry key for the first instance of SQL Server: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSearch\Language

  2. Repeat steps 3 through 5 in the preceding procedure, replacing the key name in step 4 with dan.

For a given language, follow the preceding steps, replacing the key name in step 4 with the value listed below for the specific language.

Language

Key name for step 4

Danish

dan

Polish

plk

Turkish

trk

Stage 3: Add Configuration Values That Give the Location of Each Linguistic Component for a Language

To add configuration values for these components for the Danish language:

  1. Select the registry key you entered in Stage 2 above. For the first instance of SQL Server this would be: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSearch\Language\dan

  2. On the menu bar, click Edit, click New, and click String Value.

  3. Type TsaurusFile.

  4. Press ENTER.

  5. Right-click the TsaurusFile registry value you just added, and then click Modify.

  6. In the Edit String dialog box, in the Valuedata box, type tsdan.xml.

  7. Click OK.

Repeat steps 2 through 7 for the remaining linguistic components of the language—thesaurus file, language (locale), word breaker and stemmer. The values to register those components for the Danish, Polish, or Turkish language are provided below.

Values for Danish

Repeat steps 2 through 7 to add each set of values listed below, replacing the language-specific value type (step 2), value name (steps 3 and 5), and value data (step 6) for each value.

Value type for step 2

Value names for steps 3 and 5

Value type for step 6

String value

TsaurusFile

tsdan.xml

DWORD value

Locale

00000406

String value

WBreakerClass

{16BC5CE4-2C78-4CB9-80D5-386A68CC2B2D}

string value

StemmerClass

{83BC7EF7-D27B-4950-A743-0F8E5CA928F8}

Values for Polish

For the Polish language, follow the steps outlined above, using the values listed below. Select the registry key you entered for Polish in Stage 2 above. For the first instance of the SQL Server, this would be: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSearch\Language\plk

Complete steps 2 through 7 to add each set of values listed below, replacing the language-specific value type (step 2), value name (steps 3 and 5), and value data (step 6) for each value.

Value type for step 2

Value names for steps 3 and 5

Value data for step 6

String value

TsaurusFile

tsplk.xml

DWORD value

Locale

00000415

String value

WBreakerClass

{CA665B09-4642-4C84-A9B7-9B8F3CD7C3F6}

String value

StemmerClass

{B8713269-2D9D-4BF5-BF40-2615D75723D8}

Values for Turkish

For the Turkish language, follow the steps outlined above, using the values listed below. Select the registry key you entered for Turkish in Stage 2 above. For the first instance of SQL Server, this would be: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSearch\Language\trk

Complete steps 2 through 7 to add each set of values listed below, replacing the language-specific value type (step 2), value name (steps 3 and 5), and value data (step 6) for each value.

Value type for step 2

Value names for steps 3 and 5

Value data for step 6

String value

TsaurusFile

tstrk.xml

DWORD value

Locale

0000041f

String value

WBreakerClass

{8DF412D1-62C7-4667-BBEC-38756576C21B}

String value

StemmerClass

{23A9C1C3-3C7A-4D2C-B894-4F286459DAD6}

After you load third-party word breakers, you need to refresh the list of LCIDs that are supported for full-text indexing and querying. To refresh this list, use the sp_fulltext_service system stored procedure to perform the following steps:

  1. Load newly installed word breakers and filters in the server instance, as follows:

    EXEC sp_fulltext_service @action='load_os_resources', @value=1;
    
  2. Update the list of languages, as follows:

    exec sp_fulltext_service 'update_languages';
    

The languages of the newly-loaded word breakers will now be listed by the sys.fulltext_languages catalog view.