Data Points

Updating Data in Linked Servers, Information Schema Views, and More

John Papa

Code download available at:DataPoints0411.exe(115 KB)

Every day a developer somewhere needs to write code to iterate through SQL Server™ system objects, query and update tables in linked servers, handle optimistic concurrency, and retrieve column and stored procedure metadata. In this month's column, I will address these and other T-SQL development scenarios based on some of the questions I most frequently receive from readers.

Q The app I am working on requires that any time a value is modified in our sales table, a copy of the old row is inserted into a history table. I created an AFTER UPDATE trigger to handle this, but how do I get it to work if the history table is on another machine?

Q The app I am working on requires that any time a value is modified in our sales table, a copy of the old row is inserted into a history table. I created an AFTER UPDATE trigger to handle this, but how do I get it to work if the history table is on another machine?

A A trigger is a great tool for tracking changes made to rows in a table, but it needs a little help to communicate with another instance of SQL Server. When a trigger tries to modify a history table on another machine running SQL Server, it is necessary to link the originating server to the remote server using the system stored procedure sp_addlinkedserver. SQL Server handles security at the server level, which means that the bounds of the security context of the transaction in the originating server ends at that server. Thus, when the trigger in the originating server tries to insert a record into a table in the remote server, the security context must either exist on the remote server or alternative rules must be established ahead of time to handle how the transaction should log into the remote (linked) server.

A A trigger is a great tool for tracking changes made to rows in a table, but it needs a little help to communicate with another instance of SQL Server. When a trigger tries to modify a history table on another machine running SQL Server, it is necessary to link the originating server to the remote server using the system stored procedure sp_addlinkedserver. SQL Server handles security at the server level, which means that the bounds of the security context of the transaction in the originating server ends at that server. Thus, when the trigger in the originating server tries to insert a record into a table in the remote server, the security context must either exist on the remote server or alternative rules must be established ahead of time to handle how the transaction should log into the remote (linked) server.

A linked server is an agreement between an originating SQL Server and a remote SQL Server made so that the originating SQL Server can talk to the remote one. This agreement can be set up using the SQL Enterprise Manager interface or by using the sp_addlinkedserver stored proc. Once a linked server has been set up, it can be removed through Enterprise Manager or by calling the sp_dropserver stored proc. For example, to add a linked server called SQLRemote to an originating SQL Server instance called SQLOrigin, this statement could be executed from a connection to SQLOrigin:

sp_addlinkedserver 'SQLRemote'

Once the link has been established, it is important to make sure that the local SQL Server has proper security permissions to communicate with the remote instance. This can be done using the Enterprise Manager interface or through the sp_addlinkedsrvlogin stored procedure. This stored procedure can be used to map the SQL login from the originating server to a SQL login on the remote server:

sp_addlinkedsrvlogin 'SQLRemote', 'false', NULL, 'SQLTestUser', 'MyPassword'

This code snippet maps all users from the originating server to the remote server SQLRemote's SQL Server login called SQLTestUser. The first parameter indicates the remote server's name. Setting the second parameter to false indicates that the remote server login parameters should be used for the specified local login. The third parameter is used to indicate what local login to map from. If you want all local logins to map to a single remote login, then you should pass a NULL value. This means that no matter who logs in on the local server, they will be mapped to the remote login specified with the remote user and remote password parameters. The last two parameters specify the remote SQL login and password that should be used.

To map a specific local login to the remote login called SQLTestUser, this statement could be modified by changing the third parameter from NULL to a specific login name such as JoeSomebody. You can also map the current login to a specific remote login by setting the second parameter to true. This tells SQL Server that authenticated logins should use their credentials to communicate with the remote SQL Server. Many organizations either create specific logins for server mappings or use the authenticated login to map to the remote server. As you can see, there are several possible solutions so it is important to create a mapping that follows the security patterns of your organization.

The interface in SQL Enterprise Manager is pretty straightforward and can be used in place of executing sp_addlinkedsrvlogin. Figure 1 shows that if the local SQL Server login is Test2, then it will map to the remote SQL Server instance's SQLTestUser2 login. Otherwise, any other local login will map to the remote login called SQLTestUser. One word of caution: if you create a mapping using the sp_addlinkedsrvlogin stored procedure and then go to SQL Enterprise Manager to look at the mapping, be sure to refresh the linked server node. SQL Enterprise Manager caches its settings, so it is essential to refresh the node to be sure you are seeing the most accurate and up-to-date information.

Figure 1 Mapping Security for a Linked Server

Figure 1** Mapping Security for a Linked Server **

Once the mapping is established, the problem can be solved as a trigger could be created to insert a record into a remote server's table. All you have to do to specify a remote server is prefix the fully qualified table name with the SQL Server instance name (sqlserverinstance.databasename.owner.tablename). The following trigger will insert all of the inserted or updated rows from the Customers table in the local server's Northwind database into the remote server's Northwind.dbo.Customers table:

CREATE TRIGGER trCustomers_UpdateToOtherServer ON Customers AFTER UPDATE AS INSERT SQLRemote.Northwind.dbo.Customers SELECT CustomerID, CompanyName, ContactName, ContactTitle, Address, City, Region, PostalCode, Country, Phone, Fax FROM inserted GO

It will not track deleted rows.

It should be noted that when using a trigger to do a linked server query, if the remote server is down nothing that fires that trigger can work on the local server. This introduces an additional point of failure into the system. You may want to consider modifying this solution slightly to use a local staging table (populated by the trigger) and a job that periodically moves the data from the local staging table to the remote server.

Q I want to implement optimistic concurrency in my database queries by including a versioning column in each table that will signify when a row was last modified. Can you tell me about the various advantages and disadvantages to using a column of type TIMESTAMP or ROWVERSION versus a type DATETIME as the versioning column?

Q I want to implement optimistic concurrency in my database queries by including a versioning column in each table that will signify when a row was last modified. Can you tell me about the various advantages and disadvantages to using a column of type TIMESTAMP or ROWVERSION versus a type DATETIME as the versioning column?

A Both of these approaches involve adding a column to a table. If the tracking column I add is called LastUpdate, then I must first add the column to the table using the ALTER TABLE data definition language (DDL) statement with one of the following two statements shown here:

-- Add a DATETIME tracking column ALTER TABLE Customers ADD LastUpdate DATETIME -- Add a TIMESTAMP tracking column ALTER TABLE Customers ADD LastUpdate ROWVERSION

A Both of these approaches involve adding a column to a table. If the tracking column I add is called LastUpdate, then I must first add the column to the table using the ALTER TABLE data definition language (DDL) statement with one of the following two statements shown here:

-- Add a DATETIME tracking column ALTER TABLE Customers ADD LastUpdate DATETIME -- Add a TIMESTAMP tracking column ALTER TABLE Customers ADD LastUpdate ROWVERSION

If the LastUpdate column is created as a ROWVERSION column (ROWVERSION is a synonym for the TIMESTAMP data type), then no additional code is required to keep its value updated. Columns defined as TIMESTAMP or ROWVERSION are built to automatically update their values every time the row they are associated with is updated.

In addition, the initial values of ROWVERSION columns are also set automatically. However, if the LastUpdate column is created as a DATETIME column, then some additional code is required to initialize its values, otherwise it will be NULL. It is important that the tracking column not be NULL so that its non-NULL value can be retrieved and then compared in an UPDATE statement's WHERE clause. The following code could be executed immediately after the DATETIME column is created to initialize the tracking column's value for all rows:

UPDATE Customers SET LastUpdate = GETDATE()

The main advantage to using a column of type DATETIME to track the version of a row is that the value is easily readable because it is a date and time value. If you want to display this value via an application, it makes much more sense to display a date and time to a user than a binary value such as is stored in a ROWVERSION column. However, creating a versioning column as a DATETIME data type not only requires additional code to initialize its values, but it also requires additional code to update its value whenever the row is modified. The value does not get updated automatically, so a trigger could be written to keep it updated. For example, a trigger called trCustomerTest could be written to update its value to the current date and time whenever a row is updated or inserted into the Customers table using the built-in SQL Server GETDATE function, as shown in the following code:

CREATE TRIGGER trCustomerTest ON Customers AFTER UPDATE, INSERT AS UPDATE Customers SET LastUpdated = GETDATE() FROM inserted i INNER JOIN Customers c ON i.CustomerID = c.CustomerID GO

This technique requires more code and thus more maintenance. If the trigger were to be dropped, then any application expecting the value of the LastUpdate column to be a versioning column would subsequently be in trouble. The LastUpdate column would not be updated and all UPDATE statements would simply overwrite the last person's changes in a situation where two users grabbed and updated the same row. However, if this trigger is maintained well, then this technique allows an easily readable and usable value to display the date and time the row was modified.

Since the SQL Server ROWVERSION data type automatically updates its column with a new value every time a value in its row is modified, using it to help implement optimistic concurrency is very simple. It requires no additional code since it automatically initializes its value and regenerates it each time its row is modified. Furthermore, the DATETIME data type is only accurate to approximately 3.33 milliseconds, so multiple updates in fast succession could receive the same value. Thus, I prefer the ROWVERSION (TIMESTAMP) column since it doesn't involve any additional code and has fewer potential points of failure.

Q I am attempting to write a query that will retrieve a list of columns (including their data types) for a given table. How do I query the SQL Server system tables for such column info?

Q I am attempting to write a query that will retrieve a list of columns (including their data types) for a given table. How do I query the SQL Server system tables for such column info?

A Both the system tables and the information schema views can be queried to retrieve schema information directly through T-SQL. Additionally, if you are looking for specific information you can retrieve system information through some of the system stored procedures. While you can query the system tables (such as sysobjects) directly, Microsoft discourages this, offering the more flexible and documented information schema views. Using the information schema views offers a layer of protection in case system tables are changed in future releases of SQL Server.

A Both the system tables and the information schema views can be queried to retrieve schema information directly through T-SQL. Additionally, if you are looking for specific information you can retrieve system information through some of the system stored procedures. While you can query the system tables (such as sysobjects) directly, Microsoft discourages this, offering the more flexible and documented information schema views. Using the information schema views offers a layer of protection in case system tables are changed in future releases of SQL Server.

The information schema views are contained in each database within the INFORMATION_SCHEMA schema. The INFORMATION_SCHEMA.COLUMNS view exposes the columns from all objects (such as tables and views) within the current database. One of the main advantages of using the information schema views is that the column names are easily readable and mostly self-documenting, which is not the case when you query against the system tables directly.

Retrieving information from the information schema views is often easier than querying the system tables. Sometimes a single view can be queried versus having to join multiple system tables together linking columns with very nebulous names. The following SQL statement will retrieve a list of all columns and their data types from the [Order Details] table in the Northwind database using an information schema view:

-- Get all Columns SELECT c.TABLE_NAME, c.COLUMN_NAME, c.DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS c WHERE c.TABLE_NAME = 'Order Details' ORDER BY c.ORDINAL_POSITION

Not only are the columns easy to identify in these views, but they can be sorted in their ordinal position and the data types can be retrieved without joining to another view or table. The same query could be written against the system tables using the code that is shown in the following sample:

SELECT so.name AS TableName, sc.name AS ColumnName, st.name AS DataType FROM sysobjects so INNER JOIN syscolumns sc ON so.id = sc.id INNER JOIN systypes st on st.xusertype = sc.xusertype WHERE so.name = 'Order Details' ORDER BY colid

This code queries the sysobjects table which contains all system objects in the current database such as tables, views, and triggers. It then joins this table to the syscolumns table, which contains a list of all columns in the database. These tables are joined on the id field, which in the context of these two tables represents the link between the object and the column. Then these tables are joined to the systypes table using the xusertype column from the syscolumns table. This can take a little digging as the columns that the tables need to be joined on are not always obvious. Thus, it is easy to think that the systypes table might need to be joined on the column syscolumns.type or even the syscolumns.xtype. Once you experiment with both the system tables and the information schema views, it is easy to see that the views are often the quicker to develop and more maintainable (or self-documenting) solution.

The information schema views can also help identify the primary key fields in a given table. To achieve this, you can join the INFORMATION_SCHEMA.COLUMNS view to the INFORMATION_SCHEMA.KEY_COLUMN_USAGE view and then on to the INFORMATION_SCHEMA.TABLE_CONSTRAINTS view. The KEY_COLUMN_USAGE view exposes the list of columns for each table that contains keys. For example, the Order Details table has four rows in this view, two for the primary key fields of OrderID and ProductID, and two more for each foreign key field of OrderID and ProductID. This makes this view ideal to be used as a many-to-many table between the COLUMNS view and the TABLE_CONSTRAINTS view in order to show the primary key columns (or foreign key columns) for a table. The TABLE_CONSTRAINTS view shows information about the constraints, such as the type of constraint. When joined together, these three views can display a list of primary key fields for a given table using the query shown in Figure 2. One alternate way to retrieve the list of keys contained within a table is to use the system stored procedure sp_MStablekeys shown in the following line of code:

sp_MStablekeys 'Order Details'

However, I prefer to use the information schema views over this stored procedure to retrieve column information since I can more easily specify which fields to return without running through the myriad of options that the system stored procedure offers.

Figure 2 Querying Primary Keys

-- Get all PK columns SELECT c.TABLE_NAME, c.COLUMN_NAME, c.DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS c INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu ON c.TABLE_NAME = kcu.TABLE_NAME AND c.COLUMN_NAME = kcu.COLUMN_NAME INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc ON tc.TABLE_NAME = kcu.TABLE_NAME AND tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME AND tc.CONSTRAINT_TYPE = 'PRIMARY KEY' WHERE c.TABLE_NAME = 'Order Details' ORDER BY c.ORDINAL_POSITION

There are several other information schema views that can be used to access the metadata for a database schema, as well. Some of my favorites are shown in Figure 3.

Figure 3 Schema Views

Information Schema View Description
INFORMATION_SCHEMA.TABLES All tables in the database
INFORMATION_SCHEMA.TABLE_PRIVILEGES Users and the permissions to each table
INFORMATION_SCHEMA.COLUMNS All columns in the database
INFORMATION_SCHEMA.KEY_COLUMN_USAGE Key columns, their tables, and their constraint name
INFORMATION_SCHEMA.TABLE_CONSTRAINTS Information about constraints in the tables
INFORMATION_SCHEMA.ROUTINES Functions and stored procedures
INFORMATION_SCHEMA.PARAMETERS Parameters associated with functions and stored procedures

The following example returns the list of stored procedures and functions along with information about any of their parameters. The PARAMETER_MODE indicates the direction of the parameter, such as IN or OUT, while the ORDINAL_POSITION field represents the order in which the parameters exist in the object:

SELECT r.SPECIFIC_NAME, p.PARAMETER_NAME, p.PARAMETER_MODE, p.DATA_TYPE, p.CHARACTER_MAXIMUM_LENGTH FROM INFORMATION_SCHEMA.ROUTINES r INNER JOIN INFORMATION_SCHEMA.PARAMETERS p ON r.SPECIFIC_NAME = p.SPECIFIC_NAME WHERE r.SPECIFIC_NAME NOT LIKE 'dt%' ORDER BY r.SPECIFIC_NAME, p.ORDINAL_POSITION

Information schema views are very powerful tools that allow you to expose the information contained within the system tables without the complexity. The views also provide a layer of abstraction in case future releases of SQL Server change the schema of any of the system tables. If you've been developing with SQL Server for a long time and are comfortable with the system tables but have not yet explored the information schema views, you may be pleasantly surprised at how easy they are to use. Further, there are several additional information schema views available through SQL Server which expose information about domain constraints, referential constraints, and other valuable information.

Send your questions and comments for John to  mmdata@microsoft.com.

John Papa is a baseball fanatic who spends most of his summer nights rooting for the Yankees with his two little girls, wife, and faithful dog, Kadi. He has authored several books on ADO, XML, and SQL Server and can often be found speaking at industry conferences such as VSLive.