Buffer Management

The primary purpose of a SQL Server database is to store and retrieve data, so intensive disk I/O is a core characteristic of the Database Engine. And because disk I/O operations can consume many resources and take a relatively long time to finish, SQL Server focuses on making I/O highly efficient. Buffer management is a key component in achieving this efficiency. The buffer management component consists of two mechanisms: the buffer manager to access and update database pages, and the buffer cache (also called the buffer pool), to reduce database file I/O.

How Buffer Management Works

A buffer is an 8-KB page in memory, the same size as a data or index page. Thus, the buffer cache is divided into 8-KB pages. The buffer manager manages the functions for reading data or index pages from the database disk files into the buffer cache and writing modified pages back to disk. A page remains in the buffer cache until the buffer manager needs the buffer area to read in more data. Data is written back to disk only if it is modified. Data in the buffer cache can be modified multiple times before being written back to disk. For more information, see Reading Pages and Writing Pages.

When SQL Server starts, it computes the size of virtual address space for the buffer cache based on a number of parameters such as the amount of physical memory on the system, the configured number of maximum server threads, and various startup parameters. SQL Server reserves this computed amount of its process virtual address space (called the memory target) for the buffer cache, but it acquires (commits) only the required amount of physical memory for the current load. You can query the bpool_commit_target and bpool_committed columns in the sys.dm_os_sys_info catalog view to return the number of pages reserved as the memory target and the number of pages currently committed in the buffer cache, respectively.

The interval between SQL Server startup and when the buffer cache obtains its memory target is called ramp-up. During this time, read requests fill the buffers as needed. For example, a single-page read request fills a single buffer page. This means the ramp-up depends on the number and type of client requests. Ramp-up is expedited by transforming single-page read requests into aligned eight-page requests. This allows the ramp-up to finish much faster, especially on machines with a lot of memory.

Because the buffer manager uses most of the memory in the SQL Server process, it cooperates with the memory manager to allow other components to use its buffers. The buffer manager interacts primarily with the following components:

  • Resource manager to control overall memory usage and, in 32-bit platforms, to control address space usage.

  • Database manager and the SQL Server Operating System (SQLOS) for low-level file I/O operations.

  • Log manager for write-ahead logging.

Supported Features

The buffer manager supports the following features:

  • The buffer manager is non-uniform memory access (NUMA) aware. Buffer cache pages are distributed across hardware NUMA nodes, which allows a thread to access a buffer page that is allocated on the local NUMA node rather than from foreign memory. For more information, see How SQL Server Supports NUMA. To understand how pages of memory from the buffer cache are assigned when using NUMA, see Growing and Shrinking the Buffer Pool Under NUMA.

  • The buffer manager supports Hot Add Memory, which allows users to add physical memory without restarting the server. For more information, see Hot Add Memory.

  • The buffer manager supports dynamic memory allocation on Microsoft Windows XP 32-bit and Windows 2003 32-bit platforms when AWE is enabled. Dynamic memory allocation allows the Database Engine to efficiently acquire and release memory in the buffer cache to support the current workload. For more information see, Dynamic Memory Management.

  • The buffer manager supports large pages on 64-bit platforms. The page size is specific to the version of Windows. For more information, see the Windows documentation.

  • The buffer manager provides additional diagnostics that are exposed through dynamic management views. You can use these views to monitor a variety of operating system resources that are specific to SQL Server. For example, you can use the sys.dm_os_buffer_descriptors view to monitor the pages in the buffer cache. For more information, see SQL Server Operating System Related Dynamic Management Views (Transact-SQL).

Disk I/O

The buffer manager only performs reads and writes to the database. Other file and database operations such as open, close, extend, and shrink are performed by the database manager and file manager components.

Disk I/O operations by the buffer manager have the following characteristics:

  • All I/Os are performed asynchronously, which allows the calling thread to continue processing while the I/O operation takes place in the background.

  • All I/Os are issued in the calling threads unless the affinity I/O option is in use. The affinity I/O mask option binds SQL Server disk I/O to a specified subset of CPUs. In high-end SQL Server online transactional processing (OLTP) environments, this extension can enhance the performance of SQL Server threads issuing I/Os.

  • Multiple page I/Os are accomplished with scatter-gather I/O, which allows data to be transferred into or out of noncontiguous areas of memory. This means that SQL Server can quickly fill or flush the buffer cache while avoiding multiple physical I/O requests. 

Long I/O Requests

The buffer manager reports on any I/O request that has been outstanding for at least 15 seconds. This helps the system administrator distinguish between SQL Server problems and I/O subsystem problems. Error message 833 is reported and appears in the SQL Server error log as follows:

SQL Server has encountered %d occurrence(s) of I/O requests taking longer than %d seconds to complete on file [%ls] in database [%ls] (%d). The OS file handle is 0x%p. The offset of the latest long I/O is: %#016I64x.

A long I/O may be either a read or a write; it is not currently indicated in the message. Long-I/O messages are warnings, not errors. They do not indicate problems with SQL Server. The messages are reported to help the system administrator find the cause of poor SQL Server response times more quickly, and to distinguish problems that are outside the control of SQL Server. As such, they do not require any action, but the system administrator should investigate why the I/O request took so long, and whether the time is justifiable.

Causes of Long-I/O Requests

A long-I/O message may indicate that an I/O is permanently blocked and will never complete (known as lost I/O), or merely that it just has not completed yet. It is not possible to tell from the message which scenario is the case, although a lost I/O will often lead to a latch time-out.

Long I/Os often indicate a SQL Server workload that is too intense for the disk subsystem. An inadequate disk subsystem may be indicated when:

  • Multiple long I/O messages appear in the error log during a heavy SQL Server workload.

  • Perfmon counters show long disk latencies, long disk queues, or no disk idle time.

Long I/Os may also be caused by a component in the I/O path (for example, a driver, controller, or firmware) continually postponing servicing an old I/O request in favor of servicing newer requests that are closer to the current position of the disk head. The common technique of processing requests in priority based upon which ones are closest to the current position of the read/write head is known as "elevator seeking." This may be difficult to corroborate with the Windows System Monitor (PERFMON.EXE) tool because most I/Os are being serviced promptly. Long I/O requests can be aggravated by workloads that perform large amounts of sequential I/O, such as backup and restore, table scans, sorting, creating indexes, bulk loads, and zeroing out files.

Isolated long I/Os that do not appear related to any of the previous conditions may be caused by a hardware or driver problem. The system event log may contain a related event that helps to diagnose the problem.

Error Detection

Database pages can use one of two optional mechanisms that help insure the integrity of the page from the time it is written to disk until it is read again: torn page protection and checksum protection. These mechanisms allow an independent method of verifying the correctness of not only the data storage, but hardware components such as controllers, drivers, cables, and even the operating system. The protection is added to the page just before writing it to disk, and verified after it is read from disk.

Torn Page Protection

Torn page protection, introduced in SQL Server 2000, is primarily a way of detecting page corruptions due to power failures. For example, an unexpected power failure may leave only part of a page written to disk. When torn page protection is used, a 2-bit signature is placed at the end of each 512-byte sector in the page (after having copied the original two bits into the page header). The signature alternates between binary 01 and 10 with every write, so it is always possible to tell when only a portion of the sectors made it to disk: if a bit is in the wrong state when the page is later read, the page was written incorrectly and a torn page is detected. Torn page detection uses minimal resources; however, it does not detect all errors caused by disk hardware failures.

Checksum Protection

Checksum protection, introduced in SQL Server 2005, provides stronger data integrity checking. A checksum is calculated for the data in each page that is written, and stored in the page header. Whenever a page with a stored checksum is read from disk, the database engine recalculates the checksum for the data in the page and raises error 824 if the new checksum is different from the stored checksum. Checksum protection can catch more errors than torn page protection because it is affected by every byte of the page, however, it is moderately resource intensive. When checksum is enabled, errors caused by power failures and flawed hardware or firmware can be detected any time the buffer manager reads a page from disk.

The kind of page protection used is an attribute of the database containing the page. Checksum protection is the default protection for databases created in SQL Server 2005 and later. The page protection mechanism is specified at database creation time, and may be altered by using ALTER DATABASE. You can determine the current page protection setting by querying the page_verify_option column in the sys.databases catalog view or the IsTornPageDetectionEnabled property of the DATABASEPROPERTYEX function. If the page protection setting is changed, the new setting does not immediately affect the entire database. Instead, pages adopt the current protection level of the database whenever they are written next. This means that the database may be composed of pages with different kinds of protection.