Common Driver Reliability Issues

08/30/2006

Microsoft Corporation

June 2004

The current version of this information is maintained at https://www.microsoft.com/whdc/driver/security/drvqa.mspx.

Applies to:
   Microsoft Windows 98 / Windows Me
   Microsoft Windows 2000
   Microsoft Windows XP
   Microsoft Windows Server 2003
   Microsoft Windows codenamed "Longhorn"

Summary: This paper provides information about writing drivers for the Microsoft Windows family of operating systems. It describes a number of common errors and suggests how driver writers can find, correct, and prevent such errors. (37 printed pages)

Introduction
User-Mode Addresses in Kernel-Mode Code
Driver I/O Methods and Their Tradeoffs
Failing to Check Buffer Sizes in Buffered IOCTLs and FSCTLs
Returning Data in Uninitialized Bytes
Failing to Validate Variable-Length Buffers
Device State Validation
Cleanup and Close Routines
Device Control Routines
Synchronization
Shared Access
Locks and Disabling APCs
Handle Validation
Requests to Create and Open Files and Devices
Driver Unload Routines
Pageable Drivers and DPCs
User-Mode APIs
StartIo Recursion
Passing and Completing IRPs
Odd-length Unicode Buffers
Pool Allocation in Low Memory
Call to Action and Resources

Introduction

Drivers occupy a significant portion of the total code base executed in kernel mode. Consequently, efforts to improve the reliability and security of the system must address this large code base.

This paper describes a variety of problems commonly seen in drivers, often with code that shows typical errors, and how to fix them. The code has been edited for brevity.

This paper is for developers who are writing kernel-mode drivers. The information in this paper applies for the Microsoft Windows family of operating systems.

User-Mode Addresses in Kernel-Mode Code

When providing services to user-mode code, drivers and other kernel-mode components usually receive and return data in buffers. To avoid corruption of data, disclosure of sensitive or security-critical data, or exceptions that cannot be handled by the try/except mechanism, kernel components must ensure that each data pointer they receive from user mode is a valid user-mode pointer. This operation is called probing.

Drivers must obey the following rules when handing pointers obtained from user mode:

Probe all user-mode pointers before referencing them.

To probe a pointer, use the macro ProbeForRead or ProbeForWrite, or the memory management routine MmProbeAndLockPages.
Enclose all references to user-mode pointers in try/except blocks. The mapping of user-mode memory can change at any instant for various reasons, such as address space deletion, protection change, or decommit. Therefore, any reference to a user-mode pointer could raise an exception.
Assume that user-mode pointers can be aligned on any boundary.
Be prepared for changes to the contents of user-mode memory at any time; another user-mode thread in the same process might change it. Drivers must not use user-mode buffers as temporary storage, or expect the results of double fetches to yield the same results the second time.
Validate all data received from user-mode code.

Handling user-mode pointers incorrectly can result in the following:

Crashes caused by references to portions of the kernel address space that the Memory Manager considers reserved. It is a serious error for any driver to reference such address space.
Crashes caused by references to input/output (I/O) space, if the architecture uses memory-mapped device registers. Such references (reads and writes) can also have negative effects on the device itself.
Disclosure of sensitive data if the caller passes a pointer to an area that is unreadable by user mode, then observes the driver's responses or return values to deduce the contents of the protected location.
Corruption of kernel data structures by writing to arbitrary kernel addresses, which can cause crashes or compromise security.

Probing

To understand when probing is necessary, consider the following sample routines, SetUserData and GetUserData. These samples represent fictitious system service routines, but could also be driver routines keyed on input/output control (IOCTL) or file system control (FSCTL) values; the only difference is that the driver code is more complicated. These routines show a situation in which probing is necessary. To simplify the example, the sample routines do not include locks to prevent race conditions and similar details that normally should be present in such code.

Example Routines That Do Not Use Probing

SetUserData receives a buffer from user mode and copies it to a global location. This routine represents any kernel component that receives data from user mode.

VOID
SetUserData (
    IN PWCHAR DataPtr,    // Pointer from user mode
    IN ULONG DataLength
    )
{
      //
      // Truncate data if it's too big.
      //
      if (DataLength > MAX_DATA_LENGTH) {
          DataLength = MAX_DATA_LENGTH;
      }

      //
      // Copy user buffer to global location.
      //
      memcpy (InternalStructure->UserData, DataPtr, DataLength);
      InternalStructure->UserDataLength = DataLength;
}

GetUserData returns to the caller data that was previously set with SetUserData:
ULONG
GetUserData (
    IN PWCHAR DataPtr, // Pointer from user mode
    IN ULONG DataLength
    )
{
      //
      // Truncate data if it's too big.
      //
      if (DataLength > InternalStructure->UserDataLength) {
           DataLength = InternalStructure->UserDataLength;
      }   

      memcpy (DataPtr, InternalStructure->UserData, DataLength);

      return DataLength;
}

Problems Caused by Failing to Probe

In the examples in the previous section, both SetUserData and GetUserData fail to validate DataPtr. If the pointer is invalid, the caller could cause a system crash, thus compromising operating system integrity. If the pointer specifies a memory address that the caller does not have the right to read, the caller might also be able to deduce the contents of that address. Because the operating system maintains data for all processes in global pool addresses, a caller could pass an invalid pointer and then inspect the returned data for passwords or program output text strings generated by operating system users.

Routines that have pointer validation problems like these could easily be used to compromise system security. A hostile program could repeatedly call SetUserData with kernel addresses, followed by calls to GetUserData to retrieve the contents of the kernel address space. The program could then look for interesting data that is private to other users of the system, such as cached file data for files to which the caller has no access. In this situation, the kernel returns data that the caller has no permission to see.

In addition, reading certain kernel addresses can cause unwanted side effects. For example, some addresses are pageable but should be paged only within certain process contexts, such as thread stacks; in other contexts, a bug check can occur. Also, certain device registers may be mapped into virtual memory. Reading from memory locations that are mapped this way directly affects the hardware. For example, reading from a control register of a programmed I/O device might cause the device to lose incoming data, or might start or stop the device.

Example Routines That Use Probing

Both SetUserData and GetUserData must validate every user-mode pointer. The following shows correct code for SetUserData, which probes user-mode pointers before accessing them.

VOID
SetUserData (
    IN PWCHAR DataPtr, // Pointer from user mode
    IN ULONG DataLength
    )
{
  //
  // Truncate data if it's too big.
  //
  if (DataLength > MAX_DATA_LENGTH) {
     DataLength = MAX_DATA_LENGTH;
  }

  //
  // Copy user buffer to global location.
  //
  try { 
       ProbeForRead( DataPtr,
                      DataLength,
                      TYPE_ALIGNMENT( WCHAR ));
       memcpy (InternalStructure->UserData, 
               DataPtr, DataLength);
       InternalStructure->UserDataLength = DataLength;
  } except( EXCEPTION_EXECUTE_HANDLER ) {
  // Use GetExceptionCode() to return an error to the     
  // caller.
 }
}

The correct code validates the pointer at DataPtr by calling the macro ProbeForRead in a try/except block.

The following shows the corrected code for GetUserData.

VOID
GetUserData (
    IN PWCHAR DataPtr, // Pointer from user mode
    IN ULONG DataLength
    )
{
  //
  // Truncate data if it's too big.
  //
  if (DataLength > InternalStructure->UserDataLength) {
       DataLength = InternalStructure->UserDataLength;
  }

  
  try {
       ProbeForWrite( DataPtr,
                      DataLength,
                      TYPE_ALIGNMENT( WCHAR ));
       memcpy (DataPtr, InternalStructure->UserData,
               DataLength);
       InternalStructure->UserDataLength = DataLength;
      } except( EXCEPTION_EXECUTE_HANDLER ){
  // Use GetExceptionCode() to return an error to the     
  // caller.
  
   DataLength=0;
   }
  return DataLength;
}

The correct code validates the pointer at DataPtr by calling the macro ProbeForWrite in a try/except block.

Addresses Passed in METHOD_NEITHER IOCTLs and FSCTLs

The I/O Manager does not validate user-mode addresses passed in METHOD_NEITHER IOCTLs and FSCTLs. To ensure that such addresses are valid, the driver must use the ProbeForRead and ProbeForWrite macros, enclosing all buffer references in try/except blocks.

In the following example, the driver does not validate the address passed in the Type3InputBuffer.

case IOCTL_GET_HANDLER: {
      PULONG EntryPoint;

      EntryPoint =
         IrpSp->Parameters.DeviceIoControl.Type3InputBuffer;
      *EntryPoint = (ULONG) DriverEntryPoint;

The following code correctly validates the address and avoids this problem.

case IOCTL_GET_HANDLER: {
      PULONG_PTR EntryPoint;

      EntryPoint =
         IrpSp->Parameters.DeviceIoControl.Type3InputBuffer;
                 
      try {                 
           if (Irp->RequestorMode != KernelMode) { 
           ProbeForWrite(EntryPoint,
                         sizeof( ULONG_PTR ),
                         TYPE_ALIGNMENT( ULONG_PTR ));
          }
      *EntryPoint = (ULONG_PTR)DriverEntryPoint;

      } except( EXCEPTION_EXECUTE_HANDLER ) {
...

Note also that the correct code casts DriverEntryPoint to a ULONG_PTR, instead of a ULONG. This change allows for use of this code in a 64-bit Windows environment.

Pointers Embedded in Buffered I/O Requests

Drivers must similarly validate pointers that are embedded in buffered I/O requests. In the following example, the structure member at arg is an embedded pointer.

struct ret_buf {
   void   *arg; // Pointer embedded in request
   int     rval;
   };

pBuf = Irp->AssociatedIrp.SystemBuffer;
   ...
arg = pBuf->arg; // Fetch the embedded pointer
   ...
// If the pointer is invalid, 
// this statement can corrupt the system.
RtlMoveMemory(arg, &info, sizeof(info));

In this example, the driver should validate the embedded pointer by using the ProbeXxx macros enclosed in a try/except block, in the same way as for the METHOD_NEITHER IOCTLs described in the preceding section. Although embedding a pointer allows a driver to return extra information, a driver can more efficiently achieve the same result by using a relative offset or a variable length buffer.

Using Handles in User Context

Drivers often manipulate objects using handles, which can come from user mode or kernel mode. If the driver is running in system context, it can safely create and use handles because all threads within the system process are trusted. When running in user context, however, a driver must use handles with care.

Drivers should not create or pass handles to ZwXxx routines. These functions translate to calls to user-mode system services. Another thread in the process can change such handles at any instant. Using or creating handles within a user's process makes the driver vulnerable to problems, as the following example shows.

status = IoCreateFile(&handle,
                      DesiredAccess,
                      &objectAttributes,
                      &ioStatusBlock,
                      NULL,
                      FILE_ATTRIBUTE_NORMAL,
                      FILE_SHARE_READ,
                      FILE_OPEN,
                      0,
                      NULL,
                      0,
                      CreateFileTypeNone,
                      NULL,
                      IO_NO_PARAMETER_CHECKING);

if ( NT_SUCCESS(status) ) {
   status = ObReferenceObjectByHandle(handle, 
                      0,
                      NULL,
                      KernelMode,
                      &ccb->FileObject,
                      &handleInformation);

By the time ObReferenceObjectByHandle is called, the value of handle might have changed if:

Another thread closed and reopened the handle.
Another thread suspended the first thread and then successively created objects until it received the same handle value back again.

Similarly, handles received from user mode in other ways—for example, in a buffered I/O request—should not be passed to ZwXxx routines. Doing so makes a second transition into the kernel. When the ZwXxx routine runs, the previous processor mode is kernel; all access checks (even those against granted access masks of handles) are disabled. If a caller passes in a read-only handle to a file it lacks permission to write, and the driver then calls ZwWriteFile with the handle, the write will succeed. Similarly, calls to ZwCreateFile or ZwOpenFile with file names provided to the driver will successfully create or open files that should be denied to the caller.

Drivers can use the OBJ_FORCE_ACCESS_CHECK and OBJ_KERNEL_HANDLE flags in the OBJECT_ATTRIBUTES structure to safely use handles to manipulate objects. To set these flags, a driver calls InitializeObjectAttributes with the handle before creating the object.

The OBJ_FORCE_ACCESS_CHECK flag causes the system to perform all access checks on the object being opened. Handles created with OBJ_KERNEL_HANDLE can be accessed only in kernel mode. Drivers should use kernel-mode handles only when necessary, however; use of such handles can affect system performance, because Object Manager calls that use kernel handles attach to the system process. In addition, quota charges are made against the system process, and not against the original caller.

Driver I/O Methods and Their Tradeoffs

Drivers can use the following I/O methods:

Buffered I/O
Direct I/O
Neither buffered nor direct I/O (METHOD_NEITHER I/O)

In general, performance improves by moving from buffered I/O to direct I/O and from direct I/O to METHOD_NEITHER I/O, because the I/O Manager does less for the driver. The driver must do more work to validate requests, however, as the higher performing methods often require significantly more validation code to ensure that the driver is robust.

Buffered I/O

Buffered I/O requests are typically used by interfaces that require small transfer sizes or are called infrequently.

To handle a buffered I/O request, the I/O Manager:

Validates the user buffer pointers passed to it.
Allocates new buffers from non-paged pool for the input data.
Copies the user data to these newly allocated buffers.

The driver operates only on the buffers allocated by the I/O Manager, and not on the buffers allocated by the caller. The driver is therefore not required to validate the buffer pointers or handle exceptions if the caller's address space becomes invalid.

Buffers allocated by the I/O Manager have the same alignment as allocated pool (8-byte alignment on the 32-bit systems). Consequently, the driver is not required to check for valid buffer alignment. The driver must validate the size and contents of the data, however. The data cannot change asynchronously because user-mode processes do not have access to the buffers.

After the driver completes a buffered I/O request, the I/O Manager executes an asynchronous procedure call (APC) to return to the original process context. The I/O Manager then copies data from the buffers written by the driver to the caller's user-space output buffer.

Failing to check the size of buffers is perhaps the most common driver error in buffered I/O. This error can occur in many contexts, but is particularly troublesome in the following cases:

Failing to check buffer sizes in buffered IOCTLs and FSCTLs.
Returning data in uninitialized bytes.
Failing to validate variable-length buffers.

These cases are discussed in more detail in the sections that follow.

Failing to Check Buffer Sizes in Buffered IOCTLs and FSCTLs

When handling buffered IOCTLs and FSCTLs, a driver should always check the sizes of the input and output buffers to ensure that the buffers can hold all the requested data. If the RequiredAccess bits in the request specify FILE_ANY_ACCESS, as most driver IOCTLs and FSCTLs do, any caller that has a handle to the device also has access to buffered IOCTL or FSCTL requests for that device, and could read or write data beyond the end of the buffer.

For example, assume that the following code appears in a routine that is called from a dispatch routine, and that the driver has not validated the input buffer sizes passed in the IRP.

switch (ControlCode)
   ...
   ...
   case IOCTL_NEW_ADDRESS:{
      tNEW_ADDRESS *pNewAddress = 
        pIrp->AssociatedIrp.SystemBuffer;

        pDeviceContext->Addr = ntohl (pNewAddress->Address);
        ...

The example does not check the buffer sizes before assigning a new value to pDeviceContext->Addr. As a result, the reference to pNewAddress->Address can cause a fault if the input buffer is not big enough to contain a tNEW_ADDRESS structure.

The following code checks the buffer sizes, avoiding the potential problem.

case IOCTL_NEW_ADDRESS: {
   tNEW_ADDRESS *pNewAddress =
     pIrp->AssociatedIrp.SystemBuffer;

  if (pIrpSp->Parameters.DeviceIoControl.InputBufferLength >=
       sizeof(tNEW_ADDRESS)){
         pDeviceContext->Addr = ntohl (pNewAddress->Address);
...

Code that handles other buffered I/O, such as WMI requests that use variable size buffers, can have similar errors.

Output buffer problems are similar to input buffer problems. They can easily corrupt the memory pool, and user-mode callers might be unaware that any error has occurred.

In the following example, the driver fails to check the size of the output buffer at SystemBuffer.

case IOCTL_GET_INFO: {

    Info = Irp->AssociatedIrp.SystemBuffer;

    Info->NumIF = NumIF;
    ...
    ...
    Irp->IoStatus.Information =
         NumIF*sizeof(GET_INFO_ITEM)+sizeof(ULONG);
    Irp->IoStatus.Status = ntStatus;
   }

Assuming that the NumIF field of the system buffer specifies the number of input items, this example can set IoStatus.Information to a value larger than the output buffer and thus return too much information to the user-mode caller. The preceding code could corrupt the memory pool by writing beyond the end of the system buffer.

Keep in mind that the I/O Manager does not validate the value in the Information field. The driver must check the output buffer size. If a caller passes a valid kernel-mode address for the output buffer and a buffer size of zero bytes, serious errors can occur.

Returning Data in Uninitialized Bytes

Drivers should initialize all output buffers with zeros before returning them to the caller. Failing to initialize a buffer can result in the inadvertent return of kernel-mode data in uninitialized bytes.

In the following example, a driver fails to initialize the buffer and thus unintentionally returns data in uninitialized bytes.

case IOCTL_GET_NAME: {
   ...
   ...
   outputBufferLength = 
      ioStack->Parameters.DeviceIoControl.OutputBufferLength;
   outputBuffer = 
      (PGET_NAME) Irp->AssociatedIrp.SystemBuffer;
  
   if (outputBufferLength >= sizeof(GET_NAME)) {
      length = outputBufferLength - sizeof(GET_NAME);
      ntStatus = IoGetDeviceProperty(
                  DeviceExtension->PhysicalDeviceObject,
                  DevicePropertyDriverKeyName,
                  length,
                  outputBuffer->DriverKeyName,
                  &length);

      outputBuffer->ActualLength = length + sizeof(GET_NAME);
      Irp->IoStatus.Information = outputBufferLength; 
   } else {
     ntStatus = STATUS_BUFFER_TOO_SMALL;
   }

Setting IoStatus.Information to the output buffer size causes the whole output buffer to be returned to the caller. The I/O Manager does not initialize the data beyond the size of the input buffer—the input and output buffers overlap for a buffered request. Because the system support routine IoGetDeviceProperty does not write the entire buffer, this IOCTL returns uninitialized data to the caller.

Some drivers use the Information field to return codes that provide extra details about I/O requests. Before doing so, such drivers should check the IRP flags to ensure that IRP_INPUT_OPERATION is not set. When this flag is not set, the IOCTL or FSCTL does not have an output buffer, so the Information field does not supply a buffer size. In this case, the driver can safely use the Information field to return its own code.

Failing to Validate Variable-Length Buffers

Drivers should always validate variable-length buffers. Failure to do so can cause integer underflows and overflows.

Drivers often use input buffers with fixed headers and trailing variable-length data, as in the following example.

typedef struct _WAIT_FOR_BUFFER {
   LARGE_INTEGER Timeout;
   ULONG NameLength;
   BOOLEAN TimeoutSpecified;
   WCHAR Name[1];
   } WAIT_FOR_BUFFER, *PWAIT_FOR_BUFFER;

if (InputBufferLength < sizeof(WAIT_FOR_BUFFER)) {
    IoCompleteRequest( Irp, STATUS_INVALID_PARAMETER );
    return( STATUS_INVALID_PARAMETER );
   }

WaitBuffer = Irp->AssociatedIrp.SystemBuffer;

if (FIELD_OFFSET(WAIT_FOR_BUFFER, Name[0]) +
       WaitBuffer->NameLength > InputBufferLength) {
         IoCompleteRequest( Irp, STATUS_INVALID_PARAMETER );
         return( STATUS_INVALID_PARAMETER );
   }

Adding WaitBuffer->NameLength (a ULONG) to the offset (a LONG) can cause an integer overflow if the ULONG value is large. Instead, the driver should subtract the offset from InputBufferLength, and compare the result with WaitBuffer->NameLength, as in the following example.

if (InputBufferLength < sizeof(WAIT_FOR_BUFFER)) {
    IoCompleteRequest( Irp, STATUS_INVALID_PARAMETER );
    return( STATUS_INVALID_PARAMETER );
   }

WaitBuffer = Irp->AssociatedIrp.SystemBuffer;

if ((InputBufferLength –
     FIELD_OFFSET(WAIT_FOR_BUFFER, Name[0])  >
       WaitBuffer->NameLength) {
    IoCompleteRequest( Irp, STATUS_INVALID_PARAMETER );
    return( STATUS_INVALID_PARAMETER );
   }

The subtraction shown in the preceding example cannot cause a buffer underflow, because the first if statement ensures that the input buffer length is greater than or equal to the size of WAIT_FOR_BUFFER.

The following example shows a more complicated overflow problem.

case IOCTL_SET_VALUE:
      dwSize = sizeof(SET_VALUE);

    if(inputBufferLength < dwSize) {
       ntStatus = STATUS_BUFFER_TOO_SMALL;
       break;
    }

    dwSize = FIELD_OFFSET(SET_VALUE, pInfo[0]) +
            pSetValue->NumEntries * sizeof(SET_VALUE_INFO);

    if(inputBufferLength < dwSize) {
       ntStatus = STATUS_BUFFER_TOO_SMALL;
       break;
    }

In this example, an integer overflow can occur during multiplication. If the size of the SET_VALUE_INFO structure is a multiple of two, a NumEntries value such as 0x80000000 results in an overflow, when the bits are shifted left during multiplication. The buffer size passes the validation test, however, because the overflow causes dwSize to contain a small number. To avoid this problem, subtract the buffer lengths as shown in the previous example, then divide by sizeof(SET_VALUE_INFO) and compare the result with NumEntries to ensure that the buffer is the correct size.

Direct I/O

Drivers for devices that can transfer large amounts of data at a time, such as mass storage devices, typically use direct I/O. To handle a direct I/O request, the I/O Manager allocates the input buffer from non-paged pool and, if the length of the buffer is nonzero, creates a memory descriptor list (MDL) to map the output buffer. For an input request, the I/O Manager checks the output buffer for read access; for an output request, it checks the buffer for write access.

Drivers access the output buffer by calling the MmGetSystemAddressForMdlSafe macro to map the MDL into a system address range. This system address range contains the same physical pages as the original user buffer, but is unaffected by virtual address changes in the calling application. Drivers can therefore rely on the address to remain valid.

Because the user's address space is doubly mapped to the system address range, two different virtual addresses have the same physical address. The following consequences of double mapping can sometimes cause problems for drivers:

The offset into the virtual page of the user's address becomes the offset into the system page.

Access beyond the end of the system buffer might go unnoticed for long periods of time depending on the page granularity of the mapping. Unless a caller's buffer is allocated near the end of a page, data written beyond the end of the buffer will nevertheless appear in the buffer, and the caller will be unaware that any error has occurred. If the end of the buffer coincides with the end of a page, the system virtual addresses beyond the end could point to anything, or could be invalid. Such problems can be extremely difficult to find.
If the calling process has another thread that modifies the user's mapping of the memory, the contents of the system buffer will change when the user's memory mapping changes.

In this situation, using the system buffer to store scratch data can cause problems. Two fetches from the same memory location might yield different values.

In addition, during read requests, drivers must not write to mapped areas that they have locked for read access. Inadvertently writing to an area that is locked for read access could allow a user-mode application to corrupt the global system state.

The most common direct I/O problem is incorrectly handling zero-length buffers. Because the I/O Manager does not create MDLs for zero-length transfers, a zero-length buffer results in a NULL value at Irp->MdlAddress. If a driver passes a NULL MdlAddress to MmGetSystemAddressForMdlSafe, mapping fails and the macro returns NULL. Drivers should always check for a NULL return value before attempting to use the returned address.

The following code snippet shows one possible error in direct I/O. The example receives a string in a direct I/O request, and then tries to convert that string to uppercase characters.

PWCHAR  PortName = NULL;

PortName = (PWCHAR)MmGetSystemAddressForMdlSafe 
                   (irp->MdlAddress, NormalPagePriority);

//
// Null-terminate the PortName so that RtlInitUnicodeString // will not be invalid.
//
PortName[Size / sizeof(WCHAR) - 1] = UNICODE_NULL;

RtlInitUnicodeString(&AdapterName, PortName);

Because the buffer might not be correctly formed, the code attempts to force a Unicode NULL as the last buffer character. If the underlying physical memory is doubly mapped to both a user-mode and a kernel-mode address, another thread in the process can overwrite the buffer as soon as this write operation completes. If the UNICODE NULL character is not present, however, the call to RtlInitUnicodeString can exceed the range of the buffer and, if it falls outside the system mapping, possibly cause a bug check.

If a driver creates and maps its own MDL, it must access the MDL only with the method for which it has probed. When the driver calls MmProbeAndLockPages, it specifies an access method (IoReadAccess, IoWriteAccess, or IoModifyAccess). If the driver specifies IoReadAccess, it must not attempt to write to the system buffer made available by MmGetSystemAddressForMdlSafe.

Further problems can occur in direct I/O paths when resources are unavailable. If insufficient system page table entries (PTE) are available, MmGetSystemAddressForMdlSafe fails and returns NULL.

Note Microsoft Windows 98 does not support MmGetSystemAddressForMdlSafe. In a WDM driver that must run on Windows 98, call MmGetSystemAddressForMdl, setting the MDL_MAPPING_CAN_FAIL MDL flag in the MdlFlags member of the MDL structure. MmGetSystemAddressForMdl is obsolete on Windows Me, Windows 2000, and all later releases.

Neither Buffered nor Direct I/O (METHOD_NEITHER)

When handling a METHOD_NEITHER I/O request, the I/O Manager does not validate the supplied buffer pointers and lengths. Drivers must validate the pointers, lengths, and alignment by probing. Drivers must also use try/except blocks around each access to the user buffer to handle any exceptions that might occur.

The driver must also manage the buffers and memory operations by itself. When possible, the driver should perform all operations on the buffer directly within the context of the calling thread. When running outside this context, the driver must use MmProbeAndLockPages to double-map and lock down the buffer, thus preventing asynchronous changes to the data.

Some file-system drivers and network transport drivers define IOCTLs for fast I/O. Fast I/O, which uses METHOD_NEITHER, involves transferring data directly between user buffers and the system cache. Because the data in the user buffers can change asynchronously, fast I/O dispatch routines can be difficult to code. All references to user buffers must be enclosed in try/except blocks, and all METHOD_NEITHER buffers must be probed.

If a driver allocates resources in a fast I/O path, the driver must subsequently release those resources if an exception occurs while referencing user-mode memory. Failing to release resources in such situations is a common driver error.

For most fast I/O paths, the I/O Manager calls the fast I/O dispatch routine from within a try/except block. A driver that allocates resources in a fast I/O path must include an exception handler in its fast I/O dispatch routine. A driver that performs fast I/O and access user-mode memory, but does not allocate resources in the fast I/O path, should include an exception handler in its fast I/O dispatch routine. It is not required to do so, however.

Device State Validation

In addition to validating pointers, drivers should validate device state in both the checked and free builds.

In the following example, the driver uses the ASSERT macro to check for the correct device state in the checked build, but does not check the device state in the free build.

case IOCTL_WAIT_FOR_EVENT:

     ASSERT((!Extension->WaitEventIrp));
     Extension->WaitEventIrp = Irp;
     IoMarkIrpPending(Irp);
     status = STATUS_PENDING;

In the checked build, if the driver already holds the IRP pending the system will assert. In the free build, however, the driver does not check for this condition. Two calls to the same IOCTL cause the driver to lose track of an IRP.

On a multiprocessor system, this code fragment might cause additional problems. Assume that on entry, the routine that includes this code has ownership of (the right to manipulate) the IRP. When the routine saves the Irp pointer in the global structure at Extension->WaitEventIrp, another thread can read the IRP address from that global structure and perform operations on the IRP. To prevent this problem, the driver should mark the IRP pending before it saves the IRP, and should include both the call to IoMarkIrpPending and the assignment in an interlocked sequence. A Cancel routine for the IRP might also be necessary.

Cleanup and Close Routines

Driver writers must not confuse the tasks required in DispatchCleanup and DispatchClose routines.

The I/O Manager calls a driver's DispatchCleanup routine when the last handle to a file object is closed. A cleanup request indicates that an application is being terminated, or has closed a file handle for the file object that represents the driver's device object. The I/O Manager still holds a reference to the file object, however. The I/O Manager calls the DispatchClose routine when the last reference is released from the file object.

The DispatchCleanup routine should cancel any IRPs that are currently queued to the target device for the file object, but must not free resources that are attached to the file object or that might be used by other Dispatch routines. Because the I/O Manager holds a reference to the file object, a driver can receive I/O requests for a file object after its DispatchCleanup routine has been called, but before its DispatchClose routine is called.

For example, a user-mode caller might close the file handle while an I/O Manager request is in progress from another thread. If the driver deletes or frees necessary resources before the I/O Manager calls its DispatchClose routine, invalid pointer references and other problems could occur.

Device Control Routines

The following errors are common in DispatchDeviceControl routines, which handle IOCTLs:

Breaking apart IOCTL and FSCTL values.
Converging code paths for public and private IOCTLs.
Checking only the requestor mode to validate IOCTL or FSCTL IRPs.

Breaking Apart IOCTL and FSCTL Values

A driver must use the full value of the IOCTL control code, and not a subset of the bits, in its dispatch routine. Access checks and the IOCTL method are encoded into the control code. Ignoring the values of these bit fields could make the driver vulnerable to other unvalidated IOCTL routes. For example,

IoControlCode =
      pIrpStack->Parameters.DeviceIoControl.IoControlCode;
ControlCode   = (IoControlCode >> 2) & 0x00000FFF;

pCmd = pIrp -> AssociatedIrp.SystemBuffer;

switch (ControlCode) {
      case IOCTL_SET_TIMEOUT:
           pTimeOut = pIrp -> AssociatedIrp.SystemBuffer;
           *pTimeOut = InterlockedExchange(
                           &pde->TimeOutValue,
                           *pTimeOut);

This code masks both the calling method and access bits before the switch statement. In this example, if the intended IOCTL required write access to the device to issue this request, a caller could execute the switch statement with a different IOCTL value that did not require write access, but matched the extracted bits. Even if this code checks the input buffer length, it cannot tell which fields of the IRP contain the input buffer unless it consults the method bits of the IOCTL.

The I/O Manager macro IoGetFunctionCodeFromCtlCode has the same problem as the preceding example. Drivers should not use this macro.

An alternative method that avoids these problems is to build an array of structures indexed by the IOCTL function code. One field of the structure might contain the dispatch routine and another field might contain the complete IOCTL or FSCTL control code to compare against the input. Using such a structure, a driver can check both the calling method and the access control bits in one compare operation.

Converging Code Paths for Public IOCTLs and Private IOCTLs

As a general rule, drivers should not contain converging execution paths for private (internal) and public IOCTLs or FSCTLs. A driver that creates private IOCTLs or FSCTLs should handle such requests separately from any public IOCTLs or FSCTLs that it also supports.

A driver cannot determine whether an IOCTL or FSCTL originated in kernel mode or user mode merely from checking the control code. Consequently, handling both along the same code path (or performing minimal validation and then calling the same routines) can open a driver to security breaches. If a private IOCTL or FSCTL is privileged, unprivileged users who know the control codes might be able to gain access to it.

Checking Only the Requestor Mode to Validate IOCTL or FSCTL IRPs

Drivers should not validate IOCTL and FSCTL requests in IRPs by checking the value of Irp->RequestorMode only. IRPs that arrive from the network and the Server service (SRVSVC) have a requestor mode of kernel, regardless of the origin of the request. A driver that relies on the previous processor mode for the thread could unintentionally use an invalid user-mode pointer without probing, or perform an operation for which the original requestor does not have the required permissions.

Instead, drivers should use the appropriate access control checks, such as FILE_READ_DATA, FILE_WRITE_DATA, and so forth.

Synchronization

On the Microsoft Windows NT, Microsoft Windows 2000, and Windows XP operating systems, drivers are multithreaded; they can receive multiple I/O requests from different threads at the same time. In designing a driver, you must assume that it will be run on a symmetric multiprocessor (SMP) system and take the appropriate measures to ensure data integrity.

Specifically, whenever a driver changes global or file object data, it must use a lock or an interlocked sequence to prevent race conditions.

In the following example, a race condition could occur when the driver accesses the global data at Data.LpcInfo.

PLPC_INFO pLpcInfo = &Data.LpcInfo; //Pointer to global data
   ...
   ...
// This saved pointer may be overwritten by another thread.
pLpcInfo->LpcPortName.Buffer = ExAllocatePool(
                                     PagedPool,
                                     arg->PortName.Length);

Multiple threads entering this code as a result of an IOCTL call could cause a memory leak when the pointer is overwritten. To avoid this problem, the driver should use the ExInterlockedXxx routines or some type of lock when it changes the global data. The driver's requirements determine the acceptable types of locks.

The following example attempts to reallocate a file-specific buffer (Endpoint->LocalAddress) to hold the endpoint address.

Endpoint = FileObject->FsContext;

if (Endpoint->LocalAddress != NULL &&
    Endpoint->LocalAddressLength <
       ListenEndpoint->LocalAddressLength ) {

      FREE_POOL (Endpoint->LocalAddress,
                 LOCAL_ADDRESS_POOL_TAG );
      Endpoint->LocalAddress  = NULL;
   }

if ( Endpoint->LocalAddress == NULL ) {
      Endpoint->LocalAddress =
            ALLOCATE_POOL (NonPagedPool,
                     ListenEndpoint->LocalAddressLength,
                     LOCAL_ADDRESS_POOL_TAG);
   }

In this example, a race condition could occur when the file object is accessed. Because the driver does not hold any locks, two requests for the same file object could enter this function. The result might be references to freed memory, multiple attempts to free the same memory, or memory leaks. To avoid these errors, the two if statements should be performed while the driver holds a spin lock.

Shared Access

File system drivers (FSD) and other highest-level drivers must perform access checks against an object's security descriptor before using IoXxxShareAccess routines to check, set, remove, or update shared access to the object.

To handle shared access, drivers should:

Obtain the requested access from the incoming IRP.
If the IRP major function code is IRP_MJ_CREATE, determine the effective mode of the request. If the value of the Irp->RequestorMode field is KernelMode, check whether the SL_FORCE_ACCESS_CHECK flag is set in the IrpSp->Flags field. If this flag is set, access checks must specify that the request originated in user mode.
Check the requested access against the object's security descriptor. Pass the access requested in the IRP as the DesiredAccess parameter to SeAccessCheck.
Compare the GrantedAccess returned by SeAccessCheck with the access requested in the IRP. If the GrantedAccess is more restrictive than the access requested in the IRP, complete the IRP with STATUS_ACCESS_DENIED. If the GrantedAccess matches the access requested in the IRP, proceed.
Check the permitted shared access. Use the ACCESS_MASK value returned in the GrantedAccess parameter of SeAccessCheck as the DesiredAccess input parameter to IoCheckShareAccess.

SeAccessCheck sets only those bits in the returned GrantedAccess value that indicate the access actually granted to the user; the MAXIMUM_ALLOWED bit is always cleared in the returned value. To handle shared access correctly, drivers should follow these guidelines:

Drivers should inspect the access requested in the IRP before comparing it with the GrantedAccess value returned by SeAccessCheck. If the IRP requests MAXIMUM_ALLOWED, the driver must check the individual bits in the GrantedAccess value to determine whether sufficient access has been granted.
Drivers must pass the GrantedAccess value returned by SeAccessCheck as the DesiredAccess input parameter when calling IoXxxShareAccess.

For similar reasons, drivers should not attempt optimizations or partial access control by checking desired access for other bits, such as FILE_WRITE_DATA.

Note This section describes the correct approach for NTFS and other file systems that use the access control lists (ACLs) supported by the SeXxx routines. An installable file system that uses a different type of ACLs should perform the equivalent access checks with its own rights-granting mechanism.

Locks and Disabling APCs

Certain locking primitives, user-supplied locks, and the unconventional use of events or other objects as locks have the potential to deadlock the system. Kernel-mode drivers that use such locking mechanisms should disable asynchronous procedure calls (APCs), unless the driver runs in a trusted environment (a worker thread). To disable and subsequently re-enable APCs, a device driver calls the KeEnterCriticalRegion and KeLeaveCriticalRegion routines, and a file-system driver calls the FsRtlEnterFileSystem and FsRtlLeaveFileSystem macros. These routines disable the delivery of normal kernel APCs. Special kernel APCs, which run at IRQL APC_LEVEL, are not affected by these routines.

Disabling APCs prevents the thread that currently holds the lock from being suspended by user-mode calls to SuspendThread (which delivers a kernel APC). Typically, such calls occur during debugging, but direct calls to this API are possible from user mode. If APCs are not disabled, the thread that holds the lock never has a chance to release the lock. As a result, other threads in the system are blocked while waiting for it.

Drivers must disable APCs when calling the following system routines:

Any of the ExXxxResourceXxx routines. These routines do not disable APCs. Drivers must enclose code that acquires and uses such resources within KeEnterCriticalRegion and KeLeaveCriticalRegion, or FsRtlEnterFileSystem and FsRtlLeaveFileSystem.
ExAcquireFastMutexUnsafe.
KeWaitForSingleObject for a non-mutex object.

Drivers are not required to disable APCs when calling the following system routines:

KeWaitForMutexObject or KeWaitForSingleObject for a mutex object. In this situation, KeWaitForSingleObject and KeWaitForMutexObject automatically disable APCs by the equivalent of KeEnterCriticalRegion.
ExAcquireFastMutex. This routine returns to the caller at IRQL APC_LEVEL and therefore blocks all APCs.

The situation is more complicated when driver code in a thread must run in order to release another thread. For example, consider a driver that acts as a communication mechanism between a client and a server thread. When the server thread posts a read, a read IRP enters the driver. Because no data is waiting for the driver, it pends the IRP and sets an appropriate cancel routine. If a client thread then sends a message with a write request, a write IRP enters the driver. Because the pending read IRP is already queued, however, the driver does not handle the write IRP; instead, the driver removes the read IRP from the queue and removes its cancel routine.

Now, assume that the queues that hold the pended IRPs are protected with locks. To improve performance, the driver writer has moved IRP completion outside the locks. This strategy has two advantages:

The lock region is smaller, thus improving driver scalability on large multiprocessor hardware.
Context swaps are minimized. Other threads that enter the driver are not awakened, and are subsequently blocked by a lock that is owned by the current thread.

Moving completion outside the locks has the following problems, however:

After the IRP has been removed from the queue, no cancellation routine is in place and APCs might be enabled.
If the client thread is suspended after it releases the lock but before it completes the IRP from the server thread, the server thread will be blocked by the suspended client thread.

To avoid these problems, such drivers should leave APCs disabled until the IRPs have been completed. For example, the following code handles a write request in the named-pipe file system.

FsRtlEnterFileSystem();

NpAcquireSharedVcb();

Status =  NpCommonWrite( IrpSp->FileObject,
          Irp->UserBuffer,
          IrpSp->Parameters.Write.Length,
          Irp->Tail.Overlay.Thread,
          Irp,
          &DeferredList ); // List of IRPs to be 
                           //completed after lock release

NpReleaseVcb();

//
// At this point we have released the locks but still 
// have kernel APCs disabled.
// We need to prevent this thread from being suspended until 
// after we release the server threads.
//

//
// Complete any deferred IRPs after dropping the locks.
//
NpCompleteDeferredIrps (&DeferredList);

//
// Reenable APCs after completing any server IRPs.
// Suspension before completing this thread's IRP doesn't 
// matter because it would just block
// this thread anyway and it's suspended.
//
FsRtlExitFileSystem();

if (Status != STATUS_PENDING) {
    NpCompleteRequest (Irp, Status);
    }

For additional information about when waiting threads receive alerts and DPCs, see the Design Guide in the Kernel-Mode Driver Architecture section of the Windows DDK.

Handle Validation

Some drivers must manipulate objects passed to them by callers, or must process two file objects at the same time. For example, a modem driver might receive a handle to an event object, or a network driver might receive handles to two different file objects. The driver must validate these handles. Because they are passed by a caller, and not through the I/O Manager, the I/O Manager cannot perform any validation checks.

In the following example, the driver has been passed the handle AscInfo->AddressHandle, but has not validated it before calling ObReferenceObjectByHandle.

// This handle is embedded in a buffered request.
//
status = ObReferenceObjectByHandle(
                  AscInfo->AddressHandle,
                  0,
                  NULL,
                  KernelMode,
                  &fileObject,
                  NULL);

if (NT_SUCCESS(status)) {
   if ( (fileObject->DeviceObject == DeviceObject) &&
        (fileObject->FsContext2 == TRANSPORT_SOCK) ) {
   ...

The call to ObReferenceObjectByHandle succeeds, but the code fails to ensure that the returned pointer references a file object; it trusts the caller to pass in the correct information. To correct this problem, the driver should pass explicit values for the DesiredAccess and ObjectType parameters.

Even if all the parameters for the call to ObReferenceObjectByHandle are correct, and the call succeeds, a driver can still get unexpected results if the file object is not intended for it. In the following example, the driver fails to ascertain that the call returns a pointer to the file object it expected.

status = ObReferenceObjectByHandle (
                          AcpInfo->Handle,
                          DesiredAccess,
                          *IoFileObjectType,
                          Irp->RequestorMode,
                          (PVOID *)&AcpEndpointFileObject,
                          NULL);

if ( !NT_SUCCESS(status) ) {
   goto complete;
}
AcpEndpoint = AcpEndpointFileObject->FsContext;

if ( AcpEndpoint->Type != BlockTypeEndpoint ) {
...

Although ObReferenceObjectByHandle returns a pointer to a file object, the driver has no guarantee that the pointer references the file object it expected. In this case, the driver should validate the pointer before accessing the driver-specific data at AcpEndpointFileObject->FsContext.

Drivers should validate handles as follows:

Check the object type to make sure it is what the driver expects.
Ensure that the requested access is appropriate for the object type and the required tasks. If the driver performs a fast file copy, for instance, it must make sure the handle has read access.
Specify the correct access mode (UserMode or KernelMode) and verify that the access mode is compatible with the access requested.
Validate the handle against the device object or driver if the driver expects a handle to a file object that the driver itself created. Do not break filters that send I/O requests for unexpected devices, however.
If the driver supports multiple kinds of file objects, it must be able to differentiate them. For example, TDI drivers use file objects to represent control channels, address objects, and connections. File-system drivers use file objects to represent volumes, directories, and files. Such drivers must determine which type of file object each handle represents.

Requests to Create and Open Files and Devices

Drivers can be vulnerable to problems when requests to create and open files or devices involve the following:

Opening files in the device namespace
Long file names
Unexpected I/O requests
Relative open requests for direct device open handles
Extended attributes

These issues are described in the following sections.

Opening Files in the Device Namespace

Drivers should set the FILE_DEVICE_SECURE_OPEN device characteristic when they call IoCreateDevice or IoCreateDeviceSecure to create a device object. The FILE_DEVICE_SECURE_OPEN characteristic directs the I/O Manager to apply the security descriptor of the device object to all open requests, including file open requests into the device's namespace. Setting this characteristic prevents the potential security problems described in this section. For Plug-and-Play drivers, this characteristic is set in the INF file.

Drivers that support exclusive opens are the only exception to this rule. Such drivers should instead fail any IRP_MJ_CREATE requests that specify an IrpSp->FileObject->FileName parameter with a nonzero length.

The I/O Manager does not perform access checks based on the device object for open requests into the device namespace unless FILE_DEVICE_SECURE_OPEN is set. For a device named "\Device\DeviceName," the namespace consists of any name of the form "\Device\DeviceName\FileName."

Omitting access checks can open security holes in drivers that have privileged IOCTL or FSCTL interfaces. The privileged interfaces require write access to the device that is denied to unprivileged users. Unprivileged users can bypass security, however, and obtain handles with read and write access by opening a file in the device's namespace. To prevent a user from bypassing security, a driver's DispatchCreate routines must properly handle such create requests.

For example, an unprivileged user who attempts to open \Device\Transport will not be able to create a handle with read or write access to the device. The transport driver has protected IOCTLs, however, that allow administrators to configure the transport (that is, changing the address and so forth). These IOCTLs require write access to the device. (Read and write access requirements are encoded in the IOCTL or FSCTL value). Unless the transport driver sets the FILE_DEVICE_SECURE_OPEN characteristic or has other code to handle the situation, a caller could open \Device\Transport\xyz, and thus gain all access to the file object created. An unprivileged caller could also use a normally opened handle to the transport to request another relative open (with or without a file name) and achieve the same result.

As an alternative to setting FILE_DEVICE_SECURE_OPEN, a driver can perform its own access checks, or it can reject such I/O requests outright. The following shows some sample rejection code.

if ( irpStack->FileObject->RelatedFileObject ||
   irpStack->FileObject->FileName.Length ) {
   Irp->IoStatus.Status = STATUS_ACCESS_DENIED;
   IoCompleteRequest(Irp, IO_NO_INCREMENT);
   return STATUS_ACCESS_DENIED;
}

Long File Names

Long file names in the create path can cause memory leaks and memory pool corruption in some drivers.

The Object Manager limits object paths to 32 KB Unicode characters. The file name length, in bytes, including a trailing Unicode NULL, must be an even number that is less than 64 KB. This limit applies to the whole object path (for example, \Device\Volume1\xxxxxx). The portion presented to the I/O Manager has the leading path to the device object removed, making it significantly shorter than 64 KB.

A driver is unlikely to encounter long file names through standard file open requests. When a caller requests a relative file name open at the native API level, however, the Object Manager and therefore the I/O Manager can present file names that are only a few bytes short of 64 KB.

When handling a relative open request, drivers often try to reconstruct the full path of the file to open. Typically, the driver concatenates the file name of the base file (the file to which the supplied name is relative) with a separator character and the file name of the relative portion. The length of the complete string can easily exceed 64 KB, and therefore will not fit in the 16-bit integer UNICODE_STRING structures that represent the file names in the file objects. As a result, the driver can either corrupt pool or leak memory.

Pool corruption is caused by allocating a buffer that is too short for the target file name, as shown in the following example.

FullNameLengthTemp = RelatedCcb->FullFileName.Length +
                     AddSeparator + FileObjectName->Length;
FullFileName->MaximumLength =
       FullFileName->Length = (USHORT) FullNameLengthTemp;

FullFileName->Buffer = FsRtlAllocatePoolWithTag(
                                        PagedPool,
                                        FullFileName->Length,
                                        MODULE_POOL_TAG);

RtlCopyMemory(FullFileName->Buffer,
              RelatedCcb->FullFileName.Buffer,
              RelatedCcb->FullFileName.Length );

CurrentPosition = Add2Ptr(FullFileName->Buffer,
                          RelatedCcb->FullFileName.Length );

RtlCopyMemory( CurrentPosition,
               FileObjectName->Buffer,
               FileObjectName->Length );

The file name length calculation exceeds 64 KB and the USHORT cast truncates the length. As a result, the allocated buffer is too small and one or both of the calls to RtlCopyMemory corrupt pool.

The memory leak is a subtler problem, which occurs when the file name length is used without truncation to allocate the pool buffer. Because the buffer is large enough, this error does not corrupt pool. The file name-length stored in the file object is truncated to 16 bits, however. If the truncation results in a zero length, the I/O Manager never frees the file name buffer, and a memory leak occurs. A leak can also occur if a driver changes the file name by removing excess backslash characters and these changes make the file name length field zero.

Unexpected I/O Requests

Drivers that create more than one kind of device object must be able to handle I/O requests on every such device object.

Many drivers create more than one kind of device object by calling IoCreateDevice. Some drivers create control device objects in their DriverEntry routines to allow applications to communicate with the driver, even before the driver creates an FDO. For example, before a file system driver calls IoRegisterFileSystem to register itself as a file system, it must create a control device object to handle file system notifications.

A driver should be ready for create requests on any device object it creates. After completing the create request with a success status, the driver should expect to receive any user-accessible I/O requests on the created file object. Consequently, any driver that creates more than one device object must check which device object each I/O request specifies.

For example, a driver might expect that an I/O request specifies an FDO for a specific device, when in fact the request specifies its control device object. If the driver has not initialized the same fields in the device extension of the control device object as in the other device objects, the driver could crash when trying to use device extension information from the control device object.

Relative Open Requests for Direct Device Open Handles

The I/O Manager performs a direct device open in response to create or open requests that meet all of the following criteria:

The volume name has no trailing characters. For example, G: is valid, but G:\ and G:\a\b are not.
The create request is not relative to another file handle.
The requested access includes one or more of the following, and no other access types: SYNCHRONIZE, FILE_READ_ATTRIBUTES, READ_CONTROL, ACCESS_SYSTEM_SECURITY, WRITE_OWNER, or WRITE_DAC.

For a normal create or open request on a storage volume, the I/O Manager typically attempts to mount a file system, if none is already mounted. When performing a direct device open, however, the I/O Manager does not mount or send requests through a file system. Instead, it sends the IRP_MJ_CREATE request directly to the storage stack, bypassing any file system that has been mounted for the volume. Requests for further operations (such as read, write, or DeviceIoControl) on the file handle are sent to the topmost device object in the storage stack for the volume.

The I/O Manager performs a direct device open only when the caller requests limited access to the device, such as the access required to read device attributes. This type of open operation occurs rarely, but is useful when an application wants to query certain attributes of a storage volume without forcing a file system to be mounted.

If an application later sends an open request that is relative to a handle on which the I/O Manager performed a direct device open, the file system stack receives a file object in which the RelatedFileObject field points to an object that the file system has not previously seen. To determine whether the I/O Manager performed a direct device open on a file object, a file system driver can test the FO_DIRECT_DEVICE_OPEN flag in the Flags field of the file object.

On Microsoft Windows NT 4.0 and earlier versions of Windows NT, relative open requests for direct device open handles failed. This problem has been corrected in Microsoft Windows 2000 and later releases.

Extended Attributes

Drivers must validate the size and contents of extended attributes (EAs). EAs are used primarily by TDI drivers during open operations. The redirector (RDR) also uses them to hold user names and passwords for accessing network shares.

The I/O Manager copies and parses EAs to make sure they have the correct format: a keyword (a NULL-terminated, variable-length character string), followed by its value (0 to 65535 bytes). Drivers should not assume, however, that if the keyword is correct the value block contains exactly the data they expect. Even if the keyword is correct, the data size might be too small, thus causing the expected data structure to extend beyond the allocated end of buffer, or to contain garbage.

For example, the following code does not properly validate that the size of the value block is sizeof(PVOID).

ea = (PFILE_FULL_EA_INFORMATION)
      Irp->AssociatedIrp.SystemBuffer;

RtlCopyMemory (
           &connection->Context,
           &ea->EaName[ea->EaNameLength+1],
           sizeof (PVOID));

Drivers also must validate the data within EAs. The following code fails to perform this validation.

ea = OPEN_REQUEST_EA_INFORMATION(Request);
if (ea == NULL) {
    return STATUS_NONEXISTENT_EA_ENTRY;
   }

name = (PTRANSPORT_ADDRESS)&ea->EaName[ea->EaNameLength+1];
AddressName = (PTA_ADDRESS)&name->Address[0];

for (i=0;i<name->TAAddressCount;i++) 
...

If the address count is large, the for loop could run beyond the end of the allocated buffer. The driver should check the minimum size of the value, and check each individual address to make sure it is within the buffer.

During internal review, Microsoft found the following error in several drivers that process EAs.

FILE_FULL_EA_INFORMATION UNALIGNED *
FindEA(
    PFILE_FULL_EA_INFORMATION    pStartEA,
    CHAR                        *pTargetName,
    USHORT                       TargetNameLength)
{
    FILE_FULL_EA_INFORMATION UNALIGNED *pCurrentEA;

    do
    {
        Found = TRUE;
        pCurrentEA = pStartEA;
        pStartEA  += pCurrentEA->NextEntryOffset;
...

This code should cast pStartEA to a PUCHAR to send forward a byte count instead of multiples of sizeof (FILE_FULL_EA_INFORMATION).

Driver Unload Routines

Before unloading, drivers must release all driver-allocated resources, cancel all timers, ensure that no deferred procedure calls (DPCs) are queued, and ensure that all driver-created threads have terminated. The operating system frees a driver's address space soon after unloading the driver. Thereafter, attempting to execute any driver code, for example, in a DPC or driver-created thread, can result in a system crash.

This section outlines the steps that drivers should take to prevent such errors when using the following:

Work items
Driver-created threads
Timers
Queued DPCs
IoCompletion routines

Work Items

Drivers that use work items should call the IoAllocateWorkItem, IoQueueWorkItem, and IoFreeWorkItem routines instead of the obsolete ExQueueWorkItem and related routines. The newer IoXxxWorkItem routines include unload protection that the obsolete routines did not have.

The IoXxxWorkItem routines ensure that the device object associated with the work item remains available until the callback routine returns. Work item callback routines can set an event immediately before exiting, without risk that the driver will be unloaded before the callback routine returns. After the event is completed, the driver can call IoFreeWorkItem and free any resources shared with the work item.

The obsolete ExQueueWorkItem and related routines did not have this protection mechanism.

Note The number of threads in which to run work items is limited. Drivers should allocate work items only when needed, and free them as soon as they are no longer required. A driver should not wait until it is unloaded to free work items that are no longer in use.

Driver-Created Threads

Many drivers have separate threads of execution that are created outside the control of the worker thread manager. These threads execute code within a loaded driver. Because a driver's address space is freed soon after its Unload routine returns, every driver must carefully synchronize the termination of these driver threads. Attempting to execute instructions in a driver thread after the driver is unloaded can cause a system crash.

In the following example, the driver waits on an event that another driver thread will set just before exiting.

KeWaitForSingleObject(
                &Device->UnloadEvent,
                Executive,
                KernelMode,
                FALSE,
                (PLARGE_INTEGER)NULL
                ) {
    };
return;

The following code sets the event.

KeSetEvent(&Device->UnloadEvent,
           IO_NETWORK_INCREMENT,
           FALSE);
return;

If the driver unloads before the final few instructions execute, a fault may occur. In this example, the system could crash if the driver has already been unloaded when the return statement following the call to KeSetEvent is executed.

To prevent this error, drivers that create separate threads should wait on the thread object itself, instead of waiting on an event set by the thread. For example, if a driver calls PsCreateSystemThread to create a thread, the driver can call KeWaitForSingleObject, passing the handle of the thread as the object on which to wait. When the thread calls PsTerminateSystemThread, or returns from its thread routine back to the system, the wait is satisfied. The driver can now safely unload because the thread has exited.

Timers

Drivers that use timers must also unload carefully. Drivers must cancel any timers that are queued, wait for any CustomTimerDpc routines that are running, and synchronize access to driver structures from DPC routines.

A driver can cancel a one-shot timer in its Unload routine. To cancel a one-shot timer, the driver calls KeCancelTimer. If KeCancelTimer returns TRUE, the timer is not running. If KeCancelTimer returns FALSE, the timer DPC is currently running and the driver must not free any driver-allocated resources until after the DPC has finished running.

The operating system forces any DPCs that are already running to run to completion, even after the driver Unload routine returns (but before deleting the driver's address space). A driver can therefore wait on an event signaled by the DPC. The DPC should signal the event after it has finished accessing any resources, typically immediately before returning. When the event wait is satisfied, the driver can safely free those resources and unload.

Drivers that use periodic timers must take an additional step. The driver first calls KeCancelTimer to disable the periodic timer. KeCancelTimer always returns TRUE for such timers, however, because as soon as a periodic timer expires, the operating system queues another such timer; consequently, periodic timers always appear to be queued.

To make sure that any DPCs for a periodic timer have completed, a driver must also call KeFlushQueuedDpcs. KeFlushQueuedDpcs returns after all queued DPCs on all processors have run. Although this routine is expensive in terms of performance, a driver must call it in this situation.

Queued DPCs

Before unloading a driver, the operating system flushes driver-queued DPCs other than those for periodic timers, as described in the preceding section. Therefore, drivers that queue DPCs are not required to call KeFlushQueuedDpcs before unloading; however, such drivers must synchronize access to ensure that the DPC routine has finished using resources before the driver frees them. A driver can use the same kind of event wait mechanism described for one-shot timers.

IoCompletion Routines

In rare cases, an IoCompletion routine can run in parallel with a driver's Unload routine. If the Unload routine waits for an event set by the IoCompletion routine, the event could be satisfied and the driver unloaded before the IoCompletion routine runs to completion. This is a problem only for drivers that do not use Plug and Play.

To avoid this problem, drivers for Windows XP and later can use the IoSetCompletionRoutineEx routine to set the IoCompletion routine. IoSetCompletionRoutineEx protects the IoCompletion routine from driver unload.

Pageable Drivers and DPCs

Drivers that queue DPCs and make themselves pageable are not required to flush DPCs before calling MmPageEntireDriver. The operating system flushes DPCs before paging the driver, but the driver must ensure that neither it nor another thread queues any additional DPCs until the driver is once again locked in memory.

User-Mode APIs

This section describes errors that can occur when drivers are called by the following user-mode APIs.

NtReadFile and NtWriteFile
TransmitFile

NtReadFile and NtWriteFile

Drivers that read and write data in response to the user-mode APIs NtReadFile and NtWriteFile must be able to handle the negative file offsets that can be passed with these APIs. The I/O Manager performs limited checks on these offsets.

NtWriteFile accepts negative LARGE_INTEGER values to signify a write to end of file and a write to current position. NtReadFile accepts a negative offset, which indicates the current position read. No other negative offsets are accepted.

The I/O Manager does not reject transfers where the offset plus the transfer length cause the offset of the buffer end to wrap from positive to negative.

TransmitFile

The Win32 TransmitFile API issues an IOCTL to the system afd.sys driver (AFD) to do fast file copies over the network. The AFD provides support for Windows Sockets API to communicate with underlying transports. During internal testing, Microsoft found several drivers that encountered problems when their handles were passed to the TransmitFile API. Some looped, completing read requests with a success status but with zero bytes read; others had cancellation problems.

The Device Path Exerciser, DevCtl, includes the /w option to test a driver by using TransmitFile. Microsoft recommends testing drivers for these problems.

StartIo Recursion

If many device requests are outstanding, calls to IoStartNextPacket or IoStartNextPacketByKey from a driver's StartIo routine can result in recursive calls back to the StartIo routine without unwinding the stack.

Drivers that call these routines from the StartIo routine should first call the IoSetStartIoAttributes routine, with the DeferredStartIo parameter set to TRUE. Doing so causes the I/O Manager to keep track of the nesting level of the calls, and dispatch to the StartIo routine only after the current StartIo call has returned.

Passing and Completing IRPs

Drivers commonly have the following problems in passing and completing IRPs:

Copying stack locations incorrectly.
Returning incorrect status for an IRP that the driver does not handle.
Losing IRPs or completing them more than once.
Returning incorrect status for an IRP that the driver issues.

Copying Stack Locations Incorrectly

When passing an IRP down the stack, drivers should always use the standard functions IoSkipCurrentIrpStackLocation and IoCopyCurrentIrpStackLocationToNext. Do not write driver-specific code to copy the stack location. Using the standard routines ensures that the driver does not duplicate the IoCompletion routine of a driver layered above it.

For example, the following code can duplicate an IoCompletion routine and cause problems.

currentStack = IoGetCurrentIrpStackLocation (Irp) ;
nextStack = IoGetNextIrpStackLocation (Irp) ;

RtlMoveMemory (nextStack, 
               currentStack, 
               sizeof (IO_STACK_LOCATION));

Returning Incorrect Status for an IRP That the Driver Does Not Handle

A driver must not return STATUS_SUCCESS for an IRP that it does not handle.

For example, some drivers incorrectly return STATUS_SUCCESS for query IRPs, even though they do not support the required functionality. Doing so can easily crash or corrupt the system, particularly during operations like file name look-ups, if the I/O Manager or another component attempts to use data that was left uninitialized by the Dispatch routine.

Unless otherwise noted in the documentation for a specific IRP, a driver should return STATUS_NOT_SUPPORTED for any IRP it does not handle. Plug-and-Play drivers might also return STATUS_INVALID_DEVICE_REQUEST to indicate that the IRP is inappropriate for the device.

Losing IRPs or Completing Them More Than Once

IRPs that are lost or completed more than once, along with missing calls to I/O Manager routines such as IoStartNextPacket, often occur in error-handling paths. A "lost" IRP is one that the device has finished, but the driver never completed by calling IoCompleteRequest or passing it to another driver.

Quick reviews of code paths can often find such problems. In addition, the DC2 and DevCtl tools can assist in finding lost IRPs. The DC2 and DevCtl tools are provided in the Tools directory of the Windows DDK.

Returning Incorrect Status from an IRP That the Driver Issues

Unlike drivers to which an IRP is forwarded, the driver that issues an IRP must not propagate the SL_PENDING_RETURNED bit in its IoCompletion routine for that IRP. Doing so corrupts the memory pool following the IRP.

When a driver receives an IRP from another driver, it must propagate the SL_PENDING_RETURNED bit if it returns STATUS_MORE_PROCESSING_REQUIRED for the IRP. Therefore, IoCompletion routines for IRPs that are forwarded from another driver typically include the following code.

If (Irp->PendingReturned)
    IoMarkIrpPending(Irp);

The driver that issued the IRP, however, must not include this statement. The issuing driver is the final recipient of the IRP; further processing is not required. When the issuing driver's IoCompletion routine is called, the DeviceObject parameter is NULL and the I/O stack location points to the location immediately following the end of the IRP, causing corruption of the pool header for the next memory allocation.

Odd-length Unicode Buffers

Some I/O Manager APIs support Unicode input buffers that contain an odd number of bytes. The optional file name in NtQueryDirectoryFile, and many queries using NtQueryInformationFile (such as FileNameInformation), are examples. Drivers should test the lengths of these buffers upon input.

Pool Allocation in Low Memory

When the system is low on pool memory, calling ExAllocatePool with the pool type NonPagedPoolMustSucceed causes the system to crash. This can occur, for example, on a web server where client spikes are frequent and short, but the occurrences use a great deal of pool memory and can cause memory to become fragmented temporarily.

Drivers should not use this flag. Instead, drivers should allocate nonpaged memory with the NonPagedPool or NonPagedPoolCacheAligned flags and, if ExAllocatePool returns NULL, return the status STATUS_INSUFFICIENT_RESOURCES.

In addition, Microsoft Windows XP and Windows 2000 drivers must use MmGetSystemAddressForMdlSafe instead of MmGetSystemAddressForMdl. WDM drivers must use MmGetSystemAddressForMdl with the MDL_MAPPING_CAN_FAIL MDL flag, because MmGetSystemAddressForMdlSafe is not supported on Windows 98 and Windows Me.

For more information on pool allocation failures, see Low Pool Memory and Windows XP, available on the Microsoft website.

Call to Action and Resources

Call to Action:

Find and correct errors in existing drivers. Use the Driver Verifier, DC2, and DevCtl utilities in the Windows DDK.
Analyze code paths, particularly those involving locks, to uncover any problems described in this paper.
Always validate pointers obtained from user-mode callers.
Always check buffer sizes to prevent buffer overruns and underruns.

Resources:

Low Pool Memory and Windows XP
Driver Reliability and Security topics at Windows Hardware and Driver Central

Share via