Implementing a High-Resolution Time Provider for Windows

Article
10/11/2019

Timers

Implement a Continuously Updating, High-Resolution Time Provider for Windows

Johan Nilsson

Code download available at:HighResolutionTimer.exe(404 KB)

This article assumes you're familiar with C++ and the Win32 API

SUMMARY

The timestamps that you can obtain from Windows NT are limited to a maximum resolution of 10 or 15 milliseconds, depending on the underlying hardware. At times, such as when you need to timetag frequent events, it is desirable to achieve a higher resolution. For example, what if you want to be able to contact a thread or perform some other task at intervals more frequent than 10 milliseconds? Suggested ways of achieving better resolution include using performance counters in conjunction with the sytem time to calculate smaller time increments. But using performance counters is a technique that presents its own problems. This article shows one possible way of overcoming limitations inherent in this approach.

Contents

Who Needs This Stuff, Anyway?
An Initial Attempt
A Reliable Method for Synchronization
The Issue of Frequency
Protecting Against System Time Changes
Time Adjustments
The Time Provider
Performance Factors
Future Directions
Conclusion

Why would you be interested in obtaining the system time with a resolution below one millisecond? In the course of my work, I found it necessary to be able to determine the sequence of events originating from different threads of execution in my process. I also needed to relate these events to absolute time, but noticed that the actual resolution of the system time was no more granular than 10 milliseconds.

In the following article, I'll explain this system time resolution limit, the steps needed to overcome it, and some of the common pitfalls. A sample implementation is available for download at the link at the top of this article. The source code for these files was written and tested using Visual C++® 7.1 and Windows® XP Professional. During the course of this article, though, I will frequently refer to the Windows NT® family of operating systems (Windows NT 4.0, Windows 2000, or Windows XP) rather than to a specific version. For the parameter types and usage of the Win32® APIs referenced in the article, refer to the MSDN Library/Platform SDK documentation.

Who Needs This Stuff, Anyway?

When I recently did an Internet search for "Windows NT millisecond time resolution," I got well over 400 hits. The majority discussed either how to obtain the system time with a higher resolution than 10 milliseconds or how to cause a thread to sleep for less than 10 milliseconds. I will concentrate here on why it's difficult to obtain the system time with a resolution greater than 10 milliseconds. You might think this would be easy with the GetSystemTime API, which returns a SYSTEMTIME structure containing a wMilliseconds field documented in MSDN as storing the current millisecond. But it isn't as simple as it seems. And what about GetSystemTimeAsFileTime with its 100 nanosecond resolution? Let's start out by performing a little experiment: try repeatedly retrieving the system time, formatting it, and outputting it to the screen (see Figure 1).

Figure 1 Retrieving and Outputting System Time

#include <windows.h> #include <iostream> #include <iomanip> int main(int argc, char* argv[])
{ SYSTEMTIME st;
    while (true)
    { ::GetSystemTime(&st);
        std::cout << std::setw(2) << st.wHour << ':' << std::setw(2) 
          << st.wMinute << ':' << std::setw(2) << st.wSecond << '.' 
          << std::setw(3) << st.wMilliseconds << 'n';
    }
    return 0;
}

I wasn't aiming for nanoseconds, but for the single-millisecond resolution, which should be possible judging from the SYSTEMTIME structure. Let's take a look at the output:

20:12:23.479 
20:12:23.479 
20:12:23.494 
20:12:23.494 
[...lots of these removed...] 
20:12:23.494 
20:12:23.509 
20:12:23.509 
20:12:23.509 
•••

As you can see, the best I could achieve was 15-millisecond resolution, which is the length of the Windows NT clock interval. At each elapsed clock interval, Windows NT will update the system time. The Windows NT scheduler will also kick in and possibly select a new thread for execution. For more information on this, take a look at Inside Windows 2000, Third Edition by David Solomon and Mark Russinovich (Microsoft Press®, 2000).

If you ran the code I just showed, you might have seen the time updating approximately every 10 milliseconds instead. If that's the case, it probably means that you are running Windows NT on a single-processor machine, where the clock interval is usually equivalent to 10 milliseconds. As you can see, the system time isn't updated frequently enough for this to be a useful technique. Now, let's try to find a solution.

An Initial Attempt

When you ask how to obtain the system time with a better resolution than 10 ms you'll probably get an answer like: use performance counters and synchronize the performance counter value to the instant the system time changes. Use these values in combination to calculate the current time with an extremely high resolution. Figure 2 shows an implementation of this.

Figure 2 An Initial Attempt

#include <windows.h> #include <iostream> #include <iomanip> struct reference_point
{ FILETIME file_time;
    LARGE_INTEGER counter;
}
;
void simplistic_synchronize(reference_point& ref_point)
{ FILETIME ft0 =
    { 0, 0 }
    , ft1 =
    { 0, 0 }
    ;
    LARGE_INTEGER li;
    //  
    // Spin waiting for a change in system time. Get the matching  
    // performance counter value for that time.  
    //
    ::GetSystemTimeAsFileTime(&ft0);
    do
    { ::GetSystemTimeAsFileTime(&ft1);
        ::QueryPerformanceCounter(&li);
    }
    while ((ft0.dwHighDateTime == ft1.dwHighDateTime) && (ft0.dwLowDateTime == ft1.dwLowDateTime));
    ref_point.file_time = ft1;
    ref_point.counter = li;
}
void get_time(LARGE_INTEGER frequency, const reference_point& reference, FILETIME& current_time)
{ LARGE_INTEGER li;
    ::QueryPerformanceCounter(&li);
    //
    // Calculate performance counter ticks elapsed  
    //
    LARGE_INTEGER ticks_elapsed;
    ticks_elapsed.QuadPart = li.QuadPart - reference.counter.QuadPart;
    //  
    // Translate to 100-nanoseconds intervals (FILETIME  
    // resolution) and add to  
    // reference FILETIME to get current FILETIME.  
    //
    ULARGE_INTEGER filetime_ticks, filetime_ref_as_ul;
    filetime_ticks.QuadPart = (ULONGLONG)((((double)ticks_elapsed.QuadPart/(double) frequency.QuadPart)*10000000.0)+0.5);
    filetime_ref_as_ul.HighPart = reference.file_time.dwHighDateTime;
    filetime_ref_as_ul.LowPart = reference.file_time.dwLowDateTime;
    filetime_ref_as_ul.QuadPart += filetime_ticks.QuadPart;
    //
    // Copy to result  
    //
    current_time.dwHighDateTime = filetime_ref_as_ul.HighPart;
    current_time.dwLowDateTime = filetime_ref_as_ul.LowPart;
}
int main(int argc, char* argv[])
{ reference_point ref_point;
    LARGE_INTEGER frequency;
    FILETIME file_time;
    SYSTEMTIME system_time;
    ::QueryPerformanceFrequency(&frequency);
    simplistic_synchronize(ref_point);
    while (true)
    { get_time(frequency, ref_point, file_time);
        ::FileTimeToSystemTime(&file_time, &system_time);
        std::cout << std::setw(2) << system_time.wHour << ':' << std::setw(2) << system_time.wMinute 
          << ':' << std::setw(2) << system_time.wSecond << '.' << std::setw(3) 
          << system_time.wMilliseconds << 'n';
    }
    return 0;
}

The performance counter is a high-resolution hardware counter which can be used for measuring brief periods of time with high precision and low overhead. I synchronized the performance counter value to the corresponding system time by repeatedly spinning inside a tight loop, waiting for the system time to change. When the system time changed, I stored the counter value together with the system time.

Using these two values as a reference, it is possible to calculate the current system time with a high resolution (see get_time in Figure 2 for details).Take a look at the results:

••• 
21:23:22.296 
21:23:22.297 
21:23:22.297 
21:23:22.298 
21:23:22.298 
21:23:22.299 
21:23:22.300 
21:23:22.300 
21:23:22.301 
21:23:22.301 
21:23:22.302 
21:23:22.302 
21:23:22.303 
•••

Although it looks as though this was successful, there are several problems with this implementation: the synchronization implementation (there's a good reason the function is named "simplistic_synchronize"), the frequency reported by QueryPerformanceFrequency, and the lack of protection for changes in system time. Let's look at some possible improvements for all of these problems in the following sections.

A Reliable Method for Synchronization

The problem with the synchronization implementation is that it didn't take into account the preemptiveness of the Windows NT scheduler. There is no guarantee, for example, that a thread context switch won't take place between the following two lines of code, resulting in a delay of an unknown period of time:

::GetSystemTimeAsFileTime(&ft1); 
::QueryPerformanceCounter(&li);

The simplistic synchronization function will still succeed most of the time when executing under the following circumstances:

The current thread is not preempted by a higher priority thread getting into the ready state
The current thread's quantum does not reach its end
There are few hardware interrupts (other than the clock interrupt itself)

The most straightforward solution for this is to raise the process priority class to REALTIME_PRIORITY_CLASS and the thread priority to THREAD_PRIORITY_TIME_CRITICAL to prevent the thread from being preempted during synchronization. Unfortunately, there isn't much you can do about the hardware interrupts, but well-behaving drivers should handle their interrupts, queue a deferred procedure call (DPC) and even process the DPC within an order of microseconds. The problem is that you can't be sure that all drivers in your system are actually well behaved. In fact, even if you only have conforming drivers in your system, you might still have a lot of interrupts.

Nevertheless, there is a reliable way of synchronizing without necessarily raising the process and thread priorities. Figure 3 shows a basic flowchart of the steps involved.

Figure 3 Reliable Synchronization

Figure 3** Reliable Synchronization **

You need to repeatedly check to see if the system time has changed, just like the simplistic_synchronize implementation shown in Figure 2. The big difference from the previous implementation is that you now also use the performance counter itself to verify that you're keeping within the desired accuracy level. This sounds simple enough, but a closer examination of Figure 3 will show you that it isn't quite as simple as it seems. Some further explanation is needed, such as why you store the difference between the latest performance counter values in the variable prev_diff. The reason for this is that from the point the system time is stored in t1 to the point the counter value is stored in p1, the system time could potentially change without being detected until the next run through the inner loop.

Following that, you could erroneously assume that the time changed somewhere in between the latest two counter values, when in fact it did not. To safeguard against this, you should assume that the system time changed either between the latest or the previous two counter values (except for the unlikely event that it should change the very first time through the inner loop). At the end of the synchronization, an adjustment of the counter value is performed; this is to make sure that the returned value is within the desired accuracy. Figure 4 shows this process.

Figure 4 Counting

Figure 4** Counting **

This method of synchronization needs quite a few iterations to complete, but in practice that hasn't proven to be a problem. For further information on synchronizations and their accuracy, you should take a look at the sidebar "Synchronization: How Good Can it Get?"

The Issue of Frequency

Although I've managed to get a good starting point, there are still some problems to be addressed. Suppose you perform this synchronization at some distinct point in time. Then, whenever you need a high-resolution time, you call get_time. As shown earlier, the frequency reported by QueryPerformanceCounter is used for calculating the current system time with a high resolution. It's bound to happen that the times reported by get_time will deviate from the actual clock time with a much larger value than the level of accuracy you've just obtained. This is because the performance counter is not intended to be used for measuring extended periods of time.

I ran a small test to find out how much of an effect this would have, using two microseconds as the acceptable synchronization limit. (I chose two microseconds because it's the best my dual PII 400MHz machine can get.) The results are shown in Figure 5.

Figure 5 Synchronization Test

Figure 5** Synchronization Test **

As it turns out, my high-resolution clock had deviated from the real system time by a millisecond after only 110 seconds. A quick calculation indicates that this corresponds to an error in the reported performance counter frequency of roughly 9 parts per million (PPM). An error of 0.000009 might sound minor, but if you want to report sub-millisecond times this really has an impact. Initially, I considered two options. First, the user is responsible for periodical resynchronization, and thus must also determine how often this must be done. Second, the synchronization is performed by a background thread every nth second.

Even before testing the first option, I decided against it. The second option seemed more viable and the only problem I could foresee was that the required synchronization between the clients and the synchronization thread would incur some overhead. The ease of use still outweighs the increased complexity and overhead.

The sample implementation provided in the download uses a background thread to synchronize the performance counter to the system time. An illustration of how this implementation manages to keep itself close to the real system time is shown in Figure 6 (note that the vertical scale is now set to +/-100 microseconds).Synchronization: How Good Can it Get?

Using the method of synchronization I describe in this article, you can specify the accuracy of the results you want. In reality, however, there is a platform-dependent (hardware and software) limit on the quality of the results you can achieve. The clock interrupt handler in Windows NT takes some time to execute, effectively limiting your precision to no better than the clock interrupt handler's execution time, plus the time spent in thread context switching, plus the execution times for the function calls needed to check when the time has changed. If you are running on a symmetric multiprocessing (SMP) machine, you can avoid clock interrupt problems by running the synchronization thread on another CPU.

Prohibiting the synchronization thread from running on the CPU that handles the clock interrupt can make an almost tenfold difference in the synchronization accuracy on an SMP machine. The only problem is that you first need to know which CPU actually handles the clock interrupt. I can only tell from my limited experience that it seems to be handled by CPU#0 (which makes some kind of weird sense, I suppose). Assuming that this is true, you can simply use the SetThreadAffinityMask API to remove CPU#0 from the thread's list of allowed processors. You should just make sure that the process is allowed to run on the other processors by checking the results from calling GetProcessAffinityMask beforehand.

Figure 6 Sample Synchronization

Figure 6** Sample Synchronization **

The graph in Figure 6 displays the high-resolution time deviation from system time over a period of some 13 minutes. The blue line shows the periodic resynchronizations being applied before the deviation reaches the allowed deviation from system time (in this case, 50 microseconds). It also shows that the time between the synchronizations increases after each execution. This is because the current implementation of the time provider adapts by measuring the error of the performance counter's reported frequency and continuously applies this to the internal calculation of the high-resolution times.

While the blue line shows data with a smoothing filter applied, the yellow line shows the raw measured deviation from the system time. The filtering is done in real time and this is the data actually used for determining the performance counter's real frequency and the high-resolution time deviation from the system time. For more details, see the source code in the download.

Protecting Against System Time Changes

There is also the problem of system time changes. Whenever one occurs, you must resynchronize immediately to make sure that the calculated times are correct. In Windows 2000 and Windows XP this isn't difficult since the system always broadcasts a WM_TIMECHANGE message to all top-level windows when the system time is set. Unfortunately, in Windows NT 4.0 and earlier this is not enforced, although the Platform SDK documentation does state the following: "An application should send this message to all top-level windows after changing the system time." Note that this sentence was qualified with "should," so you can't depend on everyone to do it.

To get some perspective on this problem, I should say that changing the system time isn't something being done ad hoc by just any application. To modify the system's time or related configuration, the privilege SE_SYSTEMTIME_NAME needs to be enabled. If the user does not have this priviledge enabled, you can run the application under an administrative account, ask an administrator to install the application as a Windows NT service, or ask an administrator to grant the required privilege to the account used for running the app. For Windows NT 4.0 the best you can hope for is that a system administrator will not install or allow installation of an ill-behaving application (one that changes the system time without telling other applications about it).

So how do you actually handle the WM_TIMECHANGE message? Since you already have a thread that's being used for the periodic synchronization, the only thing you need to do is let the thread create an invisible top-level window and run a message loop in addition to the periodic synchronization.

Time Adjustments

There is also a problem related to how Windows NT maintains the system time. To help software such as Network Time Protocol (NTP) clients keep the time synchronized to an external source, Windows NT exposes the SetSystemTimeAdjustment API. This API takes two parameters, the time adjustment itself in 100-nanosecond units, and a Boolean parameter which instructs Windows NT when to disable time adjustment. When time adjustment is enabled, the system adds the value of the specified time adjustment at each clock interrupt. When disabled, the system adds the default time increment instead (which is the same as the clock interval in this article). For more details, see the Platform SDK documentation.

There are a couple of problems with this, though. The first is that enabling (and changing) the time adjustment alters your reference frequency—the flow of time. The second, which is a bigger problem, is that there is no notification sent by the system when the time adjustment is changed, enabled, or disabled. Changing the time adjustment even by the minimum possible increment (one 100-nanosecond unit) on a system with a default time increment of 156250 units causes a change in the reference frequency of 6.4 PPM (1/156250). Once again, this might not sound like much, but considering that you might want to keep within 50 microseconds from the system time, it could mean that you exceed that limit after a few seconds without resynchronization.

To lessen the impact of such adjustments, the time provider has to monitor the current time adjustment settings. With no help from the operating system itself, this is implemented by calling the SetSystemTimeAdjustment companion API GetSystemTimeAdjustment. By performing this check repeatedly at short enough intervals and adjusting the internal frequency as needed, you can avoid drifting too far from the system time.

The Time Provider

Now that you have gained a better understanding of the problem domain, I'll briefly introduce the time_provider class available in the download. The time provider is implemented as a parameterized singleton, supplying clients with high-resolution, continuously updating time, as shown here:

template< typename counter_type, int KEEP_WITHIN_MICROS = 100, 
  int SYNCHRONIZE_THREAD_PRIORITY = THREAD_PRIORITY_BELOW_NORMAL, 
  int TUNING_LIMIT_PARTSPERBILLION = 100, int MAX_WAIT_MILLIS = 10000, 
  int MIN_WAIT_MILLIS = 100 > class time_provider

Using the provider to obtain the current time is similar to using the Windows API:

typedef hrt::time_provider<hrt::performance_counter>time_provider_t; time_provider_t& provider=time_provider_t::instance(); 
SYSTEMTIME st; provider.systemtime(&st);

Figure 7 provides explanations of the available template parameters, typedefs, and member functions of the time_provider class. You may find it strange that the different tuning parameters are specified as template parameters. From my point of view all of them are design parameters and can be determined from your application's requirements at compile time.

Figure 7 Time_provider Parameters and Members

Template Parameters
counter_type Represents a high-resolution, high-frequency counter. It must provide the static members value and frequency, as well as the typedef value_type. KEEP_WITHIN_MICROS Defines the maximum number of microseconds that the time provider is allowed to drift away from the actual system time. It also affects how often the resynchronization thread will synchronize. SYNCHRONIZE_THREAD_PRIORITY Defines the priority to which the synchronization thread should set itself when executing the synchronization. This should not have to be altered unless your application executes continuously at a high priority. The default is THREAD_PRIORITY_BELOW_ NORMAL which doesn't disturb the standard execution of normal-to-high priority threads. TUNING_LIMIT_PARTSPERBILLION The current implementation of the time provider performs continuous measurements of the counter frequency. This frequency is maintained internally and allows for less frequent resynchronizations and more accurate timing. When the accuracy of the measured frequency reaches a certain threshold, adjustment of this is no longer executed (but periodic resynchronization is always active). The unit of this limit is the error rate for the calculated frequency, the default corresponding to 100 parts per billion. MAX_WAIT_MILLIS Defines the maximum allowed tuning interval in milliseconds—that is, the time to wait between checking how far the high-precision time has drifted away from the system time. The tuning interval is autonomously adjusted, but only up to this limit. This parameter should normally not be altered. MIN_WAIT_MILLIS Defines the minimum allowed tuning interval in milliseconds. See MAX_WAIT_MILLIS for more details.
Typedefs
raw_value_type A type that is capable of storing a "raw" time stamp.
Member Functions
instance Returns a reference to the only instance of this class. systemtime Returns the current system time as a SYSTEMTIME structure. filetime Returns the current system time as a FILETIME structure. rawtime Returns the current system time as a "raw" time stamp with minimum overhead. To transform this into an absolute time, use filetime_from_rawtime or systemtime_from_rawtime. systemtime_from_rawtime Transforms a "raw" timestamp into an absolute time, represented as a SYSTEMTIME structure. filetime_from_rawtime Transforms a "raw" timestamp into an absolute time, represented as a FILETIME structure.

The code in Figure 8 demonstrates using the time_provider class to collect raw time samples in a tight loop for later transformation and output. Another example of this can be found in the threads sample in the download (demonstrating the same idea in a multithreaded environment).

Figure 8 Using the time_provider Class

#include <hrt/performance_counter.hpp> 
#include <hrt/time_provider.hpp> 
#include <hrt/system_time.hpp> 
#include <vector> 
#include <iostream> 
#include <iomanip> using namespace hrt;
typedef time_provider<performance_counter> time_provider_type;
typedef time_provider_type::raw_value_type raw_time_type;
typedef std::vector<raw_time_type> raw_vector;
const int NUMBER_OF_SAMPLES = 1000;
int main(int argc, char* argv[])
{ raw_vector samples;
    time_provider_type& provider = time_provider_type::instance();
    samples.reserve(NUMBER_OF_SAMPLES);
    for (int i = 0;
    i < NUMBER_OF_SAMPLES;
    ++i)
    { samples.push_back(provider.rawtime());
    }
    system_time st;
    for (raw_vector::iterator iter = samples.begin();
    iter != samples.end();
    ++iter)
    { provider.systemtime_from_rawtime(*iter, st.pointer());
        std::cout << std::setfill('0') << std::setw(2) << st.hour() << ':' << std::setw(2) 
          << st.minute() << ':' << std::setw(2) << st.second() << '.' << std::setw(3) << st.millis() << 'n';
    }
    return 0;
}

Performance Factors

So how much overhead is involved in using the time_provider class for obtaining the system time? A certain amount of extra work can't be avoided when you have to calculate the time rather than just retrieving it. If you are really concerned about performance within certain critical sections of the code, use the technique shown in Figure 8 involving raw counter values. Using raw values allows you to postpone the transformation into system time such that no immediate additional overhead is imposed (besides the call to collect the counter value itself, which is, of course, hard to avoid).

The table in Figure 9 shows this more clearly, giving an estimate of the Win32 API performance relative to the time_provider counterparts. The numbers in the table are expressed as percentages relative to the execution time for GetSystemTime on a Windows XP SMP system (corresponding uniprocessor numbers are shown within parentheses).

Figure 9 Win32 Time Functions and Performance

Win32 API	Execution Time	time_provider	Execution Time
GetSystemTimeAsFileTime	1.9% (~0%)	filetime	135% (900%)
GetSystemTime	100% (100%)	systemtime	234% (1001%)
QueryPerformanceCounter	55% (400%)	rawtime	57% (400%)

As I mentioned earlier in this article, the cost of calling QueryPerformanceCounter is not negligible, and this holds especially true for uniprocessor systems. The execution times for the calls using the performance counter APIs are generally much faster on a symmetric multiprocessing (SMP) system. This is because most SMP systems implement the performance counter using the Pentium time stamp counter (TSC) and have a relatively low call overhead compared to the uniprocessor implementation.

I was somewhat disappointed with the performance, even though no effort went into optimizing the calculations. To obtain better performance with the cost of dropping portability, you could try using another counter. The time_provider class is parameterized on the counter type to allow for the use of other high-resolution counters. There is an experimental tsc_counter class available in the download, enabling direct usage of the Pentium TSC. Initial testing of this class indicates much better performance than going through the performance counter API, even on an SMP machine. When running the same tests as in Figure 9, the tsc_counter edition of the time provider clocks in at 33 percent (filetime), 133 percent (systemtime) and 5.9 percent (rawtime).

Future Directions

There are a number of potential problems with the current implementation—not surprising considering the complexity of the problem. The code is not intended to be used as-is on any available system as problems might arise due to hardware compatibility issues such as power-saving, CPU speed stepping, and noncontinuous counters. If you find some way to make the provider more reliable even under those conditions, please let me know. You should know your hardware platform before you decide to use the code.

Wrappers for .NET and COM, allowing usage of the time provider from languages other than C++, are also a definite possibility. In fact, I've already got one implementation of the time provider running as a COM in-proc server.

Conclusion

If you now think that you can obtain the system time with an almost arbitrary precision here, just a slight warning: don't forget the preemptiveness of a multitasking system such as Windows NT. In the best case, the time stamp you'll get is off by only the time it takes to read the performance counter and transform this reading into an absolute time. In the worst cases, the time elapsed could easily be in the order of tens of milliseconds.

Although this might indicate you went through all of this for nothing, rest assured that you didn't. Even executing the call to the Win32 API GetSystemTimeAsFileTime (or gettimeofday under Unix) is subject to the same conditions, so you are actually doing no worse than that. In a majority of the cases, you will have good results. Just don't perform anything requiring real-time predictability on the basis of time stamps in Windows NT.

For background information see:
Time Functions
Inside Windows 2000, Third Edition by David Solomon & Mark Russinovich (Microsoft Press, 2000)
Performance Counter Value May Unexpectedly Leap Forward

Johan Nilsson is a system engineer for the Swedish Space Corporation at Esrange, located above the Arctic Circle. He has been developing software using C++ for Windows NT since the release of Windows NT 4.0 and programming for Windows/DOS since Windows 3.1. He can be reached at johan.nilsson@esrange.ssc.se.

Additional resources