Best Practices: API Testing

Josh Poley

Microsoft Corporation

February 2008

Summary: In this article, the author describes proven testing practices that you might find useful in your own tests. (5 printed pages)

Contents

Introduction

Planning

Organization

Priorities

Development

Configuration

Validation

From Here On Out

Introduction

If you were to ask several testers how to write good application-programming interface (API) tests, you would get several different answers. Not only are peoples' styles and approaches different, but often API sets will lend themselves to various methodologies.

That being said, my goal here is to describe some of the practices that have worked well for me while testing the Microsoft Xbox operating system. I hope that the information that I detail here will be beneficial and useful in your own tests.

Planning

Before you jump into writing test code, you will want to spend some time up front to think about what you will test and how you will be performing the tests.

Depending on the complexity of the task, you might want to create a test-specification or design document and have it reviewed by your peers. This document will serve as a road map to your testing and, thus, make your tests easier to write while providing better test coverage.

Organization

How you organize your tests will have an influence on the productivity, effectiveness, and (ultimately) maintenance of your tests. It pays to perform some initial planning and spend the up-front cost to ensure that you have a clean foundation upon which to build. Use the tips that follow to fuel your thinking in this area.

All of the tests for a given API should be in a single file; and it makes sense to name the file after the API under test. This makes it easier for others to find and locate all of the test cases for a specific function. Some methods should be tested in pairs or groups, so that there are times when you will cover multiple APIs in a single file.

In a comment at the top of the file, you should include the declaration of the APIs. This will allow you to reference the parameters and return type quickly. If there are nuances of the API (undocumented or otherwise), it is a good idea also to note them here.

Each test case should be as self-contained and isolated from dependencies as possible. Make it as easy as possible for a developer to copy out an individual test case and insert it into a stand-alone application or into the developer's personal unit-test framework. This implies also that test cases should be as simple as possible. You will want to break down individual cases into the lowest layer. You don't want to use or write a bunch of custom framework code to wrap or group the APIs that are under test.

Test cases should be grouped by the test category (straight-line cases, boundary cases, null inputs, and so forth). This makes it easier for readers and maintainers to scan for a specific test case. Also, it provides for a logical and understandable means to segment the tests. If you have a large collection of tests, it can be confusing to have all of the tests listed one right after the other. It is beneficial to group them visually in the code by using comment blocks to break them apart and signify the current test category.

The order of the test cases should match the test plan (test specification), too. This makes it easier to ensure that all of the cases that are documented in the specification are, in fact, represented in the test code. Alternately, it allows the tester to ensure that the two are kept in synch. All too often, the test plan is ignored after it is initially written; by keeping the order the same, it is easier to synch back and forth between them and keep both up to date.

Try to avoid "test chaining" in your development. A chain is where test N does not perform any cleanup, because test N+1 relies on the result of the first test case—typically, as a setup step. This practice increases the maintenance cost and can lead to a fragile test framework. If a test case must perform some setup actions, have the test perform that work, even if it is a duplication of work that was performed in a previous test case. If it is easy to do, you should run your tests in a different order, or even with some tests run multiple times. Doing so can uncover additional bugs; but test chains will prevent you from doing this effectively.

Priorities

When you first start to develop your test cases, it is fairly common to jump right into some of the invalid or "evil" tests that will (I hope) cause crashes and other interesting problems. Unfortunately, this is not necessarily the best use of a tester's time. It is actually better to focus on the mainline cases first.

Ask yourself, "What are the test cases that must work, and work well?" These are the tests upon which you should focus first. A basic guideline is to identify the most common parameters and conditions that an end developer will use when calling the API, and test these scenarios extensively.

Development

The standard for your test code should be as high as that of the product code. As such, be sure to follow your team's coding convention or style guideline for source code. If your team doesn't currently have one, actively work with your development counterparts to create a comprehensive guideline. Having all of the tests follow the same convention is important for maintenance and readability, and is especially helpful if it is the same guideline as the main product code. Working with the development team here provides a good avenue to collaborate on improvement of the overall quality of the software.

Because your tests likely will be inherited by someone else, be sure to write good clean and well-documented code. If it is not immediately clear what the test code is attempting to accomplish or verify, it needs more comments. As new test cases are added to verify explicit bugs, be sure to add a reference to the bug number that documents the error. If necessary, add comments to describe what the actual fixes were in the product code; this will make very clear what must be covered in the new tests.

The quality of your test code can affect the quality of the product directly. If there are bugs in your test code, you might be testing the product incorrectly and missing important issues. When you compile your test code, convert all warnings to errors and use the most aggressive warning level possible (for example, Microsoft Visual Studio's /W4 and /WX, or GCC's -W (and family) and -Werror compiler switches). In addition, if you have the tools available, static code-analysis applications also can help find common development problems.

Depending on your test, it can be very beneficial to store state information in easily accessible (via the debugger) variables. If your test breaks during stress, you will want to investigate the computer and look at these variables to glean important information quickly without having to dig very deeply. Having previous state information handy is a good idea too, especially if the test case relies on a sequence of events that deal with arbitrary data. That way, when a test fails at a later point, you can still discover some information about the events that transpired in the past.

Configuration

Changing the run-time behavior of your tests is a fairly common practice. The following are some pointers for you to keep in mind when you design your tests:

At the start of your test, log out any system settings, and test parameters that pertain to the runtime environment. From a log file, you should be able to recreate the test's exact configuration and parameters; you never want to guess how the environment was setup when an error occurs. If each test case performs its own configuration setup/initialization, be sure that you are actually testing real-world scenarios; your setup could be placing the test in an unrealistic configuration.

If you have a concept of build-verification tests (BVTs), regression tests, stress tests, and so forth, it is best to have all of the tests for a given API located in the same source file/project, and use a configuration setting to define which set of tests to run. The BVT tests are often just a few key test cases, whereas regression tests and stress tests will encompass a much more extensive set of scenarios.

To aid in tracking down resource leaks, you should make it easy to configure your test project to run a specific set of test cases in a tight loop. If it is not easy to run your tests in this manner, they won't be run in this manner—leaving your components open to leaks. As a bonus, it makes it straightforward to help isolate an observed leak.

Randomized inputs can help you reach a wide variety of states easily; but make sure that your test is smart about the limits, types, and formats of the inputs. It is very important also that the seed (that you must log) is configurable to allow reproducibility when an error is encountered. Also, be sure to log out the exact data that is passed into the API—preferably, in a format that makes it easy to create a stand-alone test case with the exact inputs/states that are used.

Validation

Typically, validating the test's behavior is considered rather straightforward. Unfortunately, however, lack of attention here often can lead to false positives/negatives or non-actionable bugs. When approaching each test case, make the initial assumption that it currently fails, and then prove through testing that it works or doesn't work correctly. To do so, you will want to verify (and then log) a good set of detailed conditions.

For starters, whenever you encounter a failure condition, be sure to log out everything that you require to investigate the failure and file a meaningful bug. For example, the following sample provides much more information than just logging out a basic "test failed" string:

LogFailure("Unexpected result. Received %lu, expected %lu", GetLastError(), result);

A good logging methodology will let you provide a rich collection of useful state information. Consider adding some of the following items to your log output:

  • Write out the file name and line number of the test's source code where the error occurred. This allows users to jump easily to the exact location in the test code where an error was first identified. The compiler's __FILE__ and __LINE__ macros make this very easy.
  • Dump out any resource-utilization metrics that make sense for your tests. Memory usage is a usual suspect, but CPU utilization, network bandwidth, and disk-drive IO might be useful in the context of your tests.
  • Use the logging infrastructure to record performance timing for key scenarios, which can later be analyzed over a period of time to track trends.

Most tests follow a typical "setup, test, cleanup" pattern. Every step in the setup phase should be validated with as much enthusiasm as the actual test itself. If a precondition fails, you should record the error (typically, as a "this test is blocked" message) and exit from the test case. Failing to pay attention to initial failures can lead to false negatives. For example, if you expect your API under test to fail, it might be failing for the wrong reason if a precondition is not set up correctly.

You should run your tests periodically in an environment/configuration in which you expect them to fail. This is a good way to identify tests that provide false positives. For example, if a test is attempting to access a know resource (file path, registry setting, memory location, and so forth), run the test in an environment in which the resource does not exist (or the test does not have permission to access it). If a test case "passes" in this type of setup, it might be making false assumptions and hiding valid bugs.

Testing for assertions and crashes in the API are something that should be done periodically. However, these test cases should be coded up and configured so that they do not run by default; you don't want to halt or cause downstream errors when you run a series of automated tests.

If an API has out parameters, set the initial variable's value to a hard-coded but nonstandard pattern. This will enable you to check easily whether or not the API modified the out parameter. Obviously, it will depend on the specified behavior; however, for every test case, you should check to see if the value was altered, and then verify if it was set to the correct value.

Be sure to check also for over- and underflows of buffers and memory locations that are modified by the API under test. You can create a larger buffer than the API will be modifying, and place known values before and after the expected modified region. Then, verify that these values are not altered. Alternately, you can use read-only memory pages to trap edits that extend beyond the expected range.

Read-only memory also is good for ensuring that APIs do not alter memory that they aren't supposed to alter. An input buffer that is marked as "const" in the C/C++ language is not guaranteed to preserve the memory; it is a compiler-time check, not a run-time enforcement. By marking memory as read-only before passing it into an API, you can quickly discover conditions in which the code is misbehaving.

From Here On Out

Be very careful about when you run tests that have adverse side effects. For example, a test to call CloseHandle() twice on one file handle is a good thing to test, but in an isolated environment; running such code in conjunction with other tests could be disastrous. If another person's test was issued the handle immediately after the first close (as it was released back into the population), the second call to CloseHandle() will invalidate it and cause the other test to start acting on bad data.

Make it easy for the developers to run your tests. Do whatever it takes to provide all of the necessary scripts and configuration files, so that all it takes is a double-click. If it is super easy for the developer to run your tests, that person will be more likely to do so before checking in changes.

Now that you have coded up your tests, schedule a full code review and get feedback from your peers. If your test code itself is buggy, it will be harder to find bugs in the product code. Especially watch out for cut-and-paste errors, as test cases often follow a similar pattern from one case to the next.

About the author

Josh Poley has been a tester at Microsoft since 1998. He initially worked on the very first version of the Passport authentication service (currently called Windows Live ID). In the spring of 2000, Josh moved over and joined a small handful of people who were starting to work on a project code-named Xbox. His initial responsibilities covered various pieces of the low-level operating system (file systems, peripheral communication, and so on). Shortly after the Xbox game console launched in 2001, Josh took over as lead of the Kernel Test Team, and he remained in charge of validating the core operating system throughout the development and launch of the Xbox 360. In the spring of 2007, Josh joined the Zune team, where he currently works on new media devices.