.NET Code Tuning

Make Your Apps Fly with the New Enterprise Performance Tool

John Robbins

This article is based on a prerelease version of Visual Studio 2005. All information contained herein is subject to change.

This article discusses:

  • The inner workings of profilers
  • Flexible features of the EPT
  • A sample app to profile
This article uses the following technologies:
.NET, Visual Studio, C#

Code download available at:EnterprisePerformance.exe(258 KB)

Contents

The Philosophy of Profilers
The Philosophy of the Enterprise Performance Tool
Animated Algorithm
Getting Started with the EPT
Viewing Profiler Data

Fast code is still in vogue. Even though I'm typing this on a machine that has the power and memory to simultaneously control a nuclear power plant, a Mars rover mission, and the air traffic over the American West, and still have plenty of horses left to work on SETI packets in the quest for extraterrestrials, it doesn't mean that developers no longer have to worry about the speed and efficiency of their code. In the old days of Win32® native development, we not only worried about speed but about those nasty access violations (and Global Protection Faults and Unrecoverable Application Errors for you old-timers) on the PC platform. While managed code has eliminated some of the worries, it just means that those performance issues you are experiencing may be a little more insidious than before. The main reason is that with managed code, we don't have the easy views into the runtime that we had in the native days.

Many times I have been working with a client, wondering how I'm going to tackle a vicious performance problem. Of course, none of these performance problems show up on any testing systems; they only show up in real-world production. Since the common language runtime (CLR) is a black box, it's pretty hard to divine what's going on if I want to begin to figure out ways to duplicate the performance problem on a test system. While there are third-party commercial performance tools on the market, most of those tools are too intrusive to even consider using on a production system. That's why I was thrilled to see that Microsoft will be delivering a brand new profiler, the Enterprise Performance Tool (EPT), as part of Visual Studio® 2005 Team Developer Edition. It's the first profiling system that I can really consider using on a production system because it offers some very lightweight means of collecting your performance data. Having led the development of a best-selling commercial profiler, I can appreciate just how hard it is to collect useful profiling data without much overhead.

In this article, I want to introduce the philosophy of EPT and show you how to get started using it. Because of the complexity of a profiler, in a future issue I will discuss how you can use EPT to track down real-world performance problems you may encounter in your co-worker's code (I know your code is perfect!). Please keep in mind that EPT is in beta (I'm using the Burton Beta 1 Refresh bits, version 40607.83) and there may be changes to the UI or to some particular steps before the product ships. Before I jump into EPT itself, I want to spend a little time talking about how profilers work in general so you can better understand what makes the Enterprise Performance Tool so special.

The Philosophy of Profilers

When you are writing a profiler, you can choose from two ways of collecting data: probing and sampling. Both are perfectly valid, but each has its trade-offs. A probe profiler collects data by inserting probes or hooks into the application so the profiler runtime is called whenever the program executes that hook. To place the probes, the profiler needs to instrument the app in the compilation step, rewrite the compiled binary, or instrument the application on the fly. To see an example of a probe profiler approach for .NET-based applications, check out Aleksandr Mikunov's phenomenal article, "Rewrite MSIL Code On the Fly with the .NET Framework Profiling API" from the September 2003 issue of MSDN®Magazine. When I get around to the discussion of EPT itself, you'll see it uses the term "instrument" to indicate the probe approach.

The key benefit to the probe profiler approach is that the inserted probes will always be called when the application executes. This way the profiler runtime will have a complete picture of the run, so building key information such as parent-child relationships between functions is guaranteed to be correct and the profiler can report perfect call trees necessary for you to easily find the call paths that took the longest time. With probe profilers, there's nothing stopping the developers from only inserting probes at the function entry and exit. Additional probes can be placed down in the source-line level so you have a complete picture of the function.

However, some drawbacks come with the detailed view provided by a probe profiler. The first is that the instrumentation scheme can be cumbersome to use and since it's rewriting at the binary level, there's plenty of areas to introduce potential errors. As you can imagine, those probes also take up space, resulting in some code bloat and slower performance. With a fully instrumented application, probe profilers can cause so much of a slowdown that it is almost impossible to run the instrumented binary on a production system, so you truly can't take advantage of the profiler when you need it most.

Sampling profilers, as the name implies, take a snapshot of what's executing in the application at predefined intervals. Most developers don't realize that Microsoft has always shipped a sampling profiler with their development environments. It's called the debugger! If you start debugging your application and every 30 seconds or so break into the debugger, you can note where the application threads are executing to get a pretty good idea of what your app is doing over a run. I've solved many production performance problems by manually doing the work of a sampling profiler.

What makes sampling profilers so valuable is that they have far less overhead than probe profilers. That means you stand a much greater chance of using them on a production system without grinding the server to a halt. The problem with sampling profilers is that it's entirely possible that all the samples taken of your application could show no code at all. For example, if you have a database-intensive application, all the samples could be from inside the database access assemblies. Finally, traditional sampling profilers that only grab the currently executing instruction for each thread make it much more difficult to determine the parent-child relationship between methods, so determining the worst performing code path is much more difficult.

The Philosophy of the Enterprise Performance Tool

With an understanding of how profilers operate, I can discuss the tack taken by EPT. Simply put: it's both a sampling and a probe profiler (which Microsoft calls instrumentation). The idea is that you'll start looking at your application performance with the sampling profiler to get the general performance characteristics so you can start focusing on the hotspots in your application. Once you have an idea of the assemblies that have some issues, you can turn to instrumented profiling to look at the specific problem areas in order to fix them. Of course, if you're doing unit performance testing, there's nothing stopping you from jumping right to instrumenting just the particular modules to look at their performance in a focused scenario.

Part of what makes the EPT sampling profiler interesting is that there are numerous items which you can use to trigger a sample. The default sampling point is clock cycles and is probably what you will almost always use. The default is to take a sample every one million clock cycles, but you can change the sampling number to any value you'd like, though the smaller you make it the more overhead EPT will incur. For your production servers, you can set the number to something quite high, like five million, to keep the overhead reasonable without completely destroying usability in the process. As you would expect, sampling every five million clock cycles will mean you need to run your application for quite a while to get a good distribution of samples across your hot spots.

If you have applications that are using lots of memory, you can choose to have the EPT sampling profiler trigger on page faults instead. That way you'll get your performance snaps when data is being swapped out of RAM, and you can see who's doing the pushing. If your initial profiler runs indicate that you are spending a great deal of time in areas outside your code, you can tell the profiler to do its sampling based on system calls instead. If you are profiling a heavily multithreaded application, this sampling statistic will snap the data when you are transitioning from user mode to kernel mode, which indicates that you have threads that are potentially blocking on kernel objects more than necessary. The final values you can use for the sampling trigger are the various performance counters supported by the CPU, such as branch counts or cache misses. This is an advanced option that very few of you will really need, but if you do need that data, it's nice to know it's there.

Those busy Redmontonians also solved the call stack problem—one of biggest problems standing in the way of a useful sampling profiler. As I mentioned earlier, most sampling profilers simply snap the currently executing instruction when taking a sample. Microsoft figured out how to incorporate a superfast stack walk into their sampling profiler portion so that you'll get the benefits of the sample with the benefits of knowing how you got there at the time of that sample. That makes correlating those snaps back to your code infinitely easier.

Before I discuss an application you can profile, I want to mention a few things that you'll hopefully find interesting. The first is that if you think that Microsoft has started from scratch on the performance tool, you're only partially right. Inside Microsoft, development teams have been using the predecessors of EPT, called Call Attribute Profiler (CAP), which used instrumentation, and the Low Overhead Profiler (LOP), a sampling profiler. Since Microsoft developed these tools to collect performance information on applications such as the operating system and the entire Office suite, they shouldn't even break a sweat with your application. Having used the predecessors of EPT, I can tell you how much easier the public versions are to use. And they have some extremely interesting features, which I'll discuss shortly.

The second item of interest has to do with the technologies supported by EPT. While some of you may be thinking that you won't be able to use EPT with your Win32 application or native code since Microsoft is very focused on the .NET Framework, the EPT team has in fact committed to supporting all Win32 native applications along with .NET code. That means that you have full sampling and instrumentation support no matter what technology you use, be it ASP.NET, Windows® Forms, MFC, or Win32 Services. You'll see that from Visual Studio .NET there's no difference in using EPT across technologies.

The actual EPT setup is trivial; simply select Enterprise Performance Tools from the Enterprise Tools tree control in the Visual Studio .NET setup. Of course, since you know that EPT is still a beta product your immediate reaction might be to run Virtual PC and safely contain everything there. However, in order to do the sampling profiling, EPT uses a kernel mode device driver to respond to the CPU performance counter interrupts and, unfortunately, Virtual PC does not implement the counters. It also does not emulate an Advanced Programmable Interrupt Controller (APIC), both of which are necessary for the kernel device driver to do its magic. The good news is that if you don't have an extra machine on which to install EPT, you're not totally out of luck because the instrumentation profiler still works. If you don't have a spare machine on which to install EPT, that's a good excuse to get your company to buy you another machine.

Animated Algorithm

To learn about any tool, you need a decent sample application that will allow you to best utilize that tool. At this point in the beta cycle, EPT does not come with any samples, but I had a perfect profiler sample already on my hard disk. A while ago, I was trying to figure out how to use multithreading in a Windows Forms application, so I wrote a great little program called Animated Algorithm that animated numerous sorting algorithms in real time. Figure 1 shows my sample application ready to sort.

Figure 1 Animated Algorithm in Action

Figure 1** Animated Algorithm in Action **

Animated Algorithm lets you choose from among 15 different sorting algorithms in the combobox on the form. The Options menu lets you choose the sleep time between each element swap or set so you can slow down the graphical update.

I wrote Animated Algorithm a while ago using the Microsoft® .NET Framework version 1.1 so you won't find any fancy generics or the new BackgroundWorker item in the code. The sorting algorithms in the NSORT assembly came from an outstanding article posted on The Code Project (see Sorting Algorithms In C#) by Jonathan de Halleux, Marc Clifton, and Robert Rohde, which encapsulates the algorithms in a common structure so you can easily replace the class that does the element swapping and setting. Because of their very nice architecture, all I had to worry about was the UI portion.

For the rest of this article, I'll profile the Animated Algorithm program. It would be really cool if the EPT team shipped this as a sample application with the product. (Hint, hint.)

Getting Started with the EPT

In Visual Studio 2005 Beta 1, it certainly isn't obvious where you'll find EPT. EPT starts when you start the Performance Wizard, which you can find under the Tools menu, and is present both with and without a project open. Keep in mind that the Performance Sessions created by the Performance Wizard are not part of your projects; they're actually separate files that have their own IDE window called Performance Explorer. You can open the Performance Sessions you create by selecting the PSESS files from the File | Open dialog.

If you do not have a project open when stepping through the Performance Wizard, the resulting Performance Session is associated with the binary you specified. However, in the beta, you must have the associated project open when you specify a binary to run. I just wanted to mention this little twist because it confused me the first time I ran into it.

Once you start the Performance Wizard, the first screen you're presented asks you to choose the application you want to profile. If you have a project open that generates multiple assemblies (like Animated Algorithm), you can only pick one from the Wizard. If you want to do sampling, it's fine to pick just the one assembly because EPT sampling profiles all loaded assemblies (including those from the Framework Class Library). However, if you want to do instrumentation profiling on multiple assemblies, the Performance Wizard only selects the one assembly, so you'll need to specify the other projects or assemblies in the resulting Performance Session in the Performance Explorer. I'll show you how to do that in just a moment.

After you've chosen the assembly or project that you want to use in your Performance Session, you have to pick the profiling method. At any point in Performance Explorer you can switch between sampling and instrumentation to your heart's content; the choice you make in this page of the wizard indicates only what you want to do initially. After selecting the profiling method, the wizard is basically complete. For the final release of EPT, you'll have more options for specifying additional information in the Performance Wizard. The final release will also let you create performance sessions directly from the Performance Explorer.

Figure 2 shows the window of Performance Explorer right after completing the Performance Wizard steps to create an instrumented run of the AnimatedAlgorithms project. To add the output binaries for the other project, right-click on the Targets folder and select the Add Target Project from the context menu. If you want to add specific binaries not associated with the project, select the other option, Add Target Binary. If you've selected Add Target Project, you can choose the other projects from the open solution in the resulting dialog.

Figure 2 Performance Explorer

Figure 2** Performance Explorer **

If you've selected an instrumented run, which is denoted by text in the dropdown listbox below the green start arrow, the binary instrumentation takes place before the program executes. If you don't want to instrument a particular binary for a run, right-click on the binary and uncheck the Instrument Binary menu option.

If you've selected sampling profiling and you want to attach to a running project, clicking the Attach/Detach button, the diagonal arrow to the right of the Start button, will bring up the Attach Profiler to Process dialog. With EPT you can attach to as many processes as you need in order to get a picture of your application. The Attach Profiler to Process dialog also lets you detach profiling from specific binaries. In a future issue of MSDN Magazine, I'll be talking much more about attaching to existing processes, especially for ASP.NET performance tuning.

The last button at the top of the Performance Explorer window is the ubiquitous Properties button. Before you start your profiling runs, you'll probably want to make a trip into the Performance Sessions properties to set a few key properties. The first is on the General tab and is the location where you'll want Performance Reports stored for the Performance Session. When profiling projects, the default is to store the reports in the same directory as the solution. But it's better to put your Performance Sessions and their corresponding reports into their own directories so you can more easily store particular sets of runs. That also makes it easier to analyze before and after scenarios to see the impact of the code changes you're making.

On the General tab you can also flip between instrumentation and sampling profiling, which changes the value displayed in the Performance Explorer. In my performance tuning, I like to keep a particular session devoted to a single type of profiling so there's no confusion surrounding the reports. There's nothing stopping you from creating hundreds of different Performance Session files for all sorts of particular scenarios from profiling type to single binary instrumentation. I'll whet your appetite a bit by mentioning one last item on the General tab, the very enticingly named Managed Allocation Profiling. After I get through general profiling, I'll come back to that one.

Another interesting tab on Performance Session property page is the Sampling tab (see Figure 3). Here is where you tell EPT what type of sampling you want to do. As I mentioned earlier, you have tremendous control over exactly how you want to sample.

Figure 3 Various EPT Sampling Counter Options

Figure 3** Various EPT Sampling Counter Options **

When it does a profiling run, EPT instruments the binaries in places where they exist on the hard disk. If you'd like to have the instrumented binaries moved to a different location, select the Binary tab in the Performance Session property page, then check the Relocate Instrumented Binaries (which has absolutely nothing to do with REBASE-style relocation), and specify where you want the changed binaries to go.

The Instrumentation tab allows you to specify programs you want to run before and after instrumentation takes place. This can be helpful if you need to perform additional tasks on the instrumented binary such as moving it into the Global Assembly Cache or to a specific location on the Web server. The Advanced tab is undocumented in this beta release. Finally, with the Counters tab you can tell EPT to collect additional data from the CPUs in the system such as L2 or L3 Cache read misses. Obviously, these are very advanced options that only a few developers will need, but if you do need them, they can make all the difference in the world.

Before I move on to viewing Sampling data, I want to mention that the Performance Explorer window can open as many Performance Sessions as you need. This is extremely helpful when you want to look at specific before and after scenarios or would like to perform separate test runs with different instrumented binaries. When you have multiple Performance Sessions open, you should make sure to right-click on the particular session and select Set as Current Session to have that session's setting execute and the reports filed in its report node.

Viewing Profiler Data

After getting your Performance Session set to what you want to do, it's time to start profiling. I'm going to start by performing a sampling profile on Animated Algorithm to see if I can find some hotspots. The key to getting good data out of sampling is to do a long run. For Animated Algorithm, I'll run each of the 15 sort algorithms twice and I'll leave the sampling set to the default one million clock cycles.

After you've completed a run, EPT puts that run's report in the Reports folder of the Performance Session. EPT collects raw performance data and streams it out to the report file during the run with no analysis. That way you avoid all that overhead when running your application, but you will pay the price in large report files. The sampling report file for the run I just finished is 3.70MB and it took about three minutes to complete. Make sure you have lots of disk space when running EPT.

All the data analysis, which entails building the call stacks and calculating the performance numbers, occurs when you open the report file itself. For the beta, there can be some slowdowns in opening the file. It may look like the view is in an infinite loop, but if the progress bar is moving in the report window, just be patient and the file will eventually pop up.

The first view in any profiling run is the Performance Report Summary, shown in Figure 4 for the Animated Algorithm sampling run just completed. As expected, sampling takes place across the whole application so what you're looking at is what you'll see in your applications as well: most of the work takes place inside the Framework Class Library or the operating system. If you do see one of your methods in the sampling Summary view, you are probably looking at a performance problem.

Figure 4 EPT Sampling Performance Report Summary

Figure 4** EPT Sampling Performance Report Summary **

Taking a quick glance at Figure 4, you're probably wondering about the difference between Inclusive Sampled and Exclusive Sampled. Exclusive Sampled means that the method was at the top of the stack when the sample was taken. In other words, it's the function that's currently executing. Inclusive Sampled means the function appears in the call stack when a sample is taken. Thus, the inclusive method is a caller of the currently executing method.

In a sampling scenario, the more often a method appears in the call stack (Inclusive Sampled), the more time that function is spending in execution, so that's where you want to focus your attention for tuning. In the case of Exclusive Sampled functions, a function that appears there frequently indicates that the function is being called quite often, but its execution may actually be quite fast. For a graphics-heavy application like Animated Algorithm, I fully expect that something from GDIPLUS.DLL will be near the top of the list just as shown. In Figure 4, the function at the offset 0x5B8D in GDIPLUS.DLL , which happens to be the FLOOR function, is called all the time to calculate where to display something on the screen. When you look at your performance runs, make sure to set up your symbol servers to get the best information possible. In writing this article, I was using an unreleased version of EPT and the symbols were not available.

Before I jump into the other views, I want to instrument Animated Algorithm and do the same run I did for the sampling profiler in order to show the Performance Report Summary for an instrumented run. As you can guess, instrumented runs generate much more data than sampling runs. For the run, I instrumented all five assemblies in Animated Algorithm and ended up with a 375MB session file.

The main difference between sampling and instrumentation data is that sampling looks across the whole process space and will show calls inside the Framework Class Library or operating system (in other words, places where you don't have source code). Instrumentation, on the other hand, only looks at the application and the methods you directly call on non-instrumented modules. For example, if you have a "Hello World!" application whose Main only calls Console.WriteLine, you'll get timing information for any work in Main as well as timing information for the length of the Console.WriteLine, but you won't get any detail into the Console.WriteLine method.

Figure 5 shows the Performance Report Summary for the instrumented run. The first table, Most Called Functions, shows the heavily used functions. The first column in the table is mislabeled as time; it really represents the number of calls to the function. The percentage column shows the percentage of total calls that were made to this particular function. In most runs, you'll see Framework Class Library or operating system functions here. If you are seeing some from your own code, you'd better understand why you are calling that particular function so often.

Figure 5 Summary for Instrumented Run

Figure 5** Summary for Instrumented Run **

The Functions with Most Individual Work table lists those methods that take the most time just to execute the function, without any additional function calls. This is also known as the exclusive time for the function. For the beta, the Time column unit is clock ticks. For the final release, the units will be milliseconds. However, I think that the actual raw units for a performance run are not useful for analysis. The most important numbers are the percentages. When looking at performance problems you want to know which method took the longest as compared to all the other methods in the application. You can't look at two numbers like 3519639455 and 3492589504 with any hope of comparing them. Fortunately, the table includes the percentage, and my vote is for the EPT team to drop the raw data from the chart.

The last table, Functions Taking Longest, shows what's sometimes called wall time, stopwatch time, or elapsed time for a method. The profiler records the time at the entry point and exit point of a method and subtracts the values. This number accounts for all child methods that are called, all context switches, and sleeping done by the method. In Figure 5, you can see that System.Windows.Forms.Application.Run took the longest, as you would expect for a Windows Forms application. While many developers concentrate on the exclusive time, that's only a small piece of the whole performance picture. If your method is calling out to a database or making a Web service call, the thread your method is running on will block while waiting for those calls to return data, thus moving the thread off the CPU. By keeping a close watch on the elapsed time for your method, you'll find the parts of your code that are slowing down your application.

While the Summary views are nice, you'll be most interested in seeing where your code stacks up with the rest of the system for sampling runs or against the other methods in your application for instrumentation runs. This is the bailiwick of the Function view, which is selectable by clicking on the Function button on the bottom of the report window. You can double-click on any method in a Summary view to jump to the Function view as well.

For sampling runs, the Function view shows you the list of all function captures in at least one inclusive capture. For instrumented runs, you'll see all the instrumented methods called as part of the run. No matter what type of profiling you are doing, there's quite a bit of data displayed in the Function view so you can get a handle on how your code looks.

The sampling Function view shows the Inclusive Samples and Exclusive Samples columns by default. Since I like the percentage numbers, I right-clicked on the column headers to add Inclusive Percent and Exclusive Percent to the column headers as well. If you are sampling a multiple-process system, you may want to include other columns such as Process Name or Process ID so you can identify which method sampling goes with which process. You can also set column headers in the instrumented Function view, but you'll have a different set to choose.

When analyzing a sampling run in a Function view, I like to glance at the first couple of pages in the Function view sorted by the Inclusive Samples column to get an idea about what's been executing. If I don't see any of my methods in the first couple of pages, I right-click in the Function view and select "Group by Module" so I can get the tree report view. Sorting by a particular column will sort correctly when you have the functions grouped by module—a nice feature.

For instrumented runs, the Function view has many more columns to display. If you happen to have a 40-inch monitor, you should be able to see them all without maximizing the Visual Studio .NET window. For the rest of us, the best way to look at the Function view is to press Alt + Shift + Enter for full-screen mode.

In the columns, instrumented runs in the Function view use the "inclusive" and "exclusive" terminology that I explained earlier. However, there's another term thrown in the mix: Application. The elapsed time, as I mentioned, is the total time from one instrumentation point to another, regardless of any context switches the thread may make. The idea with Application time is that EPT will extract out the time spent in those context switches so you are seeing the time that your code was actually executing on the CPU. Figure 6 lists the definitions of the not-so-obvious columns you'll see in the instrumented Function view. You may want to tape it to your monitor until the help comes online for EPT.

Figure 6 Instrumentation Function View Columns Definitions

Instrumentation Function View Column Name Definition
Elapsed Exclusive Time The execution time for the code in the method only (excludes all child function times). This includes the time the method blocked or was otherwise moved off the CPU (context switch).
Application Exclusive Time The runtime for just the code in the method. This excludes all child methods called and any time spent off the CPU.
Elapsed Inclusive Time The execution time for the code in the method, all child methods called, and all context switches that occurred. This is just like starting a stopwatch at the start of a method and stopping it at the end.
Application Inclusive Time The execution time for the code in a method and all the child methods called, with no time added for when the thread was off the CPU.
Exclusive Transitions The number of kernel mode transitions caused by the code in the method only.
Inclusive Transitions The number of kernel mode transitions caused by the code in the method and any child methods called.

When looking at the Function view for an instrumented run, I add the columns to see the percentage values of the various timings, remove the raw number time columns, and add the two transition columns. That gives me a clearer view of the run. The first column I sort on is the % Application Exclusive Time as I want to see which function is doing most of the work. Since the instrumentation puts in probes around all child calls a method makes, it's entirely possible that you'll see a Framework Class Library or operating system on top of the list. In fact, with my Animated Algorithm run, System.Drawing.SolidBrush.ctor and System.Drawing.Brush.Dispose are listed as number one and two in the percentage of Application Exclusive Time, with 14.982 percent and 14.867 percent, respectively. The first function I wrote was the third Bugslayer.SortDisplayGraph.SorterGraph.UpdateSingleFixedElement at 12.217 percent, which draws an individual bar on the graph. Depending on the application type, I may choose to sort by other columns when looking in the function view. If there are Web service or database calls, I'd be looking at the % Elapsed Inclusive Time so I could see if there were any particular methods involved in blocking for long periods of time. For an application like Animated Algorithm, I'd also look at the percentage of Application Inclusive Time.

Based on those numbers in my instrumented run, I was curious to find out who was making those calls to the SolidBrush methods, so I right-clicked on the .ctor method and chose Show in Caller/Callee view so I could look at who was calling that method. This view, which is also available for sampling profiles, lets you see at a glance all the callers to a target method and all the methods that the target calls.

Because the .ctor method was not instrumented, the Caller/Callee view will not show any callees, but it will obviously show the callers. I double-clicked on the only caller, which happened to be the UpdateSingleFixedElement method that had the third highest Application Exclusive Time and had the view shown in Figure 7.

Figure 7 EPT Caller/Callee View

In Figure 7, the dropdown combobox in the middle of the view is the target method, in this case UpdateSingleFixedElement. The grid above the method contains all the callers of the target method (the callers). The grid below the target method contains all the methods the target method called to do its work (the callees). If you want to look at who called a particular caller, you double-click on that caller method, it becomes the target method, and you'll see the original target method drop down into the callees section. In essence, you just have walked up the stack once.

Just based on the view in Figure 7, you can probably pick out a potential performance problem. Animated Algorithm didn't seem to have any glaring performance problems, but the fact that the SolidBrush .ctor and Dispose are taking up so much time and are both being called inside a method, UpdateSingleFixedElement, (that's called 351,872 times) indicates I'm doing something stupid by creating that brush every time through that function and should be caching it off. As I get into analyzing code with EPT in a future issue of MSDN Magazine, you'll see some other problems with Animated Algorithms as well.

The last of the common views into your data is the Callstack view. Here is where you can see, in a more hierarchical manner, the call stacks you looked at in the Caller/Callee window. For sampled runs, you'll see quite a few entries at the top level in the Callstack view because each of those represents a unique point where there was an exclusive sample. As you expand items in a sampling run, you'll also see that occasionally there are items at the same level that indicate that the function at the root had multiple call trees leading back to it. The item shown in the root position is the top of the stack.

For instrumented runs, the Callstack window will have root elements for each thread in the application. As Animated Algorithm only has two threads, you'll only see two items at the tree-root level. In the Callstack view you can see the absolute call stacks, from the first method instrumented down to the last so you get a real picture of how your application executed. Many times I've been surprised at the gap between what I thought the code did and what it does in reality.

You can spend a great deal of time analyzing your code in the Callstack window. When looking at a particular trail through your application, you can get rid of a lot of noise by selecting a particular node you're interested in going down, right-clicking, and choose the Set Root menu option. In Figure 8 I wanted to look at all the calls made by NSort.SwapSorter.Sort, so setting it as the root got the UI thread out of the way.

Figure 8 EPT Callstack View

In a future issue I'll talk more about the final two tabs in the EPT display area, Trace, and Type. The Type view, is where you can look at the objects you've allocated in an application. It works in the beta version. Back when I was discussing the Performance Session properties, I mentioned on the General tab that there's a Managed allocation profiling section. If you select the Allocations-only radio button, EPT fills in the type view. In the beta, the report looks similar to the reports in numerous other tools, but the data collection does not seem to have as much overhead as I've seen in other tools. Finally, to understand what the Enterprise Performance Tool team is thinking and to learn more about it, make sure to check out their blog at blogs.msdn.com/profiler.

John Robbins is a cofounder of Wintellect, a software consulting, education, and development firm that specializes in Windows and the .NET Framework. His latest book is Debugging Applications for Microsoft .NET and Microsoft Windows (Microsoft Press, 2003). You can contact John at www.wintellect.com.