Test Run

Determining .NET Assembly and Method References

James McCaffrey

Code download available at:TestRun0603.exe(133 KB)

Contents

.NET Assembly Dependencies
Outputting .NET Assembly Dependencies
.NET Method Dependencies
Comments and Discussion

Before you can test any software system effectively, you must understand the system under test. If the system includes the Microsoft® .NET Framework, understanding the system under test includes understanding its assembly and method dependencies. This is not particularly enjoyable or challenging from a conceptual point of view, but it is absolutely indispensable.

Many tools are available to uncover .NET assembly and method dependencies, but the majority are GUI-based and not particularly automation-friendly. And these tools are not very customizable. Colleagues of mine who write software test automation in a .NET-based environment use two custom-built tools to programmatically determine assembly and method references. When I teach .NET test automation classes to software engineers, I often pull out these tools, AssemblyRefs.exe and AnalyzeMethodCalls.exe. So in this month's column I'll present two very lightweight, automation-friendly, highly customizable tools for finding .NET assembly dependencies and .NET method dependencies.

Figure 1 Finding .NET Assembly Dependencies

Figure 1** Finding .NET Assembly Dependencies **

If you can't wait to get started, take a look at Figure 1. The sample run analyzes an application named RunScenario.exe for assembly dependencies and saves output in XML. AssemblyRefs.exe's source is short (less than one page of code) and can easily be modified to suit your needs. Figure 2 shows a tool that finds .NET method dependencies. This sample run analyzes a class library named CombinationsLib.dll and sends output directly to the command shell. AnalyzeMethodCalls.exe is also small (again, less than one page of code) and is easily extended.

Figure 2 Finding .NET Method Dependencies

Figure 2** Finding .NET Method Dependencies **

There is nothing especially difficult about these two tools but they raise some very interesting design issues. And while this column is aimed primarily at those with beginning and intermediate skills, even if you have advanced coding skills I bet you'll find some interesting and useful new techniques.

.NET Assembly Dependencies

Let's start by writing a small, flexible tool that determines .NET assembly dependencies. Assemblies are the fundamental building blocks of all .NET-based applications. An assembly can be located in one physical file (an .exe or .dll), or it may span several physical files. Assemblies contain several pieces of data, including an assembly manifest (which contains metadata about the assembly, such as version number), type metadata (which contains information about classes, enumerations, and so forth), the actual intermediate language code that implements the assembly, and a set of resources (typically configuration settings, images, and so on). As the MSDN® documentation points out, functionally, assemblies form a security boundary, a type boundary, a reference-scope boundary, and a version boundary. Therefore, if you are testing code, you must be able to determine which assemblies are referenced by the assembly under test. Without this information you cannot create meaningful test cases. This is also useful for debugging complex systems that will not build properly.

Now that you are motivated, let's write the tool. The key to a .NET assembly reference tool is the Assembly.GetReferencedAssemblies method in the System.Reflection namespace. GetReferencedAssemblies returns an array of type System.Reflection.AssemblyName that contains information about each of the assemblies referenced by the calling Assembly object. An AssemblyName object is a wrapper around the full display name of an Assembly object. For example, for the assembly RunScenario.exe shown in Figure 1, the AssemblyName's string representation is:

RunScenario, Version=1.0.1880.30104, Culture=neutral, PublicKeyToken=null

At first glance it seems as if the problem of generating all assembly references is solved because you can makes calls along the lines of the following:

Assembly a = Assembly.LoadFrom( pathToAssembly); AssemblyName [] an = a.GetReferencedAssemblies(); foreach (AssemblyName name in an) Console.WriteLine(name.ToString());

This is true up to a point, but there are two problems. First, each of the referenced assemblies can reference other assemblies, which may in turn reference other assemblies, and so forth. The second problem is that this approach is limited to the type of output you can produce (which makes it automation-unfriendly). What you need is code that will get all levels of assembly references, and also will provide a way to easily produce multiple output formats.

Figure 3 Assembly Dependencies Tree

Figure 3** Assembly Dependencies Tree **

The solution comes right out of a CS101 class. The assembly references form a tree structure with the assembly under test as the root node. Figure 3 shows a hypothetical set of .NET assembly dependencies. The assembly under test is node A. Node A references assemblies B, C, and D, and so forth.

You need to traverse the tree structure, storing each assembly reference. How? You could write a recursive routine. But recursion has certain disadvantages—I will have more to say about this when I discuss Microsoft hiring interviews later in this column. A better approach is to use a non-recursive algorithm that stores assembly references on a stack:

create empty stack push root assembly reference(s) onto stack while (stack is not empty) { pop the top assembly reference display info about the popped item get all assembly references of the popped item foreach reference push the reference onto the stack, "right-to-left" }

This standard algorithm will traverse the reference tree in a "pre-order" traversal. If the algorithm is applied to the tree structure in Figure 3, the output will be: A, B, C, E, F, I, J, D, G, H. This output format doesn't really convey the hierarchical structure of the assemblies, so you need to store additional information that indicates which level each assembly node is on. If you define the top-level node A to be at level 0, then nodes B, C, and D are at level 1; nodes E, F, G, and H are at level 2, and nodes I and J are at level 3. You can use a tiny Info class to represent each assembly's name and level:

class Info { public readonly string Name; public readonly int Level; public Info(string name, int level) { this.Name = name; this.Level = level; } } // class Info

Since this class just wraps data fields, you can use a struct instead of a class if you like. To make the tool more general, instead of popping items off the stack and displaying them directly, you can write a routine that stores the assembly Info objects in an ArrayList or a List<Info> in a pre-order fashion. With this approach you can iterate through the resulting objects and display the assembly dependencies as text, or save the info to an XML file, or use another method that meets your needs. Figure 4 shows a short method that accepts as input a path to a .NET assembly file and returns an ArrayList that holds Info objects (assembly names and levels) of referenced assemblies in pre-order fashion.

Figure 4 Storing Reference Dependencies Information

static ArrayList StoreAssemblyReferenceInfo(string targetAssemblyPath) { Hashtable found = new Hashtable(); // used to avoid dupes ArrayList results = new ArrayList(); // resulting info Stack stack = new Stack(); // stack of Assembly Info objects // store root assembly (level 0) directly into results list Assembly assembly = Assembly.LoadFrom(targetAssemblyPath); stack.Push(new Info(assembly.ToString(), 0)); // do a preorder, non-recursive traversal while (stack.Count > 0) { Info info = (Info)stack.Pop(); // get next assembly info if (!found.ContainsKey(info.Name)) { found.Add(info.Name, info.Name); results.Add(info); // store it to results ArrayList Assembly child = Assembly.Load(info.Name); AssemblyName[] subchild = child.GetReferencedAssemblies(); for (int i = subchild.Length-1; i >= 0; --i) { stack.Push( new Info(subchild[i].ToString(), info.Level+1)); } } } return results; }

The StoreAssemblyReferenceInfo routine in Figure 4 uses an auxiliary Hashtable object to keep track of assemblies that have already been explored in order to avoid duplication. The loading routine begins by loading the assembly under test from its path supplied as the input argument, using the Assembly.LoadFrom method. You get this root assembly object's name, create an Info object with level 0 from the name, and add it to the stack of Info objects to be processed.

Now that the stack of Info objects has been seeded, you follow the traversal algorithm I described earlier. You pop the top assembly Info object off the stack, load it into memory using the Assembly.Load method, get all of its referenced dependencies using the GetReferencedAssemblies method, and push their associated Info objects onto the stack. This process continues until all unique instances of referenced assemblies in the assembly dependency tree have been accounted for. For the dependency tree in Figure 3, after StoreAssemblyReferenceInfo finishes, the return ArrayList object will hold {A 0}, {B 1}, {C 1}, {E 2}, {F 2}, {I 3}, {J 3}, {D 1}, {G 2}, {H 2}.

Figure 5 Graph Structure

Figure 5** Graph Structure **

.NET assembly references can actually form a graph structure rather than a tree (see Figure 5). Because the StoreAssemblyReferenceInfo method does not store duplicate values, it will create a virtual tree structure rather than a graph structure. In practice, this approach has proven to be useful and effective. If you need to, you can easily modify the code presented here to store assembly dependencies as a graph structure. Instead of using an Info object that contains just an assembly name and the assembly tree level, you would create a Node object that stores the assembly name (as a string) and all assembly references (as an ArrayList for example). Because graph structures do not have standard traversal orderings, you can simply store all the Node objects in an ArrayList object and iterate through the list.

Here it's assumed that all referenced assemblies can be successfully loaded. The Assembly.Load(assemblyString) method will throw one of the following exceptions if the assembly cannot be loaded:

ArgumentNullException - assemblyString is a null reference. FileNotFoundException - assemblyString is not found. BadImageFormatException - assembly assemblyString is not a valid assembly. SecurityException - The caller does not have the required permission.

You may want to catch these exceptions, print a warning message, and continue execution even though your resulting dependencies output will not be accurate.

Outputting .NET Assembly Dependencies

With the assembly dependencies information (name and level) saved in pre-order fashion in an ArrayList object, you have great flexibility in output. Suppose you just want a simple text display to the command shell. You can write this method:

static void DisplayReferencedAssembliesAsText(string targetAssemblyPath) { ArrayList results = StoreAssemblyReferenceInfo(targetAssemblyPath); foreach(Info info in results) { Console.WriteLine(info.Name.PadLeft( info.Name.Length + info.Level * 2, ' ')); } }

And call it along the lines of:

string path = @"C:\Here\There\TargetAssembly.exe"; DisplayReferencedAssembliesAsText(path);

You simply use the assembly-level information to format output, indenting twice as many spaces as the level number. If applied to Figure 3, the output on the console shell will resemble:

A B C E F I J D G H

Suppose you want more sophisticated output. SaveReferencedAssemblies (see Figure 6) takes an ArrayList of assembly Info objects stored in pre-order fashion and produces an XML file.

Figure 6 Outputting Assembly Dependencies as XML

static void SaveReferencedAssembliesAsXML( string targetAssemblyPath, string fileName) { ArrayList results = StoreAssemblyReferenceInfo(targetAssemblyPath); XmlTextWriter xtw = new XmlTextWriter(fileName, Encoding.Unicode); xtw.Formatting = Formatting.Indented; xtw.WriteStartDocument(); xtw.WriteStartElement("Assemblies"); // write XML-required root node int pendingEndTags = 0; for (int i = 0; i < results.Count-1; ++i) // all but last node { Info curr = (Info)results[i]; Info next = (Info)results[i+1]; if (next.level == curr.level) // sibling up ahead { xtw.WriteStartElement("Assembly"); xtw.WriteAttributeString("Name", curr.name); xtw.WriteEndElement(); } else if (next.level > curr.level) // child ahead { xtw.WriteStartElement("Assembly"); xtw.WriteAttributeString("Name", curr.name); ++pendingEndTags; } else if (next.level < curr.level) // last child { xtw.WriteStartElement("Assembly"); xtw.WriteAttributeString("Name", curr.name); xtw.WriteEndElement(); for (int j = 1; j <= curr.level - next.level; ++j) { xtw.WriteEndElement(); --pendingEndTags; } } } // take care of pesky last node Info last = (Info)results[results.Count-1]; xtw.WriteStartElement("Assembly"); xtw.WriteAttributeString("Name", last.name); xtw.WriteEndElement(); for (int j = 1; j <= pendingEndTags; ++j) xtw.WriteEndElement(); xtw.WriteEndElement(); // write end of required XML root node xtw.Close(); }

The SaveReferencedAssemblies routine traverses through the ArrayList of assembly Info objects looking at the current node and the next node in the list. If the level of the next node is equal to the level of the current node, that means the current node has a sibling, so you can emit the complete node as XML with a start tag and an end tag. If the level of the next node is greater than the level of the current node, the current node has children, so you emit only the start node of an XML element. And if the level of the next node is less than the level of the current node, that means the current node is the last node in a group of siblings, so you need to emit the entire node and also emit end element tags. The number of XML end element tags necessary will be equal to the difference between the levels of the current and next nodes.

The routine finishes up by taking care of the very last node in the ArrayList of assembly Info objects. Because it is the last node, it will be a leaf node (no children). You can emit the node as XML with start and end element tags. Now there could be pending XML end element tags. The routine keeps track of this with the local variable pendingEndTags, and you emit that number of end element tags if required. The output from SaveReferencedAssemblies is illustrated in Figure 1.

Note that the techniques I've demonstrated up to this point are based on metadata. Only assemblies that are statically referenced will be found by these traversal techniques. So, for example, if a configuration file stored the name or location of an assembly to be loaded at run time and used in a late-bound fashion, these techniques would not account for those assemblies.

.NET Method Dependencies

Now I'll show you a quick, automation-friendly, flexible, and easy way to determine .NET method dependencies. Knowing what methods are called by your system allows you to create meaningful test cases, and also allows you to identify "dead code"—methods that are not called at all. Manually reading through source code is usually not a viable option because in any realistic system there will just be too much code. My goal here is to list all the methods in a .NET assembly unit, and for each method list the methods it calls. At first thought this would seem to be a pretty easy task. I wanted to produce an especially lightweight (less than one page of code) and easily customizable tool.

One obvious approach you might try is to parse through the source code of the system under test. But if you've ever tried to do this, you'll know it is very tough. Another approach is to attempt to write a tool that directly parses the binary assembly under test file. This turns out to be very, very difficult. Yet another approach several of my colleagues have tried is to use the ildasm.exe tool, point it to the system under test, instruct it to produce text file output using the /OUT command-line switch, and then parse through the resulting intermediate language (IL). This is possible but unfortunately the text output from ildasm.exe was not designed for automation and can be difficult to parse.

There is another way. We could manually parse through the IL stored in the assembly, but, as noted, that would be difficult.

However, others have already done this task for us. A great way to achieve easy method dependencies output is the elegant .NET Reflector tool, written by Lutz Roeder. It is available at www.aisto.com/roeder/dotnet. The Reflector tool is widely known and used. In fact, reflector.exe and ildasm.exe are two of the best-known GUI tools for .NET system discovery. What is not well-known or well-documented is that Reflector contains a great API set that dramatically simplifies working with .NET assemblies. The code that produced the output shown in Figure 2 is listed in Figure 7.

Figure 7 Finding Method Dependencies with Reflector

static void Main(string[] args) { ArrayList listMethods = new ArrayList(); try { Console.WriteLine("\nStart method reference analysis\n"); IServiceProvider serviceProvider = new Application(null); IAssemblyLoader assemblyLoader = (IAssemblyLoader) serviceProvider.GetService(typeof(IAssemblyLoader)); IAssembly rAssembly = assemblyLoader.LoadFile("..\\..\\..\\CombinationsLib.dll"); Console.WriteLine("Assembly: " + rAssembly.ToString()); foreach (IModule module in rAssembly.Modules) { foreach (ITypeDeclaration typeDeclaration in module.Types) { foreach (IMethodDeclaration methodDeclaration in typeDeclaration.Methods) { Console.WriteLine("Method: " + methodDeclaration); IMethodBody body = methodDeclaration as IMethodBody; if (body != null) { foreach (IInstruction inst in body.Instructions) { switch(inst.Code.ToString()) { case "call": case "callvirt": case "calli": case "newobj": Console.WriteLine(" Calls to: " + inst.Operand.ToString()); } } } } } } assemblyLoader.Unload(rAssembly); } catch(Exception ex) { Console.WriteLine("Fatal error: " + ex.Message); } Console.WriteLine("\nDone"); }

The program in Figure 7 analyzes a .NET class library named CombinationsLib.dll. The associated using statements are:

using System; using System.IO; using System.Reflection; using System.Collections; using Reflector; using Reflector.CodeModel;

The first four of these six namespaces are part of the .NET Framework and should be familiar. The Reflector and Reflector.CodeModel namespaces are part of the Reflector API. To call the API you must download the reflector.exe file from Lutz's Web site and add a reference to it. Visual Studio® 2005 allows you to add references to .exe files. Unfortunately, Visual Studio .NET 2003 does not. If you're using Visual Studio .NET 2003, you can rename the file to reflector.dll and then add a reference to it. An alternative is to use the command-line compiler csc.exe with the /reference switch set to reflector.exe; this will work with both the .NET Framework 1.x and the .NET Framework 2.0.

The code in Figure 7 is straightforward. You instantiate a Reflector IServiceProvider interface, then use that interface to instantiate an IAssemblyLoader object. Next call the LoadFile method to load an assembly for use by the Reflector API. Notice that LoadFile returns a Reflector IAssembly rather than a System.Reflection.Assembly object. Once the Reflector assembly has been loaded into memory, you can iterate through the assembly using the Reflector.CodeModel Modules, Types, Methods, and Instructions collections. The Reflector.CodeModel IInstruction objects are direct analogs of intermediate language instructions. Here are some Reflector-style IL instructions:

L0021: conv.i8 L0022: beq.s 47 L0024: ldstr Array length does not equal k L0029: newobj Exception..ctor(String) : Void L002E: throw L002F: ldarg.0 L0030: ldarg.1 L0031: stfld Combination.n : Int64 L0036: ldarg.0

Each Reflector instruction includes an address followed by the colon character, an operator/instruction like "ldstr" (load string) or "call" (call a method), and an optional list of operands/arguments. Informally, when the term "IL instruction" is used, it often means either an opcode like 0x3B or its assembly language equivalent (bre—branch if equal), either with or without an operand. The Reflector API uses a slightly different terminology. If the variable instruction's ToString results in

L0123: ldstr "Hello"

then in Reflector terminology, instruction.Code.ToString is "ldstr" and instruction.Operand.ToString is "Hello".

Now because you are looking for .NET method calls, you need to identify exactly which IL instructions make method calls. This depends on exactly how you define a method call. The ECMA Common Language Infrastructure (CLI) specification document says in section 12.4.1.2 that IL, also called Common Intermediate Language (CIL) or Microsoft intermediate language (MSIL), has three call instructions that are used to transfer argument values to a destination method:

call Used when the destination address is fixed at the time the CIL is linked.

calli Used when the destination address is calculated at run time.

callvirt Uses the class of an object, known only at run time, to determine the method to be called.

If you examine the code in Figure 7 you'll see that I check for these three instructions. I also check for the "newobj" IL instruction, which calls a constructor. Whether or not you consider calls to constructors and properties as method calls depends on what you mean by a method. One of the benefits of lightweight tools like this one is that you can modify them to meet your own needs.

Lutz's slick Reflector code does all the work. Behind the scenes, the Reflector API code disassembles raw .NET assemblies, peeling away all file header information, and translates the remaining binary information to human-readable format. In theory, you could do this from scratch by using the information in the ECMA CLI specification document, but it is an awful lot of work.

You can extend my method dependencies tool in several ways. One interesting possibility is to modify it so that it not only keeps track of called methods, but also called-by methods. Testers are usually most concerned with which methods are called by a particular method so they can create meaningful test cases. But developers often want to know which methods call a particular method.

A second modification I've seen is a call tree structure very similar to the assembly references tree structure. The tool as presented just lists one level of method calls. But you can easily produce multilevel method call dependencies.

Comments and Discussion

The two lightweight tools I've presented here are easily modifiable and automation-friendly, but don't neglect GUI-based tools. Let me call your attention to the terrific CLR Profiler 2.0 tool. The CLR Profiler tool gives a very powerful way to analyze .NET assemblies during their run time. Figure 8 shows CLR Profiler output.

Figure 8 CLR Profiler Output

Figure 8** CLR Profiler Output **

The .NET assembly dependencies tool and the .NET method dependencies tool I've presented are often used by some of my colleagues at Microsoft as the basis for interview questions. When faced with a tree traversal problem in a hiring interview, new college graduates tend to immediately jump to a recursive solution. Interviewers are often looking for at least a mention of the possible problems with a recursive solution (memory allocation issues, debugging difficulty, and so forth) and a mention of a nonrecursive solution. SDET interview questions at Microsoft rarely have one correct answer—most coding questions are chosen so that they have multiple approaches, each with pros and cons. Just like coding problems in a real production environment.

Send your questions and comments for James to  testrun@microsoft.com.

James McCaffrey works for Volt Information Sciences Inc., where he manages technical training for software engineers working at Microsoft. He has worked on several Microsoft products including Internet Explorer and MSN Search. James can be reached at jmccaffrey@volt.com or v-jammc@microsoft.com. Thanks to Lutz Roeder for his help with this column.