Basic Instincts

Deploying Assemblies

Ted Pattison

Contents

Private Assemblies
The Global Assembly Cache
Deploying Assemblies with a Configured CodeBase
The Download Cache
Assembly Loading
Conclusion

This month's installment of Basic Instincts is the second in a series focused on working with assemblies. In the August 2003 issue I discussed the four parts of an assembly name. I also covered the build mechanics for building an assembly with a version number and a strong name. In this month's column I'll explore the options for deploying assembly DLLs in both development and production environments.

The four-part name of an assembly is independent of location. That is, there is nothing about the name of an assembly DLL that tells the common language runtime (CLR) or a hosting application anything about the location of the actual assembly file. This provides a valuable degree of flexibility because an assembly DLL can be deployed in several different ways and in many different places. It also means that the CLR must use something other than the assembly name to locate it at run time.

There are three primary ways to deploy an assembly on a target machine. The first technique involves deploying a DLL as a private assembly by locating it in the ApplicationBase directory. The second technique involves installing it in a machine-wide repository called the Global Assembly Cache (GAC). The third technique involves configuring an assembly DLL with a <codeBase> element that allows the CLR to download the DLL on demand from across the network the first time it's used by a hosting application.

Private Assemblies

Deploying a DLL as a private assembly is the simplest approach you can take. You deploy the DLL inside the ApplicationBase directory of the hosting application or inside a subdirectory of the ApplicationBase directory if the PrivateBinPath property is set. In most cases, it's really as easy as that.

One of the biggest advantages of private assembly deployment is that it allows for XCOPY deployment. That's because an application, its configuration file, and all of its private assemblies are contained within a single directory structure. Once your application and its private assemblies have been thoroughly tested, you can deploy the application as a whole by simply copying the ApplicationBase directory structure to the target machine using a utility such as XCOPY.EXE or a network transfer protocol such as FTP. You can even deploy an application by simply using drag and drop in Windows® Explorer. Once you have copied the ApplicationBase directory structure, the application is ready for use (assuming the common language runtime is present).

One noteworthy limitation of private assembly deployment is that it can never be deployed outside the ApplicationBase directory. You cannot share a private assembly across two or more applications that you have deployed in separate directories.

So how does the CLR find and load private assemblies? Well, the CLR discovers the physical path at run time through a search process known as probing. When the CLR starts probing for a private assembly, it determines the file name of the assembly by taking its friendly name and adding an extension of .dll.

Once the CLR determines the name of the target assembly file, it searches inside the ApplicationBase directory to see if it's there after checking the GAC for it. If it is, probing stops and the CLR loads the assembly into memory. Otherwise the probing process continues in a subdirectory of the ApplicationBase directory that has the same name as the assembly itself. For example, if the CLR were probing for an assembly named MyLibrary, it would look to see if there was a subdirectory named MyLibrary in the ApplicationBase directory that contained the assembly file MyLibrary.dll.

If the CLR finds the assembly file in this subdirectory, the assembly is loaded. If not, the CLR continues probing in the same two directories for an assembly file with an .exe extension instead of a .dll extension. That means the CLR automatically looks at four different paths while probing for a private assembly. If an application were to have an ApplicationBase directory with a path of C:\MyApp, the CLR would automatically probe for the assembly file using the following four file paths:

c:\MyApp\MyLibrary.dll
c:\MyApp\MyLibrary\MyLibrary.dll
c:\MyApp\MyLibrary.exe
c:\MyApp\MyLibrary\MyLibrary.exe

So, the CLR will automatically inspect four file paths when probing for an assembly that is culture neutral. When the CLR is probing for a satellite assembly with a cultural identifier, it will additionally look in subdirectories that have the same name as the cultural identifier itself. This extra support makes it easier to deploy multiple resource-only assemblies that have been localized for different languages.

Note that although you have the flexibility to deploy a private assembly in any subdirectory inside the ApplicationBase directory, if you choose one with a name other than the ones previously outlined, the CLR will require extra configuration information to assist it in probing. You must add a special <probing> element to the application configuration file to give the CLR a hint.

Let's say you wanted to create a subdirectory named MyAssemblies inside the ApplicationBase directory and then you wanted to deploy some of your dependent assemblies inside of it. The application would not be able to load these private assemblies until you modified your application configuration file to include the probing element, as shown here:

<configuration>
  <runtime>
     <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <probing privatePath="MyAssemblies"/>
    </assemblyBinding>
  </runtime>
</configuration>

As you can see, the <probing> element contains a privatePath attribute that tells the CLR where to look. If you want to add more than one subdirectory to the private path, you can concatenate them together using a semicolon-delimited string.

You should also remember that the CLR looks through a predetermined sequence of directories during probing. Given a privatePath value of MyAssemblies, the CLR will now probe for an assembly named MyLibrary in the following order:

C:/Apps/MyLibrary.DLL
C:/Apps/MyLibrary/MyLibrary.DLL
C:/Apps/MyAssemblies/MyLibrary.DLL
C:/Apps/MyAssemblies/MyLibrary/MyLibrary.DLL
C:/Apps/MyLibrary.EXE
C:/Apps/MyLibrary/MyLibrary.EXE
C:/Apps/MyAssemblies/MyLibrary.EXE
C:/Apps/MyAssemblies/MyLibrary/MyLibrary.EXE

The sequence of file paths in which the CLR probes for an assembly file is important because the probing process stops once the CLR locates an assembly file with the correct file name. If you deploy an application along with one version of MyLibrary.dll in the ApplicationBase directory and a second version in the subdirectory MyAssemblies, which DLL file will the CLR load? You should see that the CLR is going to load the DLL in the ApplicationBase directory because that is always the first directory searched by the CLR during the probing process.

The Global Assembly Cache

You cannot use private assembly deployment to share an assembly DLL across several applications on the same machine. The best way to do that is to install the assembly in the GAC.

Installing an assembly DLL in the GAC eliminates path dependency problems between an application and a dependent DLL. It doesn't matter where the ApplicationBase directory for an application is located. The CLR can always find and load an assembly DLL when it has been installed in the GAC.

Another significant benefit to using the GAC is that you can install as many versions of the same assembly DLL on a single machine as you want. This is valuable because different applications can load whichever version of MyLibrary.dll works best with them. This is an example of how the GAC facilitates side-by-side assembly deployment.

The GAC has been designed to be a secure repository of assemblies. Therefore, the CLR places two important restrictions on the installation of DLLs in the GAC. First, you cannot add assemblies or remove them from the GAC unless you have either Administrators or Power Users privileges on the target machine. Second, you cannot install an assembly in the GAC unless it contains a strong name. That's because the CLR has been designed to perform a strong-name verification check on the assembly's digital signature whenever you install an assembly DLL in the GAC. This ensures that the GAC only contains assemblies that have been signed by someone in possession of the proper private key.

It is interesting to note that the CLR does not perform strong-name verification checks on an assembly's digital signature when it loads assemblies in the GAC at run time. That's because the CLR assumes that assemblies in the GAC have already been verified. Therefore, assembly deployment in the GAC provides a small performance boost. The GAC is the only place from which you can load a strongly named assembly without paying the price of strong-name verification checks at run time.

Now let's discuss how the CLR manages the GAC internally. The CLR contains a system component called the assembly manager that takes on the responsibilities of storing assembly files in the GAC and loading them at run time when they are first used by an application. The assembly manager is loaded from the system component FUSION.DLL.

I must point out that the manner in which the assembly manager stores and retrieves assembly files on a target machine should be considered a private implementation detail of the CLR. I am going to describe some of these details simply to give you a better sense of how the GAC works. You should never design applications or use deployment techniques that rely on these private details because they are likely to change in future versions of the CLR.

The assembly manager stores assembly files in the GAC using a special directory structure in the Windows file system. This directory structure is created as a subdirectory in the Windows directory with a path that looks like this:

C:\Windows\assembly\GAC

When you install assemblies into the GAC, the assembly manager creates new directories to store them in. In fact, the assembly manager will create a unique directory for each assembly that is stored in the GAC. The reason for this is that the GAC must be able to accommodate two different assemblies whose name only differs by their public key value or by their version number. After all, there could be many different assemblies with the file name MyLibrary.dll. Therefore, the assembly manager creates a unique directory for each assembly using all four parts of the assembly name. For example, imagine you install version 1.0.24.0 of MyLibrary.dll in the GAC and this assembly has a public key token of 29989d7a39acf230. When you do this, the assembly manager creates a new directory with the following path:

C:\WINDOWS\assembly\GAC\MyLibrary\1.0.24.0__29989d7a39acf230

As you can see, the assembly manager uses its own internal naming scheme for directories when it stores an assembly in the GAC. When it's time for the assembly manager to load an assembly with a specific four-part name, it knows where to locate it because it follows the same naming scheme.

When you want to install an assembly in the GAC, you are not required to interact directly with FUSION.DLL. Instead, you use a utility that has been written to interact with FUSION.DLL for you. If you want to install an assembly into the GAC on a development workstation for testing, you can use a command-line utility named GACUTIL.EXE that ships with the Microsoft® .NET Framework SDK. GACUTIL.EXE provides many command-line switches for installing and managing assemblies in the GAC. For example, the /i switch is used for the regular installation of an assembly, like so:

GACUTIL.EXE /i MyLibrary.dll

In order to install an assembly in the GAC using this technique, you must have access to the assembly file. However, it's also important to note that the assembly manager makes a copy of the assembly file when it's installed in the GAC. After installation, the assembly manager is only concerned with the copy it made. That means you can delete the original assembly file after it has been installed in the GAC. If you installed the assembly file from a CD, you can remove the CD from the drive without any problems; if you installed the assembly file from the network, you can disconnect from the network without any problems as well.

You can also examine and manage the assemblies in the GAC using a GUI-based utility called the Assembly Cache Viewer, an administrative utility that runs transparently in Windows Explorer as a Windows shell extension named SHFUSION.DLL. This utility is located in the \Windows\assembly directory (see Figure 1).

Figure 1 The Assembly Cache Viewer

Figure 1** The Assembly Cache Viewer **

Note that the view supplied by the Assembly Cache Viewer does not actually show you the physical layout of the directory structure maintained by the GAC. Instead, it shows you a flattened-out view where all the assemblies are shown as a single scrollable list. You should also note that every assembly is displayed with its friendly name, version number, culture, and public key token.

The Assembly Cache Viewer provides administrators and developers alike a simple way to install and remove assemblies from the GAC. If you want to install an assembly, you can drag it from the Windows Explorer and drop it in the Assembly Cache Viewer. When you want to remove an assembly from the GAC, you can select it in the Assembly Cache Viewer and press the DELETE key on your keyboard.

Remember that FUSION.DLL is the only component that's allowed to read and write files into the GAC. When you manage assemblies using the Assembly Cache Viewer, you should understand that the Windows shell extension SHFUSION.DLL is interacting with FUSION.DLL behind the scenes to carry out your commands for you.

Deploying Assemblies with a Configured CodeBase

The final option for deploying a dependent assembly on a target machine is to configure it using a <codeBase> element. A configured <codeBase> element is powerful because it lets you download an assembly DLL from across the network. This means the CLR can download an assembly DLL to a target machine on demand the first time it is used by an application.

You can use a <codeBase> element to deploy a strongly named assembly in any directory on the target machine. You can even deploy a strongly named assembly on a file server or a Web server. You can then use a <codeBase> element to configure an app on a target machine to download that assembly on demand. If you plan on deploying your assemblies using a <codeBase> element, build them with a strong name to allow for this extra flexibility.

An example of an application-specific <codeBase> element is shown in Figure 2. Note that you can add a <codeBase> element to the machine.config file instead of the application configuration file to manage a dependent assembly on a machine-wide basis.

Figure 2 A Element

<!-- MyApp.exe.config -->
<configuration>
  <runtime>
    <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <dependentAssembly>
        <assemblyIdentity 
          name="MyLibrary" publicKeyToken="29989D7A39ACF230" />
        <codeBase
          version="1.0.24.0"
          href="https://www.AcmeCorp.com/Downloads/MyLibrary.dll"/>
      </dependentAssembly>
    </assemblyBinding>
  </runtime>
</configuration>

Examine the <codeBase> element shown in Figure 2. Note that a <codeBase> element must be placed inside a <dependentAssembly> element. The <dependentAssembly> element in this example also has an inner <assemblyIdentity> element which has attributes to identify the friendly name and public key token of the dependent assembly being configured.

While a <dependentAssembly> element only contains one <assemblyIdentity> element, it can have many inner <codeBase> elements. That's because each separate version of an assembly needs a separate <codeBase> element. The <dependentAssembly> element in Figure 2 contains only a single <codeBase> element for version 1.0.24.0, but it's possible to add additional <codeBase> elements for other versions of this assembly.

Note that the <codeBase> element contains an href attribute in addition to the version attribute. The purpose of the href attribute is to provide the CLR with a Uniform Resource Identifier (URI) that allows it to determine the location of the assembly file. If you want to configure a dependent assembly on the local file system using a <codeBase> element, you should use an href attribute with a URI that looks like this:

href="https://file:///c:\AcmeCorpSharedAssemblies\MyLibrary.dll"

If you want to configure a dependent assembly for download from a file server using a Univeral Naming Convention path name, you should configure the href attribute with a URI that looks like this:

href="https://file://AcmeCorpFileServer1\Downloads\MyLibrary.dll"

If you want to configure a dependent assembly for download from a Web server using HTTP, you should configure the href attribute with a URI that looks like this:

href="https://www.AcmeCorp.com/Downloads/MyLibrary.dll"

Keep in mind that you can also configure a <codeBase> element for a dependent assembly using the .NET Framework Configuration administrative tool, MSCORCFG.MSC, which is a Microsoft Management Console (MMC) snap-in that can be launched from a shortcut in the Administrative Tools group under the Windows Start menu. This eliminates the need to deal directly with the XML that goes into an application configuration file or into the machine.config file. By the way, when you start configuring applications and their dependent assemblies, be aware that the Microsoft .NET Framework Configuration administrative tool is a convenient GUI-based tool. This utility's UI is shown in Figure 3.

Figure 3 .NET Framework Configuration

Figure 3** .NET Framework Configuration **

The .NET Framework Configuration administrative tool is convenient because it can do the work of creating an application configuration file for you. You can configure an application by simply interacting with standard Windows controls such as textboxes, radio buttons, and checkboxes. When you configure an application in this manner, the Microsoft .NET Framework Configuration administrative tool does the grunt work of creating the appropriate XML content and adding it to the application configuration file.

The Download Cache

Now let's discuss what happens when the assembly manager of the CLR downloads a dependent assembly from across the network using a <codeBase> element. The assembly manager does not load the assembly file directly into memory within a running application. Instead, it downloads the assembly file and writes it to disk in a temporary storage area known as the download cache.

There is an obvious advantage to this caching scheme. It isn't necessary to copy the assembly file across the network more than once. The assembly file only needs to be downloaded by the assembly manager and saved to disk the first time it's used by an application. The assembly manager can load the assembly file from the download cache on the local hard drive.

There are two important differences between the download cache and the GAC. The GAC is truly a machine-wide repository while the download cache is not. The download cache is actually managed by the CLR on a user-by-user basis. For example, if a user named Bob runs an application that downloads an assembly file from across the network using a <codeBase> element, the assembly manager will store the assembly file within a private subdirectory for this user under the following path:

C:\Documents And Settings\Bob\Local Settings\Application Data\

The other big difference is that the assembly manager treats the GAC as a secure and fully trusted repository of assemblies because it's local. The download cache is less secure and not fully trusted. Therefore, the assembly manager treats assemblies loaded from the download cache as mobile code that is subject to additional security restrictions. This is important to know because the CLR runs mobile code in a restricted sandbox to protect the host machine from attacks.

Assembly Loading

Now it's time to discuss what happens at run time when the assembly manager needs to load a dependent assembly into a running application. The assembly manager starts by breaking down the format string the four-part name of the assembly it's looking for. The assembly manager usually determines the format string by looking at a reference in the manifest of another assembly.

Let's walk through an example. Imagine you run MyApp.exe and it executes the first line of code that requires the CLR to load MyLibrary.dll. The assembly manager examines the assembly manifest for MyApp.exe and discovers the four-part name of the assembly file for MyLibrary.dll that was present when the consuming app was compiled. If MyLibrary.dll has a strong name, the assembly manager generates a format string containing a public key token, like this:

  MyLibrary,
  Version=1.0.24.0,
  Culture=neutral,
  PublicKeyToken=29989D7A39ACF230

At this point in the process, the assembly manager has the four-part name of the dependent assembly that was present during the compilation of MyApp.exe. In the next step, the assembly manager looks to see if there is any configuration information to redirect the application to use a different version number. Let's assume for the purposes of this discussion that the version number is not going to be redirected.

Once the assembly manager knows the four-part name of the assembly that it's looking for, it is now ready to begin searching for the assembly file. The assembly manager conducts its search in the following manner. First, it checks the config file redirect locations (like <codeBase>), then it searches for the assembly in the GAC. Next it searches for the assembly using configuration information in a <codeBase> element. And finally, it searches for the assembly by probing within the ApplicationBase directory.

The fact that the assembly manager always looks in the GAC first has an important implication. Think about what would happen if you deployed three copies of the same assembly. For example, imagine you have put one copy of the assembly DLL in the ApplicationBase directory, you have deployed a second copy using a <codeBase> element, and you have installed a third copy in the GAC. The assembly manager will always load the assembly from the GAC. Once it finds an assembly in the GAC, it stops the search.

If the assembly manager looks for a configured <codeBase> element and finds one, it then checks to see if the URI points to a location on the local hard disk. If it does, the assembly manager loads the assembly file directly from that location on disk.

If the URI in the <codeBase> element points to an assembly file on another machine, the assembly manager knows that it can only load a local copy of the assembly file from the download cache. Therefore, it checks the download cache to see if it has already been downloaded. If it hasn't, then the assembly manager copies it from across the network to the download cache. Once the assembly file is in the download cache, the assembly manager will load it.

If the assembly manager cannot find an assembly file in the GAC and it sees that the assembly has no configured <codeBase> element, it resorts to probing the ApplicationBase directory. The assembly manager conducts the probing process by looking directly in the ApplicationBase directory and then by looking in various subdirectories according to the rules outlined earlier in the section on private assemblies.

As you can imagine, many things can go wrong when the CLR attempts to load an assembly. For example, if the assembly manager cannot find the assembly file while probing within the ApplicationBase directory, it throws a FileNotFoundException to the line of code that caused the assembly manager to begin looking for the assembly in the first place. The CLR throws a FileNotFound exception when it cannot locate the assembly file configured in a <codeBase> element and when it determines that the assembly it is attempting to load does not have the correct version number, culture setting, or public key value.

Conclusion

This month I described the three primary ways to deploy an assembly DLL. Private assembly deployment is valuable because it provides all the advantages of XCOPY deployment. But it's limited because you must deploy the assembly DLL within the ApplicationBase directory structure. Assembly deployment in the GAC is useful because it lets you share an assembly DLL across several different applications regardless of where the ApplicationBase directory for each application is located. The GAC also provides the advantage of being a secure and fully trusted repository of assemblies. Finally, deploying an assembly DLL using a <codeBase> element allows you to deploy a strongly named assembly anywhere on the local hard drive or at any accessible location across the network. There are two main advantages to using a <codeBase> element. First, you are not restricted to deploying the assembly DLL within the ApplicationBase directory structure. Second, you can configure an assembly DLL to be download by the CLR on demand from across the network.

Send your questions and comments for Ted to  instinct@microsoft.com.

Ted Pattison is an independent contractor living in Los Angeles, CA. Ted is the author of Programming Distributed Applications with COM+ and Visual Basic 6.0 (Microsoft Press, 2000). His new book, Building Applications and Components with Visual Basic .NET, will be available in October 2003 from Addison-Wesley. Contact Ted at TedP@SubliminalSystems.com.