Microsoft .NET: Implement a Custom Common Language Runtime Host for Your Managed App

Steven Pratschner
This article assumes you�re familiar with C++ and the .NET Framework
Level of Difficulty     1   2   3 
Download the code for this article: Clr.exe (40KB)
Browse the code for this article at Code Center: CLRHOST
SUMMARYWhile most application developers may not need to write a custom host, understanding what is involved provides a great deal of insight into the architecture of the CLR. After covering how the CLR is started and loaded into a process, how to set the available configuration options, and how a host defines application domains, this article explains how to design a custom host. Important concepts include making the right decisions about the application domain boundaries for the host, configuring them correctly, loading and executing user code, and resolving references to assemblies. Setting security policy and unloading application domains as the application shuts down are also explained.

The common language runtime (CLR) is the foundation upon which the Microsoft® .NET strategy is built. The CLR provides an execution environment that manages running code and provides services that make software development easier. These services include automatic memory management, cross-language integration, interoperability with existing code and systems, simplified deployment, and a finely grained security system.
      The CLR is flexible enough to run a variety of different types of applications. For example, the benefits provided by the CLR apply equally well to console applications, Web server scripts, downloaded controls, traditional Win32®-based applications, database queries, macros in business productivity applications, and so on. In fact, the CLR can add value to most scenarios in which code is written and executed.
      In the future, support for different types of common language runtime applications will be built into operating systems, but today each application type requires a piece of code to get it up and running. This piece of code is referred to as a CLR host. Specifically, a host is responsible for loading the CLR into a process, defining the application domains within the process, and executing user code within those domains. I'll explain application domains and user code later in this article when I discuss how to write your own custom host for the CLR.
      Examples of hosts that ship with the .NET Framework include:
ASP.NET
An ISAPI filter that ships with ASP.NET is responsible for starting the CLR and initializing the plumbing needed to route Web requests to the ASP.NET processes.
Internet Explorer
The .NET Framework ships with a MIME filter that hooks into Internet Explorer 5.01 or later to execute managed code controls that are referenced from HTML pages.
Shell Executables
Each time an executable is launched from the shell, a small piece of unmanaged code gets invoked that transitions control to the CLR.
Other hosts could include:
Database Engines
A future version of Microsoft SQL Server will allow stored procedures to be written in languages that support the .NET Framework and are executed with the CLR.
Personal Organizers
Several e-mail/calendar/contact programs allow users to write scripts to customize the processing of e-mail messages, appointments, and so on. It's easy to imagine these scripts running on the CLR. The security system provided by the CLR is especially important in this scenario because of the proliferation of viruses spread by e-mail systems.
      To understand the tasks that your custom host must perform and how to implement your host to perform them, you first need to know how the CLR is started, configured, and loaded into a process, and how a particular version of the CLR is selected when several are available.

Starting the CLR

      In order to start running managed code in a process, the CLR must be loaded and initialized and, as I just described, the host is responsible for loading the CLR. Because all hosts must start with an unmanaged stub, the .NET Framework provides a set of unmanaged APIs the host can use to get the CLR running.
      Several versions can be installed and run simultaneously (also described as the CLR being fully "side-by-side") primarily to offer administrators greater flexibility in upgrading to new releases. Previous runtimes shipped by Microsoft, including the Visual Basic® runtime, the Java virtual machine, and the COM infrastructure all forced administrators to upgrade to a new version even if only one application required it or if the runtime in question happened to be shipped in a service pack containing fixes to other Windows® components that the administrator wanted to install.
      While the flexibility of side-by-side is great for an administrator, it makes the job of hosting more difficult because the host must decide how to operate in the presence of multiple versions of the CLR, and it must pick a particular CLR version to load into a given process. Although multiple versions of the CLR may exist on a given machine, only one version may run in a particular process. So once the host chooses which version to load, all managed code that runs in that process will use that version of the CLR.
      The CLR's implementation of side-by-side requires the use of a startup shim. The shim is a thin piece of code that accepts a version number and other startup parameters from the host and starts the CLR. Only one version of the shim exists on a given machine and that version is installed on the machine's default search path (currently %windir%\system32). The shim is kept as small and straightforward as possible to ensure its compatibility with future versions of the CLR.
      The startup shim is implemented in mscoree.dll, while the bulk of the CLR's execution engine is implemented in mscorsvr.dll or mscorwks.dll, depending on whether you are running the server or workstation build (I'll describe this later on). Mscorsvr.dll and mscorwks.dll are installed in subdirectories of %windir%\Microsoft.NET\Framework, which are named by version number. All of the DLLs that implement a given version of the CLR are installed in one directoryâ€"they are not scattered in several directories on the machine. Figure 1 shows the relationship between the shim and the core CLR DLLs, and where they are installed.

Figure 1 Shim and CLR DLLs
Figure 1 Shim and CLR DLLs

      As described, the primary role of the shim is to accept a version number, then branch to the appropriate implementation of the CLR installed on the machine. Figure 2 describes some of the most commonly used shim APIs.

Loading the CLR into a Process

      Hosts call the CorBindToRuntimeEx API to load the CLR into a process. There are four values a host can set when calling CorBindToRuntimeEx. These settings control which CLR gets loaded and how basic functions like garbage collection and class loading will behave in the process. The four settings are: version, server versus workstation, concurrent GC, and loader optimization. The version setting determines which version of the CLR gets loaded. The server versus workstation setting specifies whether the workstation build or the server build is loaded. Concurrent GC specifies whether garbage collection (GC) is done concurrently or not. Finally, loader optimization controls whether assemblies are loaded domain-neutrally.
      A host has direct control over how the CLR is loaded by specifying values for each of these settings. However, each setting is optional. Default values are used if a particular parameter isn't supplied. The following four sections describe each of these settings in detail.
      In addition to these four settings, a host can also request an interface pointer to one of the COM interfaces exposed by the CLR. The most common interface for hosts to request is ICorRuntimeHost. I'll describe this interface later in the article as well, but in general this interface allows the host to fine-tune these options and to begin creating application domains and running user code in the process. Figure 3 shows how to call CorBindToRuntimeEx and get back a pointer to ICorRuntimeHost.

Version Setting

      The version setting specified in the pszVersion parameter dictates which version of the CLR to load. The pszVersion parameter to CorBindToRuntimeEx is a string that identifies the subdirectory under %windir%\Microsoft.NET\Framework that contains the specific version of the CLR the host wants to load. Note that the letter "v" must precede the actual version number in this string (for example, v1.0.2212).
      If null is passed for this parameter, the host is delegating the decision about which version to load to the per-machine, per-user, or application-specific settings. Unless otherwise configured, the latest version of the CLR is loaded.

Server versus Workstation Setting

      The .NET Framework ships with two builds of the CLR: a workstation build and a server build. This setting specifies whether the workstation build or the server build is loaded. These builds are tuned to provide optimal performance for client applications and multiprocessor server scenarios, respectively. Specifically, the server build takes advantage of multiple processors so garbage collection can be done on each processor in parallel.
      If null is passed for this parameter, the workstation build is loaded. In addition, when running on a single-processor machine, the workstation build will always be loaded, even if svr is requested by the host. The reason for this restriction is performanceâ€"the workstation build always outperforms the server build on single-processor machines.

Concurrent GC Setting

      The CLR's garbage collector can be run in one of two modes: concurrent or nonconcurrent. This setting specifies whether or not garbage collection is done concurrently. A host turns on concurrent GC by passing the STARTUP_CONCURRENT_GC flag as the dwflags parameter to CorBindToRuntimeEx. If this flag is not set, nonconcurrent GC is used.
      When running concurrent GC, collections are done on background threads instead of the threads that run user code. By user code, I mean any managed code that is not specifically part of the host. For example, to the Internet Explorer host, user code is the managed controls and script that make up the HTML pages. To the host that runs executables from the shell, the user code is the code contained in the executable being launched. As a result, applications running with concurrent GC can provide a more responsive user interface. However, although the application is more responsive, the overall performance of the garbage collection is slower. Concurrent GC is used almost exclusively for applications with complex user interfaces.
      In contrast, nonconcurrent GC performs collections on the same threads that run user code. The application is less interactive, but the overall performance of the GC is better than with concurrent GC. Non-concurrent GC is almost always used for server applications like Web servers or database servers. In fact, if the server build of the CLR is requested on a uniprocessor machine, concurrent GC will never be used.

Loader Optimization Setting

      This fourth setting controls whether or not assemblies are loaded domain-neutrally. To understand what this means and which setting to choose, you need to understand the CLR's definition of application domains. Operating systems and runtimes typically provide some form of isolation between applications running on the system. This isolation is necessary to ensure that code running in one application cannot adversely affect other unrelated applications. In Windows, this isolation has historically been achieved using process boundaries. In this model, a process runs in exactly one application, so no other applications are affected if the application crashes.
      The .NET platform has similar needs for isolation, but there are many scenarios in which the process boundary is too expensive an operation since it involves a thread switch, the resetting of call stacks, and so on.
      With .NET, user code can be verified to be type-safe, so several applications can run in the same process and guarantee that one application can't bring down the whole process. This helps provide isolation at a lower cost than the process boundary. The CLR allows multiple applications to be run in a single operating system process by using a construct called an application domain to isolate those applications from one another.
      In many respects, application domains are the CLR's equivalent of an operating system process. As such, user code is isolated to the domain in which it is loaded. That is, the code cannot be directly called from outside the containing application domain, nor can it make direct calls to code loaded in other domains. If a single assembly is used by several applications in the same process, the CLR will load multiple copies of it by defaultâ€"one for each domain in which the assembly is used.
      To maintain isolation, each domain will have its own copy of the user's code and the data structures the CLR builds when executing the code. In many cases, this can be optimized so that the read-only CLR data structures are shared among all domains within the process. This optimization can significantly reduce memory usage for scenarios in which the same assembly is commonly used by a number of applications in the process. Assemblies loaded in this fashion are said to be domain-neutral.
      Although domain-neutral code consumes less memory, it does run a bit more slowly. The slower performance is related to the way in which the assembly's static variables and methods are accessed. A separate copy of the static variables must be maintained for each domain to prevent object references from leaking across domains by passing them as static variables. As a result, the CLR must maintain tables that map a given caller to the appropriate copy of the static variable. The indirection through these lookup tables causes the code to run more slowly. Access to nonstatic data and methods is equivalent regardless of whether the optimization is enabled or not. (I'll explain how to configure application domains for your host later in this article.)
      The CLR allows the host to control this loader optimization by setting the dwFlags parameter when calling CorBindToRuntimeEx. For example, the ASP.NET host takes advantage of this feature to optimize the use of assemblies like System.WebForms and System.Data. The host can specify one of the three values shown in Figure 4.

ICorRuntimeHost

      As I mentioned earlier, this interface allows the host to set more granular options and to begin creating application domains and running user code in the process. Specifically, ICorRuntimeHost allows a host to access numerous additional configuration parameters, explicitly control when the CLR is started and stopped, and obtain a pointer to an initial application domain (that is, create an application domain) and transition into managed code.
      The GetConfiguration method on ICorRuntimeHost provides access to an interface called ICorConfiguration that can be used to configure specific aspects of the CLR that will be loaded into the process, or to register for additional events. For example, a host could use GetConfiguration to register a callback function to receive notification that a particular thread is about to be stopped in the debugger, or to specify the size of the GC heap. See Figure 5 for a description of the methods on ICorConfiguration.
      ICorRuntimeHost's Start and Stop methods allow a host to explicitly control the lifetime of the CLR within the process. The host isn't required to explicitly call these methods, since Start is implicitly called when the first managed code is run in the process and Stop is implicitly called when the process shuts down. However, there are scenarios in which it is useful to call these methods. For example, a host may know that it is finished running managed code and may want to stop the CLR to cause it to release the memory and resources it is using. Note that once the CLR is unloaded from a process, it cannot be started in that process again.
      In order to begin running managed code, a host must obtain a pointer to an application domain. In many cases, this is the default domain within the process, but the host can also create additional domains as well, as you will see later in this article.

Designing Your Host's Architecture

      Now that you've seen how the CLR is started and initialized, you can start making decisions about how to write your own host. This section describes the architecture of a typical CLR host. By typical architecture, I mean the architecture that performs best, is easiest to write, and offers the host the most flexibility. Most hosts consist of both unmanaged and managed code. The unmanaged code, of course, is responsible for configuring the CLR, loading it into the process, and transitioning the program into managed code. The managed portion of the host is typically responsible for creating the domains that user code will run in and dispatching user requests to those domains.
      Hosts typically contain both unmanaged and managed code for two reasons. The first is performance. There is a cost associated with calling across the managed/unmanaged boundary. It is generally a good idea to transition into managed code once and stay there instead of continually transitioning from the unmanaged host code to the managed user code. The second reason is ease of implementation. The overall goal of the CLR is to make code easier to writeâ€"so you should use it as much as possible.
      A portion of the host is likely to be written as a .NET assembly since all managed code must be in a .NET assembly. Therefore, the host must decide the application domain in which to run the managed hosting code. Each process has a default domain that is well-suited for this purpose. The default domain is created automatically by the CLR every time it is initialized into the process. When the process shuts down, the default domain is unloaded. Most hosts don't run user code in the default domain because it can't be shut down independently of the process.
      An interface pointer to the default domain can be obtained by calling ICorRuntimeHost::GetDefaultDomain. The pointer that is returned points to the instance of System.AppDomain that represents the default domain. This interface pointer is of type _AppDomain and is generated automatically by the COM interop layer of the CLR. In short, the host will be calling methods on an instance of the managed class System.AppDomain through COM interop. As I mentioned previously, it is generally advantageous for performance reasons to keep the number of calls across the unmanaged/managed boundary to a minimum. Figure 6 shows this architecture.

Figure 6 A Typical Host Architecture
Figure 6 A Typical Host Architecture

      Figure 7 illustrates how to obtain the default domain and load the managed hosting code into it. In this example, the managed hosting code is contained in an assembly called MyManagedHost.dll. The sample code creates an instance of a type called HostProcessRequest in MyManagedHost.dll. Error checking has been omitted for brevity.

Appropriate Domain Boundaries

      As I just described, application domains are a means for isolating an application within a process. The definition of what an application means to a particular host, and therefore where the application domain boundaries lie, is one of the most critical decisions a host must make. For example, to the ASP.NET host an application corresponds to a vroot as defined in the Web server's admin tool. To a database server such as SQL Server, an application may correspond to a particular database.
      When determining how an application is defined by your host, you need to consider several conditions:
Code Isolation
Direct calls between two types are only allowed if both types are in the same application domain. All calls coming into or going out of the domain are indirected through proxies. In short, if a host wants to ensure that code in two .NET assemblies cannot directly call each other, those assemblies must be loaded in different domains.
Configuration and Application Isolation
Isolating both the configuration data and the location from which private components (assemblies) are loaded is critical for the ability of a developer to build an application that cannot be affected by changes made to the system on behalf of other applications. Most commonly, an application is rooted in a particular directory in the file system. Requests to load private assemblies will only be made in that directory and its subdirectories. For example, the Internet Explorer host defines an application per Web site by default. The root directory of the site is considered the root directory for the application.
Security
A host has a high degree of control over the permissions that code receives when running in a given application domain, as I'll discuss later. For example, a host may want to require that all code running in a particular domain come from a certain area on disk, or that all code running in a domain is signed by a particular publisher. Hosts can also define this security policy based on custom data (or evidence, which I'll explain later).
      Consider the scenario in which a host has some notion of user identity outside of that provided by Windows. In this case, the host may want to partition domains based on user account and define custom security policy that enforces that only code running under that person's account is allowed in a particular domain.
Unloading
Application domains are the unloading boundary in the CLR. Nothing smaller than a domain, such as an assembly or type, can be unloaded. The ability to unload and reload code often affects how a host determines domain boundaries.

Configuring Application Domains

      Once inside managed code, the host will likely want to create application domains in which to run user code based on the aforementioned criteria (security, isolation, unloading, and so on). Most hosts don't run user code in the default domain for a few reasons. First, the default domain cannot be unloaded until the process exists. Also, for security and isolation reasons it doesn't make sense for the hosting code and user code to run in the same domain.
      There are various properties a host can set on an application domain that control everything from how assemblies are found to whether DLLs are locked (and therefore can't be dynamically replaced) when loaded. These properties are defined by the managed class System.AppDo- mainFlags. This class's two most important properties are ApplicationBase and ConfigurationFile.
      If the host intends to load assemblies from disk, ApplicationBase will almost always be set. ApplicationBase defines the root directory for the application. The CLR will always start by looking in the ApplicationBase when resolving references to assemblies. By defining an ApplicationBase, the host allows assemblies to be private in relation to a particular application. Application-private assemblies are a key element in the ability to create an isolated application.
      The ConfigurationFile property specifies an XML file that contains settings used to configure the application running in the domain. Examples of settings in the application configuration file include assembly versioning rules and information about how to locate types that are accessed remotely by the application.
      The following sample code sets ApplicationBase and ConfigurationFile before creating a domain.

  IDictionary properties = new Hashtable(2);
  

// CLR loader settings
properties.Add(AppDomainFlags.ApplicationBase,
"c:\\program files\\myapp");
properties.Add(AppDomainFlags.ConfigurationFile,
"c:\\program files\\myapp\myapp.config");

AppDomain appDomain = AppDomain.CreateDomain("MyDomain",
null,
null,
properties);

 

      See the documentation for System.AppDomainFlags in the .NET Framework SDK for descriptions of the other properties that can be used to configure an application domain.

Loading and Executing User Code

      Now that the host has created and configured one or more application domains, the next step is to execute user code in those domains.
      All code that is run in the CLR must be part of an assembly. An assembly is the deployment unit for types and resources, and is the primary unit of execution in the CLR. The manner in which assemblies are loaded for execution depends largely on the host's specific scenario. In general, there are two options. The first option is to load precompiled assemblies from disk. The methods Assembly.Load, Assembly.LoadFrom, and AppDomain.CreateInstance are commonly used to load precompiled assemblies.
      The second option is to create assemblies on the fly using the APIs in the System.Reflection.Emit namespace. ASP.NET uses these APIs to dynamically create assemblies that correspond to .aspx pages in a Web application. These assemblies can then be directly run in a given domain and thrown away. Of course, the Reflection.Emit namespace does offer the capability to persist an assembly to disk as well.

Resolving References to Assemblies

      The CLR has a well-defined set of rules for resolving references to assemblies, including probing in the ApplicationBase and looking in the global assembly cache. These default rules may not be sufficient in some hosting scenarios, especially if the host is creating assemblies on the fly using Reflection.Emit. In this case, there may not be a persisted assembly to find.
      The CLR provides a hook into the class-loading process that allows a host to plug in its own rules for how assembly references are resolved. This hook is the TypeResolveEvent on the System.AppDomain class. This event takes a delegate of type ResolveEventHandler. ResolveEventHandler has the following signature:

  public delegate Assembly ResolveEventHandler(
  
object sender,
EventArgs e);

 

      If the CLR does not find an assembly given its default rules, it raises this event by passing the identity of the assembly it is looking for as the EventArgs parameter. Hosts that receive this event are free to resolve the reference to the assembly by any means they see fit. They may construct an assembly on the fly, find it in a custom location on disk, and so on, as long as they construct an instance of System.Reflection.Assembly to return from the delegate. The sample code in Figure 8 demonstrates how a host registers to receive the TypeResolveEvent.

Setting Security Policy

      The code access security system provided by the .NET Framework is designed to allow administrators to make finely grained decisions about whether a given piece of code can access a particular resource. Decisions about what code is allowed to do are based on characteristics of the code itself, rather than on the user that happens to be executing the code. These characteristics are called evidence. Examples of evidence include the Web site or zone the code was downloaded from or the digital signature of the vendor that published the code.
      The code access security system maps this evidence to a set of permissions when the code is loaded and run. These permissions define the specific actions that the code is allowed to take. For example, code may be given a permission that allows it to read from a particular portion of the file system, or write to a particular network drive. The specific mapping between a particular piece of evidence and the permissions granted to the code is made by the administrator or host and is termed security policy. For example, an administrator may use security policy to grant code that is downloaded from the intranet a higher set of permissions (like the ability to access the file system) than code that is downloaded from the Internet.
      There are two ways a host can influence the set of permissions granted to code running inside an application domain it created. First, the host can associate evidence with the domain itself. This evidence is added to the evidence about each piece of code that is run in the domain before security policy is evaluated. This additional evidence is particularly useful when the host wants to convey a piece of information to the policy about the environment in which the code is running. For example, if a host starts a domain associated with a particular Web site, that site's URL can be set as domain-level evidence to ensure that no code running inside that domain will ever be granted more permissions than code coming from that site. The ability to set domain-level evidence requires that the host itself be granted the ControlEvidence permission. The following code demonstrates how to set domain-level evidence.

  Using System.Security.Policy;
  

Evidence evidence = new Evidence();
evidence.AddHost(new Url("https://www.somesite.com"));

AppDomain appDomain = AppDomain.CreateDomain("MyDomain",
evidence,
null,
null);

 

      The second way a host can influence code access security policy is to set the application domain-level policy. The code access security system includes the concept of policy levels. Simply put, a policy level is just an entity to which policy can be associated.
      The .NET Framework supports four policy levels: Enterprise-wide policy, Machine-wide policy, Per-user policy, and Application domain-level policy.
      Each policy level can restrict the set of permissions granted by the level above it. For example, even if the machine-wide policy states that code with a particular strong name can read environment variables, the per-user policy may not grant the code the permission to do so. A given policy level can never grant more permissions than the level above it.
      Security policy is evaluated at all four policy levels. The results of evaluating policy at each level intersect to form the final set of permissions granted to the code in question.
      Enterprise-wide, machine-wide, and per-user policy are set by the administrator using the administration tools provided with the .NET Framework. Application domain-level policy is set by the host by calling AppDomain.SetAppDomainPolicy just after the domain is created.
      The fact that application domain-level policy can restrict policies specified at the higher levels gives a host a high degree of control over the security permissions granted to code in domains it has created. This high degree of control is critical in numerous situations. For example, Microsoft SQL Server uses domain-level policy to ensure that only assemblies explicitly registered in the SQL catalog are allowed to execute. It is reasonable to imagine a host wanting to ensure that no code executing on its behalf be allowed to access the registry or display a user interface that could block the current thread. Note that the host itself must be granted ControlEvidence permission to set domain-level policy.
      The sample code in Figure 9 creates domain-level policy that will only allow code with a given strong name to execute in the domain.

Unloading Application Domains

      As described earlier, application domains are the unit of code-unloading in the CLR. The AppDomain class in the System namespace includes a static method called Unload that hosts can use to free the resources associated with a particular domain. Calls to the AppDomain.Unload method result in a graceful shutdown. That is, each thread running in the domain is sent a ThreadAbort exception, then unwound to the domain boundary, no new threads are allowed to enter the domain, and all domain-specific data structures are then freed.
      If the domain has run code from a domain-neutral assembly, the domain's copy of the static variables and related CLR data structures are freed, but the code for the domain-neutral assembly remains until the process is shut down. There is no mechanism to fully unload a domain-neutral assembly other than shutting down the process or completely unloading the CLR itself using the ICorRuntimeHost::Stop method.
      Calling ICorRuntimeHost::Stop removes the CLR from the process. All domains are forcefully shut down. After the Stop method has been called, the CLR can never be loaded in that process again.

Conclusion

      The CLR has been built to support a variety of application types ranging from shell executables and Web applications to stored procedures running in a database. Because each application scenario requires a hosting environment to get it up and running, the .NET Framework SDK includes a set of interfaces that allow third parties to write custom hosts, and this is a key to the overall adoption of the Microsoft .NET strategy. By opening up the interfaces used internally to host applications like ASP.NET, Microsoft enables .NET to have as broad a reach as possible.

For related articles see:
Microsoft .NET Framework Delivers the Platform for an Integrated, Service-Oriented Web
For background information see:
.NET Developer site on MSDN Online
Steven Pratschner is a program manager on the Common Language Runtime team at Microsoft. In addition to working with hosting-related issues, Steven is also responsible for the versioning, deployment, and side-by-side support provided by the CLR.

From the March 2001 issue of MSDN Magazine