Moving to .NET and WinFX: A Roadmap for C/C++ Applications

 

Kate Gregory
Gregory Consulting

October 2004

Applies to:
   Microsoft Visual C++
   Microsoft.NET Common Language Runtime (CLR)
   C++/CLI extension
   Microsoft .NET Framework
   WinFX

Summary: This article is for C++ programmers who are (at least for now) not targeting the Microsoft .NET Framework in new or existing applications. It provides some guidelines for moving to the .NET Framework without leaving behind the investment in existing code, and explains why you should consider moving to the .NET Framework not only for new development, but for existing applications as well. (9 printed pages)

Contents

Why Use the Common Language Runtime?
A Review of the Languages Involved
Understanding C++ on .NET
Managed and Unmanaged Types
It's Still C++
Mixing Managed and Native Development
Roadmap to .NET
Conclusion

Why Use the Common Language Runtime?

Microsoft has always made it possible for programmers to access operating system features from their own code. In the early days of Windows, we used the Windows API from our C programs, making function calls (such as GetMessage(), TranslateMessage(), and so on) to achieve our ends. Over time, new Windows functionality was exposed in COM components such as Shell. To use the full power of Windows, programmers learned COM concepts and created COM applications. This evolution will continue, and to use the full power of WinFX, your applications will use the Common Language Runtime, or CLR. WinFX is a managed API, designed to be called from managed applications written for the CLR, and it is through WinFX that your applications will tap the functionality of new technologies like Avalon and Indigo. WinFX is based on the .NET object model.

Access to operating system features is not the only draw for the CLR, of course: the .NET Framework provides a better and easier component model than COM and DCOM. The libraries of managed code that are provided with the .NET Framework offer functionality above and beyond that of the operating system, so you can concentrate on the parts of your application that are specific to your situation rather than working on problems that many people have solved before. As well, you can build a component-based solution without much of the difficulty that has been traditionally associated with COM and DCOM deployment.

A Review of the Languages Involved

Does this mean you should rewrite all your applications? Of course not. If you wrote your applications in C++, you can compile them for the CLR today without code changes. Rewriting carries a huge risk, since you could introduce bugs into working code just by trying to port it to another language such as C#. The best you can hope for is to spend a lot of time rewriting and translating your code in order to achieve the same functionality as before. There's just no need to do that when you want to target the CLR. Instead, your time and effort can be spent on using new functionality and expanding your application's capabilities.

In the years since the .NET Framework, the CLR, and C# were announced, many developers have wondered about Microsoft's plans for C++. Some have speculated that C# is a replacement for C++, but it most certainly is not. C# is a language that is easier to learn than C++, and provides access to the functionality of the CLR. For those who already know C++, there's no need to learn anything to gain access to the functionality of the CLR, and C++ has features that are not in C#, so moving would actually involve giving up some power.

Every version of Microsoft Visual C++ is more standards-compliant than the one before, and the current release, Visual C++ .NET 2003, is about 98% compliant with the ISO standard for C++. The keywords to access CLR capabilities in this version all start with double underscores, so as not to interfere with standards compliance. While this approach works, it's awkward and not as intuitive as it could be. A new binding of C++ to .NET, which is being standardized under the name C++/CLI, is in store for Visual C++ 2005. This revision contain keywords that are not part of the current C++ standard, but also do not interfere with standard-conforming C++ programs because they are largely conforming pure extensions to ISO C++. An international standard for the C++/CLI extensions is being developed by ECMA and will eventually be submitted to ISO. Like C# today, C++/CLI will be standardized, so that Microsoft will not be the only source of compilers for C++/CLI. Visual C++ 2005 is in beta now, so you can explore the new extensions right away. In this whitepaper, the new C++/CLI syntax will be used in code examples. (CLI stands for Common Language Infrastructure and is the standardized portion of the .NET Framework, including the .NET Common Language Runtime.)

Understanding C++ on .NET

When you write code to run on the CLR, you're writing managed code. Standard C++ code, the kind that would compile in any standards-compliant C++ compiler, can be compiled either to unmanaged (native) code or to MSIL: you just use a compiler switch. Specify the /clr option and the compiler generates MSIL, an assembly that will run on the CLR. That's what makes your code managed: the compiler option. You don't need to use any special keywords or change your code much, if at all, to compile cleanly with the /clr option.

Once you're writing managed code you can, if you wish, use CLR features such as the Base Class Libraries: powerful classes for XML manipulation, cryptography, data access, and the like. Unmanaged code, compiled without the /clr option, can't declare instances of managed classes and call their methods directly the way managed code can. You can access managed code from unmanaged code through .NET Interop, which makes a .NET object appear to be a COM component. That's a slower-running approach than compiling your code to managed and calling other managed code directly.

Whether you use the Base Class Libraries (and other managed libraries) or not, you're still writing in C++ and you still have all the power and flexibility of C++ available to you. You can use templates, write operator overloads, create inline functions, and so on. Compiling to MSIL does not preclude you from using any C++ functionality. Multiple inheritance, for example, is not ruled out just because you're compiling to MSIL, just because you're writing managed code.

Managed and Unmanaged Types

An ordinary C++ class, of the kind taught in introductory language courses, defines an unmanaged type:

class A
{
private:
   int x;
public:
   A(int xx): x(xx) {}
};

Whether you compile this code with or without the /clr option (managed or unmanaged code) this is an unmanaged type, also referred to colloquially as an unmanaged class or unmanaged data. Instances of this class can be allocated on the stack, again as taught in introductory language courses:

A something(3);

They can also be created on the native or unmanaged heap:

A* otherthing = new A(4);

The programmer must then remember to clean up the object on the unmanaged heap, using the delete operator:

delete otherthing;

Either way, the garbage collector will never be involved, even if the code is compiled to MSIL and the application is run on the runtime.

You may, however, want to author managed types (managed classes, managed data). These types can be called from other assemblies, other managed code running on the runtime, no matter what language those other assemblies were written in. Code written in C#, in Visual Basic .NET, or in a language you've never heard of that happens to compile to MSIL, can use your managed types. The interactions are managed by the runtime and so, in most cases, are the lifetimes of the instances of those types created by your code and by those other assemblies.

These managed types are first-class .NET objects. You create them in C++/CLI using a natural syntax that differs very little from traditional C++:

ref class R
{
private:
   int x;
public:
   R(int xx): x(xx) {}
};

This class definition uses a spaced keyword—a keyword that happens to contain a space in it. Technically, there is no ref keyword in C++/CLI but there is a ref class keyword. This means that you can have a variable called ref without conflict. And this class, R, can be used by code written in C# or Visual Basic .NET, or any other .NET-supported language. You can write a class library in C++/CLI, using the skills and techniques you've developed as a C++ programmer over the years, and use this library in applications that are not written in C++ at all. They just have to be applications that run on the CLR.

It's Still C++

Moving to the CLR does not have to mean moving to C#. Many C++ developers moved to C# when it was released. Reasons varied: wizard and designer support is more available for C#, management often supported the new language just because it was new, and some developers didn't realize that managed applications could be created in C++. But many developers resisted that move, even to the extent of resisting the move to the CLR entirely. A common theme in the reasons for resistance: "I like C++." C++ has features that other languages do not, like true deterministic destruction and templates. When you choose to write managed code in C++, you get all the C++ features and all the CLR features: the best of both worlds.

Deterministic Destruction

In other .NET supported languages, such as Visual Basic .NET, C#, or the managed extensions to C++ in Visual C++ .NET 2003, the location of an instance depends on the type you are creating. If you're creating an instance of a managed type, you create it on the managed heap:

Dim o as new R(3)  ' VB.NET
R o = new R(3); // C#
R* o = new R(3); // managed extensions for C++

The memory used by the instance (o in all these examples) is managed by the runtime and may be cleaned up or reorganized by the garbage collector.

In contrast, if you create an instance of a value type, in all three languages the instance is created on the stack:

Dim i As Int32 = 7  ' VB.NET
int i = 7; // C#
int i = 7; // managed extensions for C++

Only in C++/CLI do you gain the freedom to decide for yourself where your objects are created and whether you want to have their memory managed by the runtime. You can create an instance of a reference class (one defined with the ref class keyword as shown earlier) on the managed heap, like this:

R^ o = gcnew R(3);  // C++/CLI

If you prefer, you can create that instance on the stack:

R os(3);

The difference between o and os is their lifetimes, or specifically the amount of control you have over their lifetimes. If you're writing managed code, you probably don't mind giving up control over memory and trusting the runtime and the garbage collector to look after memory for you. But developers still care about non-memory related cleanup: closing files or connections, for example. Garbage collection alone is not enough to handle all the resources you use in an application. In C++, this non-memory cleanup typically happens in the destructor.

The object on the managed heap, accessed through the handle o, begins to exist when control reaches the line with gcnew. At some point in the future, o will go out of scope. Maybe control will pass out of the block in which it was declared with a return or exit statement, maybe the block is the predicate of an if, for, or while statement and control leaves it in the usual way, or maybe an exception is thrown. Whatever the reason, o will go out of scope. At this point, things become a little fuzzy. If any code took a copy of the handle, and the copy is still around, then the object will continue to exist on the managed heap as long as there is a handle to it in scope somewhere. If all the handles have gone out of scope, then the object is eligible to be collected, but the exact moment at which it will be collected is unknown, and therefore the moment when the destructor will run is unknown. It depends on things like the amount of memory pressure your application is exerting.

Things are very different for the object on the stack, os. Once that goes out of scope (according to the same occurrences that might have sent o out of scope,) it's all over for the object. Its destructor, if it has one, runs right at the instant that os leaves scope. You know exactly when your non-memory cleanup happens, and it happens as soon as possible. This is deterministic destruction.

By the way, the instance os (which you think of as being on the stack) is actually consuming memory on the managed heap (which is still managed by the garbage collector). The destructor does not collect the memory used by the instance; it takes care of non-memory cleanup. The reference type can only simulate being on the stack. If you've done a good job of forgetting all about memory management and trusting the garbage collector to handle that, the simulation becomes a very good one.

The using construct in C# provides a similar ability, but automatic scope in C++ is simpler: you write less code and you can't forget to do it. There's a nice mapping between the destructor in C++ and the Dispose() method you might write in a managed type in some other language: they're the same. When C# code uses your managed type, and calls Dispose(), it's actually the destructor that runs. When C++/CLI code uses a managed type that wasn't written in C++ and creates it on the stack, when the instance goes out of scope there won't be a C++ destructor to run, but the Dispose() method will run. As far as the C++ developer is concerned, it's all deterministic destruction. That means if I take the following C# code:

{
   using( System::Data::SqlClient::SqlConnection conn
     = new System::Data::SqlClient::SqlConnection(connString) ) {
      // work with the connection in some way
      // including code that might throw an exception
      using( System::Data::SqlClient::SqlCommand cmd
        = new System::Data::SqlClient::SqlCommand(
              queryString, conn) ) {
         // work with the command
      // must write "using"s to call Dispose or Close
      }
   }
}

In C++ I can write the code like this:

{
   System::Data::SqlClient::SqlConnection conn(connString);
   // work with the connection in some way
   // including code that might throw an exception
   System::Data::SqlClient::SqlCommand cmd(queryString, %conn);
   // work with the command
   // don't call Dispose or Close explicitly
}

The SqlConnection and SqlCommand objects implement IDisposable, but a C++/CLI programmer doesn't need to remember to call Dispose(). Less code and less forgetting are natural benefits of the destructor mechanism in C++. Using this mechanism on the CLR is natural and intuitive, and doesn't require libraries that are written in C++ or that implement destructors.

Templates

The same aspects of C++/CLI that make it possible to create an instance of a managed type on the stack also enable you to use managed types with traditional C++ templates:

set<String^>^ SetofStrings;
. . .
String^ s = "Hello World";
SetofStrings->insert( s );

That means that everything in STL is available to you when you're working with managed types, your own or from the Base Class Libraries. C++/CLI comes with some helper templates, including auto_close<>, a variant on a smart pointer that calls the Close method of the instance it wraps, and marshal_as<>, which converts related types such as System::String and std::string.

Mixing Managed and Native Development

If you're hesitant to recompile all your standard unmanaged C++ code with the /clr option, don't worry! It's straightforward to mix and match managed and native (unmanaged) code within an application.

From managed code, you can call any existing unmanaged code just as you would unmanaged-to-unmanaged. Include the header file, link to the lib, and you're ready to go. It doesn't matter whether you're calling an old C library, a C++ library, or a COM API. It doesn't matter what other libraries being used by the code you're calling. Just #include and go on as always. Your code will make the transition between the managed calling code and the unmanaged target code, then back to managed on the return, in the highest-performing manner. No other .NET-supported language has this option available. This capability, formerly known as It Just Works interop, is now called C++ Interop.

From unmanaged code, all managed types are available to you, disguised as COM types. The regasm utility produces and registers a type library for the managed type, and you use compiler support (such as #import) to access the component from native code. There is a performance penalty associated with this choice: you might choose instead to compile the calling code with /clr and access the managed type directly through the runtime. In keeping with the philosophy of C++, it's your choice.

Roadmap to .NET

C++/CLI gives you tremendous choice for how to access the functionality offered by the CLR (and by WinFX). If you have a new application to write, the advantages of writing it in C++/CLI to target the CLR are overwhelming. You can have robustness and reliability, access to important managed APIs, and developer productivity from available managed libraries, without giving up access to native libraries or to any C++ paradigms. This decision is clear: write new applications for Windows in C++/CLI as managed applications.

What about existing applications? Some applications are at the end of their lifecycle: you don't intend to add functionality or make significant changes. Users are unlikely to ask for integration with other applications. These applications need not move to .NET. They'll run on Microsoft Windows code-named "Longhorn," when the time comes, so you can just let them be.

That leaves existing applications that are not at end of life. You intend to maintain and enhance these applications and you probably plan to use managed libraries for those enhancements. Or perhaps you would like to target the CLR to ease the deployment troubles often faced by large, distributed applications. In essence, you have three choices: Rewrite, Integrate, or Migrate.

Rewriting is the riskiest approach. You must go through your code and identify libraries and constructs that have equivalents in the .NET world. For example, you may be using MFC to build a user interface for a Windows application: the .NET equivalent would be Microsoft Windows Forms today, and Avalon in the "Longhorn" timeframe. You may be using ADO or the MFC data access classes to work with data: the .NET equivalent would be Microsoft ADO.NET using DataSet or DataReader classes. The effort to identify these libraries and constructs is significant. Then you rewrite large portions of your application, potentially introducing bugs as you do so, and you test it extensively. This takes time and costs money, but when the work is complete you have a modern application, written entirely in C++/CLI and using managed libraries at every opportunity.

If your current code base is old, has been updated several times already, or is not well understood by your development team but must be enhanced or updated in some way, rewriting may be the best approach for you. You may also need to take on some rewriting if you must create verifiable assemblies for use in a partial trust situation. While it's the first plan that occurs to most developers, it should actually be your last resort. If your code is clean, well documented, and performs well today, you should resist rewriting as much as possible.

Integrating involves thinking of all the .NET libraries (the Base Class Libraries, WinFX, Indigo, Avalon, and so on) just as new libraries for you to use. You leave your existing code base alone, and extend it with new modules that use these new libraries. You might recompile your backbone application into MSIL, making direct calls on the runtime into the new libraries and using C++ Interop to access your old libraries. Alternatively, you can leave your backbone as native (unmanaged) code and use COM Callable Wrappers around the managed types to expose them to the old code. Some performance testing will help you make this choice. You can compile as much or as little of your code base with /clr as you choose.

Migrating sits between these two extremes. It works best when your application is already divided into components or layers. (A monolithic application is hard to maintain, so you might want to componentize your application anyway, but such refactoring is a form of rewriting that carries risks.) Then you can wrap each component in C++/CLI code that exposes managed types. A new backbone application can use these managed types, or the new managed types can interact with the .NET libraries. Over time, the native "cores" inside the wrappers can be rewritten as fully managed code that uses only managed libraries, if there are compelling performance or code security reasons for doing so.

Conclusion

Whatever kind of development you're doing: writing a new application from scratch, maintaining an existing application that needs very little work, or breathing new life into an old application that has seen better days, the .NET Framework can make your job easier. Options such as integrating your application with the CLR or migrating your application to the CLR are more feasible when you work in C++/CLI than they would be if you ported the application to C#. C++/CLI gives you all the power and flexibility of C++ on the CLR, gives you true deterministic destruction even for managed types, and gives you the highest-performance interop. It's a natural choice for any C++ programmer who has an application that should target the CLR.

 

About the author

Kate Gregory is a Microsoft Regional Director, a Visual C++ MVP, and the author of Microsoft Visual C++ .NET 2003 Kick Start. Gregory Consulting provides consulting and development services throughout North America, specializing in software development, integration projects, technical writing, mentoring, and training with leading-edge technologies.

© Microsoft Corporation. All rights reserved.