Effective Memory, Storage, and Power Management in Windows Mobile 5.0

 

Christian Forsberg
business anyplace

March 2006

Applies to:
   Windows Mobile version 5.0
   Microsoft Visual Studio .NET 2005
   Microsoft .NET Compact Framework version 2.0

Summary: Learn about how to manage memory, storage, and power in your native and managed applications running on a Windows Mobile 5.0–based device. This article gives you an overview about the memory, storage model, and device hardware. It gives concrete advice and best practices to help your applications make the best use of the limited resources on devices like Pocket PCs and Smartphones. (24 printed pages)

Contents

Introduction
Memory Management
Persistent Storage
Power Management
Performance
Tools and Resources
Conclusion

Introduction

Managing available resources has always been a critical issue for developers, and that is even more important when developers are writing applications for Pocket PCs and Smartphones. In the early days of developing for desktop computer versions of Microsoft Windows, developers talked about well-behaved applications. That term meant that applications needed to act in a certain way not to make life harder for the operating system and other simultaneously running applications. To a large extent, writing well-behaved applications for Microsoft Windows Mobile version 5.0 means that the application manages memory and power in an efficient way. The situation today on the desktop computer is completely different from the early days, and the average developer regularly forgets to check for successful memory allocations without any serious consequences. On a mobile device, this is still a critical issue.

Even if the rules of memory management on Windows Mobile 5.0 are not too difficult to understand or conform to, it is important to understand them to make your application well behaved.

Memory Management

This article begins by looking at the memory architecture of Windows Mobile 5.0. Windows Mobile 5.0 is based on Microsoft Windows CE version 5.0, which is a 32-bit operating system that shares many of the attributes of desktop computer operating systems like Microsoft Windows XP. Because 32 bits can address a total memory of 4 GB, this is also the total space that Windows Mobile 5.0 can address. So far, the memory model is identical with the memory model of Windows XP. The similarities continue with the division of the total memory between the operating system and the applications. As shown in Figure 1, the operating system has a reserved area of 2 GB (hexadecimal addresses 8000 0000–FFFF FFFF) in the upper address space where only code with privileged access (referred to as kernel mode or KMode) can run. This area is often referred to as the kernel address space.

Figure 1. Virtual address space

The lower 2 GB (0000 0000–7FFF FFFF) is the user address space, and this is where the similarities with Windows XP stop. On the desktop computer, applications can use all of this area; in Windows CE, this area is divided into an application space, a reserved area, and a large memory area. The application space (0000 0000–03FF FFFF) is used by the currently active (running) process and the loaded ROM DLLs. As the name implies, the large memory area (4200 000–7FFF FFFF) is used for large memory allocations, such as memory-mapped files. This area is also used for the object store, but it is minimized to 32 KB in Windows Mobile 5.0 (for more information, see the Persistent Store section in this article). The user address space is actually divided into 64 equal parts of 32 MB each called slots. The first two slots (0 and 1) are the application space that includes the currently active process (slot 0) and the loaded execute in place (XIP) DLLs located in ROM (slot 1).

As shown in Figure 2, the upper part of the user address space (slots 33–63) is the large memory area that includes things like memory-mapped files and resource-only DLLs. In a managed application, this is the place where all assemblies (including Mscorlib.dll and the application's executable file and DLLs) are loaded.

Figure 2. User address space

The reserved area shown earlier in Figure 1 actually contains all of the other loaded processes in the system (slots 2–32), as shown in Figure 2. Each new process is loaded in one of these process slots, and when it is running, it is copied (by means of some simple aliasing of the virtual address space) to the active process slot (0).

Considering the fact that many devices have several processes loaded already when they are started (such as filesys.exe and device.exe) and that the user probably wants to use other applications in parallel, the actual number of processes that your application can occupy is more limited than that. The application space, shown in Figure 3, includes the first two slots (0 and 1). In the second slot (1), the DLLs located in ROM are run. When loaded, they are available to all processes. A good example is COREDLL.DLL, which is always loaded at the top of slot 1 and is always available to all running processes. In a managed application, this is where the common language runtime is loaded. (The runtime consists of the DLLs MSCOREE.DLL, MSCOREE2_0.DLL, and NETCFAGL2_0.DLL.)

Figure 3. Application space

Slot 0 holds the currently running process, and just above a small reserved area (guard section of 64 KB), it includes the executable code and data (starting at address 0001 0000). It also includes the virtual memory allocations, such as the application heaps and thread stacks. In a managed application, this is where the application domain heap, just-in-time (JIT) compiler heap, and garbage collection heap are located. The numerous heaps are created to avoid memory fragmentation.

On the desktop computer, it's extremely rare for a memory allocation to fail. On a device, it's very possible for that to happen.

Another important thing to understand is that all virtual memory allocation reservations are aligned to 64-KB blocks. For example, if you make 512 reservations of 4 KB, you will use all of the process memory (the 32 MB of slot 0). For details about how this is done, see the fourth rule in Writing Memory-Efficient Applications. However, within those reservations, the allocation can be committed for each memory page (4 KB).

The process heap works in a similar way. It first reserves a larger piece of memory that is later committed on a page basis. As shown earlier in Figure 3, the general virtual memory allocations occur from the bottom upward. The opposite is true for the DLL virtual memory allocations. They occur from the top downward. At the top of slot 0, the RAM–based DLLs are loaded along with DLL–related memory allocations (which include read/write data used by the ROM DLLs and ROM DLL overflow from slot 1). ROM DLLs are loaded aligned to 64-KB blocks, which means that each DLL will occupy at least 64 KB of memory.

For example, if your application includes three DLLs with a size of 20 KB each, the memory used will be 192 KB when they are loaded. However, if these DLLs are combined into a single DLL of 60 KB, it will occupy only 64 KB of memory when loaded, and you will save 128 KB.

As already mentioned, the 32-MB process memory can be a real limitation. If your application needs an even larger memory allocation—for example, 64 MB—it isn't possible to allocate that memory in the process memory. The memory allocation actually occurs, but in the same memory as the memory-mapped files (see Figure 1 and Figure 2 earlier in this article). The threshold is 2 MB, so if the request memory allocation is larger than 2 MB, that memory area is used.

The same rule applies as for the smaller memory allocations: you first reserve a larger memory area that is later committed when needed. An important aspect of these larger memory allocations is that they can be accessed by all processes (which also impose some security issues because you should not put any data here that needs to be kept secure and not visible by other processes). For example, you can allocate a large memory area and make it available to other processes by sending them a Windows message or a named event with the address of the allocation. However, this option should be used with care because free memory areas may differ between devices.

In many situations, a better option for large memory allocations is to use memory-mapped files. You can make memory-mapped files work very much like ordinary file system files, including the ability to create, open, write, read, seek, close, and delete them. Therefore, a memory-mapped file can be a great alternative to any temporary file that your application can create and use only during the execution of your application. Because memory-mapped files are created in the same memory area as the large memory allocations described previously, they share the same attribute of being available to all loaded processes.

Memory-mapped files are therefore another option for sharing data between processes. Additional options include the use of TCP sockets, Message Queuing, and point-to-point message queues. For an excellent example of using point-to-point message queues from managed code, see the article Point-to-Point Message Queues with the .NET Compact Framework.

When looking closer at memory management in managed applications, some rules are the same, and some others apply. Because each managed application is a process, both of the first two rules (each process has 32 MB, and a maximum of 32 processes can be loaded at the same time) also apply to Microsoft .NET Compact Framework applications. The rules related to native memory allocation apply only if native function calls (platform invokes) are made to allocate native memory. Memory-mapped files are clearly also a valid option for a managed application, as discussed in more detail later in this article.

A lot of care has been put into the .NET Compact Framework regarding memory management. No RAM is used until an application is started, and RAM is quickly freed when the application closes. Exceptions are always thrown when the wrong (not owned) memory is accessed. The .NET Compact Framework is also designed to make applications continue to run even when memory is low. If the application needs more memory than what is available, it will be correctly closed, and all resources will be released. The .NET Compact Framework itself should not fail because of low memory.

To implement managed applications that manage memory efficiently, the most important concept that you should understand is how the garbage collector works. The blog post An Overview of the .Net Compact Framework Garbage Collector by Steven Pratschner is a good place to start learning. In general, the garbage collector will assume that you are using good programming practices like declaring and using resources as late as possible and releasing (disposing of) them as soon as possible.

Also, a typical issue to be aware of in managed code is that operations like boxing and some string manipulations will cause managed objects to be implicitly created. These implicitly created objects often easily outnumber the objects that you create explicitly. A typical example is the [] operator of the Hashtable class, as shown in the following code example.

public class Hashtable 
{
    public object this[object key] { get; set; }
}

The operator needs to be implemented as shown to allow any type of key. However, if the Hashtable class is used with integer keys, a boxing occurs every time an item is requested. If this boxing occurs frequently in an application, it rapidly creates thousands of small managed objects that need to be cleaned up.

The garbage collector does this cleaning up (memory management) in a managed application. A lot of effort has been put into making this construct as intelligent as possible because the task of automatic memory management is far from trivial. The developer can trigger a garbage collection (release of memory occupied by unused resources) by calling the GC.Collect method, but a call to this method is generally not needed because the garbage collector usually determines when a garbage collection is needed.

When programming native code by using C++, you should generally release resources in a class destructor. In C#, a finalizer should be implemented only when absolutely necessary—mainly because unneeded finalizers will make the release of allocated memory happen later than necessary. When a class has a finalizer, the finalizer is called on the first garbage collection, but the memory is not collected until another garbage collection runs. Also, finalizers cannot safely interact with any managed resources because there is no way to know if the managed resources exist at the time of finalization.

Writing Memory-Efficient Applications

From the previous discussion, you can summarize ten important rules that apply to memory management:

  1. Each process (application) can occupy a maximum memory of 32 MB.
  2. The system can run no more than 32 processes at a time.
  3. Always check for successful return values from all memory allocations.
  4. Rather than doing many small reservations, make a large reservation (64 KB aligned), and then commit for each needed memory area (4 KB aligned).
  5. Combine smaller DLLs into larger ones (64 KB aligned).
  6. Large memory allocations (larger than 2 MB) are made in the same area as the memory-mapped files (large memory area).
  7. Use memory-mapped files as an alternative to large memory allocations and temporary files, and also to share memory between processes (applications).
  8. Declare and create instances as local as possible (no class-level variables if they are needed in only one method), and call Dispose as soon as you finish using an instance that implements the IDisposable interface.
  9. In most cases, avoid calling the GC.Collect method.
  10. Implement the IDisposable interface and finalizers correctly.

Now that you have a number of rules for memory management, you can examine how to follow these rules in practice.

The general approach to conform to the first rule is to be as efficient as possible in your use of memory and also to realize that the process memory is clearly limited.

For the second rule, the consequence is simply to load as few processes as possible. On the desktop computer, it might be a good idea to split up your business applications with a smaller starter (menu) application that starts the other subapplications—one for each menu command. But even if this technique might have some advantages for distribution (for example, you need to update only the subapplication), this is not a very efficient solution on a Windows Mobile–based device because the menu and all of the subapplications will occupy a process slot each.

The third rule is about the checking of return values from each memory allocation. It simply means that the returned memory pointer is checked not to be null, as shown in the following code example.

pMemory = VirtualAlloc(...);
if(!pMemory)
    // Raise error here

The preceding code example introduces the basic memory allocation function, VirtualAlloc, which allocates virtual memory (which is released by means of the VirtualFree function). All memory allocations use this function. For example, when a process (application) starts, it is used to set up the process heap that is used for local memory allocations by means of the head allocation functions (LocalAlloc, LocalReAlloc, and LocalFree). The heap allocation functions are the easiest to use, and for many smaller memory allocation situations, they are the best choice. However, because you have no control over when the actual virtual memory allocations (VirtualAlloc) and releases (VirtualFree) occur, the heap is easily fragmented. A solution might be to set up your own private heap by using the private heap functions, as shown in the following code example.

HANDLE HeapCreate(DWORD flOptions, DWORD dwInitialSize,
                  DWORD dwMaximumSize);

BOOL HeapDestroy(HANDLE hHeap);

LPVOID HeapAlloc(HANDLE hHeap, DWORD dwFlags, DWORD dwBytes);

LPVOID HeapReAlloc(HANDLE hHeap, DWORD dwFlags, LPVOID lpMem,
                   DWORD dwBytes);

BOOL HeapFree(HANDLE hHeap, DWORD dwFlags, LPVOID lpMem);

The last three functions work much like the heap allocation functions. The main point of creating a private heap is that it gives you more control over how the allocations occur. For example, the function to delete the private heap (HeapDestory) will release the memory despite any unreleased (by means of HeapFree) allocations.

To give you total control, you have to use the basic virtual memory allocation function (VirtualAlloc), and that takes you to the fourth rule. The following problematic example code can show an issue with this function.

INT i;
PVOID pMemory[512];

for(i = 0; i < 512; i++)
     pMemory[i] = VirtualAlloc(0, PAGESIZE, MEM_RESERVE | MEM_COMMIT,
        PAGE_READWRITE);

Because each call to VirtualAlloc will be aligned to 64 KB, this code will try to allocate a total of 32 MB (not the 2 MB that you might expect); it will obviously fail because this is the size of the total process memory. Conforming to rule four means that you will instead use something like the following code example.

INT i;
PVOID pReserve, pMemory[512];

pReserve = VirtualAlloc(0, 512 * PAGESIZE, MEM_RESERVE, PAGE_READWRITE);
if(!pReserve)
    // Raise error

for(i = 0; i < 512; i++)
{
    pMemory[i] = VirtualAlloc(pReserve[PAGESIZE * i], PAGESIZE, MEM_COMMIT,
        PAGE_READWRITE);
    if(!pMemory[i])
        // Raise error
} 

In the preceding code example, the desired 2-MB memory area is first reserved, and then aligned to memory pages (4 KB) when committed. Note that the reservations does not affect any physical RAM; physical RAM is allocated only after the commitment.

The fifth rule indicates that you should aim to combine your smaller DLLs in a way that they are as large as possible within a 64 KB maximum. Obviously, just as for processes, the fewer DLLs that you need to load, the better.

The following code example illustrates the application of rule six.

PVOID pReserve, pMemory;
DWORD dwSize = 6 * 256 * PAGESIZE;

pReserve = VirtualAlloc(0, dwSize, MEM_RESERVE, PAGE_NOACCESS);
if(pReserve) 
    pMemory = VirtualAlloc(pReserve, dwSize, MEM_COMMIT, PAGE_READWRITE);

Because the first reservation call has a size larger than 2 MB (actually, 6 MB), the memory will be reserved in the (upper) large memory area available to all processes. The second call to VirtualAlloc simply commits all of the reserved memory. However, just as for the smaller allocations in the process memory, large memory reservations can be committed for each memory page.

Because memory is a limited resource on a device, a general rule is to always release memory as soon as possible. You can release memory by using the VirtualFree function, as shown in the following code example.

VirtualFree(pReserve, 0, MEM_RELEASE);

You can use the same function to uncommit committed memory pages by using the MEM_DECOMMIT flag as the last parameter.

VirtualAlloc is easy to use, but it allocates a great deal of memory. If you anticipate allocation patterns that use small amounts of memory for variable durations, but more than you can put in the local program heap, a better choice might be to set up a private heap, as mentioned previously.

To conform to the seventh rule, you can use memory-mapped files as an alternative to VirtualAlloc for larger memory allocations. As the name suggests, a memory-mapped file is a file that is mapped into memory. This mapping can be accomplished by means of a physical file in the file system to work as an in-memory cache of (a part of) the file (which was actually the initial purpose of memory-mapped files in Microsoft Windows NT). More interestingly, the memory-mapped file can be created without the connection to a physical file in the file system. It is then given a unique (systemwide) name. Any of the other loaded processes can use this name to access the same memory-mapped file, thereby enabling interprocess communication. To create a named memory-mapped file, you can use the following code example.

HANDLE hFileMapping;
PVOID  pMemory;

hFileMapping = CreateFileMapping((HANDLE)INVALID_HANDLE_VALUE,
    NULL, PAGE_READWRITE, 0, 0x200000, TEXT("MyMMF"));
if(hFileMapping)
    pMemory = MapViewOfFile(hFileMapping, FILE_MAP_WRITE, 0, 0, 0);

Because an invalid file pointer (INVALID_HANDLE_VALUE) is passed as the first parameter to the function (CreateFileMapping), the memory-mapped file will not be connected to a physical file in the file system. The third parameter (PAGE_READWRITE) enables both read and write access to the file (memory), and the fifth parameter is the maximum size. The last parameter is the name of the memory-mapped file (which any other loaded process can use to access the same memory).

If the creation of the memory-mapped file is successful, a pointer to the allocated memory is retrieved by another function call (MapViewOfFile) with the handle to the memory-mapped file as the first parameter, followed by the access (FILE_MAP_WRITE states both read and write access) and the file offset (high order and low order as separate parameters). The last parameter is the number of bytes to map (zero means that the whole file should be mapped). This way, you have created a memory pointer (pMemory) that can be used just like any other memory pointer.

The memory object that the preceding code example creates doesn't actually commit 2 MB of RAM. Instead, only the address space is reserved, and pages are automatically committed as they are accessed. An application can therefore create a huge, sparse array of pages that occupy only as much physical RAM as is needed to hold the data.

When you finish using the pointer and memory-mapped file, you can use the following code example to release the memory pointer and to close the memory-mapped file.

UnmapViewOfFile(pMemory);
CloseHandle(hFileMapping);

One implementation of a memory-mapped file for use with the .NET Compact Framework is the MemoryMappedFileStream class in the Caching Application Block included in the OpenNETCF Application Blocks 1.0, which is a conversion of the desktop computer application blocks. Just like any other stream, the implementation enables the use of the memory-mapped file, as shown in the following code example.

const int maxLength = 1024;
string data = "This is a test";

using(MemoryMappedFileStream fs = new MemoryMappedFileStream("MyMMF",
    maxLength, MemoryProtection.PageReadWrite))
{
    fs.MapViewToProcessMemory(0, 0);
    fs.Write(Encoding.ASCII.GetBytes(data + "\0"), 0, data.Length + 1);
}

This managed code creates the memory-mapped file with a specified name, maximum length, and access. Then, it writes a string to the memory-mapped file just as you would to any ordinary file (stream) in the file system. In the code, the constructor of the MemoryMappedFileStream class uses the native CreateFileMapping function, and then the MapViewToProcessMemory function uses the native MapViewOfFile function to the memory pointer. The Write method actually uses the memory pointer to copy the passed string to the memory of the memory-mapped file.

If another process must read the same memory-mapped file, you can use the following code example.

const int maxLength = 1024;
string data;

using(MemoryMappedFileStream fs = new MemoryMappedFileStream("MyMMF",
    maxLength, MemoryProtection.PageReadWrite))
{
    fs.MapViewToProcessMemory(0, 0);
    byte[] buffer = new byte[maxLength];
    fs.Read(buffer, 0, buffer.Length);
    data = Encoding.ASCII.GetString(buffer, 0, buffer.Length);
}

The syntax for opening the already existing memory-mapped file is identical to the initial creation, and so is the retrieval of the memory pointer. The only difference is the read of the previously written string.

For special purposes, like writing device drivers, there is some important information about passing and mapping pointers (for example, about the MapPtrToProcess and MapCallerPtr functions) that you should investigate.

Moving on to the last rules that are specific to managed code, the eighth rule indicates that because you are not doing the memory management yourself, you need to be more careful with the memory allocations (declarations, instantiations, and disposals) that you have in your code. Even if there is no problem with calling the Dispose method explicitly, using the using keyword in C# is a very good practice because it declares and instantiates an object as late as possible, and it helps you dispose of the object as soon as possible. A typical example is a database connection, as shown in the following code example.

using(SqlCeConnection cn = new SqlCeConnection(connectionString))
{
  cn.Open();
    // Use connection
}

When the using clause is finished (after the last closing brace), the connection instance's (cn) Dispose method is called implicitly.

Conforming to the ninth rule is not very difficult because it means that you simply should avoid attempting to help the garbage collector in what it already does best—determining when it needs to check for memory that can be released. The main reason is that a manually forced garbage collection takes extra time that might decrease the performance of your application without providing much benefit. For more details, see the blog post The perils of GC.Collect (or when to use GC.Collect) by Scott Holden.

To help you conform to the tenth rule, you can use Table 1.

Table 1. When to implement the IDisposable interface and a finalizer

The class instance has allocated Implement IDisposable? Implement finalizer?
Only managed resources that don't implement IDisposable or have any way of being closed No No
Only managed resources, but some of them implement IDisposable or can be closed somehow Yes No
Any native resources Yes Yes

As shown in Table 1, there is no need to implement either the IDisposable interface or a finalizer if the class uses only managed objects that do not need to be disposed of. If only managed objects are used, and those objects need to be disposed of (that is, they use native resources), only the IDisposable interface needs to be implemented, and this implementation can look like the following code example.

public class ClassName : IDisposable
{
    // ...
    public void Dispose()
    {
        // Clean up managed resources (by calling their Dispose method)
    }
}

But if the class allocates any native resources (also often referred to as unmanaged resources), you need to implement both the IDisposable interface and a finalizer. The following code example shows a common pattern.

protected void Dispose(bool disposing)
{
    if(disposing)
    {
        // Clean up managed resources (by calling their Dispose method)
    }
    // Clean up unmanaged resources
}

public void Dispose()
{
    Dispose(true);
    GC.SuppressFinalize(this);
}

~ClassName()
{
    Dispose(false);
}

An extra Dispose method that takes a Boolean as a parameter is implemented. This Boolean indicates whether the class is being disposed of because the IDisposable.Dispose method is called or because the finalizer is run (both the finalizer and IDisposable.Dispose call to this method). If the class is being disposed of deterministically, GC.SuppressFinalize is invoked to remove the instance from the set of objects that require finalization, reducing the pressure on the garbage collector during a garbage collection. The Dispose method should avoid using managed objects outside the if block. Both the IDisposable.Dispose method and the finalizer execute the code outside the if block, and there is no way to know if a managed object still exists at the time the finalizer executes. Accessing managed objects from a finalizer can lead to intermittent failures that are difficult to reproduce.

Low-Memory Management

In general, you should take every available action to free memory as soon as it is no longer needed, and that becomes even more important in a low-memory situation. In a native application, you should look for the WM_HIBERNATE message in your application's main message loop, as shown in the following code example.

case WM_HIBERNATE:        
    // Free memory
    break;

When you receive the WM_HIBERNATE message, you should try to save the current state and then release all virtual memory allocations (VirtualFree); free as many Graphics, Windowing, and Events Subsystem (GWES) objects as possible (for example, windows, bitmaps, brushes, and fonts); free privately created heaps; and free (LocalFree) the entire process heap.

For a managed application, you set up an event handler to listen for the WM_HIBERNATE message as shown in the following code example.

Microsoft.WindowsCE.Forms.MobileDevice.Hibernate +=
    new EventHandler(MobileDevice_Hibernate);

The preceding code is probably placed in your main Windows constructor, and the event handler is implemented as shown in the following code example.

void MobileDevice_Hibernate(object sender, EventArgs e)
{
    // Free memory
}

In the previous code example, the place to release any native resources (any virtual memory allocations that you have made in your code by using functions like VirtualAlloc and LocalAlloc) is inside the method. Also, if you want to prevent the system from closing any of your hidden forms, adding code to the method in the preceding code example is the way that you can take control of releasing your application resources yourself. One of the most common actions that your code can take is to always dispose of objects (call Dispose on all objects that implement IDisposable) as soon as possible. This action will help the garbage collector, which will also respond to the WM_HIBERNATE message.

To determine the available memory from native code, you use the GlobalMemoryStatus function. The OpenNETCF Wiki Compact Framework Frequently Asked Questions/Determining Avaiable Memory shows how to call this function from managed code. Managed code developers can also use GC.GetTotalMemory to determine available memory. In addition, the Performance section later in this article provides more information about how to analyze your applications.

Memory Hardware

Manufacturers compete by offering different configurations of their mobile devices. For example, Windows Mobile 5.0 software supports a range of different screen resolutions (240 x 240, 240 x 320, 480 x 480, or 480 x 640 on Pocket PCs and 176 x 220 or 240 x 320 on Smartphones). An area that might not be as obvious is the memory configuration of different devices. Basically, the ROM is used for storing things like standard software and user data (as discussed in the following Persistent Storage section), and the RAM is used only for storing code and temporary data needed to run that code. The most common configuration is probably 64 MB of RAM even though Windows Mobile 5.0 will work in 32 MB of RAM.

Even if devices with 128 MB of RAM are offered, you need to consider that their RAM will consume twice the power compared to devices with 64 MB of RAM. Power that RAM consumes is proportional to the size of the RAM. ROM, however, consumes the same power regardless of size; therefore, the only limiting factor for choosing a larger ROM is the price. A common ROM configuration is 64 MB, and because the standard software occupies about 32 MB, you can store about 32 MB of user data. A device with 128 MB in ROM can store user data of about 96 MB, and a device with 256 MB in ROM can store user data of about 224 MB.

The preceding specifications will appear in product advertising, but an interesting detail that might not appear in advertising is which type of ROM is used. There are two types of ROM hardware used in a Windows Mobile 5.0–based device: NOR and NAND. NOR is faster to read but slower to write, and NAND is slower to read but faster to write. Another difference is that NOR supports XIP, which means that the standard software does not need to be loaded into RAM before it is run. Therefore, a device that uses NOR requires less RAM and loads standard software faster (but it runs somewhat slower). However, because NAND is less expensive and ideal for storing user data (because it writes faster), one possibility for a device is to have 64 MB of NOR (for standard software that can be executed in place) and 64 MB of NAND (for storing user data).

As an enterprise developer, you should probably be involved in selecting the target device for a specific solution because the choice will affect memory, storage, and power management in your design.

For more details about memory hardware and device configurations, see the blog post RAM, ROM, NAND, NOR--that's a lot of capital letters by Mike Calligaro.

Persistent Storage

Persistent storage is a new RAM/ROM paradigm that changes the way memory is managed, and it is important news for both users and developers. Persistent storage is actually not something completely new because Smartphones have had it since the first version back in 2002. However, the first Pocket PCs with persistent storage will be running Windows Mobile 5.0 software. Now, rather than using RAM for both storage medium and system memory, Pocket PCs will use RAM only for system memory and will use ROM for data storage. This configuration is more similar to the way a desktop computer works with its RAM and hard disk. Because all data on a device running Windows Mobile 5.0 software is saved in ROM, it is protected from loss if the battery is fully drained.

Why this protection from data loss is an advantage is obvious to the many users who have left their devices unused for more than a few days and lost all their data. On most such occasions, the data was probably synchronized with a desktop computer, but sometimes valuable data was forever lost. In addition, the effort of redoing all device settings (such as e-mail accounts), reinstalling all applications, and restoring the data was difficult. The absolute worst case with persistent storage is that you cannot use the device until it is recharged, but the data will still be there. Because the persistent storage file system is not cleared when the power is cut (from a completely discharged battery or removal of the battery), each device manufacturer will provide some mechanism for deleting the data if you really want to. The mechanism will be different from device to device, but every device will have it.

A very interesting side effect of persistent storage is that it also affects battery life. Before Windows Mobile 5.0, the RAM was always using the battery whether the device was turned on or off. The power that the RAM used was also proportional to the size of the RAM. Keeping the data in a RAM of 128 MB requires twice the power of 64 MB. With data saved in ROM, there is less need for RAM. For most scenarios, 64 MB of RAM will probably be sufficient, and reducing the RAM from 128 MB to 64 MB will cut the amount of power that the RAM requires in half.

Before persistent storage, it was critical not to drain the battery completely because data would be lost, and that resulted in the "72-hour rule" for previous Pocket PCs. The idea was that if you discovered on Friday that the battery was critically low, the device should have kept the data until Monday morning when you returned to the charger. The critically low level would actually turn the device off to preserve the power for at least 72 hours. Because a typical fully charged battery holds about 1,000 milliampere hours (mAh) and 128 MB of RAM uses about 500 mAh to stay resident for 72 hours, the device needed to be turned off when the battery was only 50 percent used. With persistent storage, the battery can be used until it is fully drained, doubling the usage time in the current example.

The reason why you never saw 256 MB of RAM in a Microsoft Windows Mobile 2003–based Pocket PC is that the device would run for a minute, detect that the battery was critically low, and turn off. With ROM, the power consumption is different. ROM uses the same battery power independent of size; a 32-MB ROM uses about the same power as a 1-GB ROM.

This news opens the possibility of devices with much larger storage space and without a battery-life penalty. Also, because ROM is becoming cheaper than RAM, the same storage space will cost less.

The information presented so far in this section can be summarized in the following benefits for persistent storage:

  • Data stored in ROM is not lost if the battery is fully discharged.
  • With less RAM required and the ability to use all of the battery, the effective battery life improves significantly.
  • Because ROM consumes power independently of size, devices can have larger storage without affecting battery life.

However, persistent storage also presents some challenges, and you need to make your applications adapt to those challenges.

The first, and probably most important, challenge is that accessing ROM is slower than accessing RAM. Reads are somewhat slower and writes are considerably slower. Writing to ROM is sometimes ten times slower than writing to RAM.

The question here is whether the slow process of writing to ROM will make the device (and your application) unbearably slow. You can answer that question by examining the way that the standard software works, and how it handles the slower ROM access. First, application load (read) times will not change at all because the standard applications have always been in ROM. Many of the writes are buffered, which means that the slower physical writes will not affect the user experience. Also, a user will not notice something slowing down from 1/100 to 1/10 of a second. In general, typical user scenarios like everyday synchronizing, reading e-mail, browsing the Web, and listening to music will not be much affected.

But when large amounts of data need to be written, there will be a considerable slowdown. Any user initially synchronizing thousands of e-mail messages and hundreds of contacts and calendar appointments will definitely see a difference.

Another challenge is the fact that ROM does not allow an infinite number of writes. RAM can change its content repeatedly without any negative effect on the hardware, but modern flash ROMs typically can be written 1 million times. This number, however, will not be a limitation to the users of your applications. Again, you can understand the reason by examining the way that the standard software works, and how it handles the ROM writing. Whenever possible, the writes to ROM are buffered by RAM and written in larger blocks. For example, because the registry and databases (including calendar and contacts) are buffered like this, you won't write to the ROM every time you mark an e-mail message as read. There is not as much buffering when files are being written, but at least the file system will wait until a block is full before it is written to the ROM.

You can therefore make a small change and save it, and if you immediately reset your device, that change may be lost. However, everything is flushed periodically and whenever the device is suspended. Also, the writes to ROM are distributed equally so that all of the ROM blocks are written the same number of times. The result is that writing the same file many times will not cause the number of writes for one block to be completely consumed before any other.

Despite the buffering and equally distributed writes, there is a theoretical possibility to find ways to reach the limit of ROM writes. Visiting thousands of Web pages per day with the temporary Internet files causing many writes to ROM is one such way. Although there are ways to work around this scenario (for example, disabling the cache or using a RAM disk) and other extreme scenarios, you should at least give the challenge of a limited number of writes serious thought when designing your solution.

The challenges of persistent storage can be summarized as follows:

  • ROM is slower than RAM, and writing to ROM is significantly slower than writing to RAM.
  • Flash ROM can be written only a limited number of times.

Writing Persistent Store—Enabled Applications

In general terms, most of the practices from desktop computer development apply to writing persistent store–enabled applications for mobile devices. Starting with general memory management, it is always better to do something in memory than in interaction with the file system. To a desktop computer developer, the obvious solution is to work with a stream rather than writing a file to disk. But due to the earlier RAM–based file system, Pocket PC developers often wrote something to a file (like the XML for a DataSet) just to read it back into a string. Now, Pocket PC developers should learn how to use memory streams.

Continuing with file access, the general rule of thumb is to prefer "chunky" before "chatty" writes, meaning that it is always better to open a file, write a large amount of data to it, and close it than to open the file, do a tiny write, close the file, open it again, do another tiny write, close it, and so on. When it is possible, try to buffer your own file operations. Also, many temporary files can be replaced with memory-mapped files. A class like MemoryMappedFileStream will have minimal impact on existing code when you are changing from file system–based file access to the use of memory-mapped files, and it can save your users many ROM writes while providing them better performance. One advantage with memory-mapped files compared to native large memory allocations is that the files can more easily be shared between processes. Another advantage is that the reserved memory pages are automatically committed as they are needed.

A practical detail is that on a Windows Mobile 5.0–based device, the default volume is the persistent storage volume. On previous generations of Pocket PCs, there was no default volume because the file system was RAM based (in the object store). On Smartphone 2002, the folder was \IPSM; on Windows Mobile 2003–based Smartphone, the folder is \Storage. Now, you should use the SHGetSpecialFolderPath function, as shown in the following code example.

SHGetSpecialFolderPath(NULL, szFolderPath, CSIDL_APPDATA, TRUE);

By using the same function, you can also get the My Documents folder (by using CSIDL_PERSONAL) and the Program Files folder (by using CSIDL_PROGRAM_FILES).

Moving on to database access, the general rule is to try to do as much data processing as possible in memory, which means a use of DataSet instances in preference to instances of the new SqlCeResultSet when you are doing many updates. Note that when you are doing only database reads, the choice is probably between a SqlCeDataReader instance (forward only) and a SqlCeResultSet instance (flexible navigation), depending on the navigational needs.

Another general recommendation for optimal storage management is to minimize both the size of your application and the amount of persisted data that it uses. You should always carefully consider the impact of including things that increase your application or storage requirements. For example, is a high-resolution image on your splash or About screen worth the storage requirements?

In addition, you should be careful with the frequency of altering persisted data such as many small updates to files, a database, or even the registry. (The new volatile registry keys feature in Windows Mobile 5.0 enables the creation of registry keys that exist only until the device is restarted. This feature is useful for sessionlike settings). In general, all periodic updates need to be questioned.

Object Store

If you read the Windows CE 5.0 documentation, you will notice that an entity called the object store still exists in the operating system of Windows Mobile 5.0. The object store is the RAM file system from previous generations of Pocket PCs. But with the file system in ROM, you may wonder why the object store is still in the documentation. Windows Mobile 5.0 still has an object store, but it is set to its minimum size (32 KB). The ability to configure it (that is, allow it to get any bigger) has been removed, and the object store itself effectively does not exist anymore (so it can't be seen). Even though Windows Mobile no longer uses the object store, other Windows CE 5.0–based devices still use it.

Power Management

Because managing memory and storage is closely related to managing the power on devices, it's important to know how to write power-efficient applications.

The persistent storage feature is a key factor in extending battery life on a Pocket PC running Windows Mobile 5.0 software—mainly because of the removal of the 72-hour rule that made the device turn off even if the battery still had half of its power. But even if the backlight, screen, and CPU sometimes used more power than the battery-backed RAM, the RAM was always using power, even when the device was suspended (all day and all night).

A benefit of persistent storage is that a user does not risk losing data just because the battery is fully discharged. However, modern batteries (like lithium ion and lithium polymer) can be damaged when they are fully discharged. You should inform users of this fact because otherwise, they might let their batteries become fully discharged more often than before (because they know that they will not lose their data).

In many respects, Pocket PCs and Smartphones are becoming more similar (for example, both now use persistent storage and soft keys), but there are some areas where they are still different. One is the touch screen and another is power management.

To simplify the discussion, the Pocket PC basically has two different states: on or asleep (named suspended in the documentation). When a Pocket PC is on, everything (for example, CPU, screen, and backlight) is running and consuming power. When the device is asleep, it uses very little battery power, and its applications simply are not running. The Smartphone also has two different states, but they are easier to comprehend: on or off. When a Smartphone is off, it does not use any battery power. When a Smartphone is on, the CPU is always active, even if it is using a minimum of power (and only a bit more than a sleeping Pocket PC) because it is not being used.

Because a Smartphone is always on (two words you will be hearing a lot more), it is vulnerable to any application that consumes a large amount of power (that is, any CPU-intensive application). Therefore, application developers are responsible for writing power-friendly applications to improve the user experience and the battery lifetimes of their devices. On a Pocket PC running Windows Mobile 5.0 software, you don't need to think too much about being power efficient because the sleeping mode effectively halts any CPU-intensive functionality that your applications implement. But there are two reasons for making your applications power efficient anyway:

  • You may want to use uniform code for Pocket PC and Smartphone.
  • There is a very high probability that Pocket PCs will get the same power scheme as Smartphones in the future.

Writing a Power-Efficient Application

The most important measure for minimizing an application's use of the battery is to make sure that the application uses the CPU only when absolutely necessary. Try to avoid all kinds of polling with infinite loops waiting for something to happen, and use an event-driven design so that your application is notified of any changes that it requests. The extreme is probably an infinite loop that uses the PeekMessage function—it will drain batteries quickly. Try to find an alternative solution to anything that involves a never-ending background thread, such as that shown in the following code example.

while(TRUE)
{
    // Do something
    if(bBatteryDeadYet)
    { break; }
}

The preceding code example is obviously devastating for battery life (and performance) because the CPU is active for 100 percent of the thread's scheduled time slice. Also remember that most of the code that your application executes is not your code—it is code located in ROM. Most of the code in ROM is compressed, so before that code can run, it must be decompressed into RAM, and decompression uses CPU and power.

You should also stop using the CPU as soon as possible. If a background thread is necessary or even if that cool animation on your splash or About screen is worth it, it usually is not a problem. For example, running the animation for a short time probably will not completely deplete the battery, but if the animation keeps running all night, it might waste a lot of battery power. Two important guidelines to focus on are to stop using the CPU when your application is not visible to the user (that is, not in the foreground) and when the user has not used your application for some time.

To act on whether your application is visible or not, you can put the following native code example in your main message loop.

case WM_ACTIVATE:        
    active = LOWORD(wParam);
    if(active == WA_INACTIVE)
        // Now in background, so stop using CPU
    else if(active == WA_ACTIVE)
        // Now in foreground, start using CPU (if needed)
    break;

In managed code, the preceding action would look like the following code example.

protected override void OnDeactivate(EventArgs e)
{
    // Now in background, so stop using CPU
}

protected override void OnActivated(EventArgs e)
{
    // Now in foreground, start using CPU (if needed)
}

The best way to determine whether the user has used your application for some time is to use a timer (the SetTimer function in native code, and the System.Threading.Timer class in managed code) that is reset each time a key is pressed or the stylus is used.

A more advanced solution is to let a construct called the Power Manager notify you of any state change that happens in the system (for example, when the backlight or the screen is turned off). Because the Power Manager does not send Windows messages, and there is no managed wrapper provided for receiving its events, it is a bit tricky to use for state-change notifications. However, a native sample (Animate3) is available, and a managed sample is provided by fellow MVP Alex Feinman.

If you want to learn more about power management on Windows Mobile 5.0–based devices, see the blog posts by Mike Calligaro named Power to the People, Power to the Developers part 1, Power to the System, and Power to the Developers part 2. The two posts focusing on developers include some great example code (both native and managed) that shows how to implement a power-efficient application.

As a final note related to Persistent Storage, the advice to avoid writing to ROM if possible also applies to being power efficient. Writing to ROM consumes more power than writing to RAM, so minimizing ROM writes will also minimize power consumption.

Performance

It is a good idea to start thinking about performance before you begin the development of your application. You should also regularly evaluate performance (include these evaluations in your project schedule from the start) throughout the development process. From the use cases (user scenarios with prerequisites) and other design documentation, you can create the performance criteria—for example, that opening your application should take less than 3 seconds, synchronizing with the server should take less than 30 seconds, or opening a specific form should take less than 300 milliseconds. Always try to evaluate your solution on hardware that is as similar as possible to the production environment.

There is often a difference between perceived (the user's perception of how long it takes) and absolute (how long it really takes) performance. The difference between the two is often related to the responsiveness of the user interface. If something blocks the user interface, and you consider doing the operation on another thread, remember that threads can be hard to synchronize correctly, that testing all possible scenarios is hard, and that a performance impact may result. One area especially important to consider is the performance related to the new persistent storage. Some other examples include that you should cache graphics device interface (GDI) objects (they should not be loaded in the Paint handler) and turn off caching when using WinInet functions.

When you analyze performance and if you like doing things your own way, you can use the two native functions GetTickCount (which is called by the managed equivalent Environment.TickCount) and QueryPerformanceCounter (which can be called from managed code). The main difference between the two functions is their granularity. GetTickCount gives you millisecond granularity, and even if QueryPerformanceCounter has platform-dependent granularity, it is as good as a microsecond. The following code example is a native example of using GetTickCount.

DWORD dwStartTime = GetTickCount();
// Do things to measure
DWORD dwMeasuredTime = GetTickCount() – dwStartTime;

The following code example is the managed equivalent of using Environment.TickCount.

int startTime = Environment.TickCount;
// Do things to measure
int measuredTime = Environment.TickCount – startTime;

The use of tick counts (as shown in the two previous code examples) for measuring is probably good enough for many profiling scenarios.

If you want more help in analyzing performance of your native applications, you can use Remote Call Profiler and Remote Kernel Tracker. Both these tools come with Windows CE 5.0 Platform Builder and can be used at the same time. Remote Call Profiler combines profiling and analysis tools with a graphical interface that you can use to identify algorithmic bottlenecks in your code. Remote Kernel Tracker helps you understand how your application affects the entire system by providing very detailed systemwide information, such as when a thread block is waiting for a semaphore, when an event signals, when an interrupt occurs, and any other system events that pass through the kernel. A systemwide view is useful because your application may not be directly degrading performance, but your application may be causing other parts of the system to degrade performance.

When developing managed applications, you get the benefit of type safety and automatic memory management, which can drastically improve your productivity. However, those benefits come at the price of performance. First, the environment (common language runtime) is doing work for you at run time, and it is hard to know exactly how and when the CPU and memory are used. Second, a general-purpose high-level class library (like the .NET Compact Framework), with its extensive code, is not suitable for high-performance device applications.

It is sometimes better to implement your own, more specific and more effective, solution. It is also good to understand that due to the limited system resources (mainly CPU and battery), the JIT compiler (that compiles Microsoft Intermediate Language (MSIL) code to native code) is made relatively simple and fast, and the garbage collector does most of the built-in performance optimization and memory management. Because native code (JIT-compiled) is three times the size of the MSIL code, it is probably not a good idea to precompile the code on a mobile device due to the limited memory resources. That may be why there's no Ngen utility (available for the full Microsoft .NET Framework) for the .NET Compact Framework.

If some well-written native code uses 50 percent or more of the available processor time (for example, video and audio playback), it is not a suitable candidate for the .NET Compact Framework. Fortunately, most business applications can be optimized considerably while staying within the managed environment.

With regard to code quality, one important area to consider is the method call (code) paths. A managed method or property call is up to five times slower than an optimized native call. Therefore, if your application has many deep code paths (calls through many classes), it will affect performance, and it is also hard to profile because the load is evenly distributed. In practice, this means that you should not have too many tiers in your application design.

Another important area is code related to user interaction. Just as the Windows Forms code in the .NET Compact Framework is using native code to handle heavy processing, you can implement something called hybrid controls with a mix of native and managed code. Actually, you can use this technique whenever there is a performance bottleneck in your managed code, but the speed gain in native code must be subtracted with the marshalling cost.

For the enterprise developer, the compact database manager (Microsoft SQL Server Mobile Edition) can be beneficial for handling heavy data loads in an efficient way. The improved integration with Microsoft SQL Server 2005 opens new possibilities for optimization with features like graphical query execution plans. Be sure to make use of these new possibilities.

The preceding discussion on optimization of managed code performance can be summarized in the following best practices:

  • Leave most of the memory management to the garbage collector.
  • Minimize method call (code) paths, and especially avoid recursion.
  • Avoid the use of the very general class libraries, and build more specific ones yourself.
  • Optimize with the right mix (hybrid) of managed and native code.
  • Use SQL Server Mobile Edition for data storage and use the tools to optimize performance.

Finally, remember that performance optimization can go on forever, and that is the main point of defining the performance criteria from the start. Optimize until you meet the criteria, and then stop.

Tools and Resources

Throughout the article, you have been given numerous links to articles and other resources that provide more information about memory, storage, and power management.

The following are more tools and resources for native code development:

  • Kernel Memory Tool (especially the mi command) is a tool that is included in Windows CE 5.0 Platform Builder. It displays information about kernel memory usage and runs within the Windows CE Console Debug Shell tool.
  • Application Verifier is a tool that checks the stability of an application and detects common programming mistakes. The tool can detect, pinpoint, and handle memory leaks. The tool can also detect some forms of heap corruption.
  • CE Stress is a tool that is available in the Windows CE Test Kit (CETK) in Windows CE 5.0 Platform Builder. It can help you identify whether a functional unit leaks memory, crashes, hangs, or fails to function after extended continuous operation. There are many other useful tools in the CETK, like the Resource Consumer.
  • Remote Call Profiler and Remote Kernel Tracker are two great tools for analyzing performance.
  • How to Use Remote Tools to Track Memory Leaks in Windows CE Applications by Mike Hall is an article that shows how to track memory leaks by using tools like Remote Performance Monitor and Remote Kernel Tracker.

The following are some useful tools and resources for managed code development:

  • Performance counters are dumped by the common language runtime in a file when an application closes. For details about using this file (formerly named Mscoree.stat), see the "Developing Well Performing .NET Compact Framework Applications" article later in this list. A runtime view of the performance counters will be available through a free tool unofficially referred to as TuneIn.
  • CLR Profiler is not available for device development yet, but it can be useful anyway if you can compile your application on the full .NET Framework.
  • Loader log includes the assembly and DLL load information.
  • Interop log includes the managed and native function signatures for all calls that cross an interoperability boundary. That is, it includes both calls from managed code to native code and calls from native back to managed.
  • JIT debugger can be used to attach the managed debugger to a running .NET Compact Framework application. David Kline also has a great blog post on the subject.
  • How to: Improve Performance includes many good tips on both performance and memory management within managed applications.
  • NET Compact Framework version 2.0 Performance and Working Set FAQ by Roman Batoukov and the .NET Compact Framework Team includes valuable advice on building well-performing managed applications.
  • How to: Create Log Files by Katie Blanch and the .NET Compact Framework Team includes information about how to enable logging of managed applications.
  • Developing Well Performing .NET Compact Framework Applications by Dan Fox and Jon Box can help improve the performance of your managed applications.
  • Instrumentation for .NET Compact Framework Applications by Mark Arteaga includes a tool for monitoring your managed applications.

Conclusion

Understanding the design and behavior of the memory, storage, and power management functions will help you create more effective applications. Correct memory management will also almost always result in better performance. The main lessons are to allocate memory correctly and check for successful allocations, correctly help the garbage collector, prepare your applications for persistent storage, and trim power usage and performance by using the available tools and resources. Finally, always remember that your applications contribute to (or detract from) the overall stability of the mobile device.