Time Travel with Windows Media Center in Windows Vista

 

Stephen Toub

Microsoft Corporation

March 2007

 

Applies to:

   Windows® Media Center in Windows Vista®

   Windows® Media Center Software Development Kit (SDK)

   Microsoft .NET Framework 2.0

Summary: Discusses writing background applications for Windows Media Center in Windows Vista, and demonstrates how to write an application that allows a user to enter a time code on the remote control, causing Windows Media Center to jump to that location in the current media playback. This is a revised version of an article originally written about Windows XP Media Center Edition 2005 in April 2005.

Contents

Introduction

Integrating with Windows Media Center

Hello World!

Position Changer Application

Keyboard Hooks

Jumping to Time Codes

Configuring Position Changer Application

Conclusion

For More Information

Introduction

Remote controls are a wonderful thing. They let us manipulate from afar and are any couch potato's best friends. Of course, a remote control is only as useful as its programming, and often that programming is limited by the device's creator or by the target software's manufacturer.

On the surface, this would seem to be the case with Windows Media Center. Windows Media Center has been programmed by its development team to respond to certain commands from the remote control, and depending on the current media experience within Windows Media Center, that behavior changes. For example, when watching live TV, pressing digits on the remote control forces a channel change. Yet pressing those same number buttons when searching the electronic program guide allows the user to search for shows that contain certain characters and phrases. But what if you needed these buttons to behave differently? How else might you want Windows Media Center to respond?

Personally, while I think Windows Media Center is terrific, I've frequently wished for functionality equivalent to a slider bar that would allow me to seek to arbitrary locations within a video. For example, if I watch a recorded show in Windows Media Player, I'm able to use the mouse to jump to any location in the video I desire, simply by clicking on a particular location on the slider bar at the bottom of the video.

Slider bar in Windows Media Player

Windows Media Center in Windows Vista has several useful mechanisms for allowing a user to navigate media, but it currently doesn't provide a feature that allows for jumping to a specific location in a show. Similarly, when playing music, there's no way to jump to a particular location within the audio.  As a developer, when I find a perceived gap in a product's functionality, my first inclination is to ask whether I can write any code to fix it (that's actually my second inclination; my first is to ask whether someone else has already written the code to fix it). Wouldn't it be great if we could program Windows Media Center to respond to remote control commands in a fashion that would allow a user to jump around within a video, say by allowing the user to enter a time code on the remote's number buttons, thereby causing Windows Media Center to jump to that location in the playback?

In fact, this is possible. The beauty of Windows Media Center is that it is designed as both a product and an extensible platform so you can change how it responds in certain scenarios. In this article I'll demonstrate exactly how you, too, can implement this or similar functionality for your own Windows Media Center system.

Note   This application is only intended to be used with Windows Media Center in Windows Vista, and it may not function correctly or without side effects with future versions of the product.

Integrating with Windows Media Center

There are a variety of ways to integrate custom code into Windows Media Center in Windows Vista, but the primary way is through Windows Media Center Presentation Layer applications.

There are three distinct Windows Media Center Presentation Layer application types you can create using the Windows Media Center platform: Local, Web, and Background. Windows Media Center Presentation Layer local applications consist of an installed managed code assembly and related files, and as a locally-installed application, it has access to all computer resources. These applications are explicitly launched by a user and typically interact with a user through interfaces written in Windows Media Center Markup Language (MCML). MCML is a declarative XML language that lets application developers take advantage of dynamic layout capabilities, integrated animation support, rich text and graphic support, and automatic keyboard, mouse, and remote control navigation. Applications written with MCML take advantage of the same rendering technology used by Windows Media Center itself to create its user interface; in fact, MCML can render remotely with full fidelity to a Windows Media Center Extender session running on Xbox 360. As an example of this type of application, the Windows Media Center SDK comes with two complete Windows Media Center Presentation Layer local applications that utilize MCML: an RSS feed reader named Q, and a gallery browser application named Z.

A very useful variation of Windows Media Center Presentation Layer applications is also available in Windows Vista: Windows Media Center Presentation Layer web applications. These are Windows Media Center Presentation Layer applications built without managed code and purely consist of a library of MCML files that reside on an HTTP server. Installing an Windows Media Center Presentation Layer Web application is simply a matter of informing Windows Media Center as to where on the Internet the starting page of the application lives. Windows Media Center provides an environment in which the MCML can be hosted, full access to Windows Media Center API methods and properties, and access to a subset of Microsoft .NET Framework types that support common scenarios of Web-delivered experiences. An Windows Media Center Presentation Layer web application has no access to local computer resources.

The focus of this article, however, will be on another type of Windows Media Center Presentation Layer application: background applications. These are managed applications loaded when Windows Media Center starts up and unloaded either when the applications choose to exit or when Windows Media Center explicitly unloads them due to the user closing the Windows Media Center shell. Background applications do not present a user-interface of their own (other than an occasional dialog box that can be presented by the Windows Media Center environment on request by the application). Rather, they run in the background doing background tasks and interacting with Windows Media Center through its API.

Hello World!

Before I get into the nitty-gritty of writing a background application to allow for jumping around in a video, let me begin by introducing you to background applications through the simple example used for any new environment or programming language: Hello, World.

A background application must implement two MediaCenter-related interfaces, both of which are defined in the Microsoft.MediaCenter.Hosting namespace and are contained in the Microsoft.MediaCenter.dll assembly, which lives in the %WINDIR%\ehome\ directory as well as in the global assembly cache (GAC) on a Windows Vista system. (Note that previous versions of Windows Media Center relied on counterparts to these two interfaces in a different namespace, Microsoft.MediaCenter.AddIn. Applications that use those interfaces are still supported, but that namespace has been deprecated in favor of Microsoft.MediaCenter.Hosting, and all new applications you write should rely on the new namespace.) One of these two interfaces, IAddInModule, provides a way for Windows Media Center to initialize and uninitialize the application, while the other interface, IAddInEntryPoint, provides the equivalent of a "Main" method for it:

public interface IAddInModule
{
    void Initialize(
        Dictionary<string, object> appInfo, 
        Dictionary<string, object> entryPointInfo);
    void Uninitialize();
}
public interface IAddInEntryPoint
{
    void Launch(AddInHost host);
}

The IAddInModule.Initialize method accepts two parameters, appInfo and entryPointInfo. These dictionary collections are populated from an XML file that's used to register an application with Windows Media Center, where appInfo contains the attribute name-value pairs from an application element in the XML, and where entryPointInfo contains the attribute name-value pairs from an entrypoint element in the XML. For background applications in Windows Media Center in Windows Vista, the application element is typically defined as:

<application title="title" id="application GUID">...</application>

while the entrypoint element is typically defined as:

<entrypoint
    id="GUID"
    addin="className, assemblyName, Version=version,
        PublicKeyToken=publicKey, Culture=culture"
    title="title"
    description="description"
    category="Background"
    context="context"
/>

The intent of entryPointInfo and appInfo is to allow the application to learn more about how it was registered, with Windows Media Center providing all of this information to it at run time. The most interesting attribute from an application's perspective is probably the context attribute on the entrypoint element (which is then accessible through the entryPointInfo dictionary). Context is an optional value that has no meaning to Windows Media Center and is simply a vehicle for passing extra information into the application. For example, a weather monitoring background application could have its context attribute set to contain a URL, username, and password for accessing an appropriate weather-related Web service. When the application's Initialize method is called, the application can pull this context out of entryPointInfo and save it for use later on, allowing it to connect to the specified service. By allowing for this sort of configuration, the same application could be registered multiple times, each time with different context information, thus allowing multiple instances of the same application for accessing multiple services. Another use would be to have debug code compiled into the application that's only executed when context is set to a predetermined value. Or you could have an application which navigates to a page in Windows Media Center under certain conditions, where the target page is specified by the context attribute. You could then register this application multiple times, each with a different target page as the context.

IAddInEntryPoint.Launch accepts a parameter of type Microsoft.MediaCenter.Hosting.AddInHost, a sealed class that provides an application access to and control of many aspects of the Windows Media Center environment.

In Windows XP Media Center Edition 2005, when the Windows Media Center shell (%windows%\ehome\ehshell.exe) was launched and found a registered background application, it created a new application domain in the process in which to host that particular application. Since an assembly cannot be unloaded from an application domain, the only way to unload an assembly from a process is to unload all application domains in which that assembly is loaded. As such, ehshell created a separate application domain to house each application it loaded, allowing Windows Media Center to tear down that application domain when the application shut down. In addition to the reliability benefits resulting from this application domain isolation, this architecture also provided for an extra level of security. However, in many scenarios, the reliability provided by this model was not enough. One of the problems users of Windows XP Media Center Edition 2005 experienced was poorly written applications and HTML applications hosted by Windows Media Center. Even though applications were hosted in separate application domains, they were still in-process in ehshell.exe. As a result, if something went wrong in one of these applications, the entire Windows Media Center experience could be affected.

To address this reliability issue, in the Update Rollup 2 release for Windows XP Media Center Edition 2005, Windows Media Center moved to an out-of-process hosting model, and this model continues to be the model used for Windows Media Center in Windows Vista. When an application is run, a new hosting application, %windows\ehome\ehexthost.exe, loads the application rather than ehshell loading it (if you look in the Windows Task Manager after an application loads, you'll see an instance of this application listed). That way, if Windows Media Center detects that the application is being a bad citizen, it can tear down the whole hosting process without affecting the main ehshell process. .NET Remoting is used to communicate between the ehshell process and the ehexthost process, utilizing a fast named-pipe channel for the communication.

Once the ehexthost process and the application domain in that process for a background application have been created, ehshell uses a set of internal application loader classes to load the application's assembly into the remote application domain and to instantiate an instance of the application (as the loader uses the System.Reflection.Assembly.Load method to do so, the application's assembly should be strong-named and installed into the GAC so that it can be discovered and loaded from there). After the assembly is loaded, ehshell uses the loaded Assembly instance's CreateInstance method to instantiate the application, which means that an application should have a default constructor accessible (and since Windows Media Center uses the IAddInModule.Initialize method to initialize the application after it's been constructed, there's no reason to do otherwise). In fact, the simplest thing to do is to not add any explicit constructors at all, as both the C# and Visual Basic compilers will then emit a public parameterless constructor for the application.

The previous discussion was very important for background applications implemented for Windows XP Media Center Edition 2005 because it served to highlight the need for background applications to derive from System.MarshalByRefObject. As mentioned, Windows Media Center needs to call the methods on IAddInModule and IAddInEntryPoint in order to initialize, launch, and uninitialize the application. But the code in ehshell that makes these calls is running in a different application domain from the isolated application (and in Windows Vista, and entirely different process). MarshalByRefObject serves as the base class for all classes that are accessed across application domain boundaries, and any classes that do not inherit from MarshalByRefObject are implicitly marshaled by value. Thus, in Windows XP Media Center Edition 2005, in order for the instantiated application to be used correctly by ehshell, it had to derive from MarshalByRefObject. To make this more concrete, the application loader explicitly checked to make sure that the instantiated application derived from MarshalByRefObject, throwing an exception if it didn't and preventing the application from being used.

This restriction on deriving from MarshalByRefObject was removed for applications in Windows Vista. This isn't because the .NET Framework changed how .NET Remoting operates. Rather, Windows Media Center now implements several internal wrapper classes that themselves implement MarshalByRefObject and the relevant interfaces (for example, the internal AddInModuleWrapper class derives from MarshalByRefObject and implements the IAddInModule interface). Instances of these wrapper classes are used to do the cross-application domain calls, and they delegate any method calls on these interfaces to the corresponding method on the actual application. These wrappers in effect serve as shims to remove the need for your applications to implement MarshalByRefObject.

Once an instance of the application has been created, it's Initialize method is called. You can code your application's Initialize method to perform whatever configuration work your application requires, such as using the supplied configuration dictionaries, accessing the registry, or reading data from disk. However, the Windows Media Center team strongly encourages you to do as little as possible in Initialize. In fact, Windows Media Center enforces a timeout of 10 seconds on the duration of the Initialize method's invocation. If the Initialize method attempts to take longer than 10 seconds, the Initialize method will be aborted and the application will be unloaded. As such, in Initialize you should really only retrieve and store data from the two provided dictionaries; any other initialization work should be done in the Launch method. Also note that the Initialize method is called on a separate thread from the thread that invokes the Launch and Uninitialize methods, so you should avoid doing anything in Initialize that depends on the same thread being used to call Launch (for example, don't put any information into thread-local storage expecting the Launch method to find that same data available).

With the application initialized, Windows Media Center calls its IAddInEntryPoint.Launch method. As mentioned earlier, the Launch method is the main method of the application, and all processing for the application should be done in Launch. When Launch returns, Windows Media Center calls the applications IAddInModule.Uninitialize method. As with Initialize, this method is called on a separate thread from both Initialize and Launch, and it has 10 seconds to complete. Of course, failure to finish within the 10 seconds doesn't have as dire consequences as for Initialize, since the application's Launch method has already been called and completed, and thus the system is finished processing whatever the application was intended to do. Still, don't do anything computationally or resource intensive in Uninitialize. You should treat the Uninitialize method as being equivalent to the System.IDisposable.Dispose method, using it to free any resources acquired in either the Initialize or Launch methods that weren't explicitly cleaned up in those methods. Once the application has been uninitialized, the application domain in which it was running is torn down.

With that information in hand, the following is a minimal implementation of "Hello, World":

using System;
using System.Collections.Generic;
using Microsoft.MediaCenter;
using Microsoft.MediaCenter.Hosting;

public class HelloWorldBackgroundAddIn :
    IAddInEntryPoint, IAddInModule
{
    private string _displayTitle;

    public void Initialize(
        Dictionary<string, object> appInfo,
        Dictionary<string, object> entryPointInfo)
    {
        _displayTitle = entryPointInfo["context"] as string;
    }

    public void Launch(AddInHost host)
    {
        host.MediaCenterEnvironment.Dialog(
            "Hello, World!", _displayTitle, 
            DialogButtons.Ok, 0, true);
    }

    public void Uninitialize() { }
}

For this HelloWorldBackgroundAddin class, I'm using the Initialize method to read the context value from the entryPointInfo dictionary, and I'm using it as a display value when calling the Dialog method. The Launch method displays a pop-up dialog to the user, as shown below, using the provided AddInHost instance.

HelloWorldBackgroundAddIn in Windows Media Center

Windows Media Center exposes information about the current environment and provides some control over that environment through this interface, which is only valid to the application until Launch returns. The AddInHost instance can be cached in a member variable and used by other methods called by Launch, but it should not be accessed after Launch returns.

In order to use a background application, the application's assembly must be installed into the GAC on the Windows Media Center system, and the application must be registered with Windows Media Center. The easiest way to register an application with Windows Media Center is to create an XML file (as discussed previously) that describes the application, and then use the RegisterMCEApp.exe utility available in %WINDIR%\ehome\ to inform Windows Media Center of the application's existence and configuration. A sample XML file for the HelloWorldBackgroundAddIn just described might look like this:

<application title="HelloWorldBackgroundAddIn" 
             id="{c5ff0e68-6666-45c5-a9f6-47161b93d06d}">
    <entrypoint id="{d27e5a02-5a62-40fd-8ded-b395fff7bba6}" 
            addin="HelloWorldBackgroundAddIn, 
                   HelloWorldBackgroundAddIn, Version=2.0.0.2, 
                   Culture=neutral, PublicKeyToken=842cfffba89328ea, 
                   Custom=null"
            title="HelloWorldBackgroundAddIn" 
            description="Simple application that displays a dialog." 
            imageURL = ".\HelloWorldBackgroundAddIn.png"
            context="My Cool Add-In">
        <category category="Background"/>
    </entrypoint>
</application>

Do note that the RegisterMCEApp.exe utility does not currently check to make sure that the referenced assembly is available. As such, if there's a mismatch between the assembly described in the configuration file and the assembly you've installed into the GAC, or if you register an application but forget to install its assembly into the GAC, RegisterMCEApp.exe will not notice. However, if you set the DWORD value EnableErrorDetails on the key HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Media Center\Settings\Extensibility to 0, when an application fails to load, Windows Media Center will allow you to see the full exception details about why the failure happened. In the case of an assembly being registered incorrectly, this frequently manifests as a FileNotFoundException as the assembly file fails to be found in the GAC.

Finally, I noted earlier that application assemblies need to be installed into the GAC; this isn't entirely true. Installing an assembly into the GAC every time you make a change during development can become painful. There are a few solutions. First, the Z application included in the Windows Media Center SDK includes a batch file that can be run as a post-build step to uninstall and then reinstall the assembly in the GAC. Alternatively, ehshell.exe supports a special command-line parameter, addinfallbackpath, that can be used to augment where Windows Media Center probes for your assembly. This feature causes Windows Media Center to use Assembly.LoadFrom with the specified directory as a fallback when the assembly isn't found in the GAC, and it should be used purely as a development tool rather than to support production deployment. For more information, see Debugging Layout Issues.

Position Changer Application

Now that you know how to write a simple background application, I'll get down to business explaining how to write an application that allows a user to enter a time code on the remote control and have Windows Media Center jump to that time in the current media.

My application is named PositionChangerAddIn, and as with all background applications, it implements IAddInEntryPoint and IAddInModule.

namespace Toub.MediaCenter.AddIns
{
    public sealed class PositionChangerAddIn : 
        IAddInEntryPoint, IAddInModule
    {
        ...
    }
}

The application's IAddInEntryPoint.Launch method is implemented as follows:

void IAddInEntryPoint.Launch(AddInHost host)
{
    try
    {
        _host = host;
        LoadPreferences();

        using (_waitForExit = new ManualResetEvent(false))
        using (_ehshellProcess = GetEhShellProcess())
        using (_timer = new System.Threading.Timer(
            delegate { AttemptTransitionToEnteredTime(); }, 
            null, Timeout.Infinite, Timeout.Infinite))
        using (new KeyboardHook(_ehshellProcess.Id, hook_KeyDown))
        {
            _waitForExit.WaitOne();
        }
        _host = null;
    }
    catch(Exception exc)
    {
        host.MediaCenterEnvironment.Dialog("Unable to launch. " + 
            exc.ToString(), "Position Changer Add-In", 
            DialogButtons.Ok, 0, true);
        throw;
    }
}

To begin, the Launch method caches the provided AddInHost so that all instance methods on the class will have access to it, and then loads some preferences related to the operation of the application (more on this later). It then immediately creates a System.Threading.ManualResetEvent. The Launch method will wait on this event until Windows Media Center shuts down, at which point Windows Media Center will call the application's Uninitialize method on a different thread. Uninitialize will set the reset event, thereby allowing the Launch method to wake up and exit. This is a standard pattern which you can use in your own applications to keep the Launch method from exiting until a user exits Windows Media Center (or until some other event happens that should result in the exiting of the background application). Another approach is to enter into an infinite loop that continually polls and sleeps. Such an approach might be appropriate for an application that checks a Web site or Web service every few minutes in order to find the local weather.

Once the reset event has been created, a System.Diagnostics.Process instance for the ehshell process is retrieved. This provides the application with access to the process' ID, which is necessary for some of the functionality in the application, as you'll soon see.

private Process GetEhShellProcess()
{
    int currentSessionId;
    using (Process currentProcess = Process.GetCurrentProcess()) 
        currentSessionId = currentProcess.SessionId;

    Process[] procs = Process.GetProcessesByName("ehshell");
    Process ehshell = null;
    for (int i = 0; i < procs.Length; i++)
    {
        if (ehshell == null && 
            procs[i].SessionId == currentSessionId) ehshell = procs[i];
        else procs[i].Dispose();
    }
    return ehshell;
}

The GetEhShellProcess could simply return the first process it finds after calling GetProcessesByName, but that could result in incorrect behavior. The reason is that while ehshell is a singleton process (meaning that there can be only one), it's only a singleton with respect to the current desktop session. There can be multiple instances of ehshell.exe running at the same time if Windows Media Center extenders are in use, such that each extender gets its own dedicated ehshell process. The trick to telling them apart is to realize that when an extender connects to your Windows Vista system, it is essentially doing so through a terminal services connection. Processes started for the extender are thus run in a different desktop session than the session for the interactive user logged into the machine. So, in order to find the correct ehshell process, I simply look for the ehshell process that exists in the same session as the background application.

After obtaining the Process instance for ehshell, the Launch method proceeds to create a System.Threading.Timer, which will be used to ensure that any time value entered by a user is processed within a predetermined interval.

The next action taken by the Launch method brings me to the core of the design of this application. Nothing within the Windows Media Center SDK allows an application to intercept commands from the remote control, functionality which is necessary for this application to be successful. When a user hits keys on the remote, in certain situations the application needs to prevent Windows Media Center from performing its normal processing of the input (such as changing to live TV and navigating channels if the number keys are used when watching a recorded show) and should instead consume that input and do its own processing on it. As nothing is provided by Windows Media Center, my application takes advantage of functionality provided by the Win32 libraries and Windows operating system.

There are several ways you could attempt to intercept remote control commands in a managed application. For those of you deeply familiar with Windows Forms, you might consider using the System.Windows.Forms.NativeWindow class available from the System.Windows.Forms.dll assembly. NativeWindow provides a low-level encapsulation of a window handle and associated window procedure, allowing you to subclass an existing window in order to receive and process its Windows messages. In Windows XP Media Center Edition 2005, this approach would in fact allow a background application to receive and process any Windows messages sent to ehshell's user interface. That's because in that edition of Windows Media Center, as mentioned previously, background applications were loaded in-process into ehshell. But in Windows Media Center in Windows Vista, applications are loaded into a separate process. NativeWindow can only be used to attach to windows in the current process, and thus it's not appropriate for our situation. Another solution is needed.

Keyboard Hooks

The remote control for Windows Media Center behaves very much like a keyboard. When the IR receiver for the remote control receives a signal, it translates the remote control's button press into Windows messages. For a description of how each button is translated, see Understanding Mouse, Keyboard, and Remote Control Input. Suffice it to say, however, that almost all of the remote control commands are converted into WM_INPUT, WM_APPCOMMAND, or WM_KEYDOWN messages. For example, pressing the play button on the remote control causes a WM_APPCOMMAND message to be sent with an APPCOMMAND_MEDIA_PLAY command value (this is the same message sent when you push the play button on a multimedia keyboard). Pressing the right arrow button on the remote causes a WM_KEYDOWN Windows message to be sent for the VK_RIGHT virtual key.

It just so happens that for my application's design, the only Windows messages I need to intercept are WM_KEYDOWN messages, as the only remote control buttons I care about are the arrow buttons, the number buttons, the enter button, and the clear button, all of which result in WM_KEYDOWN messages being sent. As such, I can take advantage of a low-level keyboard hook in Windows rather than intercepting all Windows messages.

Windows hooks are an extensibility mechanism built into the Windows message handling system whereby an application can register a callback function to monitor message traffic. That function can then choose to process certain messages before they reach the target window procedure, and can even prevent them from reaching the target window altogether. Most types of hooks should not be used from the .NET Framework, but low-level keyboard hooks are an exception. From the LowLevelKeyboardProc Function documentation on the MSDN Web site:

"The LowLevelKeyboardProc hook procedure is an application-defined or library-defined callback function used with the SetWindowsHookEx function. The system calls this function every time a new keyboard input event is about to be posted into a thread input queue. The keyboard input can come from the local keyboard driver or from calls to the keybd_event function. If the input comes from a call to keybd_event, the input was "injected". However, the WH_KEYBOARD_LL hook is not injected into another process. Instead, the context switches back to the process that installed the hook and it is called in its original context. Then the context switches back to the application that generated the event."

As such, all I need to do is install a keyboard hook that monitors all messages sent to the ehshell process. I can sort through those looking for particular keystrokes I'm interested in, and I can implement logic that is executed in response to finding such input.

The code download for this article includes a class called KeyboardHook that can be used to install a keyboard hook for a particular process (in actuality, the hook is installed globally and thus for all processes, but the KeyboardHook quickly filters out the messages not targeted to the single process I ask it about). You can see in my Launch method shown previously that I create an instance of KeyboardHook just before I wait on the ManualResetEvent and that I destroy the instance after finishing waiting on that reset event. The constructor for KeyboardHook is shown here:

public KeyboardHook(int targetProcessID, KeyEventHandler keyDown) 
{
    if (keyDown == null) throw new ArgumentNullException("keyDown");

    // Store the user's KeyDown delegate
    _keyDown = keyDown;
    _pid = targetProcessID;

    // Create the callback and pin it, since it'll be called 
    // from unmanaged code
    _hookProc = new LowLevelKeyboardProc(HookCallback);

    // Set the hook for just the GUI thread
    using(Process curProcess = Process.GetCurrentProcess())
    using(ProcessModule curModule = curProcess.MainModule)
    {
        _hookHandle = SafeWindowsHookHandle.SetWindowsHookEx(
            WH_KEYBOARD_LL, _hookProc, 
            GetModuleHandle(curModule.ModuleName), 0);
    }
    if (_hookHandle.IsInvalid)    
    {
        Exception exc = new Win32Exception();
        Dispose();
        throw exc;
    }
}

The constructor is passed the process ID of ehshell as well as a KeyEventHandler delegate to be invoked whenever a WM_KEYDOWN message is received and needs processing. Both of these parameters are stored into member variables for later use. Next, I create the delegate that will serve as the actual hook procedure, get a reference to the current DLL module (which is used to tell Windows what DLL contains the callback procedure), and invoke the SetWindowsHookEx function from Win32 to actually install the hook. From this point on, any keyboard messages sent on the system will be routed through my hook procedure.

The hook procedure HookCallback is relatively straightforward as well:

private IntPtr HookCallback(int nCode, IntPtr wParam, IntPtr lParam)
{
    bool handled = false;
    try
    {
        if (nCode >= 0 && wParam == (IntPtr)WM_KEYDOWN)
        {
            uint pid;
            GetWindowThreadProcessId(GetForegroundWindow(), out pid);
            if (pid == _pid)
            {
                KBDLLHOOKSTRUCT hookParam = 
                    (KBDLLHOOKSTRUCT)Marshal.PtrToStructure(lParam, 
                    typeof(KBDLLHOOKSTRUCT));
                Keys key = (Keys)hookParam.vkCode;
                if (key == Keys.Packet) key = (Keys)hookParam.scanCode;

                KeyEventArgs e = new KeyEventArgs(key | ModifierKeys);
                _keyDown(this, e);
                handled = e.Handled | e.SuppressKeyPress;
            }
        }

        return handled ?
            new IntPtr(1) :
            SafeWindowsHookHandle.CallNextHookEx(
                _hookHandle, nCode, wParam, lParam);
    }
    catch (Exception exc) { Error(this, new ErrorEventArgs(exc)); }
    return new IntPtr(1);
}

This method performs several checks. First, it only processes the message if it's a WM_KEYDOWN message. It then retrieves the process associated with the window that's currently in the foreground as a fast heuristic for determining which process the incoming message is associated with. All messages not bound for the ehshell process are ignored.

If all checks pass, we then need to determine what key was pressed, which can be done by examining the lParam parameter in the callback. The parameter serves as a pointer to a KBDLLHOOKSTRUCT that contains information about the low-level keyboard input event we're processing. Inside that struct is the vkCode field, which represents the virtual-key code. Most of the time, this is exactly the value we're looking for. It can simply be cast to a System.Windows.Forms.Keys enumeration to discover the value of the key that was pressed (Keys.Right, Keys.Return, Keys.D6, and so forth). However, in some situations, the value received for the virtual-key code is equivalent to Keys.Packet (231), which by itself isn't very useful. In those situations, which typically occur when number keys are pressed in an extender environment, the scanCode field can be used to determine the actual value.

Once we know we want to process the message, and we know the Keys value, we can invoke the KeyEventHandler delegate. The KeyEventHandler delegate accepts a KeyEventArgs instance as a parameter. One of the properties on KeyEventArgs, Handled, is a Boolean that specifies whether the KeyEventHandler was able to handle the message or not. If the message was handled, the callback returns a value of 1 to Windows, signaling that the message was processed. If the delegate returns false, the message was not handled and the callback calls to the Win32 CallNextHookEx function, which passes the message along the hook chain, ultimately allowing the message to reach its original window destination (if another registered hook doesn't stop it from doing so). In this way, my application can examine the environment and the received WM_KEYDOWN message and decide how to handle the message or even whether to handle it at all.

A few words of warning about this approach are necessary. Keyboard hooks are invoked when the target thread calls to the Win32 GetMessage or PeekMessage functions as part of its message processing loop. As this is done on the UI thread, the UI thread will be blocked while any keyboard hooks you've installed are processing, thus decreasing the interactivity of the application. As such, make sure your hooks are fast, responsive, and don't block for any significant period of time. Also, it should be noted that in my application I'm relying a great deal on undocumented implementation details of Windows Media Center, and I am making a very conscious decision to do this knowing full well that any of my assumptions may change in future versions, causing my application not to work. Application authors should avoid assumptions like these as much as possible and be fully prepared for the consequences should future versions of Windows Media Center change how it does things that aren't explicitly documented in the SDK.

Jumping to Time Codes

With a keyboard hook installed and the Launch method blocking on the reset event (which will only unblock when Uninitialize is called), my application is up and running in Windows Media Center, able to handle any WM_KEYDOWN events that arrive. When one does arrive, it's handled by the hook_KeyDown event handler:

private void hook_KeyDown(object sender, KeyEventArgs e)
{
    const int MAX_WAIT_TIME = 1000;
    try
    {
        if (ShouldOverrideKeyHandling && e.Modifiers == Keys.None)
        {
            if (Monitor.TryEnter(_keyedTime, MAX_WAIT_TIME))
            {
                try
                {
                    MediaCenterEnvironment env = 
                        _host.MediaCenterEnvironment;
                    if (env == null) return;
                    MediaExperience exp = env.MediaExperience;
                    if (exp == null) return;
                    Keys keyCode = e.KeyCode;

                    bool isStart = (keyCode == Keys.Right);

                    if (isStart) ClearInputAndResetTimer();
                    else if ((DateTime.Now – 
                             _lastTimeKeyReceived).TotalMilliseconds > 
                                 _maxTimeBetweenKeys) return;

                    _lastTimeKeyReceived = DateTime.Now;

                    if (isStart) return;

                    if (keyCode == Keys.Enter)
                    {
                        if (_keyedTime.Count > 0) 
                        {
                            SetTimer(Timeout.Infinite);
                            e.Handled = 
                                AttemptTransitionToEnteredTime();
                        }
                        else ClearInputAndResetTimer();
                    }
                    else if (keyCode >= Keys.D0 && keyCode <= Keys.D9)
                    {
                        SetTimer(_maxTimeBetweenKeys);
                        _keyedTime.Push(keyCode - Keys.D0);
                        e.Handled = true;
                    }
                    else if (keyCode == Keys.Escape)
                    {
                        ClearInputAndResetTimer();
                        e.Handled = true;
                    }
                    else ClearInputAndResetTimer();
                }
                finally { Monitor.Exit(_keyedTime); }
            }
        }
    } 
    catch(InvalidOperationException){}
}

Right up front, the handler makes a decision on whether the current environment is appropriate for intercepting key strokes, and since this check is performed for every WM_KEYDOWN event, it should be as a fast and as lean as is possible. I only want to do my own keystroke handling if the user is watching or listening to media in full-screen mode (otherwise the user may be navigating menus within Windows Media Center), and only if the current media is playing or is paused. Given that, my ShouldOverrideKeyHandling property is implemented as follows:

private bool ShouldOverrideKeyHandling
{
    get
    {
        MediaTransport transport = CurrentMediaTransport;
        if (transport != null)
        {
            return transport.PlayState == PlayState.Playing ||
                   transport.PlayState == PlayState.Paused;
        }
        return false;
    }
} 
private MediaTransport CurrentMediaTransport
{
    get
    {
        MediaCenterEnvironment env = _host.MediaCenterEnvironment;
        if (env != null)
        {
            MediaExperience exp = env.MediaExperience;
            if (exp != null) return exp.Transport;
        }
        return null;
    }
}

AddInHost.MediaCenterEnvironment and AddInHost.MediaExperience may return null if media is not currently playing. If they return null, media is definitely not being played, and thus I definitely do not want to override key handling in this scenario. On the other hand, if they don't return null, I only want to override key handing if MediaExperience.IsFullScreen is true, meaning that the media is currently being viewed in full-screen mode rather than in a small viewport somewhere, and if MediaExperience.Tansport.PlayState indicates that the user is in an appropriate play mode (other available states include Buffering, Finished, Stopped, and Undefined).

If ShouldOverrideKeyHandling returns false, then no additional processing is done in the handler and it immediately returns, signaling to the keyboard hook that my application does not want to process this keystroke. The keyboard hook then passes the message along, allowing Windows Media Center to handle it as it normally would. If, however, ShouldOverrideKeyHandling returns true, it means the environment is in a state such that the application needs to examine what keys were pressed in order to determine what it needs to do in this situation. Note that just because ShouldOverrideKeyHandling returns true doesn't mean that the application will eventually handle the message; it only means that more significant processing is required. After all, this application only uses a handful of the buttons on the remote control, and there are a bunch of buttons in which I'm not interested that result in WM_KEYDOWN messages being sent.

Once ShouldOverrideKeyHandling returns true, the application looks for a sentinel button marker. This is a key the user must press to signal to the application that it should replace the default processing. Obviously, some users may like the behavior whereby Windows Media Center changes channels when number buttons are pressed (and some users may like it only sometimes), so there needs to be a way for the user to signal to the application that number buttons pressed should be handled in the special way. For this, I decided on the right-arrow key. The right-arrow key is not used during normal playback of video or audio, so using it as a sentinel value won't conflict with most normal interactions while viewing media (as mentioned earlier, this is the case for Windows Media Center in Windows Vista, but of course that could change in future versions of the product). However, there are scenarios while watching full-screen TV where it may be validly used, such as when a dialog with multiple options is presented over the full-screen media. In this case, the left and right arrow keys are used to select the appropriate option. This must be taken into consideration when deciding what other keys to handle.

So, if the key the user pressed is the right arrow key, which I've deemed the "start" key for the application, a couple of things happen. First, any digits received up to this point are cleared in order to make sure we're starting afresh, and the application stores the current time into a variable that maintains the last time a keystroke was received. This allows the application to enforce a maximum amount of time between keystrokes that will be handled. Additionally, if the application's keystroke timer is currently enabled, it is then disabled (the keystroke timer is used to automatically jump to the entered time code when the user has finished entering it). Finally, the application returns (with e.Handled equal to false) to indicate that it doesn't want to eat the WM_KEYDOWN message. The application doesn't need to prevent the start keystroke from going through to Windows Media Center, and in fact doing so would be erroneous as it would prevent users from using dialog boxes overlaid on top of full-screen video or audio playback.

At this point, the application is primed to receive more keystrokes, and there are four different scenarios it looks for. When another WM_KEYDOWN is received, assuming it's not another right arrow key, the application checks to see how much time has elapsed since the time that the right arrow key was pressed. If it's above the allowed time between keys, the keystroke is ignored and sent along to Windows Media Center as if the application weren't in the picture. Otherwise, if the key is a digit, the application sets the keystroke timer to expire in four seconds (the default is four seconds, but as will be explained later, this can be configured via a registry key) and pushes the digit's value onto a stack for later retrieval. At this point, the application does not want to pass this WM_KEYDOWN message to Windows Media Center, so returns with e.Handled equal to true. If rather than a number button, the keystroke is the Enter key, it means that the user is done entering numbers and wants to jump to the time value entered thus far. Otherwise, if the keystroke isn't a right arrow key, a number button, or Enter, everything needs to be reset.

When the enter key is pressed, or when the timer expires (signaling that it's been four seconds since the last number key was received), the application calls its AttemptTransitionToEnteredTime method. This method parses the time entered by the user and changes the current playback's position to match that time:

private bool AttemptTransitionToEnteredTime()
{
    TimeSpan time;
    bool success = false;
    if (ComputeEnteredTime(out time)) success = TransitionToTime(time);
    ClearInputAndResetTimer();
    DisplayPosition();
    return success;
}
private bool TransitionToTime(TimeSpan time)
{
    MediaTransport transport = CurrentMediaTransport;
    if (transport != null)
    {
        transport.Position = time;
        return true;
    }
    return false;
}
private bool ComputeEnteredTime(out TimeSpan time)
{
    bool startedWithZero = false;
    time = TimeSpan.FromSeconds(0);

    lock(_keyedTime)
    {
        if (_keyedTime.Count == 0) return false;

        int seconds = _keyedTime.Pop();
        if (_keyedTime.Count > 0) seconds = 
            (10*GetNextQueuedDigit(ref startedWithZero)) + seconds;

        int minutes = 0;
        if (_keyedTime.Count > 0) minutes = 
            GetNextQueuedDigit(ref startedWithZero);
        if (_keyedTime.Count > 0) minutes = 
            (10*GetNextQueuedDigit(ref startedWithZero)) + minutes;

        int hours = 0;
        int multiple = 1;
        while(_keyedTime.Count > 0) 
        {
            hours = (multiple*GetNextQueuedDigit(ref startedWithZero)) 
                    + hours;
            multiple *= 10;
        }

        time = TimeSpan.FromSeconds((((hours*60)+minutes)*60)+seconds);

        return true;
    }
} 

AttemptTransitionToEnteredTime makes use of two helper routines. The first is TransitionToTime. It simply accepts a TimeSpan value and uses the current MediaTransport's Position property to jump to the specified time code. The second routine is ComputeEnteredTime. This method is responsible for examining the Stack of user-keyed digits and parsing them into a corresponding TimeSpan.

The application expects numbers to be entered in a way that is familiar to users. So, if the user enters the sequence "123", that should be interpreted as 1 minute 23 seconds, or if the user enters "12345", that should be interpreted as 1 hour 23 minutes and 45 seconds. To support this notation, every time a digit is entered it is pushed onto a Stack. This allows ComputeEnteredTime to pull off the entered values in the reverse order from which they were entered, so if a user enters "12345", ComputerEnteredTime will process the digits in the order "54321".

The actual parsing logic is very simple. If no keystrokes have been entered, ComputeEnteredTime returns that it was unable to compute a time (though it does set the out TimeSpan parameter to 0, since all out parameters must be initialized for a successful compile). If there are digits available, it picks off the first two and treats them as a number of seconds. If there are more digits available, it picks off the next two and treats those as a number of minutes. And if there are any more digits waiting, it consumes them and treats them as a number of hours. This number of hours, minutes, and seconds is then transformed into a TimeSpan that's passed back to the caller along with a true return value to indicate success.

You'll note that no validation on the entered time is performed by this method. Thus, it allows a user to enter "65" in addition to "105", both of which will jump to the same location in the media (1 minute 5 seconds). Similarly, "99" will jump to 1 minute 39 seconds, but "100" will jump to only 1 minute. While some of these corner cases may seem obscure, this design was simple to implement and felt to me like the easiest to understand in most scenarios. If you disagree, you can certainly reimplement ComputeEnteredTime to your liking.

Once AttemptTransitionToEnteredTime has called ComputeEnteredTime and TransitionToTime, it provides some visual feedback to the user about where the user currently is in the media. This serves a few purposes. First, if the user successfully jumped to a new time, this lets the user confirm that they did in fact go where they meant to go. If on the other hand a time was not successfully parsed, it's still good to display feedback to the user alerting them to this fact. AttemptTransitionToEnteredTime does this by calling the DisplayProgressBar method in order to show the blue position bar to the user (the one shown when the play button on the remote control is pressed; see the figure below):

private void DisplayPosition()
{
    if (_showPosition)
    {
        MediaTransport transport = CurrentMediaTransport;
        if (transport != null)
        {
            switch (transport.PlayState)
            {
                case PlayState.Playing:
                case PlayState.Paused:
                    PostMessage(_ehshellProcess.MainWindowHandle, 
                        WM_APPCOMMAND, IntPtr.Zero,
                        (IntPtr)(APPCOMMAND_MEDIA_PLAY | 
                                 FAPPCOMMAND_OEM));
                    break;
            }
        }
    }
}

private const uint WM_APPCOMMAND = 0x0319;
private const uint APPCOMMAND_MEDIA_PLAY = 0x002E0000;
private const uint FAPPCOMMAND_OEM = 0x10000000;

[DllImport("user32.dll", SetLastError=true)]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool PostMessage(
    IntPtr hWnd, uint msg, IntPtr wParam, IntPtr lParam);

Position bar in Windows Media Center

To accomplish this, DisplayPosition also takes advantage of Windows messages. Remember that pressing buttons on the remote control results in Windows messages being sent to Windows Media Center, and that pressing the play button causes Windows Media Center to display the position bar. So, DisplayPosition mimicks the play button by sending the equivalent Windows message back to ehshell's main window. The play button sends a WM_APPCOMMAND message for the APPCOMMAND_MEDIA_PLAY command, and thus DisplayPosition uses the Win32 PostMessage API to do the exact same thing. This same technique can be used to control other aspects of Windows Media Center, allowing for automation of the environment in any way that a user can manipulate it with the remote control.

Unfortunately, sending the play button's Window message is not a perfect solution. When Windows Media Center receives the play message, if it's already playing video and the position bar is not being displayed, Windows Media Center displays the position bar. Great. However, if it's playing and the position bar is already being displayed, it hides the position bar. You can verify this by playing a video and hitting the play button multiple times; the visibility of the position bar will continually toggle with each press. As such, if the application calls DisplayPosition and the position bar is already shown, it will result in the position bar being hidden. Not the best outcome, but something for which I've decided not to try to find a workaround.

Configuring Position Changer Application

With everything that's been implemented thus far, the application is basically complete. The only thing left to implement is providing the ability to configure certain behaviors of the application. These configuration options are exposed as registry values under the registry key at HKEY_LOCAL_MACHINE\Software\Toub\PositionChangerAddIn and are all loaded using the following method, which is called at the beginning of Launch:

private void LoadPreferences()
{
    const int minTimeBetweenKeys = 1000;
    const int maxTimeBetweenKeysDefault = 4000;

    try
    {
        using(RegistryKey preferencesKey = 
                Registry.LocalMachine.OpenSubKey(AppRegKey))
        {
            if (preferencesKey != null)
            {
                _maxTimeBetweenKeys = (int)preferencesKey.GetValue(
                     MaxTimeBetweenKeysRegValue, 
                     maxTimeBetweenKeysDefault);
                _showPosition = (int)preferencesKey.GetValue(
                     ShowPositionRegValue, 1) != 0;
            }
        }
    }
    catch(ArgumentException){}
    catch(SecurityException){}

    if (_maxTimeBetweenKeys < minTimeBetweenKeys) _
        maxTimeBetweenKeys = maxTimeBetweenKeysDefault;
}

The MaxTimeBetweenKeys value controls how much time is allowed between key presses. As mentioned earlier, this defaults to 4000 milliseconds (4 seconds), but can be changed by changing the value in the registry. After you hit the right arrow key, you'll have this much time to press the first digit of the time code, and after that digit, you'll have this much time again to press the next digit, etc. If after hitting the right arrow key this amount of time expires before you press the first digit, you'll have to press the right-arrow again in order to enter a time code. If this amount of time expires after pressing a digit that's part of a time code, the application will automatically jump to whatever time you entered thus far (making use of the System.Threading.Timer created in the Launch method, whose TimerCallback ends up calling AttemptTransitionToEnteredTime, just as if the Enter key were pressed).

The ShowPosition value controls whether the position bar is displayed when a jump is attempted. This defaults to true (1 in the registry; to disable it, set this to 0). Note that if this is enabled and a jump is attempted and the position bar is already displayed, the position bar will be hidden, just as if you hit the play button on the remote when the position bar is already being displayed. To implement this configuration option, I simply wrap the contents of DisplayPosition with an if-block conditioned on the value of _showPosition.

Last but not least, I wanted to be able to support relative jumps in addition to absolute jumps, meaning that if a user is currently five minutes into watching a recorded show and enters the time code "123," the application would jump the video to time 6:23 (or 3:37 if jumping backwards) instead of an absolute time jump to 1:23. I originally implemented this using a registry value, meaning that all jumps would be absolute or all would be relative, and in order to change the behavior a user would have to edit that registry value. This, of course, isn't very user friendly, given that a user would probably benefit from having both types of jumps available at her fingertips. As such, I changed my design slightly such that both the right arrow key and the left arrow key serve as a starting key; the only difference between the two is that the right arrow serves to denote an absolute jump and the left arrow serves to denote a relative jump. This is implemented by setting a Boolean value based on whether the start key was the right or left arrow, and then using that Boolean when transitioning.

private void hook_KeyDown(object sender, KeyEventArgs e)

{

    bool isStart = (keyCode == Keys.Right || keyCode == Keys.Left);
    ...
    if (isStart) 
    {
        _jumpIsRelative = (keyCode == Keys.Left);
        return;
    }
    ...
}

If the right arrow was pressed, the application jumps to the entered time, just as you've seen up until this point. If, however, the left arrow was pressed, the time entered is treated as a relative value and is added to the current position's time in order to arrive at the actual destination time.

private void TransitionToTime(TimeSpan time)
{
    ...
    if (_jumpIsRelative) time = time.Add(exp.Transport.Position);
    ...
}

In order to allow for relative jumps backwards, a user needs to be able to enter a negative time, which I support in ComputeEnteredTime by checking to see if the first digit of the time is 0. So for example, entering "123" will jump forward 1:23, but entering "0123" will jump backward 1:23.

Conclusion

For those of you who just want to have PositionChangerAddIn's functionality added to your Windows Media Center, simply close ehshell and install the compiled application available for download from this article. The next time you start ehshell, the application should be available to you. To install it, use the gacutil utility included in the .NET Framework SDK to install the application's assembly into the GAC, and then use the RegisterMceApp.exe utility from %WINDIR%\ehome\ to register the application. The commands would look something like the following (see the Windows Media Center SDK for more details):

gacutil –i PositionChangerAddIn.dll
RegisterMceApp /allusers PositionChangerAddInReg.xml

Jumping to a specific time code is enabled while watching recorded TV shows (regardless of whether it's in Recorded TV or in My Videos), while listening to music, and while watching non-DVR-MS video, as long as that video supports seeking (so indexed WMV should work fine). While enjoying the media, hit the right-arrow button on the remote control and then enter a time on the number keys. If you then wait four seconds, it'll jump to the time you entered. To jump immediately, you can press the enter key. If the left arrow key is used instead of the right arrow key, the application will jump forward the amount of time entered (rather than to that time), or if you preface the time with a 0, the application will jump backwards that same amount.

Of course, this is only the functionality as I've envisioned and implemented it, and your needs may be different. Maybe you want to use dialog notifications to provide feedback about jumps. Maybe you want to override other remote control commands to provide additional processing. For those of you who are interested in tweaking the provided source code or who are interested in creating similar applications for Windows Media Center, I hope I've provided you with a good foundation on which to create your own projects.

Happy TV watching!

Stephen Toub is the Technical Editor for MSDN Magazine, for which he also writes the .NET Matters column.

For More Information