Introduction to Microsoft Sync Framework File Synchronization Provider

Microsoft Corporation
October 2009

Introduction

Microsoft Sync Framework is a comprehensive synchronization platform that enables collaboration and offline scenarios for applications, services, and devices. Developers can build synchronization ecosystems that integrate any application and any type of data, using any protocol over any network. As part of the new framework, we're also working on simplifying the development of end-to-end synchronization solutions by providing domain-specific components for common scenarios such as synchronizing relational databases, file systems, devices, etc.

Towards this goal, we have developed a reusable provider for synchronizing the contents of file system directories on PCs and removable media such as USB thumb drives. In this article, I'll cover the details of this reusable file synchronization provider, along with enabled scenarios and sample code for getting started.

Overview

The file synchronization provider is designed to be a generic, reusable component that can be used for synchronizing files and folders between any NTFS or FAT formatted file volumes. It uses Sync Framework's powerful metadata model to enable peer-to-peer synchronization of file data with support for arbitrary topologies (client/server, full-mesh, P2P) including support for removable media such as flash drives, USB thumb drives, etc. It enables 3rd parties to build file synchronization and roaming scenarios into their end-to-end synchronization solutions without having to worry about directly interacting with the file system.

Key features of the File system provider include:

  • Incremental synchronization of changes between two file system locations specified via a local or UNC path.
  • Synchronization of file contents, file and folder names, file timestamps, and attributes.
  • Support for optional filtering of files based on filename/extensions, sub-directories, or file attributes
  • Optional use of file hashes to detect changes to file contents if file timestamps are not reliable
  • Reliable detection of conflicting changes to the same file and automatic resolution of conflicts with a no-data-loss policy
  • Allow for limited user undo operation by optionally allowing file deletes and overwrites to be moved to the Recycle Bin
  • Support for Preview mode which provides a preview of the incremental synchronization operation without committing changes to the file system.
  • First-class support for the scenario where the user may start synchronization with equal or partially equal file hierarchies on more than one replica.
  • Support for graceful cancellation of an ongoing synchronization operation such that the remaining changes can be synchronized later without having to re-synchronize changes that were already synchronized.

Security note: The file synchronization provider does not synchronization security information, such as the Discretionary Access Control List (DACL). It is up to the application or user to correctly secure the destination folders to help prevent unauthorized access. Also, files in an encrypted folder are decrypted before they are sent and will not be encrypted in the destination folder. Be aware that this means that even if the source folder is encrypted, the files will still not be encrypted when they are sent. To help prevent unauthorized access or tampering, the communication channel between the provider and the folder must be trusted.

Enabled Scenarios

One of the goals of the file synchronization provider design is to enable a broad set of scenarios for synchronizing files and folders. Since the file synchronization provider is available for use and redistribution through the Sync Framework, ISVs are now able to build file synchronization functionality into their existing applications or create new applications to enable interesting file and folder synchronization scenarios. Some of the possible end-user scenarios enabled by the Sync Framework and this component include:

  • Multi-master file synchronization between multiple PCs: An application can be written to synchronize a user's files among their various PCs. Users can update and manage their files in the most convenient location at any given time and do a bidirectional synchronization of data between various PCs on demand or on an automated schedule to ensure that the latest files are always available to them at all places. For example, a user can work on a laptop PC while away from the office, but when the laptop is connected back up to the office network all changes can be synchronized from the laptop to the office PC and vice versa. The file synchronization provider will ensure that new and updated files are propagated in each direction and renames and deletes are repeated on the other side as well (without causing file data transfer). If there are conflicting changes made to the same file on the two sides, the conflict is resolved in favor of the copy that was last updated, preventing data loss.
  • Synchronization between PCs using a USB Drive: Sometimes it is not possible to have a direct network connection between different PCs, such as a home PC and a work PC. In this scenario, an application can be written to enable synchronization of files between the two PCs via a USB flash drive. The application is installed on both PCs and it allows the user to synchronize their file data between the PC and the USB drive. The USB drive in this case is used as the physical file transport between the work and home location.
  • Taking a network share offline: In an information worker environment many times team members have to work with workgroup documents on a network share. It would be great if files from the network share could be incrementally synchronized to a user's mobile PC or USB flash drive so they can review or modify these offline when they are not connected to the corporate network. The file synchronization provider can be used to develop an application to provide offline access to network shares by synchronizing data from the share to the user's mobile PC, even if the network share is configured to be read-only to a particular user.
  • Maintaining a backup copy of files: Many times users would like to back up files from one PC to another or from one PC hard disk to another. The file synchronization provider enables rich archiving scenarios by allowing for a one-way synchronization of files from the primary PC or hard disk to the backup PC or hard disk.

File Synchronization Provider Usage

To put some of the following material in context, let's start with a short code snippet for how the file synchronization provider can be invoked by an application. The following is part of a full code sample provided later in this article. The code below is written in C# and uses the .NET wrapper classes for the file synchronization provider but the capabilities and patterns for C++ code to invoke the native COM component are very similar.

// Copyright (c) Microsoft Corporation. All rights reserved.
public static void SyncFileSystemReplicasOneWay(
        string sourceReplicaRootPath, string destinationReplicaRootPath,
        FileSyncScopeFilter filter, FileSyncOptions options)
{
    FileSyncProvider sourceProvider = null ;
    FileSyncProvider destinationProvider = null ;

    try
    {
        sourceProvider = new FileSyncProvider(
            sourceReplicaRootPath, filter, options);
        destinationProvider = new FileSyncProvider(
            destinationReplicaRootPath, filter, options);

        destinationProvider.AppliedChange +=
            new EventHandler <AppliedChangeEventArgs>(OnAppliedChange);
        destinationProvider.SkippedChange +=
            new EventHandler <SkippedChangeEventArgs>(OnSkippedChange);

        SyncOrchestrator agent = new SyncOrchestrator();
        agent.LocalProvider = sourceProvider;
        agent.RemoteProvider = destinationProvider;
        agent.Direction = SyncDirectionOrder.Upload; // Synchronize source to destination

        Console .WriteLine( "Synchronizing changes to replica: " +
            destinationProvider.RootDirectoryPath);
        agent.Synchronize();
    }
    finally
    {
        // Release resources
        if (sourceProvider != null ) sourceProvider.Dispose();
        if (destinationProvider != null ) destinationProvider.Dispose();
    }
}

Detecting Changes on the File System

The file synchronization provider synchronizes incremental changes between two file system locations (also defined as a replica by Sync Framework). In order to do this, at the start of every synchronization session, it needs to evaluate incremental changes on each replica since the last synchronization. This process is called change detection.

The provider stores a small amount of information (called metadata) that describes where and when the item was changed, giving a snapshot of every file and folder in the replica. Changes are detected by comparing the current file metadata with the version last saved in the snapshot. For files, the comparison is done on the file size, file times, file attributes, file name (case-sensitive), and optionally a hash of the file contents. For folders, the comparison is done on folder attributes and folder name (case-sensitive).

A heuristic algorithm is used to determine file renames. If a file was only renamed or moved, just that operation will be performed on the other replica, avoiding a full stream transfer, which can be expensive. For folders, a rename or move is currently processed as a delete and create on other replicas. Files inside the renamed folder however will be processed as renames and no unnecessary stream transfers will be done.

By default, the file synchronization provider detects changes at the start of every synchronization session. This behavior can be modified by the application by using initialization flags and the explicit DetectChanges method on the provider. Given the above method of detecting changes (to support FAT file systems), if there are a large number of files in the replica, change detection may be an expensive operation and should be done only as often as required by the user experience of the application.

Progress Reporting and Preview Mode

For interactive applications, the file synchronization provider supports extensive progress reporting during the synchronization operation. This enables applications to present a progress UI to the user and display the current state of the synchronization operation.

Progress information is reported to the application via events in managed code or callback methods in unmanaged code. An application can skip a given change by handling the ApplyingChange event in managed code or the OnApplyingChange callback method in unmanaged code.

The file synchronization provider supports Preview mode synchronization which enables applications to inspect and display what changes would happen during synchronization. In preview mode, no changes are committed to the file system or the metadata, however most of the progress reporting notifications are still sent out for changes that would be applied if synchronization were run without preview. This can be used by the application to present verification UI to the user with all changes that will be made if the synchronization is executed.

Note that the file synchronization provider does not provide an upfront estimate of the total number of files to be synchronized before the synchronization operation starts because this can be expensive to figure out depending on the size of the file system replica and all applications may not want to pay the price. Applications can, however, collect the same statistics by doing a two-pass approach: first run a Preview mode synchronization session between the two replicas to collect statistics about the total number of changes, including the number of creates, deletes, updates, etc., and then run the real synchronization session with the proper progress bar display.

Filtering Out Files from Synchronization

An application using the file synchronization provider can configure the provider to filter out certain files or folders from the synchronization operation. For example, the user may want to synchronize everything but media files from a given folder. The file synchronization provider can be initialized with a scope filter that determines the set of files to synchronize. Synchronization applications can choose to filter files based on file names, sub-directories, or file attributes.

Filter TypeDefaultExample
Filename-based exclusionNo files excludedDo NOT synchronize files that match *.mov, *.wmv, or foo.txt
Filename-based inclusionAll files includedONLY synchronize files that match *.doc, *.txt, or special.jpg
Subdirectory excludeNo sub-folders excludedDo NOT synchronize the list of subdirectories that match "c:\users\videos".
File attribute-based exclusionNothing excluded based on attributesDo NOT synchronize system or hidden files

Note that only files/folders that pass ALL the scope filters are included in the synchronization. Also, applications should ensure that they use the same scope filter for all replicas in the synchronization community for a consistent user experience and to ensure convergence of synchronized data between all replicas.

As mentioned above in the Progress Reporting section, applications can ask the provider to skip certain files by handling the ApplyingChange event in managed code or the OnApplyingChange callback method in unmanaged code.

Lastly, certain system files used by Windows Explorer are always excluded from the synchronization, such as desktop.ini and thumbs.db (they do need to be marked with both the SYSTEM and HIDDEN attributes for this to happen).

Conflict Handling

There are times when users will modify the same file or folder independently on different file system replicas. The next time these replicas are synchronized with each other, the file synchronization provider attempts to reconcile these conflicting changes in the best possible way. In general, the file synchronization provider conflict resolution follows certain key principles which are determined by desired user experience (these are also principles followed by and enabled by the underlying Sync Framework):

  • Conflict resolution must be done in a way that guarantees convergence of data across all replicas
  • Built-in conflict resolution must be deterministic, meaning that the same outcome should apply regardless of which two replicas synchronize the conflicting changes.
  • Data loss must be avoided in all possible scenarios.

The file synchronization provider only supports a fixed policy for automatic resolution of conflicts.

The following is the general policy used by the file synchronization provider when automatically reconciling conflicting changes:

  • Concurrent update conflicts: If an existing file is modified independently on two replicas, the file with the last write time (i.e. updated later) is preserved on the next synchronization. If the last write times happen to be the same, a deterministic resolution policy is used, such that the user ends up with the same file contents on all replicas. If Recycle Bin support is requested from the file synchronization provider, the “losing” file is placed in the Recycle Bin on the “losing” replica so it can be recovered if the user wishes to do so.
  • Concurrent update-delete conflicts: If an existing file is modified on one replica but the same file or its container is deleted on another replica, the file synchronization provider always favors the update over the delete. If the path needs to be recreated on the side where the file was deleted, the file synchronization provider does that as well.
  • Concurrent child create-parent delete conflicts: The same policy as above is also used for the case where a new file may be created under a given folder on one replica, but the folder may have been deleted on another replica. The create overrides the folder delete and the file creation will be synchronized to all replicas with the deleted folder being resurrected.
  • Name collision conflicts for folders: Sometimes users may end up creating a folder with the same name on two different replicas. This can also happen when users start synchronizing replicas that already have similar file system hierarchies and content, or when an existing folder is renamed to a new name that happens to be the same as the name of a new folder on another replica. In all such cases the user will end up with a single folder and the contents of the folder will be merged on the next synchronization.
  • Name collision conflicts for files: The scenarios mentioned above for folders can also happen with files. In this case however, a different resolution happens depending on the scenario:
    • Create–create collision: In the case where two files were independently created on two replicas with the same name, the resolution on the next synchronization operation is to preserve a single file with the same name, but with contents from the side which had the later update timestamp. In case the file timestamps are the same, the content is assumed to be the same and one of the file streams is picked as the “winner” deterministically. If Recycle Bin support is requested from the file synchronization provider, the “losing” file will be placed in the Recycle Bin on the “losing” replica so it can be recovered if the user wishes to do so.
    • Create–rename collision: In the case where a rename of an existing file collides with a create of a file with the same name on a different replica, the resolution is always to keep both files by renaming one of them. This is done because in this case it is always safer to assume that the user intended these two files to have different identities. The file to be renamed is picked deterministically, such that the same resolution will apply regardless of which two replicas synchronize the conflicting changes.

As can be seen there's a lot of intricate logic that needs to be implemented to handle these situations correctly, especially when synchronizing across more than two replicas with an arbitrary synchronization topology. The good news is that the file synchronization provider handles most of this without the application writer having to worry about it.

Error Handling and Recovery

The file synchronization provider is designed to gracefully handle single file errors while synchronizing a set of files. A host of potentially transient errors can happen when trying to read or write files (locked files, file changed after change detection due to concurrent file activity, access denied, not enough space on disk for a given file, etc.). The file synchronization provider handles these by skipping these files during the synchronization operation such that the rest of the data can be synchronized successfully.

For every file change that is not applied successfully to the destination replica, an application callback method is invoked with the file details and error information. In some cases, the synchronizing application may wish to handle these programmatically and try to re-synchronize again after fixing up the problem. In other cases it can be presented to the user at the end of the synchronization session to ensure that the user is aware of files that didn't synchronize successfully.

It is also possible that the entire synchronization operation may fail sometimes due to a transient error, in which case the proper error code will be returned from the synchronization session and the application can react accordingly. For example, concurrent synchronization operations on the same replica are not currently supported, so trying this will result in a "replica in use" error code that can be handled by retrying the synchronization after some time.

Concurrency and Consistency Guarantee

The file synchronization provider is designed to correctly handle local concurrent operations by other applications on files that may be part of an ongoing synchronization operation. If a local concurrent change has happened on a file after the last change detection pass on the replica, either on the source or the destination, to prevent loss of the concurrent change, any changes to that file will not be synchronized until the next synchronization session (or the next change detection pass, if the application is using explicit change detection).

Also, when synchronizing changes to a file's contents, the new file stream will be applied atomically to the destination location to avoid the situation where the user ends up with a partially correct copy of the file.

Using the File Synchronization Provider - Sample Code

// Copyright (c) Microsoft Corporation.  All rights reserved.
public class FileSyncProviderSample
{
    public static void Main(string[] args)
    {
        if (args.Length < 2 ||
            string.IsNullOrEmpty(args[0]) || string.IsNullOrEmpty(args[1]) ||
            !Directory.Exists(args[0]) || !Directory.Exists(args[1]))
        {
            Console.WriteLine(
              "Usage: FileSyncSample [valid directory path 1] [valid directory path 2]");
            return;
        }

        string replica1RootPath = args[0];
        string replica2RootPath = args[1];

        try
        {
            // Set options for the synchronization operation
            FileSyncOptions options = FileSyncOptions.ExplicitDetectChanges |
                FileSyncOptions.RecycleDeletedFiles |
                FileSyncOptions.RecyclePreviousFileOnUpdates |
                FileSyncOptions.RecycleConflictLoserFiles;

            FileSyncScopeFilter filter = new FileSyncScopeFilter();
            filter.FileNameExcludes.Add("*.lnk"); // Exclude all *.lnk files

            // Explicitly detect changes on both replicas upfront, to avoid two change
            // detection passes for the two-way synchronization

            DetectChangesOnFileSystemReplica(
                    replica1RootPath, filter, options);
            DetectChangesOnFileSystemReplica(
                replica2RootPath, filter, options);

            // Synchronization in both directions
            SyncFileSystemReplicasOneWay(replica1RootPath, replica2RootPath, null, options);
            SyncFileSystemReplicasOneWay(replica2RootPath, replica1RootPath, null, options);
        }
        catch (Exception e)
        {
            Console.WriteLine("\nException from File Synchronization Provider:\n" + e.ToString());
        }
    }

    public static void DetectChangesOnFileSystemReplica(
            string replicaRootPath,
            FileSyncScopeFilter filter, FileSyncOptions options)
    {
        FileSyncProvider provider = null;

        try
        {
            provider = new FileSyncProvider(replicaRootPath, replicaRootPath, filter, options);
            provider.DetectChanges();
        }
        finally
        {
            // Release resources
            if (provider != null)
                provider.Dispose();
        }
    }

    public static void SyncFileSystemReplicasOneWay(
            string sourceReplicaRootPath, string destinationReplicaRootPath,
            FileSyncScopeFilter filter, FileSyncOptions options)
    {
        FileSyncProvider sourceProvider = null;
        FileSyncProvider destinationProvider = null;

        try
        {
            sourceProvider = new FileSyncProvider(
                sourceReplicaRootPath, filter, options);
            destinationProvider = new FileSyncProvider(
                destinationReplicaRootPath, filter, options);

            destinationProvider.AppliedChange +=
                new EventHandler<AppliedChangeEventArgs>(OnAppliedChange);
            destinationProvider.SkippedChange +=
                new EventHandler<SkippedChangeEventArgs>(OnSkippedChange);

            SyncOrchestrator agent = new SyncOrchestrator();
            agent.LocalProvider = sourceProvider;
            agent.RemoteProvider = destinationProvider;
            agent.Direction = SyncDirection.Upload; // Sync source to destination

            Console.WriteLine("Synchronizing changes to replica: " +
                destinationProvider.RootDirectoryPath);
            agent.Synchronize();
        }
        finally
        {
            // Release resources
            if (sourceProvider != null) sourceProvider.Dispose();
            if (destinationProvider != null) destinationProvider.Dispose();
        }
    }

    public static void OnAppliedChange(object sender, AppliedChangeEventArgs args)
    {
        switch (args.ChangeType)
        {
            case ChangeType.Create:
                Console.WriteLine("-- Applied CREATE for file " + args.NewFilePath);
                break;
            case ChangeType.Delete:
                Console.WriteLine("-- Applied DELETE for file " + args.OldFilePath);
                break;
            case ChangeType.Overwrite:
                Console.WriteLine("-- Applied OVERWRITE for file " + args.OldFilePath);
                break;
            case ChangeType.Rename:
                Console.WriteLine("-- Applied RENAME for file " + args.OldFilePath +
                                  " as " + args.NewFilePath);
                break;
        }
    }

    public static void OnSkippedChange(object sender, SkippedChangeEventArgs args)
    {
        Console.WriteLine("-- Skipped applying " + args.ChangeType.ToString().ToUpper()
              + " for " + (!string.IsNullOrEmpty(args.CurrentFilePath) ?
                            args.CurrentFilePath : args.NewFilePath) + " due to error");

        if (args.Exception != null)
            Console.WriteLine("   [" + args.Exception.Message + "]");
    }

}

Summary

In this article we have seen how the file synchronization provider can be used to synchronize the contents of file system directories on PCs and removable media such as USB thumb drives, and discussed the details of this synchronization provider.

Microsoft Sync Framework includes all of the things required to integrate applications into an offline or collaboration based network by using the pre-created providers or writing new custom providers. Providers enable any data source to participate in data synchronization regardless of network or device type.