Usage Event Logging in Windows SharePoint Services

 

Radu Rusu
Les W. Smith
Microsoft Corporation

July 2004

Applies to:
    Microsoft Windows SharePoint Services

Summary: Updated September 2004: Analyze usage event data in Windows SharePoint Services most effectively by parsing the log files that Windows SharePoint Services produces when logging is enabled. This article describes the format of these log files and provides a sample that demonstrates some of the basics for creating a tool that parses the files to extract information about site usage. (12 printed pages)

Contents

Introduction
Examining the Usage Log File Format
How to Parse the Usage Event Logs
Try Out a Code Sample
Conclusion

Introduction

This article describes the best way to obtain usage event data from Microsoft Windows SharePoint Services, which is to parse the log files created when logging has been enabled in a deployment. The article provides information about the format of the log files generated by Windows SharePoint Services, and the sample at the end demonstrates how to create a C++ application that extracts the usage data from these files. (The sample in this document is provided for informational purposes only and Microsoft makes no warranties, either expressed or implied, in this document. The entire risk of the use or the results of the use of the sample in this document remains with the user.)

Other ways to obtain usage event data are available in Windows SharePoint Services, such as through managed code that implements the GetUsageBlob method of the Microsoft SharePoint.SPWeb class, through Windows SharePoint Services Remote Procedure Call (RPC) protocol that posts the GetUsageBlob method, or through Microsoft Office FrontPage 2003, but each of these approaches have limitations when compared to parsing the SharePoint log files. For information about using the GetUsageBlob RPC method and a sample that posts this method within managed code, download the Usage Blob Parser, available in the Microsoft Download Center.

Note   The GetUsageData method of the SPWeb class returns usage data in differing formats depending on the type of report or period specified, but this method has a 2000-row limit that restricts its usefulness in site usage analysis. The GetUsageBlob method returns the same data and does not have a limit, but this method does not return data in a useful format and is difficult to parse. FrontPage 2003, which parses the same data, can be used to summarize usage in a compressed format, but the FrontPage client does not expose this data through its object model, so that the data returned has no practical use for a server application. Each of these methods for returning usage data are further limited in that their information applies only per site and not per virtual server. Their data accumulates usage information over a long period of time, and thus by necessity no longer stores the correlation between the fields on a hit (for example, what user saw what page at a particular time).

Windows SharePoint Services generates usage event logs on a daily basis per virtual server when Enable logging is selected on the Configure Usage Analysis Processing page in SharePoint Central Administration. When logging is enabled, Windows SharePoint Services by default creates log files in the \%windir%\system32\LogFiles\STS directory on the front-end Web server, although an alternate location can be specified. The STS directory contains a folder for each virtual server on the Web server, each named with a GUID that identifies the respective virtual server. Each virtual server folder contains subfolders for each day, which in turn contain the daily usage log for each virtual server. In addition to containing information per virtual server, the Windows SharePoint Services logs are also useful because they associate users with page hits and with time stamps.

Note   To view the log files generated in Windows SharePoint Services you must be an administrator (or a member of the STS_WPG group, which includes but is not limited to the administrators) for the computer containing the files.

Supplementing Usage Event Logging with IIS Logs

The information provided through usage event logging in Windows SharePoint Services can be supplemented by the information provided through the logs generated by Microsoft Internet Information Services (IIS). IIS logs include, for example, the IP address of the server, the type of request, the port number for the request, and other information. For more information on IIS logging, see About Logging Site Activity.

Examining the Usage Log File Format

A Windows SharePoint Services log file consists of separate entries for each page hit that occurs on a virtual server. Each log entry starts with a structure in binary format whose fields indicate the number of bytes used for each subsequent part in the entry.

The following table shows the name and data type of each field in the structure and describes the information contained in each field of an entry.

Table 1. Fields represented in the structure

Name Data type Description
pPrev Struct Points to the previous entry.
bitFlags uint8 Flag that indicates the type of hit, which can be one of the following values:

0  Regular hit

1  Used by the Microsoft Office FrontPage 2003 client application to indicate a visit (whether or not from a site referrer URL).

2  List update.

4  List operation (for example., a post to owssvr.dll) that is not an update.

8  Discussion request made through the Office Server Extensions (OSE) Discussion button in Microsoft Internet Explorer.

cbEntry uint16 Number of bytes to skip ahead to next entry.
cbSiteUrl uint16 Absolute URL of the top-level site in the site collection containing the site in which the request is made. For example, http://Server/sites/Top_Site.
cbWeb uint16 URL of the site or subsite relative to the top-level site. For example, Subsite_1/Subsite_2/Subsite_3.
cbDoc uint16 Site-relative URL of the page that is visited. For example, lists/List_Name/allitems.aspx
cBytes uint32 Bandwidth consumed by the request, including bytes received and bytes sent.
httpStatus uint16 Http status code which is the same as in IIS logs. Windows SharePoint Services only logs successful hits, so its value is always between 200 and 299 (in other words, never 304, 401, or 404). Almost all recorded hits have an Http status code equal to 200.
cbUser uint16 Name of the user making the request. For example, DOMAIN\User_Alias.
cbQS uint16 When applicable, query string used by a referring URL.
cbRef uint16 When applicable, URL from which the user navigated to the page. Excludes cases where the referring URL subsumes the site in which the request was made, and cases where the URL is typed fully in the browser.
cbUAS uint16 User agent.
reserved int32 Reserved. No definition required. IIS instance ID.

In addition to containing a structure with byte counts, each log entry contains a carriage return/line feed (\r\n) to make the entry more readable to us, the GUID of the site in which the request was made, the time stamp of the request according to the local time zone of the server, and null-terminated strings containing values that respectively correspond to each field in the structure. Windows SharePoint Services inserts an ampersand (&) between the top-level site URL and subsite URL when processing the log files. This serves to mark the log file as already processed, and thus prevents counting data twice if the usage processing job is accidentally run again on the same day. For example, if someone changes the processing time from 01:00 (1 A.M) to 11:00 (11 P.M.) in the middle of the day, the previous day's logs will not be counted twice.

How to Parse the Usage Event Logs

We can read the usage event log files easily, but they can also be consumed by a tool that can be automated to parse the files and provide information about site usage to us.

You can create such a tool in C++ or C# that reprocesses the usage logs and runs on the same server that generates the logs. The tool can deliver output in various formats, such as in a database for further querying, or as emitted data in a .csv file.

The tool must read a structure in the following form that precedes each log entry.

typedef struct _VLogFileEntry
{
    struct _VLogFileEntry *pPrev;
    unsigned char bitFlags;
    unsigned short cbEntry;
    unsigned short cbSiteUrl;
    unsigned short cbWeb;
    unsigned short cbDoc;
    unsigned long cBytes;
    unsigned short httpStatus;
    unsigned short cbUser;
    unsigned short cbQS;
    unsigned short cbRef;
    signed short cbUAS;
    signed long reserved;
} VLogFileEntry;

After the file has been mapped to a memory address, code such as the following can then be used to traverse each entry in a log and return site usage information.

unsigned long cbEntrySize = 0;

for(pCur = pBase; 
    pCur < pEnd; 
    pCur += cbEntrySize)
{
    pLFE = (VLogFileEntry *)pCur;

    pszSiteGuid = pCur + sizeof(VLogFileEntry) + 2;
    pszTS   = pszSiteGuid + cbSiteGuid + 1;
    pszSite = pszTS + cbTimeStamp + 1; 
    pszWeb = pszSite + pLFE->cbSiteUrl + 1;
    pszDoc  = pszWeb + pLFE->cbWeb + 1;
    pszUser = pszDoc + pLFE->cbDoc + 1;

After casting the current entry as a structure, the example proceeds to gather the site GUID, the time stamp, the URL of the top-level site, the relative URL of the subsite, the file name of the page that was visited, and the name of the user. The example takes into account the two bytes used for the carriage return/line feed that appears between the binary structure and site GUID in each entry, as well as the single byte used in null separators between the different parts of the entry.

The preceding for loop should also include error handling for cases of corrupt log data. The following code gathers the total size of an entry based on specific parts of the structure, and then checks for cases where the size exceeds a specified value, where an entry merely contains a carriage return or line feed, or where for any reason an entry does not equal the total size that has been determined:

const unsigned long maxCbEntrySize = 2048;

cbEntrySize = sizeof(VLogFileEntry) \
         + cbSiteGuid + cbTimeStamp + 2 \
        + pLFE->cbSiteUrl +  pLFE->cbWeb +  pLFE->cbDoc \
        + pLFE->cbUser +  pLFE->cbQS  +  pLFE->cbRef \
        + pLFE->cbUAS  + 9;   //7 NULLs and 2 bytes for \r\n

    // Check for corrupt log files
    fError  = (cbEntrySize > maxCbEntrySize ||
    !(*(pCur + sizeof(VLogFileEntry)) == '\r') ||
    !(*(pCur +  sizeof(VLogFileEntry) + 1) == '\n') ||
    !(pLFE->cbEntry == cbEntrySize));

    if (fError)
    {
        printf("Error reading Wss log file, aborting.\n");
        goto cleanup;

    }

Try Out a Code Sample

The following sample illustrates code that can be used in the Project_Name.cpp file of a C++ application to parse a Windows SharePoint Services log file and emit the usage data as a .csv file.

**Important **  The primary purpose of this sample is to demonstrate basic considerations for how to write code that parses the log files. This sample does not include all the code that would normally be found in a full production system, as a lot of the usual data validation and error handling is removed to focus the sample on what your code must accomplish. Technical support is not available for this sample.

To test the sample, open Microsoft Visual Studio .NET on the server containing the log files and create a Visual C++ console application.

To create a Visual C++ console application

  1. On the File menu in Visual Studio .NET, point to New, and then click Project.

  2. In the New Project dialog box, under Project Types, click Visual C++ Projects. In Templates, click Console Application.

  3. In the Name box, type a name for the project (Project_Name), in the Location box, type the path for where to create the application, and then click OK.

  4. In Solution Explorer, double-click the Project_Name.cpp file that is produced and replace the code that Visual Studio includes by default with the following code:

    #include "stdafx.h"
    #include "windows.h"
    #include "assert.h"
    #include <stdio.h>
    
    typedef struct _VLogFileEntry
    {
        // Point to previous entry
        struct _VLogFileEntry *pPrev;   
        unsigned char  bitFlags;
        // Number of bytes to skip ahead to next entry
        unsigned short cbEntry;
        unsigned short cbSiteUrl;
        unsigned short cbWeb;
        unsigned short cbDoc;
        // Bandwidth consumed (bytes in + bytes out)
        unsigned long cBytes;
        unsigned short httpStatus;
        unsigned short cbUser;
        unsigned short cbQS;
        unsigned short cbRef;
        signed short  cbUAS;
        signed long  reserved;
    } VLogFileEntry;
    
    int main(int argc, char* argv[])
    {
        bool fError = FALSE;
        if (argc < 3)
        {
            printf(
           "\nUsage: %s wsslogfile csvfile optionalField1 
             optionalField2",  argv[0]);
            return(1);
        } 
    
        char *szFile = argv[1];
        char *szCsvFile = argv[2];
        char *szOptionalField1 = argc > 3 ? argv[3] : NULL;
        char *szOptionalField2 = argc > 4 ? argv[4] : NULL;
        char *szGuid = NULL;
        char *szReplace = NULL;
    
        /* Format of each CSV line. Include optional fields
        passed as command line arguments, if any*/
        char *szFormat = "%s,%s,%s,%s,%s,%s,%s,%s\r\n";
        if (NULL == szOptionalField1)
            szFormat += 3;
        if (NULL == szOptionalField2)
            szFormat += 3;
    
        FILE *csvFile = fopen( szCsvFile, "a");
    
        // Bytes (with no braces)
        static const unsigned long cbSiteGuid  = 36; 
        static const unsigned short cbTimeStamp = 8;
    
        printf("\r\nParsing %s to %s \r\n",  szFile, szCsvFile);
        char *pBase, *pEnd;
        HANDLE hF, hFM;
        if ((hF = CreateFileA(
                szFile,
                GENERIC_READ, 
                0, 
                NULL, 
                OPEN_EXISTING, 
                FILE_ATTRIBUTE_NOT_CONTENT_INDEXED,
                NULL)) == INVALID_HANDLE_VALUE)
        {
            printf(
                "Can't open file %s (perhaps because it doesn't exist)",
            szFile);
            return (1);
        }
    
        DWORD dwFileSize, dwFileSizeHigh = 0;
        dwFileSize = GetFileSize(hF, &dwFileSizeHigh);
    
        /* We should never encounter a file larger than about 1 GB */
        if (dwFileSizeHigh || dwFileSize > 1000000000)
        {
            printf(" File too large %s", szFile);
            CloseHandle(hF);
            return (1);
        }
    
        if (dwFileSize == 0)
        {
            printf(" Skipping empty file %s", szFile);
            CloseHandle(hF);
            return (1);
        }
    
        hFM = CreateFileMapping(hF, NULL, PAGE_WRITECOPY, 0, 0, NULL);
        if (NULL == hFM ||
    NULL == (pBase = (char *)MapViewOfFile(hFM, FILE_MAP_COPY, 0, 0, 0)))
        {
            printf(" Can't map file %s", szFile);    
            if (hFM)
                CloseHandle(hFM);
            CloseHandle(hF);
            return (1);
        }
    
        pEnd = pBase + dwFileSize - sizeof(VLogFileEntry);
    
        char *pCur, *pszSite, *pszSiteGuid, *pszTS;
        char *pszWeb, *pszDoc, *pszUser;
        VLogFileEntry *pLFE;
        unsigned long cItemsProcessed = 0;
        unsigned long cbEntrySize = 0;
        const unsigned long maxCbEntrySize = 2048;
    
        for(pCur = pBase;
            pCur < pEnd;
            pCur += cbEntrySize)
        {
            pLFE = (VLogFileEntry *)pCur;
    
            cbEntrySize = sizeof(VLogFileEntry) \
                + cbSiteGuid + cbTimeStamp + 2 \
                + pLFE->cbSiteUrl +  pLFE->cbWeb +  pLFE->cbDoc \
                + pLFE->cbUser +  pLFE->cbQS  +  pLFE->cbRef \
                + pLFE->cbUAS  + 9;   //7 NULLs and 2 bytes for \r\n
    
            // Check for corrupt log files
            fError  = (cbEntrySize > maxCbEntrySize ||
            !(*(pCur + sizeof(VLogFileEntry)) == '\r') ||
            !(*(pCur +  sizeof(VLogFileEntry) + 1) == '\n') ||
    
            !(pLFE->cbEntry == cbEntrySize));
    
            if (fError)
            {
                printf("Error reading Wss log file, aborting.\n");
                goto cleanup;
            }
    
            // Skip  2 bytes for \r\n
            pszSiteGuid = pCur + sizeof(VLogFileEntry) + 2;
            // Skip 1 byte for the NULL separator
            pszTS = pszSiteGuid + cbSiteGuid + 1;
            pszSite = pszTS + cbTimeStamp + 1;
            // Stop at the end of the site url 
            *(pszSite + pLFE->cbSiteUrl) = '\0';
            // Skip 1 byte for the NULL separator
            pszWeb = pszSite + pLFE->cbSiteUrl + 1;
            pszDoc  = pszWeb + pLFE->cbWeb + 1;
            pszUser = pszDoc + pLFE->cbDoc + 1;
    
            /* Output is in the format: timestamp, site guid, siteUrl, 
             subsite, document, user, optional1, optional2*/
            fprintf(csvFile, szFormat,
                        pszTS,
                        pszSiteGuid,
                        pszSite,
                        pszWeb,
                        pszDoc,
                        pszUser,
                        szOptionalField1,
                        szOptionalField2);
        }
    
        cleanup:
        UnmapViewOfFile(pBase);
        CloseHandle(hFM);
        CloseHandle(hF);
        fclose(csvFile);
        return fError;
    }
    
  5. On the Build menu, click Build Solution.

  6. At a command prompt, navigate to the folder containing the new .exe file of the project.

  7. At the prompt, type Project_Name.exe followed by a space, the complete path to a log file to parse followed by a space, and the location for creating the file. The following example specifies to create a .csv file in the root directory that contains usage information for April 1, 2004:

    WssLogParser.exe C:\WINDOWS\system32\LogFiles\STS\33AEF972-56BA-
       4294-98C7-0ACCF64585B8\2004-04-01\00.log c:\2004_04_01.csv
    

You can optionally pass two other parameters to include two additional fields in the .csv file, for example, the log date or the GUID of the virtual server, which both serve as part of the path to the log file. These extra parameters would be useful to keep the virtual server or day of the usage data clear in a scenario where the tool is used on many log files spanning multiple virtual servers or days, and the output is directed to a single .csv file.

Conclusion

The logs that Windows SharePoint Services generates provide the most convenient access to usage event logging for a site. The preceding example illustrates how to parse these logs and generate output in a specified format. You could create a tool that, instead of outputting data in .csv format, exports information to a database for further processing within the context of a larger operation.