Creating a 404 Error Tracker with Visual Basic 6.0

 

Aaron Bertrand
BlueStreak.com, Inc.

October 19, 1999

Introduction

In this article, I will walk you through the steps of building a Visual Basic® 6.0 ActiveX® dynamic-link library (DLL) that will make it easy and efficient for you to track and prevent 404 (Not Found) errors on your site. Sure, some stats packages will do parts of this work for you, but most are not flexible in their "out-of-the-box" feature set. Now you will have a chance to expand the techniques in this article to have your 404 tracking and trapping do just about anything you want it to do. Or, you could use the same techniques to build your own ActiveX DLL that has nothing to do with 404 errors.

You might have already asked yourself why I chose a DLL instead of simple Active Server Pages (ASP) code, or even an Include file. A compelling argument against using a DLL is the debugging process. Because Internet Information Server (IIS) holds DLLs in memory, and unless your application is set to run in a separate memory space from the other applications in your Web server, IIS must be restarted when you make changes to any DLL attached to the server. This slows the debugging process somewhat, particularly in the fine-tuning stages.

With those problems aside, I chose a DLL mainly for performance reasons. Compiled code is known to be more efficient than interpreted code, particularly with advanced tasks such as the one this DLL proposes. Also, aside from the obvious performance gains, by storing your business logic in a compiled format:

  • Your code cannot be easily "borrowed."
  • Your application cannot be easily compromised by someone who decides to modify your code.
  • Microsoft SQL Server passwords and other sensitive data are better shielded from prying eyes.

While these benefits are always important, they become even more so when your code is going to be distributed, placed on a shared server, or placed on a server completely outside of your control. In my case, I place the integrity of the code at a much higher priority than whether or not it will be stolen.

Prerequisites

Before we start, let's make sure you've got the goods. You will need the following items installed and running:

  • Microsoft® Windows NT® 4.0 (Workstation or Server), Service Pack 4 or later.
  • Windows NT 4.0 Option Pack (IIS/PWS 4.0).
  • The latest Microsoft Data Access Components (MDAC) release (2.1 Service Pack 2 at the time of writing).
  • Visual Basic 6.0, Service Pack 2 or later.
  • SQL Server 6.5 or later (you could use Microsoft Access, but in this article, SQL Server 7.0 and a stored procedure are used).

Note   If you are running Windows 2000, there are no service packs, you cannot update core MDAC files, and you will have to add IIS (if it isn't already installed).

To add IIS:

  1. In Control Panel, click the Add/Remove Programs icon.
  2. Select Windows Components, then click the Add/Remove button. This will open the Windows Components Wizard dialog box. (Do not attempt to install the Option Pack or MDAC updates on Windows 2000.)
  3. Select the Internet Information Services check box.
  4. Adjust Details as necessary.

Additionally, you will need to run this on a computer where you have appropriate permissions to do the following things:

  • Create a custom 404 page for your Web site(s).
  • Register a custom DLL on your server.
  • Restart the Web server (during development). I recommend using a workstation or noncritical Web server for development.

Quick Overview

Here are the steps involved in creating this system:

  1. Install prerequisites and verify task permissions.
  2. Enable the custom 404 error page through Internet Services Manager.
  3. Create and test the database and schema.
  4. Create a new ActiveX DLL project and add appropriate references.
  5. Write the code and compile the DLL.
  6. Register the DLL on the development computer.
  7. Edit the 404 error ASP page to instantiate the object.
  8. Before deploying to the production computer, test, test, test!

Preparing IIS

Create a file called 404.asp and place it in your root folder (or the folder of the application). This will serve as a replacement for the bland "404 Not Found" error page, and will be the calling point of the component we're creating. Feel free to customize it with graphics, a friendly "sorry" message, and of course, links to the rest of your site.

**Note   **If you use graphics or links on this page, remember to use absolute references or virtual reference "FROM THE ROOT." This file will "run" where it's called, so if you use <img src="images/logo.gif"> where logo.gif is in /images/logo.gif, then a 404 hit in the /test/ folder will result in one of those lovely broken images (since 404.asp is "running" in /test/ it is looking for /test/images/logo.gif instead of /images/logo.gif).

  1. Open Internet Services Manager.

If you want the file to apply to your entire site:

  • Right-click the machine name in IIS.
  • Click Properties.
  • Select Master Properties for the WWW Service, then click Edit.

If you want the file to apply to a specific application:

  • Right-click the name of the application in IIS.
  • Click Properties.
  1. Click the Custom Errors tab.
  2. Scroll down and select 404.
  3. Click the Edit properties button and change the Message Type to URL.
  4. Enter /404.asp or /<application>/404.asp as the URL.

Figure 1. Custom 404 error dialog

The beauty of this solution is that when you use an ASP page to process 404 errors, the query string includes the page that was being requested (and the headers often provide information for the page that contained the bad link). You can do many things with this information. On the front end, the most important thing would be to conditionally offer appropriate alternatives based on the requested page (the ultimate goal is to build a learning application that adapts this for you). On the back end for each request, you would need to notify the site with the bad link; maintain a database of frequently requested 404 pages (and their preferred alternate pages, if any); and replace those nonexistent pages with a redirect page (optional). In this article, we will simply be logging the 404 entries (future articles will cover these other areas).

Preparing SQL Server

The last thing you have to do before creating the DLL is set up a database to store the data. (I chose to use SQL Server because of its flexibility and scalability, though you can certainly use any ODBC-compliant database—as long as you are aware of possible limitations.) The database in this article is called "track404" and the table is called "stats." Stats will initially consist of six fields:

  • id (INT IDENTITY), which represents an index field.
  • ip (VARCHAR 15), which represents the user's IP address.
  • url (VARCHAR 255), which represents the URL they were trying to access.
  • ref (VARCHAR 255), which represents the URL they came from (e.g., a bad link).
  • dt (DATETIME), which represents the exact time they entered the site.
  • ua (VARCHAR 255), which represents the browser they were using.

Here is a screen shot of the table from within Enterprise Manager:

Figure 2. New table in Enterprise Manager

Next you need to add a user to this database. To do this:

  1. Under Security for this SQL Server Registration, select Logins and add a Login named "stats" (with any password you choose).
  2. Give the user permission on your new database.
  3. Once that user is added, right-click your new table and click Properties.
  4. Click the Permissions button in the top right corner, and give the user "stats" INSERT permissions.

Instead of relying on simple insert techniques, you can compile the typical insertion code into a stored procedure. It is a simple stored procedure that accepts five string parameters as inputs and inserts them into a table. This will be faster than a general insert, and while the difference is negligible now, the added efficiency of stored procedures will be significant as your tracking system gets more elaborate. Here is the code from the stored procedure:

CREATE PROCEDURE STrack404
   @ip varchar(15),
   @url varchar(255),
   @ref varchar(255),
   @dt char(20),
   @ua varchar(255)

  AS

   BEGIN
    INSERT INTO stats (ip,url,ref,dt,ua)
      VALUES (@ip,@url,@ref,@dt,@ua)
   END

To give executive permissions to the stats user:

  1. Under Stored Procedures in this database, right-click and select "New Stored Procedure"
  2. Name it STrack404, and insert the above code. Click OK.
  3. Right-click, then select Properties.
  4. Apply exec permissions for the "stats" user.

Finally, we need to create a connection string, which will be used in the component later. If you're not using integrated security, you must add the user ID (UID) and password (PWD) into the connection string, as shown in the following code example:

"driver={SQL Server};server=1.1.1.1;database=track404;uid=stats;pwd=xyz"

Test your connection string in a simple ASP page before proceeding, and make any necessary corrections before moving on. There's no greater frustration than having to recompile a DLL simply to straighten out a botched database connection string. Here is an example of what your test ASP page could look like:

<%
  set conn = createobject("adodb.connection")
  on error resume next
  conn.open "driver={SQL Server};server=1.1.1.1;database=track404;uid=stats;pwd=xyz"
  if err then
   response.write("There was an error:<br>" & err.description)
  else
   response.write("No problem.")
  end if
  set conn = nothing   
%>

Starting the DLL

Open Visual Basic and start a new ActiveX DLL project. The first thing you want to do when creating a project is right-click the Project Name and change it, as well as change the initial class name to something you'll remember. Naming a project "Project5" does very little to set it apart from "Project3" or "Project4." In this case, they are named "Bertrand" and "track404," respectively. (Incidentally, unless other project changes are made, this will cause the ProgID of the compiled DLL to be "Bertrand.track404," and that is what is used in the createObject() call from ASP. Also, the compiled DLL will be called "Bertrand.dll." You are more than welcome to use any name for your project and DLL. Just make sure it's unique so that you won't be crosswiring anything in the registry.

Next, select Project, then References, and add the following references:

  • Microsoft Active Server Pages Object Library.
  • Microsoft ActiveX Data Objects 2.1 Library.
  • Microsoft Transaction Server Type Library.

These references allow you to use the intrinsic ASP objects (Application, Request, Response, Server, and Session) and ADO objects (such as ADODB.Connection).

Note   In Windows 2000, as you can see in the following screenshot, the Microsoft Transaction Server Type Library reference has been replaced by "COM+ Services Type Library."

Figure 3. Adding references to your project

Into the Code

The first item we're going to need is the Request object. In this initial build, we have no need for Session and Application variables, nor for the Response and Server objects, so there's no point in introducing them into your code. The Request object will allow us to retrieve the user-specific information we'll want to track. We'll also set up our Connection object and Logic variables, as shown below.

Option Explicit

  ' Create the connection object:
  Public objConnection As ADODB.Connection

  ' Create the servervariable, SQL and datetime strings:
  Public strReferrer As String
  Public strQueryString As String
  Public strServerName As String
  Public strIPAddress As String
  Public strBrowser As String
  Public strSQL As String
  Public strNow As String

  ' Create the array to strip the server name out of the URL:
  Public vntServerName As Variant

  ' Create the boolean which will indicate whether to log or not:
  Public boolValid As Boolean

  ' Create the objects for getting at the ASP Request object
  Public objContext As ObjectContext
  Public objRequest As ASPTypeLibrary.Request

Below is a very simple function that grabs any Server variable and removes the always troublesome apostrophe, instead of performing this code each time it is required. I called this function getSV, and it is as follows:

  Public Function getSV(ByVal str As String) As String
     getSV = Replace(objRequest.ServerVariables(str), "'", "''")
  End Function

The above code retrieves information such as the user's IP address and the URL they were referred from (if that information is available). Now, create a method called Log()—; this is where all the action takes place. Once the Request object is created, we call getSV() to get the user-specific variables. We set boolValid = false so that the default is to NOT log this hit into the database unless the 404 page was accessed legitimately. For example, if someone just types in the URL of your 404 page, the hit shouldn't be logged.

Next you need to test that the length of the query string is at least four and that the first four characters are "404;". If both conditions are true, you then set the boolValid flag to true, and parse the URL for everything after the server name (that's the reason for vntServerName). For example, if the user attempts to get to the URL http://yourserver/page.asp, then the only relevant information is /page.asp—no sense filling up the database with http://yourserver/prefixes (unless, of course, you're using a common 404 page for several distinct domain names in a multiple host header situation, in which case you can simply remove the last if/end if block). As mentioned earlier, there are safeguards in place that do not log the entry if there is no query string. Additionally, we don't log the entry if the file requested was Favicon.ico (which happens when a user adds your site to their favorites in Internet Explorer 4.0 or 5.0). This will be addressed more in future articles.

  Public Function log() As Long

    ' get the Request object, and parse out the ServerVariables items:
    Set objContext = GetObjectContext()
    Set objRequest = objContext("Request")
    strQueryString = LCase(getSV("query_string"))
    strServerName = LCase(getSV("server_name"))
    strReferrer = LCase(getSV("http_referer"))
    strIPAddress = getSV("remote_addr")
    strBrowser = getSV("http_user_agent")
    strNow = FormatDateTime(Now(), vbShortDate) & " "
    strNow = strNow & FormatDateTime(Now(), vbShortTime)

    ' initialize boolValid as false until it passes
    boolValid = False

    ' make sure the querystring is at least 4 characters, otherwise
    ' they entered the URL directly and should not be logged
    If Len(strQueryString) > 4 Then

      ' Do not log if the beginning of the querystring is not 404
      If Left(strQueryString, 4) = "404;" then

        ' Do not log if it's a search for favicon.ico, which is what
        '  happens when someone adds your site to their favorites:
        If InStr(strQueryString, "favicon.ico") <= 0 Then
          strQueryString = Right(strQueryString, Len(strQueryString) - 4)
          boolValid = True

          ' Remove the server name from the URL in the QueryString
          If InStr(strQueryString, strServerName) > 0 Then
            vntServerName = Split(strQueryString, strServerName)
            strQueryString = vntServerName(1)
          End If
        End If
      End If
    End If

    ' Clean up ObjectContext
    Set objRequest = Nothing
    Set objContext = Nothing

    ' Log the 404 hit, if valid:
    If boolValid Then
      strSQL = "exec STrack404"
      strSQL = strSQL & "  @ip  ='" & strIPAddress & "'"
      strSQL = strSQL & ", @url ='" & strQueryString & "'"
      strSQL = strSQL & ", @ref ='" & strReferrer & "'"
      strSQL = strSQL & ", @dt  ='" & strNow & "'"
      strSQL = strSQL & ", @ua  ='" & strBrowser & "'"
      Set objConnection = CreateObject("adodb.connection")
      objConnection.open "driver={SQL Server};server=1.1.1.1;database=track404;uid=stats;pwd=xyz"
      objConnection.Execute (strSQL)
      objConnection.Close
      Set objConnection = Nothing
    End If
  End Function

So in summary, here's the entire listing of track404.cls (comments removed):

  Option Explicit

  Public objConnection As ADODB.Connection

  Public strReferrer As String
  Public strQueryString As String
  Public strServerName As String
  Public strIPAddress As String
  Public strBrowser As String
  Public strSQL As String
  Public strNow As String

  Public vntServerName As Variant

  Public boolValid As Boolean

  Public objContext As ObjectContext
  Public objRequest As ASPTypeLibrary.Request

  Public Function getSV(ByVal str As String) As String
    getSV = Replace(objRequest.ServerVariables(str), "'", "''")
  End Function

  Public Function log() As Long
    Set objContext = GetObjectContext()
    Set objRequest = objContext("Request")
    strQueryString = LCase(getSV("query_string"))
    strServerName = LCase(getSV("server_name"))
    strReferrer = LCase(getSV("http_referer"))
    strIPAddress = getSV("remote_addr")
    strBrowser = getSV("http_user_agent")
    strNow = FormatDateTime(Now(), vbShortDate) & " "
    strNow = strNow & FormatDateTime(Now(), vbShortTime)
    boolValid = False
    If Len(strQueryString) > 4 Then
      If Left(strQueryString, 4) = "404;" then
        If InStr(strQueryString, "favicon.ico") <= 0 Then
          strQueryString = Right(strQueryString, Len(strQueryString) - 4)
          boolValid = True
          If InStr(strQueryString, strServerName) > 0 Then
            vntServerName = Split(strQueryString, strServerName)
            strQueryString = vntServerName(1)
          End If
        End If
      End If
    End If
    Set objRequest = Nothing
    Set objContext = Nothing
    If boolValid Then
      strSQL = "exec STrack404"
      strSQL = strSQL & "  @ip  ='" & strIPAddress & "'"
      strSQL = strSQL & ", @url ='" & strQueryString & "'"
      strSQL = strSQL & ", @ref ='" & strReferrer & "'"
      strSQL = strSQL & ", @dt  ='" & strNow & "'"
      strSQL = strSQL & ", @ua  ='" & strBrowser & "'"
      Set objConnection = CreateObject("adodb.connection")
      objConnection.open "driver={SQL Server};server=1.1.1.1;database=track404;uid=stats;pwd=xyz"
      objConnection.Execute (strSQL)
      objConnection.Close
      Set objConnection = Nothing
    End If
  End Function

That's it for the code. Select File, then Make Bertrand.dll, and you're almost set! Now, in case this doesn't compile for you right away, code for a batch file is also included that will allow you to quickly restart IIS and other dependant services. (As previously noted, once a DLL is attached to IIS, it's very difficult to "detach" it.) A shortcut to this batch file is sitting on my taskbar, which should tell you how often I use it. It is called Restart.bat and contains the following code:

  net stop iisadmin /y
  net start w3svc
  net start msftpsvc
  net start smtpsvc

The last two commands are only necessary if you run a File Transfer Protocol (FTP) service and Simple Mail Transfer Protocol (SMTP) service (our workstations here, for the most part, only require the first two lines). Once the Web server has been stopped, you can recompile your DLL without getting a Permission Denied error.

The final two steps are registering your new DLL and writing the ASP code to call it. Click the Start button, point to Run, and type in the following (replace <path> with the absolute path where you saved your DLL):

  regsvr32 <path>\bertrand.dll

Here's the ASP code to insert into your custom 404 page:

  <%
    set track = server.createobject("bertrand.track404")
    track.log()
    set track = nothing
  %>

Now that may not be everything. Once you port this DLL to the production server, you may have problems if the correct Visual Basic run times are not installed. You can get around this either by using the package and deployment wizard, by installing Visual Basic 6.0, or (recommended) by installing the Visual Basic 6.0 run time library on the server (you can Download the run time library here and read about it in VBRun60sp3.exe Installs Visual Basic 6.0 Run-Time Files).

Conclusion

Well, happy coding. Within no time you should have a stats table full of 404 information, just waiting to be extracted. Here is a sample ASP 404 page, as well as a small sampling of test data retrieved from the SQL Server database instantly:

Figure 4. Sample 404 page

Stay tuned for the next article, where I will enhance this component to provide a viewing and maintenance interface for the stats, and reveal methodology behind making this application smarter. In the meantime, you could expand this technique to other kinds of errors.

Additional Information

For more information on creating ActiveX DLLs with Visual Basic 6.0 and IIS4+, and creating custom 404 error messages, visit the following sites:

About the Author

Aaron Bertrand (aaron@desktop.on.ca) is an ASP/SQL developer for BlueStreak.com, Inc., and runs an ASP Tips site at http://www.aspfaq.com/.