Lightning Strings

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

VBA Tech

This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.

Fast, Undocumented String-handling Techniques

By Steven Roman, Ph.D.

I recently finished writing a book entitled Win32 API Programming with Visual Basic (O'Reilly & Assoc., 1999). The most frequently asked question in connection with this book is "Why would I, as a VB/VBA programmer, want to use the Win32 API?"

There are many ways to answer this question. The following are a few:

  • Using the Win32 API, a VB programmer can manipulate the user interface more completely than with VB alone. For instance, the Win32 API makes it relatively easy to add tab stops or horizontal scroll bars to a list box, or use a bitmap as a menu item.
  • The Win32 API allows the VB/VBA programmer to get more information about the state of the system - information such as the version of the operating system, a list of installed printers and fonts, or the number of buttons on the mouse. It can also be used to get a list of all open windows or all running applications.
  • The Win32 API can be used to dig deeply into the operating system. For instance, it can be used to subclass a control to change its behavior, to hook the operating system in order to watch for keystrokes or mouse actions and possibly alter their behavior, or to extract data from controls in foreign processes. You can even force one application to run code written from another application.

The book delves into all these aspects of the Win32 API and more. In this article, however, I want to show you a very simple, but very important, use of the Win32 API: sorting VB strings.

FIGURE 1 shows the results of a simple program that sorts two string arrays: an array consisting of 100 short strings, each of length 10,000 characters, and an array consisting of 100 long strings, each of length 100,000 characters.


FIGURE 1: A sorting program.

The application uses two methods for sorting. Both methods (slow and quick) use the same simple sorting algorithm, which puts the smallest string in the first position, then puts the second smallest string in the second position, and so on. The pseudocode is:

For i = 1 To NumStrings
        For j = i + 1 To NumStrings
          If strings(i) > strings(j) Then
            swap string(i) and string(j) 
          End If
        Next
      Next

This sorting algorithm isn't very efficient, but the algorithm isn't the important issue here. Rather, it's the method used to swap adjacent strings when required. Indeed, even more efficient algorithms, such as the venerable quicksort method, require swapping.

The swapping is done two ways. The slow way using VB assignments:

' Swap strings s and t. 
      temp = s
      s = t
      t = temp

and the quick way using the Win32 API function CopyMemory:

CopyMemory lng, 
      ByVal VarPtr(s), 4
      CopyMemory 
      ByVal VarPtr(s), ByVal 
      VarPtr(t), 4
      CopyMemory 
      ByVal VarPtr(t), lng, 4

As you can see from FIGURE 1, for the long string array, the quick method (using CopyMemory) is 500 times faster than the slow method, even on a rather small 100-item array. Moreover, the time it takes the quick method does not depend upon the length of the strings. Wow!

To understand how the quick method works, we need to take a look at the internal nature of VB strings. Before doing that, however, let's take a look at the Win32 API function CopyMemory.

CopyMemory - A VB Hacker's Dream

The purpose of CopyMemory is simply to copy a block of memory byte-by-byte from one memory address to another. This opens up a whole new set of possibilities for VB programmers, because VB doesn't have this sort of capability, except in the rather restricted form of LSet. Even then, the documentation recommends against using LSet for this purpose.

The simplest VB declaration for CopyMemory is:

Declare SubCopyMemory  Lib "kernel32" _
          Alias"RtlMoveMemory" (lpDest As Any, _
          lpSource As Any, ByValcbCopy As Long)

In this case, lpDest is the address of the first byte of the destination memory, lpSource is the address of the first byte of the source memory, and cbCopy is the number of bytes to copy.

This VB declaration is a bit dangerous, because the As Any form tells VB to skip any type checking, and an invalid type can lead to the dreaded General Protection Fault. Thus, great care must be taken when using this declaration. (Be sure to save all programs before running code containing this function.) We can (and will) override the default ByRef setting by including ByVal in the call to this function, as in:

CopyMemory lng,  AnAddress, 4

VB Strings

Let's now turn to a discussion of VB strings. We'll also discuss the very useful, but undocumented, VB functions VarPtr and StrPtr. I devote a 40-page chapter in my book to VB strings. Here's a very abbreviated version.

The VB string data type, BSTR, is shown in FIGURE 2.


FIGURE 2: The BSTR data type.

The string in this figure corresponds to the following VB code:

Dim str As String
      str = "help" 

There are several important things to note about the BSTR data type:

  • A BSTR is actually a pointer variable. It has a size of 32 bits, like all pointers, and points to the first byte in a Unicode character array. Thus, a Unicode character array and a BSTR are not the same thing. This can cause great confusion, because the term string sometimes refers to the BSTR and sometimes to the character array. To be absolutely clear, we'll use the term VB string to refer to the BSTR, not the character array.
  • The Unicode character array that is pointed to by a BSTR must be preceded by a 4-byte length field and terminated by a single, null, 2-byte character (ANSI = 0).
  • There may be additional null (2-byte) characters anywhere within the Unicode character array, so we cannot rely on a null character to signal the end of the character array. This is why the length field is vital.
  • The length field contains the number of bytes (not the number of characters) in the character array, excluding the terminating null bytes. Because the array is Unicode, the character count is one-half the byte count.

Let's emphasize that code such as:

Dim str As String
      str = "help" 

means that str is the name of a BSTR, not a Unicode character array. In other words, str is the name of the variable that holds the address xxxx, as shown in FIGURE 2. (Of course, the variable str has its own address, denoted by aaaa in FIGURE 2.)

Here is a brief experiment we can do to test the fact that a VB string is a pointer to a character array and not a character array. Consider the following code, which defines a structure whose members are strings:

Private Type utTest
       astring As String
       bstring As String
     End Type
      Dim uTest As utTest Dim s as String
      s = "testing" uTest.astring = "testing" uTest.bstring = "testing" 
      Debug.Print Len(s) Debug.Print Len(uTest)

The output from this code is:

7
8

In the case of the string variable s, the Len function reports the length of the character array. In this case, there are seven characters in the character array "testing". In the case of the structure variable uTest, however, the Len function actually reports the length of the structure (in bytes). The return value 8 clearly indicates that each of the two BSTRs has length 4. This is because a BSTR is a pointer.

VarPtr and StrPtr

The functions VarPtr and StrPtr aren't documented by Microsoft, but they can be very useful in manipulating BSTRs.

If var is any variable, then:

VarPtr(var) 

is the address of that variable, returned as a long. If str is a BSTR variable then:

StrPtr(str) 

is the contents of the BSTR, which, as we've seen, is the address of the Unicode character array pointed to by the BSTR.

Let's verify these statements using the string in FIGURE 2. Note that the variable str has address aaaa, and the character array begins at address xxxx, which is the contents of the pointer variable str.

To see that:

VarPtr(str) = aaaa
      StrPtr(str) = xxxx

run the code in FIGURE 3.

Dim lng As Long, i  As Integer, s As String
      Dim b(1 To 10) As Byte
      Dim sp As Long, vp As Long
       s = "help" 
       sp = StrPtr(s) Debug.Print "StrPtr:" & sp
       vp = VarPtr(s) Debug.Print "VarPtr:" & vp
       ' Verify that sp = xxxx and vp = aaaa
      ' by moving the long at address vp
      ' to the variable lng and then comparing it to sp. 
      CopyMemory lng, ByVal vp, 4 Debug.Print lng = sp
       ' To see that sp contains address of char array, 
      ' copy from that address to a byte array and print
      ' the byte array. We should get "help" in Unicode. 
      CopyMemory b(1), ByVal sp, 10 
      For i = 1 To 10
         Debug.Print b(i); 
      Next

FIGURE 3:Verify that sp = xxxx and vp = aaaa.

A sample of the output is:

StrPtr:1836612
      VarPtr:1243988
      True
       104  0  101  0  108  0  112  0  0  0

Swapping Strings Using CopyMemory

Now we have the necessary background to understand our string sorting application. As mentioned earlier, the only difference between the slow and quick sorting methods lies in how they handle string swapping. The slow method uses the obvious approach:

' Swap strings s and t. 
      temp = s
      s = t
     t = temp

Unfortunately, for each string assignment:

str1 = str2

VB must make a copy of the entire Unicode array pointed to by the BSTR str2 and assign a BSTR str1 that points to the copied array. It's clear that this process is very time consuming, and depends on the length of the strings.

On the other hand, the quick method uses the swapping code:

CopyMemory lng,  
      ByVal VarPtr(s), 4
      CopyMemory  
      ByVal VarPtr(s),  ByVal 
      VarPtr(t), 4
      CopyMemory  
      ByVal VarPtr(t), lng, 4

This code simply swaps the contents of the BSTR's s and t. That is, it swaps the addresses of the corresponding Unicode arrays. In this way, we only need to swap 4-byte addresses, no matter how long the Unicode arrays may be.

In the first line of code, the long variable lng will receive the address of the first Unicode array. Because this address is stored in the BSTR s, we pass the address of s by value.

Actually, you might think that the code:

CopyMemory lng, s, 4

would also work, but it doesn't. In brief, the reason is that when VB sees that a string is being passed to an API function, it makes a copy of the array in ANSI format (rather than Unicode) and passes the ANSI version to the function. (For a more detailed discussion of this issue, please see my book.)

The remaining two lines of code complete the swapping.

Conclusion

It is usually said that the Win32 API can be useful to VB/VBA programmers who want to delve more deeply into the Windows operating system and do things that cannot be done with VB alone. Here is an example, however, of a case where VB can do the job, but the Win32 API can do it much, much better.

Dr Steven Roman is an Emeritus Professor at the California State University, Fullerton. He has written 35 books, including Access Database Design & Programming [1999], Writing Word Macros [1999], Writing Excel Macros [1999], Developing Visual Basic Add-Ins [1999], and Win32 API Programming with Visual Basic [1999], all published by O'Reilly & Associates, and Concepts of Object-Oriented Programming with Visual Basic [1997] and Understanding Personal Computer Hardware [1998], both published by Springer-Verlag. He has written a special object library browser, called Object Model Browser, that displays a structured view of object libraries, rather than the usual flat view. For more information about Dr Roman and his books, articles, and software, please visit his Web site at http://www.romanpress.com/.