.NET Matters

BigInteger, GetFiles, and More

Stephen Toub

Code download available at:NETMatters0512.exe(123 KB)

Q I want to do some work with large numbers, larger than the sizes allowed by UInt64. Does the Microsoft® .NET Framework have any support for this?

A The base class libraries (BCL) don't expose publicly any big number libraries, although some of the cryptography-related classes do make use of an internal big number implementation. However, all is not lost. While you can search the Web to find a plethora of implementations in C#, C++, and a variety of other languages, it might not be necessary. If you don't mind taking a dependency on the J# libraries, you already have a big number implementation at your disposal. In fact, you have two.

The J# run-time library, vjslib.dll, is available as a redistributable component, just like the .NET Framework. You can download it from Visual J# Downloads (it's also installed as a prerequisite by Visual Studio®). In the same manner that a C# or C++ application can make use of Microsoft.VisualBasic.dll (the Visual Basic run-time library), C#, Visual Basic®, and C++ applications can use the J# run-time library and the numerous interesting classes it exposes.

Some folks use the J# Zip libraries to meet their compression requirements (see Zip Your Data: Using the Zip Classes in the J# Class Libraries to Compress Files and Data with C#), but beyond that there are some very interesting gems hidden in the library. For your needs, I suggest you see the java.math.BigInteger class (a BigDecimal class is also available). Here's an example of using BigInteger, showing that it can be used with values larger than the largest UInt64:

BigInteger i = new BigInteger(ulong.MaxValue.ToString()); BigInteger iSquared = i.multiply(i); Console.WriteLine("i:\t" + i);Console.WriteLine("i^2:\t" + iSquared);

This outputs:

i: 18446744073709551615 i^2: 340282366920938463426481119284349108225

BigInteger exposes a large number of operations. This includes the standard operations, such as addition, subtraction, multiplication, division, and modulation, but it also exposes functionality such as the ability to find the GCD of two BigIntegers, primality testing, bit set testing, and conversion to other data types. All in all, it's a very useful class.

Of course, an answer on this subject wouldn't be complete without a strict warning. Most of the time I hear people asking for such primitives, it's because they want to roll their own cryptographic operations. Don't do it. You'll quite likely be opening yourself up to a world of hurt. Instead, use the classes in the System.Security.Cryptography namespace. If they don't meet your needs (which they should most of the time), consider using the unmanaged Crypto API. If that doesn't meet your needs, look into implementations from trusted vendors. Don't roll your own.

While you're considering exploring vjslib.dll, look around a bit. There are a lot of interesting classes available there that might help you in your projects.

Q I have an attribute that I use on classes to help with my development process, but I don't want this information exposed to the public. As such, I've been wrapping all applications of the attribute to classes with compilation directives such that they're only applied in debug builds:

#if DEBUG[MySecretAttribute("private data")] #endif public class MyClass { ... }

Is this the best way?

A In the .NET Framework 1.x, yes, you're not going to find anything much better. However, the .NET Framework 2.0 adds support for applying conditional attributes to attributes. In other words, you can tag your custom attribute with ConditionalAttribute ("DEBUG"), and any applications of your attribute will only be compiled into builds with the DEBUG flag set. To exemplify this, I've compiled the code in Figure 1 into a debug build as well as into a release build. Figure 2 shows the results of running ILDASM on each of the resulting executables. The top screenshot is the debug build and, as you can see, the attribute was compiled into the build. The bottom screenshot is the release build and, as desired, it lacks the attribute. This is a very welcome change to the compiler.

Figure 1 Using Conditional Compilation with Attributes

using System; using System.Diagnostics; [MySecret("private data")] class Program { static void Main() { } } [AttributeUsage(AttributeTargets.Class)][Conditional("DEBUG")] class MySecretAttribute : Attribute { public MySecretAttribute(string value) { _value = value; } private string _value; public string Value { get { return _value; } } }

Figure 2 ILDASM of Debug and Release Builds

Figure 2** ILDASM of Debug and Release Builds  **

Q I'm working on a utility and, as part of that utility, I need to display the contents of a file in a nice fashion, ideally in hexadecimal. What are my options?

A That's a very generic request, but what you want to do is certainly possible. First, if you just want to get the contents of the file as a string of hexadecimal numbers, and if you don't care much about the formatting, you can simply use the System.BitConverter class:

byte [] data = ...; string result =BitConverter.ToString(data);

If you need more control, you can format the string however you like by manually iterating over the data:

byte [] data = ...; StringBuilder text = newStringBuilder(data.Length*2); foreach(byte b in data) text.AppendFormat("{0:X2}", b); string result = text.ToString();

However, I'll take this opportunity to discuss two things that are slightly more complex. The first is a class that has gone largely unnoticed in the .NET Framework but which I think deserves more recognition, and the second is the new debugger visualizations feature available in Visual Studio 2005, one of the most useful additions to the new version.

Did you know that System.Design.dll contains a Windows® Forms control ready-made for viewing the contents of files? System.ComponentModel.Design.ByteViewer has been available in the .NET Framework since 1.x, but I've rarely seen anyone use it. The control has a SetBytes method that accepts a byte array to be rendered. It also has a SetFile method that accepts the full path to a file as a string and loads the contents of the file (of course, this is just as easy to do in the .NET Framework 2.0 using the new File.ReadAllBytes method).

To show off its usefulness, I've created a debugger visualizer for visualizing sequences of bytes. Debugger visualizers are a feature of the Visual Studio debugger interface, displaying an object in a meaningful way that is appropriate to the object's data type. For example, Visual Studio comes with visualizers that allow you to view managed strings as HTML, XML, or plaintext. Writing visualizers entails writing a piece of code to execute in the context of the debugger, and possibly a piece of code to run within the context of the application. For my purposes, I only need to create the debugger-side code.

This visualizer can be used for Memory Streams and FileInfo objects (byte arrays would be a natural addition, but unfortunately Visual Studio 2005 doesn't support debugger visualizers for array types). For the Memory Stream, the visualizer displays the contained bytes in the ByteViewer by passing the retrieved byte array to the viewer's SetBytes method. For FileInfo objects, it passes the FileInfo's FullName property to the SetFile method.

Figure 3 shows my implementation. To implement a simple visualizer like this one, you need to create a class that derives from DialogDebuggerVisualizer, a class contained in Microsoft.VisualStudio.DebuggerVisualizers.dll, which is installed with Visual Studio 2005. The derived class overrides the Show method, which accepts two parameters.

Figure 3 ByteArrayVisualizer

using System; using System.IO; using System.Windows.Forms; using System.ComponentModel.Design; usingMicrosoft.VisualStudio.DebuggerVisualizers; [assembly: System.Diagnostics.DebuggerVisualizer( typeof(ByteArrayVisualizer), typeof(VisualizerObjectSource), Target = typeof(FileInfo), Description = ByteArrayVisualizer.Description)] [assembly: System.Diagnostics.DebuggerVisualizer( typeof(ByteArrayVisualizer), typeof(VisualizerObjectSource), Target = typeof(MemoryStream), Description = ByteArrayVisualizer.Description)] public class ByteArrayVisualizer : DialogDebuggerVisualizer { public const string Description = "Byte Array Visualizer"; protected override void Show( IDialogVisualizerService windowService, IVisualizerObjectProvider objectProvider) { object data = objectProvider.GetObject(); using (Form f = new Form()) using (ByteViewer viewer = new ByteViewer()) { if (data is FileInfo) viewer.SetFile(((FileInfo)data).FullName); else if (data is MemoryStream) viewer.SetBytes(((MemoryStream)data).ToArray()); else return; viewer.SetDisplayMode(DisplayMode.Hexdump); viewer.Dock = DockStyle.Fill; f.FormBorderStyle = FormBorderStyle.SizableToolWindow; f.Text = "Byte Array Viewer"; f.ClientSize = viewer.Size; f.Controls.Add(viewer); windowService.ShowDialog(f); } } }

Figure 4 Selecting a Visualizer in Visual Studio

Figure 4** Selecting a Visualizer in Visual Studio **

The first parameter is of type IDialogVisualizerService; its ShowDialog method can be used to display a CommonDialog or a Control in the debugger. The second parameter is of type IVisualizerObjectProvider, and can be used to retrieve the data for the object to be visualized. Once the Show method has been implemented to display the necessary information, the only code left to write is the assembly attributes that inform Visual Studio how and where this visualizer can be used. As my visualizer can be used with two different types, I have two DebuggerVisualizerAttribute instances at the assembly level, informing the debugger that ByteArrayVisualizer should be usable for the two corresponding types. After compiling the code, copy the compiled assembly to your My Documents\Visual Studio\Visualizers directory, and it'll be available the next time you run Visual Studio 2005 (see Figure 4 and Figure 5).

Figure 5 Byte Array Visualizer in Action

Figure 5** Byte Array Visualizer in Action **

Q I'd like to use DirectoryInfo.GetFiles to enumerate all of the files in a hierarchy of folders, but I have a few concerns. First, GetFiles only appears to enumerate the files in the specified folder itself and not those within subfolders. Second, I only need to do a little work for each of the files, but in some of my directories I have thousands of files, and it seems like a waste to load them all into an array before processing any. Suggestions?

A The .NET Framework 2.0 already addresses your first concern. A new overload has been added to Directory and DirectoryInfo that accepts a value from the new SearchOption enumeration. SearchOption exposes two values: TopDirectoryOnly and AllDirectories. The 1.x behavior you're used to is equivalent to specifying SearchOption.TopDirectoryOnly; specifying AllDirectories causes GetFiles to traverse into subdirectories as well. So, for example, to find all of the files on your C drive, you could write code like:

FileInfo [] files = Directory.GetFiles( "C:\\", "*.*", SearchOptions.AllDirectories);

Of course, as you point out, this will end up finding all files on your disk and creating an array out of all of them before it returns any to you. Moreover, if you don't have access to some of those subdirectories, the whole operation could result in an exception—that one forbidden directory preventing you from retrieving a list of any of them.

I'll show you how to overcome some of these obstacles, but before continuing, make sure these really are problems for you. Redeveloping Directory.GetFiles might be a fun learning project, but the code is actually quite difficult to get right, given the myriad of security issues you need to be concerned about.

Directory.GetFiles relies on the Win32® functions FindFirstFile, FindNextFile, and FindClose, all of which are exposed from kernel32.dll. FindFirstFile searches a directory for the first file whose name matches the specified pattern. It returns a handle that allows for further files to be enumerated by the FindNextFile function, and that handle can be closed using FindClose. My implementation also uses these functions. In order to ensure that the handle returned from FindFirstFile is properly released, I wrap it in a SafeHandle (for more information on safe handles, see my article on reliability in the October 2005 issue of MSDN®Magazine, available at High Availability: Keep Your Code Running with the Reliability Features of the .NET Framework). My SafeFindHandle is shown in Figure 6, and is used in the FindFirstFile and FindNextFile signatures as follows:

[DllImport("kernel32.dll",CharSet=CharSet.Auto, SetLastError=true)] static extern SafeFindHandle FindFirstFile(string lpFileName, [In, Out, MarshalAs(UnmanagedType.LPStruct)] WIN32_FIND_DATA lpFindFileData); [DllImport("kernel32.dll", CharSet=CharSet.Auto, SetLastError=true)] static extern bool FindNextFile(SafeFindHandle hndFindFile, [In, Out, MarshalAs(UnmanagedType.LPStruct)] WIN32_FIND_DATA lpFindFileData);

Figure 6 SafeHandle for Use with FindClose

sealed class SafeFindHandle : SafeHandleZeroOrMinusOneIsInvalid { [SecurityPermission(SecurityAction.LinkDemand, UnmanagedCode=true)] privateSafeFindHandle() : base(true) { } protected override bool ReleaseHandle() { return FindClose(this.handle); } [DllImport("kernel32.dll")] [ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)] private static extern bool FindClose(IntPtr handle); }

In order to allow intermediate results to be returned, I've implemented my GetFiles as a C# iterator, as shown in Figure 7. This allows me to yield results as I come across them, with the C# compiler implementing the necessary state machine rather than forcing me to do it manually. To begin, a stack of DirectoryInfo objects is created, and the initial directory is pushed onto it. In this fashion, I maintain an explicit stack of directories to be examined, including subdirectories, rather than using recursion, which can add an extra expense if the file hierarchy is particularly deep (for more information, see Recursive iterator performance, part 3). While there are still directories to be examined, I pull the top one off the stack and enumerate all of the files it contains.

Figure 7 Iterator Using FindFirstFile and FindNextFile

public static IEnumerable<FileInfo> GetFiles( DirectoryInfo dir, string pattern, SearchOption searchOption) { if (dir == null) throw newArgumentNullException("dir"); if (pattern == null) throw new ArgumentNullException("pattern"); WIN32_FIND_DATA findData = new WIN32_FIND_DATA(); Stack<DirectoryInfo> directories = new Stack<DirectoryInfo>(); directories.Push(dir); ErrorModes origErrorMode = SetErrorMode(ErrorModes.FailCriticalErrors); try { while (directories.Count > 0) { dir = directories.Pop(); string dirPath = dir.FullName.Trim(); if (dirPath.Length == 0) continue; char lastChar = dirPath[dirPath.Length - 1]; if (lastChar != Path.DirectorySeparatorChar && lastChar != Path.AltDirectorySeparatorChar) { dirPath += Path.DirectorySeparatorChar; } SafeFindHandle handle = FindFirstFile( dirPath + pattern, findData); if (handle.IsInvalid) { Errors error = (Errors)Marshal.GetLastWin32Error(); if (error == Errors.AccessDenied || error == Errors.FileNotFound) continue; else throw new Win32Exception((int)error); } else { try { do { if ((findData.dwFileAttributes & FileAttributes.Directory) == 0) { yield return new FileInfo( dirPath + findData.cFileName); } } while (FindNextFile(handle, findData)); Errors error = (Errors) Marshal.GetLastWin32Error(); if (error != Errors.NoMoreFiles) throw new Win32Exception((int)error); } finally { handle.Dispose(); } } if (searchOption == SearchOption.AllDirectories) { foreach (DirectoryInfo childDir in dir.GetDirectories()) { if ((File.GetAttributes(childDir.FullName) & FileAttributes.ReparsePoint) == 0) { directories.Push(childDir); } } } } } finally { SetErrorMode(origErrorMode); } }

First, I ensure that the directory name ends with a path separator character (typically a '\'). I then pass this path along with the search pattern to the FindFirstFile function, getting back a SafeFindHandle. If the handle is invalid, then I examine the failure that caused this result. If FindFirstFile failed because it was denied access to the directory, I simply ignore this directory and continue to examine any others I might have come across. If it failed for any other reason, I abort the operation with an exception.

If the handle is valid, then FindFirstFile will have filled the WIN32_FIND_DATA structure I passed to it. After validating that the file found was not in fact a directory, I wrap the path in a FileInfo and yield it to the caller of this iterator. Then, while FindNextFile returns true (meaning that it continues to find files), I continually yield the relevant FileInfo objects.

When FindNextFile eventually returns false, I verify that it stopped because it ran out of files, and not because of some other error. At this point, the SafeFindHandle is no longer needed and can be disposed (if an exception were to occur that prevented the finally block containing the call to Dispose from running, the SafeFindHandle's critical finalizer would ensure it was eventually cleaned up after).

Now, if the user asked for all the directories' files to be enumerated rather than just the top-level directory's files, the DirectoryInfo's GetDirectories method would be used to retrieve a list of all of the directories (this could have been done explicitly using FindFirstFile and FindNextFile, as was done for files), each of which is added to the stack of directories to be examined. The only ones that aren't added are reparse points. Reparse points are collections of user-defined data and are used to implement NTFS links. These can cause cycles to occur in directory structures, so to avoid stumbling across one and ending up in an infinite operation, I've ignored them completely.

You'll notice I'm also making use of the SetErrorMode function exported from kernel32.dll. This is to prevent the system from directly notifying the user of the application when a serious I/O related error has occurred, such as when the code attempts to enumerate the files on a drive that is unavailable. For more information, see my .NET Matters column from the January 2005 issue of MSDN Magazine.

With that in place, my implementation is complete. To take advantage of it, I can simply use it in a foreach loop:

foreach (FileInfo fi in FileSearcher.GetFiles(new DirectoryInfo(@"C:\"), "*.*", SearchOption.AllDirectories)) { Console.WriteLine(fi.FullName); }

This implementation does have some advantages. First and foremost, it allows you access to each file path as it is discovered, rather than forcing you to wait until all files have been found. Second, it doesn't bail when a directory is encountered that can't be enumerated, although an exception might be the behavior you desire in that situation. Third, you can modify this implementation to account for your particular needs; for example, by using FindFirstFileEx instead of FindFirstFile if you need more control over filtering, or by specifying an additional search pattern that should be used to determine which subdirectories to examine (rather than examining all of them).

However, there are disadvantages to this approach, and I strongly urge you to consider using Directory.GetFiles or DirectoryInfo.GetFiles if they work for your application's needs. First, the classes in System.IO have been optimized and tested for years by millions of applications using the .NET Framework. Second, I make no attempt to incorporate code access security (CAS) support into my implementation as well as the Framework has. When you use Directory.GetFiles or DirectoryInfo.GetFiles, your application only needs the relevant FileIOPermissions for the operation. My implementation requires UnmanagedCode SecurityPermission, a much more privileged permission that is rarely granted to anything but full-trust applications. I could assert that permission and then demand the appropriate file permissions, but in order to do that I would need to perform a significant amount of validation on the supplied directory and search paths in order to ensure that the user is not circumventing CAS. The classes in System.IO do this, and they do it very well.

Send your questions and comments to  netqa@microsoft.com.

Stephen Toub is the Technical Editor for MSDN Magazine.