How to: Combine LINQ Queries with Regular Expressions

This example shows how to use the Regex class to create a regular expression for more complex matching in text strings. The LINQ query makes it easy to filter on exactly the files that you want to search with the regular expression, and to shape the results.

Example

Class LinqRegExVB

    Shared Sub Main()

        ' Root folder to query, along with all subfolders. 
        ' Modify this path as necessary. 
        Dim startFolder As String = "C:\program files\Microsoft Visual Studio 9.0\" 

        ' Take a snapshot of the file system. 
        Dim fileList As IEnumerable(Of System.IO.FileInfo) = GetFiles(startFolder)

        ' Create a regular expression to find all things "Visual". 
        Dim searchTerm As System.Text.RegularExpressions.Regex = _
            New System.Text.RegularExpressions.Regex("Visual (Basic|C#|C\+\+|J#|SourceSafe|Studio)")

        ' Search the contents of each .htm file. 
        ' Remove the where clause to find even more matches! 
        ' This query produces a list of files where a match 
        ' was found, and a list of the matches in that file. 
        ' Note: Explicit typing of "Match" in select clause. 
        ' This is required because MatchCollection is not a  
        ' generic IEnumerable collection. 


        Dim queryMatchingFiles = From afile In fileList _
                                Where afile.Extension = ".htm" _
                                Let fileText = System.IO.File.ReadAllText(afile.FullName) _
                                Let fileNameMatches As System.Text.RegularExpressions.MatchCollection = searchTerm.Matches(fileText) _
                                Where (searchTerm.Matches(fileText).Count > 0) _
                                Select New With {.Name = afile.FullName, _
                                       .Matches = From match In fileNameMatches _
                                                  Let match2 As System.Text.RegularExpressions.Match = CType(match, System.Text.RegularExpressions.Match) _
                                                 Select match2.Value}

            ' Execute the query.
        Console.WriteLine("The term " & searchTerm.ToString() & " was found in:")

        For Each fileMatches In queryMatchingFiles
            ' Trim the path a bit, then write  
            ' the file name in which a match was found. 
            Dim s = fileMatches.Name.Substring(startFolder.Length - 1)
            Console.WriteLine(s)

            ' For this file, write out all the matching strings 
            For Each match In fileMatches.Matches
                Console.WriteLine("  " + match)
            Next 
        Next 

            ' Keep the console window open in debug mode
        Console.WriteLine("Press any key to exit")
        Console.ReadKey()
    End Sub 

    ' Function to retrieve a list of files. Note that this is a copy 
    ' of the file information. 
    Shared Function GetFiles(ByVal root As String) As IEnumerable(Of System.IO.FileInfo)
        Return From file In My.Computer.FileSystem.GetFiles _
                  (root, FileIO.SearchOption.SearchAllSubDirectories, "*.*") _
               Select New System.IO.FileInfo(file)
    End Function 

End Class
class QueryWithRegEx
{
    public static void Main()
    {
        // Modify this path as necessary. 
        string startFolder = @"c:\program files\Microsoft Visual Studio 9.0\";

        // Take a snapshot of the file system.
        IEnumerable<System.IO.FileInfo> fileList = GetFiles(startFolder);

        // Create the regular expression to find all things "Visual".
        System.Text.RegularExpressions.Regex searchTerm = 
            new System.Text.RegularExpressions.Regex(@"Visual (Basic|C#|C\+\+|J#|SourceSafe|Studio)");

        // Search the contents of each .htm file. 
        // Remove the where clause to find even more matches! 
        // This query produces a list of files where a match 
        // was found, and a list of the matches in that file. 
        // Note: Explicit typing of "Match" in select clause.
        // This is required because MatchCollection is not a  
        // generic IEnumerable collection. 
        var queryMatchingFiles =
            from file in fileList
            where file.Extension == ".htm" 
            let fileText = System.IO.File.ReadAllText(file.FullName)
            let matches = searchTerm.Matches(fileText)
            where searchTerm.Matches(fileText).Count > 0
            select new
            {
                name = file.FullName,
                matches = from System.Text.RegularExpressions.Match match in matches
                          select match.Value
            };

        // Execute the query.
        Console.WriteLine("The term \"{0}\" was found in:", searchTerm.ToString());


        foreach (var v in queryMatchingFiles)
        {
            // Trim the path a bit, then write  
            // the file name in which a match was found. 
            string s = v.name.Substring(startFolder.Length - 1);
            Console.WriteLine(s);

            // For this file, write out all the matching strings 
            foreach (var v2 in v.matches)
            {
                Console.WriteLine("  " + v2);
            }
        }

        // Keep the console window open in debug mode
        Console.WriteLine("Press any key to exit");
        Console.ReadKey();
    }

    // This method assumes that the application has discovery  
    // permissions for all folders under the specified path. 
    static IEnumerable<System.IO.FileInfo> GetFiles(string path)
    {
        if (!System.IO.Directory.Exists(path))
            throw new System.IO.DirectoryNotFoundException();

        string[] fileNames = null;
        List<System.IO.FileInfo> files = new List<System.IO.FileInfo>();

        fileNames = System.IO.Directory.GetFiles(path, "*.*", System.IO.SearchOption.AllDirectories);
        foreach (string name in fileNames)
        {
            files.Add(new System.IO.FileInfo(name));
        }
        return files;
    }
}

Note that you can also query the MatchCollection object that is returned by a RegEx search. In this example only the value of each match is produced in the results. However, it is also possible to use LINQ to perform all kinds of filtering, sorting, and grouping on that collection. Because MatchCollection is a non-generic IEnumerable collection, you have to explicitly state the type of the range variable in the query.

Compiling the Code

  • Create a Visual Studio project that targets the .NET Framework version 3.5. By default, the project has a reference to System.Core.dll and a using directive (C#) or Imports statement (Visual Basic) for the System.Linq namespace. In C# projects, add a using directive for the System.IO namespace.

  • Copy this code into your project.

  • Press F5 to compile and run the program.

  • Press any key to exit the console window.

See Also

Tasks

How to: Generate XML from CSV Files

Concepts

LINQ and Strings

LINQ and File Directories