Remove Comments from IIS Logs

If you think that Log Parser is a bit on the slow side (i.e. if you’re dealing with big IIS logs) and you want to bulk import your logs into SQL Server, then you’ll have to remove # comments from the log files. Microsoft has the PrepWebLog Utility to do this, but it seems to choke for files that are > 100 MB. Also, you’ll have to write this as a batch file so it goes through a whole directory of files.

I wrote a Perl script that’s relatively fast (faster than PrepWebLog) and it can crawl folders/subfolders recursively. Here it is:

# parse.pl
# example: 
#   parse c:\temp\logs\logs*\*.log
#
# Requirement: no spaces in the directory names and file names.
# This gets called via run.bat. 
 
 
sub getFileList 
{    
    # This function returns an array of file list based on filter
    # This is the filter they can put in.       
    # Returns a file with full path. 
    # Example of filters: getFileList ( "*.log" );
    @files = <@_[0]>;
    return @files;    
}
 
 
sub remove_comments
{
  # Remove # pound sign comments from files. 
  # @_[0] = filename
 
  open (my $in, "<", @_[0] ) 
      or die "in: @_[0]";
 
  open (my $out, ">", "@_[0].txt") 
      or die "out: @_[0]";
 
  while( my $line = <$in>)
  {
      print $out $line
          unless $line =~ /^#/;
  }
 
  close $in;
  close $out;
}
 
 
########## MAIN #############
$arg = @ARGV[0];
 
# Location of root directory of logs files
#$arg = 'c:\temp\logs\logs*\*.log';
 
# Replace slashes
$arg =~ s/\\/\\\\/g;
 
# Loop through all the log files. 
for $file (getFileList ($arg))
{  
  print ( "Processing file $file ... \n" );    
  remove_comments( $file );  
}

The Perl script gets called via run.bat:

REM No spaces in directory and file names.
perl Parse.pl D:\statesites\W3SVC*\*.log
pause

Leave a Reply