2013/06/13

Scripting Games 2013 - Advanced Event 5 - The Logfile Labyrinth

This is my solution for the Advanced Event 5.
I did not have much time to work on this event, but here is the script I submitted.

Instruction
Download [Skydrive]

Dr. Scripto finds himself in possession of a bunch of IIS log files, much like the one at
http://morelunches.com/files/powershell3/LogFiles.zip, if you need one to practice with. He’s keeping all of the log files in a folder, and he’s left the log files with their default filenames, which he’s given a .LOG filename extension. All of the files are for a single Web site, on a single Web server.

He’d like you to write a tool that accepts a path, and then simply scans through each file in that path somehow, generating a list of each unique client IP address that have been used to access the Web site. No IP address should appear more than once in your output, and you don’t need to sort the output in any way.

Your tool should optionally accept an IP address mask like “192.0.1.*” and only display IP addresses that match the specified pattern. If run without a pattern, display all IP addresses.

Regardless of the addresses found in the sample file linked above, you should assume that any legal IP address may appear in the files Dr. Scripto needs to scan. Your command should scan all of the files in the folder (and the folder doesn’t contain any other kind of file) and produce a single set of results. If an IP address appears in multiple log files and it’s likely that will be the case then your final output should still only list that IP address.



Solution

Dr Scripto requirements

  • Tool to scan IIS logs
    • Parameter Path (to specify the location of the logs
    • Parameter IP Address mask like "192.0.1.*" to filter the output on a specific pattern
      • No pattern specified : Show all Unique client IP
  • Output should show Unique Client IP Address that have been used to access the Website
    • No IP should appear more than once in the output
    • No Sorting

IIS Log File formats overview
In my script I only focused on the W3C format, but here is the list of formats supported by IIS
  • W3C (World Wide Web Consortium) Extended log file format – This is the default log file format used by IIS. Its uses ASCII text format and the time are recorded as UTC. This is the only format where you can customize the properties there by you can limit the size of log files and obtain the detailed information. The properties written in the log files are separated by using spaces.
  • IIS (Microsoft Internet Information Services) log file format – This format also uses ASCII text format and uses fixed number of properties. IIS log file format is used when you don’t need detailed information from the logs; it logs more information than NSCA common format but less than W3C format. It is a comma separated file and uses the local time.
  • NCSA (National Center for Supercomputing Applications) log file format – This format logs only the basic information. Similar to IIS log file format it uses fixed number of properties. It records the time using the local time and properties are separated by spaces. Note that NCSA log file format does not support FTP sites. Since the entries are small with this format, the storage space required for logging is comparatively less compared to other formats.
  • Centralized Binary Logging – Centralized binary logging is used when multiple web sites running on a server to write binary, unformatted log data to a single log file. Each web server running IIS creates one log file for all sites on that server. The IIS writes log files in binary format and uses a single file there by making it memory efficient. This type of logging is not supported at web site level.
  • ODBC log file format – This method is used when you want to log access information directly to a database. Enabling ODBC logging will disable the kernel-mode cache so this may affect the server performance. Only supported at site level.
Source: http://www.surfray.com/blog/2009/08/11/iis-log-file-formats-overview/

Header
The header of the W3C Log format stats by the "#fields: " pattern.
Let's use that to find our properties.
IIS Log - W3C Format

Using Select-String, I find the line with the pattern '#fields: '. Then I use the SubString() Method to get the rest of the line. Finally I use -split to have all my different property names.

((Select-String -path .\W3SVC1\u_ex120420.log -Pattern "#fields: " | 
 Select-Object -First 1).line.Substring("#Fields: ".Length) -split ' ')



Working with the Data
After playing a bit with different Cmdlets to read the logs, I finally decided to use Import-CSV.
That's super lazy since this Cmdlet already has a "Delimiter" Parameter :-) ! Exactly what I need!

In the following code, I use Import-Csv with the header previously determined and delimite the data on white space ' '. I also ignore any lines starting by "#"

$header = ((Select-String -path .\W3SVC1\u_ex120420.log -Pattern "#fields: " | 
 Select-Object -First 1).line.Substring("#Fields: ".Length) -split ' ')

Import-Csv '.\W3SVC1\u_ex120420.log' -Header $header -Delimiter ' ' | 
 Where-Object { -not($_.Date.StartsWith('#'))}


I repeat the same process for each file (using foreach) and store the information in an array.
This technique has some limits and can't handle large file.
Check-out the references below, Emin Atac did an awesome work on this event. Especially this part.



Processing the final Data

Outputting
Get-IISLogClientIPAddress -Path .\ -Pattern *55.2*
Get-IISLogClientIPAddress -Path .\



What I Missed
From the other entries I saw and the comments I got, my script miss the following points.
  • Handling massive log files
  • Test-Path with -container switch on the Path parameter ( in the PARAM() block)
  • Try/Catch on the Out-File (in the Process{Catch{}} part)
References
Small list of the interesting blog articles/documentations I found.
Script

function Get-IISLogClientIPAddress {
<#
.SYNOPSIS
   The function Get-IISLogClientIPAddress generates a list of each unique Client IP address that have been used to access the Website.
   The function is checking the IIS logs from the Path parameter specified by the user.

.DESCRIPTION
   The function Get-IISLogClientIPAddress generates a list of each unique Client IP address that have been used to access the Website.
   The function is checking the IIS logs from the Path parameter specified by the user.

.PARAMETER Path
   Specifies the Path to the files to be searched. Wildcards are permitted.

.PARAMETER Pattern
   Specifies the text to find.

.PARAMETER Delimiter
   Specifies the delimiter that separates the property values. Default value is ' '

.PARAMETER ErrorLog
   Specifies the full path of the Error log file.

.EXAMPLE
   Get-IISLogClientIPAddress -Path .\ -Pattern '10.211.55.*5'

    IPAddress
    ----
    10.211.55.25

   This example generate a list of IPaddress from the logs located in the current directory ".\" 
   with a pattern '10.211.55.*5'.

.EXAMPLE
   Get-IISLogClientIPAddress -Path .\ -Delimiter ' ' -Pattern '*.55.*'

    IPAddress
    ----
    10.211.55.25
    10.211.55.29 
    10.211.55.31 
    10.211.55.28 
    10.211.55.27 
    10.211.55.26 
    10.211.55.30  

   This example generate a list of IPaddress from the logs located in the current directory ".\"  
   with a pattern '*.55.*'

.EXAMPLE
    Get-IISLogClientIPAddress -Path c:\sysadmin\IISLog\W3SVC8 -Pattern "172.20.96.*" -ErrorLog Errors.log

    IPAddress
    ----
    172.20.96.9
    172.20.96.10
    172.20.96.18

   This example generate a list of IPaddress from the logs located in the directory c:\sysadmin\IISLog\W3SVC8
   with a pattern '172.20.96.*'. Errors will be logged in Errors.log.
  
.INPUTS
   String

.OUTPUTS
   Selected.System.Management.Automation.PSCustomObject

.NOTES
   Scripting Games 2013 - Advanced Event #5
#>
    
[CmdletBinding()]
    PARAM(
        [Parameter(Mandatory,HelpMessage = "FullPath to IIS Log files",Position=0,ValueFromPipeline)]
        [PSDefaultValue(Help='Specifies the Path to the IIS Log files')]
        [Alias("Directory")]        
        [ValidateScript({Test-Path -path $_})]
        [String]$Path="",
        
        [PSDefaultValue(Help='Specifies the IPAddress Pattern to search')]
        [String]$Pattern,
        
        [PSDefaultValue(Help='Specifies the Delimiter. Default is " "')]
        [String]$Delimiter=" ",
        
        [PSDefaultValue(Help='Specifies the FullPath to Error log file')]
        [ValidateScript({Test-path -Path $_ -IsValid})]
        [String]$ErrorLog
    )
    BEGIN{
        $info=@()
    }#BEGIN BLOCK

    PROCESS{

        TRY{
            $Everything_is_OK = $true

            Write-Verbose -Message "Listing and Searching in all *.LOG files in $Path"
            FOREACH ($LogFile in (Get-ChildItem -Path $Path -include *.log -ErrorAction Stop -Recurse -ErrorVariable GCIErrors)) {
                Write-Verbose -Message "$($Logfile.Name)"

                # HEADER, The #Fields directive lists a sequence of field identifiers specifying the information recorded in each entry
                Write-Verbose -Message "$($Logfile.Name) - Identifiying Header"
                $header = ((Select-String -path $LogFile -Pattern "#fields: " | 
                    Select-Object -First 1).line.Substring("#Fields: ".Length) -split ' ')

                # PARSING/IMPORTING as a CSV format.
                Write-Verbose -Message "$($Logfile.Name) - Importing Data as a CSV format"
                $csv = Import-Csv -path $LogFile -Header $Header -Delimiter $Delimiter -ErrorAction Stop -ErrorVariable ImportCSVErrors | 
                    Where-Object { -not($_.Date.StartsWith('#'))}
                
                # Outputting information to $info variable
                Write-Verbose -Message "$($Logfile.Name) - Sending data to final variable"
                $info += $csv

            }#FOREACH

        }#TRY

        CATCH{
            
            # ERROR HANDLING
            $Everything_is_OK = $false
            Write-Warning -Message "Wow! Something Went wrong !"
            Write-Warning -Message "$($_.Exception.Message)"

            IF ($PSBoundParameters['ErrorLog']) {
                $GCIErrors | Out-file -FilePath $ErrorLog -Append -ErrorAction Continue
                $ImportCSVErrors | Out-file -FilePath $ErrorLog -Append -ErrorAction Continue
                Write-Warning -Message "Logged in $ErrorLog"}

        }#CATCH

        IF ($Everything_is_OK){
            
            IF ($PSBoundParameters['Pattern']) {
                Write-Verbose -Message "Applying Pattern: $pattern and Outputting Final Result"
                $info | Select-Object -Property @{Label="IPAddress";Expression={$_.'c-ip'}} -Unique | 
     Where-Object {$_.IPAddress -like $Pattern}
            }ELSE {
                Write-Verbose -Message "Outputting Final Result (No Pattern Specified)"
                $info | 
     Select-Object -Property @{Label="IPAddress";Expression={$_.'c-ip'}} -Unique }
            }

    }#PROCESSBLOCK

    END{Write-Verbose -Message "Script completed"}#END BLOCK
}#function

Thanks for Reading! If you have any questions, leave a comment or send me an email at fxcat@lazywinadmin.com. I invite you to follow me on Twitter: @lazywinadm

No comments:

Post a Comment