2014/09/01

PowerShell - Sum similar entries from multiple CSV files

One of my script is scheduled to download everyday the proxy logs files from multiple proxies (Approx 1>2GB per file) of the previous day. The second step is to parse each of them and get the top 200 domain names within a specific environment. Finally at the end of the month another script create a report on the monthly internet usage.

I thought this was an interesting exercise even if some tools would probably do a better job ($$$). Also we shouldn't take those results too seriously since some protocols like Ajax or HTML5 talk a lot to the servers, keep refreshing pages even if you are not actively working on them.

In this post, I will talk about the last part of this process and how I combine all those files to get a real monthly top domains.





Example of daily report


Here is a sample of daily CSV file result after the parsing has done its job:


So at the end of the month I have a bunch of CSV files that were generated during the months and  I need to know how many time each domains were accessed.




One-Liner Magic!


To calculate the Sum of each domains for each months I can simply use the following One-Liner:

Get-ChildItem -Path .\*.csv | # Get each CSV files
    ForEach-Object -Process {
        Import-Csv -Path $PSItem.FullName # Import CSV data
    } | 
    Group-Object -Property Name | # Group per Domain Name
    Select-Object -Unique -Property Name, @{
        Label = "Sum";
        Expression = {
            # Sum all the counts for each domain
            ($PSItem.group | Measure-Object -Property Count -sum).Sum
        }
    } |
    Sort-Object -Property Sum -Descending |
    Out-GridView -Title "Top Domains"



Step by Step



Get all the CSV File



Now we import the data inside each file. I added a Sort-Object so you can see that we have duplicates (since we have the same values coming from different files).



Using Group-Object by Domain Name. This will group also all the different "counts" for each domain


Finally we use Select-Object to show a unique Domain Name and Sum each counts present in the "Group" property from the Group-Object Cmdlet (see above).






Thanks for reading! If you have any questions, leave a comment or send me an email at fxcat@lazywinadmin.com. I invite you to follow me on Twitter @lazywinadm / Google+ / LinkedIn. You can also follow the LazyWinAdmin Blog on Facebook Page and Google+ Page.

No comments:

Post a Comment