2011/08/15

Scaling and Queuing PowerShell Background Jobs


Great article from Travis Jones [Source]

 A couple of months ago I had asked the PowerShell MVPs for suggestions on blog topics. Karl Prosser, one of our awesome MVPs, brought up the topic of scaling and queuing background jobs.
The scenario is familiar: You have a file containing a bunch of input that you want to process and you don’t want to overburden your computer by starting up hundreds of instances of PowerShell at once to process them.
After playing around for about an hour on Friday afternoon, here is what I came up with… This example assumes you have a text file containing the names of many event logs and you want to get the content of each log.

# How many jobs we should run simultaneously
$maxConcurrentJobs = 3;

# Read the input and queue it up
$jobInput = get-content .\input.txt
$queue = [System.Collections.Queue]::Synchronized( (New-Object System.Collections.Queue) )
foreach($item in $jobInput)
{
    $queue.Enqueue($item)
}


# Function that pops input off the queue and starts a job with it
function RunJobFromQueue
{
    if( $queue.Count -gt 0)
    {
        $j = Start-Job -ScriptBlock {param($x); Get-WinEvent -LogName $x} -ArgumentList $queue.Dequeue()
        Register-ObjectEvent -InputObject $j -EventName StateChanged -Action { RunJobFromQueue; Unregister-Event $eventsubscriber.SourceIdentifier; Remove-Job $eventsubscriber.SourceIdentifier } | Out-Null
    }
}


# Start up to the max number of concurrent jobs
# Each job will take care of running the rest
for( $i = 0; $i -lt $maxConcurrentJobs; $i++ )
{
    RunJobFromQueue
}

The English version of this script is:
  • Given a file input.txt containing the name of many event logs, queue up each line of input
  • Kick off a small number of jobs to process one line of input each. Each job just gets the content of a particular log.
  • When a job finishes (determined by the StateChanged Event), start a new job with the next piece of input from the queue
  • Clean up the jobs corresponding to the event subscription so at the end we only have jobs containing event data
The “Synchronized” code you see when defining the queue is just for good measure to make sure that only one job can access it at a time.
Have something you want to see on the PowerShell blog? Leave a comment… Can’t promise we’ll get to everything but it’s nice to see what everyone is interested in.

Travis Jones
Windows PowerShell PM
Microsoft Corporation

No comments:

Post a Comment