2014/09/06

PowerShell - ConvertFrom-String and the TemplateFile parameter

I'm continuing to play with the new ConvertFrom-String cmdlet (available in the last WMF 5.0 September preview released yesterday) which make the parsing job really easy for simple or complex output.

This cmdlets supports two types of modes: Basic Delimited Parsing (See yesterday's post) and the Auto-Generated Example-Driven Parsing which I will cover in this post.

This Auto-Generated Example-Driven Parsing mode is based on the FlashExtract research work in Microsoft Research...

Important: This post is based on the September 2014 preview release of WMF 5.0. This is pre-release software, so this information may change.

The research core of FlashExtract comes from Sumit Gulwani and Vu Le:
FlashExtract: A Framework for Data Extraction by Examples, PLDI 2014, Vu Le, Sumit Gulwani (Abstract / Pdf / Video)

Abstract: Various document types that combine model and view (e.g., text files, webpages, spreadsheets) make it easy to organize (possibly hierarchical) data, but make it difficult to extract raw data for any further manipulation or querying. We present a general framework FlashExtract to extract relevant data from semi-structured documents using examples. It includes: (a) an interaction model that allows end-users to give examples to extract various fields and to relate them in a hierarchical organization using structure and sequence constructs. (b) an inductive synthesis algorithm to synthesize the intended program from few examples in any underlying domain-specific language for data extraction that has been built using our specified algebra of few core operators (map, filter, merge, and pair). We describe instantiation of our framework to three different domains: text files, webpages, and spreadsheets. On our benchmark comprising 75 documents, FlashExtract is able to extract intended data using an average of 2.36 examples in 0.84 seconds per field

NetStat.exe -na

Once again, I will work with the NetStat.exe command line, to demo ConvertFrom-String with the TemplateFile parameter. Here is the default output.



Creating the Template File

The -TemplateFile parameter allows us to specify a file that contains the data structure pattern of the information we want to automatically extract.

This file simply need to have curly braces around data that you want to extract, with a property name of your choice.

Asterisk (*)
In some case the property you define can appear multiple times and you will need to use an asterisk * to indicate that this results in multiple records.

Example:
Consider the following line from netstat -na

  TCP    0.0.0.0:49164          0.0.0.0:0              LISTENING

You would translate it to the following (Don't forget the whitespaces)

  {Protocol*:TCP}    {LocalAddress:0.0.0.0:49164}          {ForeignAddress:0.0.0.0:0}              {State:LISTENING}


Missing property:
In the previous example I defined 4 properties: Protocol, LocalAddress, ForeignAddress and State.
However, What if a line does not contains a State information? like this one :

  UDP    0.0.0.0:443            *:*

If I do the following
  {Protocol*:UDP}    {LocalAddress:0.0.0.0:443}          {ForeignAddress:*:*}

State will show the same value as ForeignAddress :-/



This can be solved by adding the property State anyway with a whitespace Regex Metacharacter \s
  {Protocol*:UDP}    {LocalAddress:0.0.0.0:443}            {ForeignAddress:*:*}{State:\s}





Different type of lines in NetStat -na
Looking at the output of NetStat -na we can see some very different types of lines:
IPV4,IPV6, with and without local/foreign ports and some without State property...

You have to identity those possible case in your template so the cmdlet knows what do we each cases.

  TCP    0.0.0.0:49164          0.0.0.0:0              LISTENING
  TCP    192.168.1.51:54331     74.125.228.42:80       ESTABLISHED
  TCP    [::]:135               [::]:0                 LISTENING
  UDP    0.0.0.0:443            *:*
  UDP    [::]:3389              *:*
  UDP    [::1]:1900             *:*
  UDP    [fe80::98b9:6db4:216a:2f9f%18]:1900  *:*


TemplateFile

Given all the previous elements, here is the TemplateFile:


  {Protocol*:TCP}    {LocalAddress:0.0.0.0:49164}          {ForeignAddress:0.0.0.0:0}              {State:LISTENING}
  {Protocol*:TCP}    {LocalAddress:192.168.1.51:54331}     {ForeignAddress:74.125.228.42:80}       {State:ESTABLISHED}
  {Protocol*:TCP}    {LocalAddress:[::]:135}               {ForeignAddress:[::]:0}                 {State:LISTENING}
  {Protocol*:UDP}    {LocalAddress:0.0.0.0:443}            {ForeignAddress:*:*}{State:\s}
  {Protocol*:UDP}    {LocalAddress:[::]:3389}              {ForeignAddress:*:*}{State:\s}
  {Protocol*:UDP}    {LocalAddress:[::1]:1900}              {ForeignAddress:*:*}{State:\s}
  {Protocol*:UDP}    {LocalAddress:[fe80::98b9:6db4:216a:2f9f%18]:1900}  {ForeignAddress:*:*}{State:\s}


netstat -na |
    ConvertFrom-String -TemplateFile .\netstat_template.txt |
    Select-Object -Property Protocol, LocalAddress, ForeignAddress, State


This is super cool !!

Extra: Retrieving the ports too !

Now we might want to split the information in the LocalAddress and have a property for the IP and another for the Port, same thing for the ForeignAddress.

We can notice that the two information are separated by a colon (:) character, so we need to split on that. Example:

{LocalAddress:0.0.0.0:49164}

Becomes

{LocalAddress:0.0.0.0}:{LocalPort:49164}

TemplateFile:

And here is the final Template.

  {Protocol*:TCP}    {LocalAddress:0.0.0.0}:{LocalPort:57037}          {ForeignAddress:0.0.0.0}:{ForeignPort:0}              {State:LISTENING}
  {Protocol*:TCP}    {LocalAddress:10.100.3.31}:{LocalPort:3389}       {ForeignAddress:10.100.44.36}:{ForeignPort:51992}     {State:ESTABLISHED}
  {Protocol*:TCP}    {LocalAddress:[::]}:{LocalPort:80}                {ForeignAddress:[::]}:{ForeignPort:0}                 {State:LISTENING}
  {Protocol*:UDP}    {LocalAddress:[::]}:{LocalPort:123}               {ForeignAddress:*}:{ForeignPort:*}                    {State:\s}
  {Protocol*:UDP}    {LocalAddress:[fe80::98b9:6db4:216a:2f9f%18]}:{LocalPort:1900}  {ForeignAddress:*}:{ForeignPort:*}{State:\s}
  {Protocol*:UDP}    {LocalAddress:[fe80::98b9:6db4:216a:2f9f%18]}:{LocalPort:59108}  {ForeignAddress:*}:{ForeignPort:*}{State:\s}


Output:

netstat -na |
    ConvertFrom-String -TemplateFile .\netstat_template_with_ports.txt |
    Select-Object -Property Protocol, LocalAddress, LocalPort, ForeignAddress, ForeignPort, State |
    Out-gridview



Thanks for reading! If you have any questions, leave a comment or send me an email at fxcat@lazywinadmin.com. I invite you to follow me on Twitter @lazywinadm / Google+ / LinkedIn. You can also follow the LazyWinAdmin Blog on Facebook Page and Google+ Page.

4 comments:

  1. Hola. Me gustaria saber si en Windows 8.1 puedo continuar utilizando powershell studio 2012

    ReplyDelete
  2. Google Translate:

    Hola , me gustaría pensar que sí. Puede contactar con el soporte Sapien.com para asegurarse de ello . Usted probablemente puede actualizar a PowerShell Studio 2014 por no mucho dinero.

    Espero que esta ayuda

    ReplyDelete
  3. how would you have it display a set text?
    $srv is a text box for the server name if it is empty I want an error to show in the griviewbox
    if ($SRV.text -eq "")

    { $Err = "Server Name Blank. Please enter Server Name!"

    Load-DataGridView -DataGridView $datagridview1 -item $Err.ToUpper()}

    Else

    { get-wmiobject win32_service -ComputerName $SRV.text | select DisplayName, Name, startmode, state | Sort-Object StartMode | Out-GridView}

    }

    ReplyDelete
  4. Why not disable the button(s) instead ?
    Look for event(s) called "textchanged" (not sure of the name)



    You can also show a message box to the user.


    Hope this helps

    ReplyDelete