PSScriptAnalyzer: Tokens and ASTs

Introduction

Bumped into the PSScriptAnalyzer the other day and a very interesting work on detecting PowerShell scripts obfuscation using ASTs by Daniel Bohannon. So figured it might be interesting to create a small write-up about ASTs, PSScriptAnalyzer, custom PSScriptAnalyzer rules, and put all the info that's out there in one spot for the interested parties. So here goes the rant... 😁

ASTs and PSScriptAnalyzer

Abstract Syntax Trees (ASTs) are a way of representing a code in an abstract way and they are mostly used by compilers.

PSScriptAnalyzer is an open source tool developed by Microsoft that was designed as a static code checker for PowerShell scripts and modules. The main idea is to check the quality of the PowerShell code by running the PowerShell script content through the set of rules built using the script AST object types.

PowerShell Script ASTs Analysis

Ok great! At this point we know what ASTs are and that PSScriptAnalyzer is built to work with ASTs. Cool... now what?

Well, let's take a look at what PowerShell AST looks like!

PowerShell script AST is built using the set of tokens. The process of creating a set of tokens from code is called lexical analysis. You can extract all the tokens from the PowerShell script using the ASTHelper module.

In the example below, we will be using the "Get-VaultCredentials.ps1" script from PowerSploit since it is publicly available but any custom PowerShell script will do.

$script = "C:\Research\Get-VaultCredential.ps1"
$tokens = Invoke-Tokenize $script
$tokens > full_tokens_dataset.txt

Each token will have related "Content" and "Type" fields. Let's take a look at the different token types present in our script:

 $tokens | Group-Object Type | Sort-Object Count -Descending | Select-Object Count, Name

Hm, so in here, we have a bunch of interesting token types: CommandArgument, Command, String, CommandParameter, and many more.

Fig 1. PowerShell Script Token Types Summary

Now that we know token types - lets analyze the ones we like Command, String, or CommandArgument:

$tokens | Where-Object {$_.Type -eq 'Command'} | Group-Object Content | Sort-Object Count -Descending | Select-Object Count, Name
$tokens | Where-Object {$_.Type -eq 'CommandArgument'} | Group-Object Content | Sort-Object Count -Descending | Select-Object Count, Name
$tokens | Where-Object {$_.Type -eq 'String'} | Group-Object Content | Sort-Object Count -Descending | Select-Object Count, Name
Fig 2. Token Command, CommandArgument, Strings Types Extraction

Things that stands out, to name a few, are Get-VaultCredential, System.Reflection.AssemblyName, vaultcli.dll, VaultOpenVault, and so on.

Ok, I wonder if we can create a custom PSScriptAnalyzer rule to pick up scripts like that? How would one go about creating one?

Well, first we would need to figure out which AST object types these interesting strings belong to. The process of creating AST is called syntax analysis and it will convert out tokens into a tree that will represent the actual structure of the code. We have two options to view the PowerShell script AST:

  1. We can use the PowerShell module ShowPSAst to visualize the tree
  2. Or we can use the PowerShell module ASTHelper to pull all AST object types from the AST and to investigate each AST object type separately

Let's take a look at our script using ShowPSAst first. Each assignment statement, loop, and command inside the PowerShell script will be represented as some kind of an AST object type. For example, the full $OSVersion = [Environment]::OSVersion.Version statement has an AST object type AssignmentStatementAst. The AssignmentStatementAst consists of VariableExpressionAst and CommandExpressionAst which in turn consist of other AST object types.

Fig 3. ShowPSAst PowerShell Module

If you want to list all AST object types present in the script or dig into some specific AST object types like CommandAst you can use ASTHelper cmdlets for this.

Fig 4. ASTHelper Get-AstType cmdlet in action
Fig 5. ASTHelper Get-AstObject cmdlet in action
Get-AstType $script
Get-AstObject $script -Type CommandAst | select -First 1
$commandASTs = Get-AstObject $script -Type CommandAst | select -First 1
$commandASTs.Extent.Text

Great! Now we have a good understanding of what kind of tokens and AST object types the script contains, and we can proceed to rule creation.

PSScriptAnalyzer Custom Rule

Let's say we want to create a PSScriptAnalyzer rule that will pick up all the scripts that contain VaultOpenVault, vaultcli.dll, DefinePInvokeMethod, ::Winapi strings in their CommandAst AST object types.

The process of creating custom PSScriptAnalyzer rules is kind of documented here. But I think that "A Crash Course in Writing Your Own PSScriptAnalyzer Rules" by Thomas Rayner is a bit more useful than Microsoft documentation. And another great source for figuring out how to put these rules together is Daniel Bohannon's custom set of rules designed to detect obfuscated PowerShell scripts.

In any case, the PSScriptAnalyzer rules must be stored in .psm1 file and as long as you know AST object types you want to create a rule for - it does not take too long to put it together.

<#
.DESCRIPTION
    Custom Rule Description
#>
function Detect-GetVaultCredential
{
    [CmdletBinding()]
    [OutputType([Microsoft.Windows.Powershell.ScriptAnalyzer.Generic.DiagnosticRecord[]])]
    param
    (
        [Parameter(Mandatory = $true)]
        [ValidateNotNullOrEmpty()]
        [System.Management.Automation.Language.ScriptBlockAst]
        $ScriptBlockAst
    )

    [ScriptBlock] $predicate = {
        param ([System.Management.Automation.Language.Ast] $Ast)

        $targetAst = $Ast -as [System.Management.Automation.Language.AssignmentStatementAst]

            if (($targetAst.Extent.Text -replace "`n", "") -match 'DefinePInvokeMethod.*VaultOpenVault.*vaultcli\.dll.*::winapi')
            {
                return $true
            }
    }
    

    $foundNodes = $ScriptBlockAst.FindAll($predicate, $false)

    foreach ($foundNode in $foundNodes)
    {
        [Microsoft.Windows.Powershell.ScriptAnalyzer.Generic.DiagnosticRecord] @{
            "Message"  = "Found: " + $foundNode.Extent.Text
            "Extent"   = $foundNode.Extent
            "RuleName" = "CustomRule1"
            "Severity" = "Warning"
        }
    }
}
Fig 6. PSScriptAnalyzer Rule Testing

And here you go - now you know a little bit about ASTs, PSScriptAnalyzer, and PSScriptAnalyzer custom rules. Idk, to me, it seems that ASTs could be used to detect not only obfuscated but also malicious PowerShell scripts, and with ML involved ASTs could provide a script evaluation\detection solution that will pick up custom scripts just as well as boilerplate PowerShell Empire scripts - but I am too lazy to dig into this.


Here is a good start though.

References

https://github.com/thomasrayner/AstHelper

https://github.com/danielbohannon/DevSec-Defense

https://www.youtube.com/watch?v=xHqj7Icc3LM

https://tosbourn.com/abstract-syntax-trees/

https://github.com/PowerShell/PSScriptAnalyzer

https://www.twilio.com/blog/abstract-syntax-trees

https://arxiv.org/pdf/1810.09230.pdf