Yara Cell Magic by Eric Hutchins

Yara is a powerful, flexible signature language. From the documentation (PDF), it is a tool "aimed at helping malware researchers to identify and classify malware families." Far beyond just malware, Yara provides a language to define sets of rules each with sets of complex conditions with robust regular expressions, match offset specifications, and many other features that, when combined, make Yara a swiss army knife for finding stuff. Wherever you have stuff and want to find a complex set of conditions, Yara can help with that.

It also has a nice Python API. The typical way to use Yara in Python is to either read in the rules from an external file or specify the rules in a Python string. Using one language (Python) to write another language (Yara) always frustrates me. At least python's triple quote """ lets you nicely write multi-line strings.

import yara
yarasig = """rule helloworld
{
strings:
$a = "hello" ascii$b = "world" nocase
condition:
any of them
}"""
myrules = yara.compile(source=yarasig)


But still, I want to edit a blob of text for a language in a native editor. I want to see matching parentheses and brackets as I type them. I want to be able to bulk comment out lines. I want to see line numbers so I can debug Yara error messages. I want syntax highlighting.

IPython lets us accomplish in a nice, clean fashion. I wrote a custom magic %%yara that turns a code cell into a inline Yara editor. Write rules with proper syntax highlighting (using a new CodeMirror mode for yara that I also wrote). Run the cell, and yara compiles the code and puts the compiled rule object back in your namespace. Then use it to Find Stuff.

Notebook Prerequisites

Modules

Custom

• yaramagic.py -- Custom code for %%yara cell magic
• yara.js -- CodeMirror custom mode for yara syntax highlighting

Data

Hello World

In [1]:
from pprint import pprint

In [2]:
%%yara -n myrules
/*
My first rule called "helloworld"
with category "testing"
*/
rule helloworld : testing
{
meta:
version = "0.1"
strings:
$a = "hello" ascii$b = "world" nocase
condition:
all of them
}

Adding compiled rules as "myrules" to namespace


In [3]:
pprint( myrules.match_data(data="This is a hello WoRlD test") )

{'main': [{'matches': True,
'meta': {'version': '0.1'},
'rule': 'helloworld',
'strings': [{'data': 'WoRlD',
'flags': 27,
'identifier': '$b', 'offset': 16L}, {'data': 'hello', 'flags': 19, 'identifier': '$a',
'offset': 10L}],
'tags': ['testing']}]}



To help debug a rule, toggle the line numbers in your %%yara cell by typing Ctrl-m-l. The yara CodeMirror mode starts line numbering at 0 so Yara's offending line number error message matches the notebook display.

In [4]:
%%yara
rule badrule
{
strings:
$a = "hello" conditio: //<-- oops all of them }  Syntax error! <undef>:5: syntax error, unexpected _IDENTIFIER_, expecting _CONDITION_  ### Classic Use Case: Volatility¶ The Volatility Framework has a built-in YaraScan plugin, but this only accepts an external yara rule file or a plain string. In order to use our compiled rules, I'm hacking together pieces from the various classes under YaraScan for this demonstration. I'm using the same ds_fuzz_hidden_proc.img sample from Volatility's public memory images page. The following code will flag processes that match a specific Yara rules object. In [5]: # Base Volatility import import volatility.conf as conf import volatility.registry as registry import volatility.commands as commands import volatility.addrspace as addrspace registry.PluginImporter() config = conf.ConfObject() registry.register_global_options(config, commands.Command) registry.register_global_options(config, addrspace.BaseAddressSpace) cmds = registry.get_plugin_classes(commands.Command, lower = True)  In [6]: # Use-case specific imports from volatility.plugins.filescan import PSScan import volatility.utils as utils import volatility.constants as constants  In [7]: config.PROFILE = "WinXPSP2x86" config.LOCATION = "file:///c:/ds_fuzz_hidden_proc.img"  In [8]: # A simplified (and surely imperfect) merging of code from YaraScan, VadYaraScanner, and BaseYaraScanner from volatility.malfind def yrscan(task, rules, contextsize=16): results = [] for vad, address_space in task.get_vads(): offset = vad.Start maxlen = vad.Length # Start scanning from offset until maxlen: i = offset while i < offset + maxlen: # Read some data and match it. to_read = min(constants.SCAN_BLOCKSIZE + 1024, offset + maxlen - i) data = address_space.zread(i, to_read) if data: for match in rules.match_data(data).get('main', []): if all([hit['offset'] < constants.SCAN_BLOCKSIZE for hit in match.get('strings', [])]): results.append((i, match)) i += constants.SCAN_BLOCKSIZE return results  In [9]: %%yara -n volarules rule exe_on_desktop { // Look for files on the Desktop that end in .exe strings:$a = /\\Desktop\\[\w .-]{1,20}\.exe/ nocase
condition:
all of them
}

Adding compiled rules as "volarules" to namespace


In [10]:
ps = PSScan(config)

In [11]:
for task in ps.calculate():

# Pass the compiled rules object to our method yrscan
hits = yrscan(task, volarules)

if len(hits) > 0:
print '-----------------------------------'
print 'Process name: %s' % task.ImageFileName
print 'PID: %s' % task.UniqueProcessId
print 'PPID: %s' % task.InheritedFromUniqueProcessId
print 'Create time: %s' % (task.CreateTime or '')
print 'Exit time: %s' % (task.ExitTime or '')
else:
next

for addr, hit in hits:
print '> Rule name: %s' % hit.get('rule')
for string in hit.get('strings', []):
# Modified from original
# https://code.google.com/p/volatility/source/browse/tags/Volatility-2.1.0/volatility/plugins/malware/malfind.py#481
print "".join(
["{0:#010x}  {1:<48}  {2}\n".format(string.get('offset') + addr + o, h, ''.join(c))
for o, h, c in utils.Hexdump(string.get('data', ''))
])

-----------------------------------
Process name: wuauclt.exe
PID: 1372
PPID: 1064
Create time: 2008-11-26 07:39:38
Exit time:
> Rule name: exe_on_desktop
0x0214342f  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x0214343f  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

0x0214342f  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x0214343f  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

> Rule name: exe_on_desktop
0x022fecb7  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x022fecc7  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

0x022fecb7  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x022fecc7  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

> Rule name: exe_on_desktop
0x023e9e87  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x023e9e97  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

0x023e9e87  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x023e9e97  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

-----------------------------------
Process name: explorer.exe
PID: 1516
PPID: 1452
Create time: 2008-11-26 07:38:27
Exit time:
> Rule name: exe_on_desktop
0x0214342f  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x0214343f  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

0x0214342f  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x0214343f  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

> Rule name: exe_on_desktop
0x022fecb7  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x022fecc7  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

0x022fecb7  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x022fecc7  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

-----------------------------------
Process name: svchost.exe
PID: 844
PPID: 672
Create time: 2008-11-26 07:38:18
Exit time:
> Rule name: exe_on_desktop
0x0214342f  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x0214343f  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

0x0214342f  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x0214343f  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

0x022fecb7  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x022fecc7  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

0x022fecb7  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x022fecc7  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

0x023e9e87  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x023e9e97  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

0x023e9e87  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x023e9e97  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

-----------------------------------
Process name: svchost.exe
PID: 1064
PPID: 672
Create time: 2008-11-26 07:38:20
Exit time:
> Rule name: exe_on_desktop
0x0214342f  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x0214343f  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

0x0214342f  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x0214343f  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

> Rule name: exe_on_desktop
0x022fecb7  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x022fecc7  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

0x022fecb7  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x022fecc7  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

> Rule name: exe_on_desktop
0x023e9e87  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x023e9e97  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe

0x023e9e87  5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b   \Desktop\network
0x023e9e97  5f 6c 69 73 74 65 6e 65 72 2e 65 78 65            _listener.exe



### Unorthodox Use Case: Filtering Logs in Pandas¶

As stated, Yara is great at Finding Stuff. When you have a lot of data in a Pandas DataFrame, you'll want to filter it to find important stuff. Pandas provides robust string matching/searching capabilities, but perhaps you already detections defined in yara sigs and want to apply those sigs to this data. Perhaps your analysts are more accustomed to writing Yara sigs than complex Pandas filtering conditions.

For this example, we load in a handful of user-agent strings into a DataFrame and we'll use Yara to match on a set of rules. This example highlights one more feature of Yara and the %%yara magic: external variables. When the Yara engine scans data, it normally is scanning a blob of data, but it can also load in named chunks of data into a dictionary called external variables. Since a DataFrame is intuitively chunked into columns, we can specify an external variable container for each column in the DataFrame.

In [12]:
import pandas as pd
from cStringIO import StringIO

In [13]:
useragentcsv = """useragent
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17"
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.65 Safari/537.36"
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.65 Safari/537.36"
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.152 Safari/537.22"
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
"Mozilla/5.0 (iPhone; CPU iPhone OS 6_1 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10B144 Safari/8536.25"
"Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A403 Safari/8536.25"
Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0
Googlebot/2.1 (+http://www.google.com/bot.html)
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36"
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17"
"""

In [14]:
df = pd.read_csv(StringIO(useragentcsv))
df.head(5)

Out[14]:
useragent
0 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...
1 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
2 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53...
3 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3...
4 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4)...

In the cell below, the -e option to the %%yara magic specifies the external variable. The externals must be specified at the time of compilation, which, for the %%yara cell magic, that means at write-time. %%yara accepts multiple -e parameters as well as comma,separated,lists. External variables are referenced directly in the condition block rather than a strings statement. Ensure the external variable names match the column names. (For more details, see the docstring by running %%yara? in a cell)

In [15]:
%%yara -n uarules -e useragent
rule iPad_iPhone
{
condition:
useragent contains "iPhone;" or
useragent contains "iPad;"
}

rule Chrome25Plus
{
condition:
useragent matches /Chrome\/((2[5-9])|3[0-9])/
}

Adding compiled rules as "uarules" to namespace



For the next method, we have to get a little creative. DataFrame.apply provides the means to apply a function to rows or columns of the DataFrame. It does not, however, let you specify parameters to the invoked method; the only parameter will be the data from the column or row. (In our example, we use axis=1 which tells pandas to look at data row-by-row).

In order to have flexibility to pass in various Yara rule files (and avoid global variables), we have a general yarafilter method that takes a compiled yara rules object as a parameter. The function generates the necessary worker function that DataFrame.apply will accept and preloads a subset of columns as external variables for the yara engine. The function returns a comma separated list of rules that matched on each row or blank string if no matches.

In [16]:
def yarafilter(rules):

# Specify a list of the column names we used in for external variables
# in the yara rules
externals = ['useragent']

def worker(row):
m = rules.match(data="  ", externals=row[externals].to_dict())

if m:
return ','.join( [y.get('rule', '') for y in m.get('main', [])] )
else:
return ''

return worker

In [17]:
df['yarahits'] = df.apply(yarafilter(uarules), axis=1)

In [18]:
df

Out[18]:
useragent yarahits
0 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...
1 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
2 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53... Chrome25Plus
3 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3... Chrome25Plus
4 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4)... Chrome25Plus
5 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...
6 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... Chrome25Plus
7 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)... Chrome25Plus
8 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT ...
9 Mozilla/5.0 (iPhone; CPU iPhone OS 6_1 like Ma... iPad_iPhone
10 Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) A... iPad_iPhone
11 Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20...
12 Googlebot/2.1 (+http://www.google.com/bot.html)
13 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3... Chrome25Plus
14 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...

© 2013 Lockheed Martin Corporation. All Rights Reserved.