Yara Cell Magic by Eric Hutchins
Yara is a powerful, flexible signature language. From the documentation (PDF), it is a tool "aimed at helping malware researchers to identify and classify malware families." Far beyond just malware, Yara provides a language to define sets of rules each with sets of complex conditions with robust regular expressions, match offset specifications, and many other features that, when combined, make Yara a swiss army knife for finding stuff. Wherever you have stuff and want to find a complex set of conditions, Yara can help with that.
It also has a nice Python API. The typical way to use Yara in Python is to either read in the rules from an external file or specify the rules in a Python string. Using one language (Python) to write another language (Yara) always frustrates me. At least python's triple quote """ lets you nicely write multi-line strings.
import yara
yarasig = """rule helloworld
{
strings:
$a = "hello" ascii
$b = "world" nocase
condition:
any of them
}"""
myrules = yara.compile(source=yarasig)
But still, I want to edit a blob of text for a language in a native editor. I want to see matching parentheses and brackets as I type them. I want to be able to bulk comment out lines. I want to see line numbers so I can debug Yara error messages. I want syntax highlighting.
IPython lets us accomplish in a nice, clean fashion. I wrote a custom magic %%yara
that turns a code cell into a inline Yara editor. Write rules with proper syntax highlighting (using a new CodeMirror mode for yara that I also wrote). Run the cell, and yara compiles the code and puts the compiled rule object back in your namespace. Then use it to Find Stuff.
Notebook Prerequisites
Modules
Custom
%%yara
cell magicData
Hello World
from pprint import pprint
%%yara -n myrules
/*
My first rule called "helloworld"
with category "testing"
*/
rule helloworld : testing
{
meta:
version = "0.1"
strings:
$a = "hello" ascii
$b = "world" nocase
condition:
all of them
}
Adding compiled rules as "myrules" to namespace
pprint( myrules.match_data(data="This is a hello WoRlD test") )
{'main': [{'matches': True, 'meta': {'version': '0.1'}, 'rule': 'helloworld', 'strings': [{'data': 'WoRlD', 'flags': 27, 'identifier': '$b', 'offset': 16L}, {'data': 'hello', 'flags': 19, 'identifier': '$a', 'offset': 10L}], 'tags': ['testing']}]}
To help debug a rule, toggle the line numbers in your %%yara
cell by typing Ctrl-m-l
. The yara CodeMirror mode starts line numbering at 0 so Yara's offending line number error message matches the notebook display.
%%yara
rule badrule
{
strings:
$a = "hello"
conditio: //<-- oops
all of them
}
Syntax error! <undef>:5: syntax error, unexpected _IDENTIFIER_, expecting _CONDITION_
The Volatility Framework has a built-in YaraScan plugin, but this only accepts an external yara rule file or a plain string. In order to use our compiled rules, I'm hacking together pieces from the various classes under YaraScan for this demonstration. I'm using the same ds_fuzz_hidden_proc.img sample from Volatility's public memory images page.
The following code will flag processes that match a specific Yara rules object.
# Base Volatility import
import volatility.conf as conf
import volatility.registry as registry
import volatility.commands as commands
import volatility.addrspace as addrspace
registry.PluginImporter()
config = conf.ConfObject()
registry.register_global_options(config, commands.Command)
registry.register_global_options(config, addrspace.BaseAddressSpace)
cmds = registry.get_plugin_classes(commands.Command, lower = True)
# Use-case specific imports
from volatility.plugins.filescan import PSScan
import volatility.utils as utils
import volatility.constants as constants
config.PROFILE = "WinXPSP2x86"
config.LOCATION = "file:///c:/ds_fuzz_hidden_proc.img"
# A simplified (and surely imperfect) merging of code from YaraScan, VadYaraScanner, and BaseYaraScanner from volatility.malfind
def yrscan(task, rules, contextsize=16):
results = []
for vad, address_space in task.get_vads():
offset = vad.Start
maxlen = vad.Length
# Start scanning from offset until maxlen:
i = offset
while i < offset + maxlen:
# Read some data and match it.
to_read = min(constants.SCAN_BLOCKSIZE + 1024, offset + maxlen - i)
data = address_space.zread(i, to_read)
if data:
for match in rules.match_data(data).get('main', []):
if all([hit['offset'] < constants.SCAN_BLOCKSIZE for hit in match.get('strings', [])]):
results.append((i, match))
i += constants.SCAN_BLOCKSIZE
return results
%%yara -n volarules
rule exe_on_desktop
{
// Look for files on the Desktop that end in .exe
strings:
$a = /\\Desktop\\[\w .-]{1,20}\.exe/ nocase
condition:
all of them
}
Adding compiled rules as "volarules" to namespace
ps = PSScan(config)
for task in ps.calculate():
# Pass the compiled rules object to our method yrscan
hits = yrscan(task, volarules)
if len(hits) > 0:
print '-----------------------------------'
print 'Process name: %s' % task.ImageFileName
print 'PID: %s' % task.UniqueProcessId
print 'PPID: %s' % task.InheritedFromUniqueProcessId
print 'Create time: %s' % (task.CreateTime or '')
print 'Exit time: %s' % (task.ExitTime or '')
else:
next
for addr, hit in hits:
print '> Rule name: %s' % hit.get('rule')
for string in hit.get('strings', []):
# Modified from original
# https://code.google.com/p/volatility/source/browse/tags/Volatility-2.1.0/volatility/plugins/malware/malfind.py#481
print "".join(
["{0:#010x} {1:<48} {2}\n".format(string.get('offset') + addr + o, h, ''.join(c))
for o, h, c in utils.Hexdump(string.get('data', ''))
])
----------------------------------- Process name: wuauclt.exe PID: 1372 PPID: 1064 Create time: 2008-11-26 07:39:38 Exit time: > Rule name: exe_on_desktop 0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe 0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe > Rule name: exe_on_desktop 0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe 0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe > Rule name: exe_on_desktop 0x023e9e87 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x023e9e97 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe 0x023e9e87 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x023e9e97 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe ----------------------------------- Process name: explorer.exe PID: 1516 PPID: 1452 Create time: 2008-11-26 07:38:27 Exit time: > Rule name: exe_on_desktop 0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe 0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe > Rule name: exe_on_desktop 0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe 0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe ----------------------------------- Process name: svchost.exe PID: 844 PPID: 672 Create time: 2008-11-26 07:38:18 Exit time: > Rule name: exe_on_desktop 0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe 0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe 0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe 0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe 0x023e9e87 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x023e9e97 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe 0x023e9e87 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x023e9e97 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe ----------------------------------- Process name: svchost.exe PID: 1064 PPID: 672 Create time: 2008-11-26 07:38:20 Exit time: > Rule name: exe_on_desktop 0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe 0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe > Rule name: exe_on_desktop 0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe 0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe > Rule name: exe_on_desktop 0x023e9e87 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x023e9e97 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe 0x023e9e87 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \Desktop\network 0x023e9e97 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe
As stated, Yara is great at Finding Stuff. When you have a lot of data in a Pandas DataFrame, you'll want to filter it to find important stuff. Pandas provides robust string matching/searching capabilities, but perhaps you already detections defined in yara sigs and want to apply those sigs to this data. Perhaps your analysts are more accustomed to writing Yara sigs than complex Pandas filtering conditions.
For this example, we load in a handful of user-agent strings into a DataFrame and we'll use Yara to match on a set of rules. This example highlights one more feature of Yara and the %%yara
magic: external variables. When the Yara engine scans data, it normally is scanning a blob of data, but it can also load in named chunks of data into a dictionary called external variables. Since a DataFrame is intuitively chunked into columns, we can specify an external variable container for each column in the DataFrame.
import pandas as pd
from cStringIO import StringIO
useragentcsv = """useragent
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17"
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.65 Safari/537.36"
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.65 Safari/537.36"
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.152 Safari/537.22"
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
"Mozilla/5.0 (iPhone; CPU iPhone OS 6_1 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10B144 Safari/8536.25"
"Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A403 Safari/8536.25"
Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0
Googlebot/2.1 (+http://www.google.com/bot.html)
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36"
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17"
"""
df = pd.read_csv(StringIO(useragentcsv))
df.head(5)
useragent | |
---|---|
0 | Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ... |
1 | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... |
2 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53... |
3 | Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3... |
4 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4)... |
In the cell below, the -e
option to the %%yara
magic specifies the external variable. The externals ___must___ be specified at the time of compilation, which, for the %%yara
cell magic, that means at write-time. %%yara
accepts multiple -e
parameters as well as comma,separated,lists
. External variables are referenced directly in the condition
block rather than a strings
statement. Ensure the external variable names match the column names. (For more details, see the docstring by running %%yara?
in a cell)
%%yara -n uarules -e useragent
rule iPad_iPhone
{
condition:
useragent contains "iPhone;" or
useragent contains "iPad;"
}
rule Chrome25Plus
{
condition:
useragent matches /Chrome\/((2[5-9])|3[0-9])/
}
Adding compiled rules as "uarules" to namespace
For the next method, we have to get a little creative. DataFrame.apply
provides the means to apply a function to rows or columns of the DataFrame. It does not, however, let you specify parameters to the invoked method; the only parameter will be the data from the column or row. (In our example, we use axis=1
which tells pandas to look at data row-by-row).
In order to have flexibility to pass in various Yara rule files (and avoid global variables), we have a general yarafilter method that takes a compiled yara rules object as a parameter. The function generates the necessary worker function that DataFrame.apply
will accept and preloads a subset of columns as external variables for the yara engine. The function returns a comma separated list of rules that matched on each row or blank string if no matches.
def yarafilter(rules):
# Specify a list of the column names we used in for external variables
# in the yara rules
externals = ['useragent']
def worker(row):
m = rules.match(data=" ", externals=row[externals].to_dict())
if m:
return ','.join( [y.get('rule', '') for y in m.get('main', [])] )
else:
return ''
return worker
df['yarahits'] = df.apply(yarafilter(uarules), axis=1)
df
useragent | yarahits | |
---|---|---|
0 | Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ... | |
1 | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... | |
2 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53... | Chrome25Plus |
3 | Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3... | Chrome25Plus |
4 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4)... | Chrome25Plus |
5 | Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ... | |
6 | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... | Chrome25Plus |
7 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)... | Chrome25Plus |
8 | Mozilla/4.0 (compatible; MSIE 6.0; Windows NT ... | |
9 | Mozilla/5.0 (iPhone; CPU iPhone OS 6_1 like Ma... | iPad_iPhone |
10 | Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) A... | iPad_iPhone |
11 | Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20... | |
12 | Googlebot/2.1 (+http://www.google.com/bot.html) | |
13 | Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3... | Chrome25Plus |
14 | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... |
© 2013 Lockheed Martin Corporation. All Rights Reserved.