Volatility memory analysis notebook by Eric Hutchins
The Volatility Framework is a powerful and flexible library to analyze volatile memory (e.g., memory dumps). The primary way analysts use this framework is to run the vol.py
script from the terminal with various plugins and parameters, printing the results of the command to stdout.
$ python vol.py -f ds_fuzz_hidden_proc.img --profile=WinXPSP2x86 psscan
Offset(P) Name PID PPID PDB Time created Time exited
---------- ---------------- ------ ------ ---------- -------------------- --------------------
0x0181b748 alg.exe 992 660 0x08140260 2008-11-15 23:43:25
0x01843b28 wuauclt.exe 1372 1064 0x08140180 2008-11-26 07:39:38
This is a quintessential use-case for IPython Notebook: a place to document various commands, the result of those commands, and markup description to explain the methodology and significance. In other words: your full analysis! Furthermore, as we show at the end, rich inline images also make IPy a fantastic way to guide and document memory analysis.
The downside is that the developers admit using the tools as a library is not perfect. Prepare to have to read some code.
Although its possible to use Volatility as a library, we hope to support it better in the future
It is, however, very exciting to see this on the roadmap:
Interactive IPython shell
Notebook Prerequisites
Modules
Data
from cStringIO import StringIO
# Imports following example from
# https://code.google.com/p/volatility/wiki/BasicUsage21#Using_Volatility_as_a_Library
import volatility.conf as conf
import volatility.registry as registry
import volatility.commands as commands
import volatility.addrspace as addrspace
import volatility.utils as utils
import volatility.win32.network as network
import volatility.plugins.taskmods as taskmods
import volatility.plugins.vadinfo as vadinfo
registry.PluginImporter()
config = conf.ConfObject()
registry.register_global_options(config, commands.Command)
registry.register_global_options(config, addrspace.BaseAddressSpace)
# You can print the cmds dictionary to see list of available plugins
# These are the same commands you would specify to the command line vol.py script
cmds = registry.get_plugin_classes(commands.Command, lower = True)
# These parameters simulate the command line settings "--profile" and "-f" respectively
config.PROFILE = "WinXPSP2x86"
config.LOCATION = "file:///c:/ds_fuzz_hidden_proc.img"
The PSScan module in The Volatility Framework scans physical memory for EPROCESS
allocations. This method discovers processes that may be hidden/excluded from the normal process tree
from volatility.plugins.filescan import PSScan
import pandas as pd
Here is the common way to invoke plugins. First, instantiate the plugin by passing the config
. Each plugin should provide a calculate
and render_[something]
method. The most common renderer is render_text
. Since Volatility is primarily intended to be run stand-alone from a terminal, it expects to write the output to a file buffer (or stdout). In our use-case, we want to direct this output into a buffer using the StringIO library.
This approach should work for most Volatility plugins. You will have to check the code for each module to see how to populate additional config
parameters as needed.
ps = PSScan(config)
pstable = StringIO()
psdata = ps.calculate()
ps.render_text(pstable, psdata)
print pstable.getvalue()
Offset(P) Name PID PPID PDB Time created Time exited ---------- ---------------- ------ ------ ---------- -------------------- -------------------- 0x0181b748 alg.exe 992 660 0x08140260 2008-11-15 23:43:25 0x01843b28 wuauclt.exe 1372 1064 0x08140180 2008-11-26 07:39:38 0x0184e3a8 wscntfy.exe 560 1064 0x081402a0 2008-11-26 07:44:57 0x018557e0 alg.exe 512 672 0x08140260 2008-11-26 07:38:53 0x0185dda0 cmd.exe 940 1516 0x081401a0 2008-11-26 07:43:39 2008-11-26 07:45:49 0x018a13c0 VMwareService.e 1756 672 0x08140220 2008-11-26 07:38:45 0x018af448 VMwareUser.exe 1904 1516 0x08140100 2008-11-26 07:38:31 0x018af860 VMwareTray.exe 1896 1516 0x08140200 2008-11-26 07:38:31 0x018e75e8 spoolsv.exe 1648 672 0x081401e0 2008-11-26 07:38:28 0x019456e8 csrss.exe 592 360 0x08140040 2008-11-15 23:42:56 0x01946020 svchost.exe 828 660 0x081400c0 2008-11-15 23:42:57 0x019467e0 services.exe 660 616 0x08140080 2008-11-15 23:42:56 0x0194f658 svchost.exe 1016 660 0x08140100 2008-11-15 23:42:57 0x019533c8 svchost.exe 924 660 0x081400e0 2008-11-15 23:42:57 0x019ca478 explorer.exe 1516 1452 0x081401c0 2008-11-26 07:38:27 0x019dbc30 lsass.exe 684 620 0x081400a0 2008-11-26 07:38:15 0x019e4670 smss.exe 360 4 0x08140020 2008-11-26 07:38:11 0x019f7da0 svchost.exe 1164 672 0x08140140 2008-11-26 07:38:23 0x01a0e6f0 svchost.exe 1264 672 0x08140160 2008-11-26 07:38:25 0x01a1bd78 csrss.exe 596 360 0x08140040 2008-11-26 07:38:13 0x01a2b100 winlogon.exe 620 360 0x08140060 2008-11-26 07:38:14 0x01a3ba78 services.exe 672 620 0x08140080 2008-11-26 07:38:15 0x01a3d360 svchost.exe 932 672 0x081400e0 2008-11-26 07:38:18 0x01a59d70 svchost.exe 844 672 0x081400c0 2008-11-26 07:38:18 0x01aa2300 svchost.exe 1064 672 0x08140120 2008-11-26 07:38:20 0x01bcc830 System 4 0 0x00319000
It would be nice to load the PSScan output into a data structure for sorting/filtering/etc. We can loop through the task list ourselves and extract and normalize the key parameters into a dictionary. Pandas can convert a list of dicts into a DataFrame trivially. And since our PSScan object is already in memory and Volatility has its own cache, traversing these objects again is very fast.
taskinfo = []
for task in ps.calculate():
info = {}
info['Name'] ='%s' % task.ImageFileName
info['PID'] = '%i' % task.UniqueProcessId
info['PPID'] = '%i' % task.InheritedFromUniqueProcessId
info['Threads'] = '%s' % task.ActiveThreads
info['HandleCount'] = '%s' % task.ObjectTable.HandleCount
info['SessionID'] = '%s' % task.SessionId
info['Wow64'] = '%s' % task.IsWow64
info['Start'] = str(task.CreateTime or '')
info['Exit'] = str(task.ExitTime or '')
taskinfo.append(info)
No handlers could be found for logger "volatility.obj"
Convert the list of dict info into a DataFrame. First we specify column ordering (else it defaults to alphabetical). Second we set the index of the table to be the PID for each selection. Then for fun, we interpret the Start and Exit timestamps as actual datetime
objects so we can sort the output DataFrame by Start time.
psscandf = pd.DataFrame(taskinfo, columns=['Name', 'PID', 'PPID', 'Threads', 'HandleCount', 'SessionID', 'Wow64', 'Start', 'Exit'])
psscandf.index = psscandf.PID
psscandf['Start'] = pd.to_datetime(psscandf['Start'])
psscandf['Exit'] = pd.to_datetime(psscandf['Exit'])
psscandf.sort(['Start'])
Name | PID | PPID | Threads | HandleCount | SessionID | Wow64 | Start | Exit | |
---|---|---|---|---|---|---|---|---|---|
PID | |||||||||
4 | System | 4 | 0 | 51 | 254 | False | NaT | NaT | |
660 | services.exe | 660 | 616 | 15 | -2121378248 | False | 2008-11-15 23:42:56 | NaT | |
592 | csrss.exe | 592 | 360 | 10 | 131072 | False | 2008-11-15 23:42:56 | NaT | |
828 | svchost.exe | 828 | 660 | 14 | 0 | False | 2008-11-15 23:42:57 | NaT | |
1016 | svchost.exe | 1016 | 660 | 51 | 0 | False | 2008-11-15 23:42:57 | NaT | |
924 | svchost.exe | 924 | 660 | 7 | 0 | False | 2008-11-15 23:42:57 | NaT | |
992 | alg.exe | 992 | 660 | 5 | 4784160 | False | 2008-11-15 23:43:25 | NaT | |
360 | smss.exe | 360 | 4 | 3 | 19 | False | 2008-11-26 07:38:11 | NaT | |
596 | csrss.exe | 596 | 360 | 10 | 322 | False | 2008-11-26 07:38:13 | NaT | |
620 | winlogon.exe | 620 | 360 | 16 | 503 | False | 2008-11-26 07:38:14 | NaT | |
672 | services.exe | 672 | 620 | 15 | 245 | False | 2008-11-26 07:38:15 | NaT | |
684 | lsass.exe | 684 | 620 | 21 | 347 | False | 2008-11-26 07:38:15 | NaT | |
932 | svchost.exe | 932 | 672 | 10 | 229 | False | 2008-11-26 07:38:18 | NaT | |
844 | svchost.exe | 844 | 672 | 19 | 198 | False | 2008-11-26 07:38:18 | NaT | |
1064 | svchost.exe | 1064 | 672 | 63 | 1308 | False | 2008-11-26 07:38:20 | NaT | |
1164 | svchost.exe | 1164 | 672 | 5 | 77 | False | 2008-11-26 07:38:23 | NaT | |
1264 | svchost.exe | 1264 | 672 | 14 | 209 | False | 2008-11-26 07:38:25 | NaT | |
1516 | explorer.exe | 1516 | 1452 | 12 | 362 | False | 2008-11-26 07:38:27 | NaT | |
1648 | spoolsv.exe | 1648 | 672 | 12 | 112 | False | 2008-11-26 07:38:28 | NaT | |
1904 | VMwareUser.exe | 1904 | 1516 | 1 | 28 | False | 2008-11-26 07:38:31 | NaT | |
1896 | VMwareTray.exe | 1896 | 1516 | 1 | 26 | False | 2008-11-26 07:38:31 | NaT | |
1756 | VMwareService.e | 1756 | 672 | 3 | 45 | False | 2008-11-26 07:38:45 | NaT | |
512 | alg.exe | 512 | 672 | 6 | 105 | False | 2008-11-26 07:38:53 | NaT | |
1372 | wuauclt.exe | 1372 | 1064 | 8 | 225 | False | 2008-11-26 07:39:38 | NaT | |
940 | cmd.exe | 940 | 1516 | 0 | False | 2008-11-26 07:43:39 | 2008-11-26 07:45:49 | ||
560 | wscntfy.exe | 560 | 1064 | 1 | 31 | False | 2008-11-26 07:44:57 | NaT |
With the data in Pandas, we can filter with conditions like: find all child processes of processes called svchost.exe. First we filter the psscandf by Name and extract the unique PID values. Then we can go back to the dataframe and filter for any PPIDs that exist in that list.
svchostpids = psscandf.ix[psscandf['Name'] == 'svchost.exe']['PID'].unique()
svchostpids
array(['828', '1016', '924', '1164', '1264', '932', '844', '1064'], dtype=object)
psscandf.ix[psscandf['PPID'].isin(svchostpids)]
Name | PID | PPID | Threads | HandleCount | SessionID | Wow64 | Start | Exit | |
---|---|---|---|---|---|---|---|---|---|
PID | |||||||||
1372 | wuauclt.exe | 1372 | 1064 | 8 | 225 | False | 2008-11-26 07:39:38 | NaT | |
560 | wscntfy.exe | 560 | 1064 | 1 | 31 | False | 2008-11-26 07:44:57 | NaT |
Or accomplish the same thing by joining the table to itself SQL-style.
psscandf.ix[psscandf['Name'] == 'svchost.exe'].merge(psscandf,
left_on=['PID'],
right_on=['PPID'],
suffixes=('_parent', '_child'))
Name_parent | PID_parent | PPID_parent | Threads_parent | HandleCount_parent | SessionID_parent | Wow64_parent | Start_parent | Exit_parent | Name_child | PID_child | PPID_child | Threads_child | HandleCount_child | SessionID_child | Wow64_child | Start_child | Exit_child | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | svchost.exe | 1064 | 672 | 63 | 1308 | False | 2008-11-26 07:38:20 | NaT | wuauclt.exe | 1372 | 1064 | 8 | 225 | False | 2008-11-26 07:39:38 | NaT | ||
1 | svchost.exe | 1064 | 672 | 63 | 1308 | False | 2008-11-26 07:38:20 | NaT | wscntfy.exe | 560 | 1064 | 1 | 31 | False | 2008-11-26 07:44:57 | NaT |
The whole purpose of this sample image file ds_fuzz_hidden_proc.img is to illustrate hidden processes. The PSScan module we ran above will search through memory and find all processes. There is another module, PSList, that will walk the operating system's process tree and show every process you would see in Task Manager. Anything in PSScan that isn't in PSList is an example of a hidden process.
There are native tools in The Volatility Framework to highlight these discrepancies, but it's also easy enough for us to do it with Pandas for the sake of example. We already have PSScan output in the psscandf
dataframe, now we build the same data structure based on PSList.
from volatility.plugins.taskmods import PSList
psl = PSList(config)
taskinfo = []
for task in psl.calculate():
info = {}
info['Name'] ='%s' % task.ImageFileName
info['PID'] = '%i' % task.UniqueProcessId
info['PPID'] = '%i' % task.InheritedFromUniqueProcessId
info['Threads'] = '%s' % task.ActiveThreads
info['HandleCount'] = '%s' % task.ObjectTable.HandleCount
info['SessionID'] = '%s' % task.SessionId
info['Wow64'] = '%s' % task.IsWow64
info['Start'] = str(task.CreateTime or '')
info['Exit'] = str(task.ExitTime or '')
taskinfo.append(info)
pslistdf = pd.DataFrame(taskinfo, columns=['Name', 'PID', 'PPID', 'Threads', 'HandleCount', 'SessionID', 'Wow64', 'Start', 'Exit'])
pslistdf.index = pslistdf.PID
pslistdf['Start'] = pd.to_datetime(pslistdf['Start'])
pslistdf['Exit'] = pd.to_datetime(pslistdf['Exit'])
pslistdf
Name | PID | PPID | Threads | HandleCount | SessionID | Wow64 | Start | Exit | |
---|---|---|---|---|---|---|---|---|---|
PID | |||||||||
4 | System | 4 | 0 | 51 | 254 | False | NaT | NaT | |
360 | smss.exe | 360 | 4 | 3 | 19 | False | 2008-11-26 07:38:11 | NaT | |
596 | csrss.exe | 596 | 360 | 10 | 322 | 0 | False | 2008-11-26 07:38:13 | NaT |
620 | winlogon.exe | 620 | 360 | 16 | 503 | 0 | False | 2008-11-26 07:38:14 | NaT |
672 | services.exe | 672 | 620 | 15 | 245 | 0 | False | 2008-11-26 07:38:15 | NaT |
684 | lsass.exe | 684 | 620 | 21 | 347 | 0 | False | 2008-11-26 07:38:15 | NaT |
844 | svchost.exe | 844 | 672 | 19 | 198 | 0 | False | 2008-11-26 07:38:18 | NaT |
932 | svchost.exe | 932 | 672 | 10 | 229 | 0 | False | 2008-11-26 07:38:18 | NaT |
1064 | svchost.exe | 1064 | 672 | 63 | 1308 | 0 | False | 2008-11-26 07:38:20 | NaT |
1164 | svchost.exe | 1164 | 672 | 5 | 77 | 0 | False | 2008-11-26 07:38:23 | NaT |
1264 | svchost.exe | 1264 | 672 | 14 | 209 | 0 | False | 2008-11-26 07:38:25 | NaT |
1516 | explorer.exe | 1516 | 1452 | 12 | 362 | 0 | False | 2008-11-26 07:38:27 | NaT |
1648 | spoolsv.exe | 1648 | 672 | 12 | 112 | 0 | False | 2008-11-26 07:38:28 | NaT |
1896 | VMwareTray.exe | 1896 | 1516 | 1 | 26 | 0 | False | 2008-11-26 07:38:31 | NaT |
1904 | VMwareUser.exe | 1904 | 1516 | 1 | 28 | 0 | False | 2008-11-26 07:38:31 | NaT |
1756 | VMwareService.e | 1756 | 672 | 3 | 45 | 0 | False | 2008-11-26 07:38:45 | NaT |
512 | alg.exe | 512 | 672 | 6 | 105 | 0 | False | 2008-11-26 07:38:53 | NaT |
1372 | wuauclt.exe | 1372 | 1064 | 8 | 225 | 0 | False | 2008-11-26 07:39:38 | NaT |
560 | wscntfy.exe | 560 | 1064 | 1 | 31 | 0 | False | 2008-11-26 07:44:57 | NaT |
Next, we take the list of PIDs from the PSList dataframe and filter the PSScan dataframe for any row where the PID is not in the PSList (via the ~ negation operator). Thus, we've discovered seven hidden processes!
This particular memory sample was created to demonstrate a very clever technique to hide processes. In fact, there is another hidden process not shown in the list below. I'll leave that as an exercise for the reader. For more, see Jesse Kornblum's blog post.
psscandf.ix[~psscandf.PID.isin(pslistdf.PID.tolist())].sort(['Start'])
Name | PID | PPID | Threads | HandleCount | SessionID | Wow64 | Start | Exit | |
---|---|---|---|---|---|---|---|---|---|
PID | |||||||||
592 | csrss.exe | 592 | 360 | 10 | 131072 | False | 2008-11-15 23:42:56 | NaT | |
660 | services.exe | 660 | 616 | 15 | -2121378248 | False | 2008-11-15 23:42:56 | NaT | |
828 | svchost.exe | 828 | 660 | 14 | 0 | False | 2008-11-15 23:42:57 | NaT | |
924 | svchost.exe | 924 | 660 | 7 | 0 | False | 2008-11-15 23:42:57 | NaT | |
1016 | svchost.exe | 1016 | 660 | 51 | 0 | False | 2008-11-15 23:42:57 | NaT | |
992 | alg.exe | 992 | 660 | 5 | 4784160 | False | 2008-11-15 23:43:25 | NaT | |
940 | cmd.exe | 940 | 1516 | 0 | False | 2008-11-26 07:43:39 | 2008-11-26 07:45:49 |
The most exciting application of Volatility analysis in IPython Notebook, to me at least, is inline graphing. In addition to the primary render_text
output option for the PSScan module, there is also a render_dot
for the Graphviz dot format. The typical Volatility use-case for Graphviz generation would go like this:
vol.py
with parameters --output=dot --output-file=out.dot
out.dot
file in GraphvizIn IPython, we can do that in just one step and keep the analysis, output, and documentation all in one place!
To render the dot files, I'm using the IPython magic hierarchymagic
. This plugin adds a new IPython cell magic %%dot
so you can write a cell like:
%%dot
digraph processtree {
graph [rankdir = "TB"];
pid672 -> pid844 [];
pid672 -> pid932 [];
//more cool dot stuff
}
and get the SVG output right in your notebook. We actually won't use this %%dot
command, though. Instead, there's an underlying worker method called run_dot
inside of hierarchymagic
that we will use instead. We use the render_dot
method to generate the graph in dot syntax. By importing the hierarchymagic library, we can pass the dot text directly to the run_dot
method which returns SVG image data (which is just xml). Finally, IPython has a handy SVG
method to render the graphic right in the notebook. No need to keep track of temporary files!
%load_ext hierarchymagic
import hierarchymagic #to put the library explicitly in the namespace
from IPython.display import SVG
psdot = StringIO()
psdata = ps.calculate()
ps.render_dot(psdot, psdata)
SVG(hierarchymagic.run_dot(psdot.getvalue(), format='svg'))
© 2013 Lockheed Martin Corporation. All Rights Reserved.