For us to learn:
boto
libraryThis notebook duplicates some of Day_20_CommonCrawl_Starter.
For moving files between your computer and PiCloud, look at Day_20_Moving_files_to_PiCloud.ipynb.
For understanding the actual content of the files in Common Crawl, we'll look at Day_21_CommonCrawl_Content.ipynb
Good to review Dave Lester's talk: http://www.slideshare.net/davelester/introduction-to-common-crawl
If you need general intro to Common Crawl, watch the Common Crawl Video.
The Common Crawl data structure is documented at https://commoncrawl.atlassian.net/wiki/display/CRWL/About+the+Data+Set. To quote the docs:
The entire Common Crawl data set is stored on Amazon S3 as a Public Data Set:
http://aws.amazon.com/datasets/41740
The data set is divided into three major subsets:
The two archived crawl data sets are stored in folders organized by the year, month, date, and hour the content was crawled. For example:
s3://aws-publicdatasets/common-crawl/crawl-002/2010/01/06/10/1262847572760_10.arc.gz
The current crawl data set is stored in the "parse-output" folder in a similar manner to how Nutch stores archives. Crawl data is stored in a "segments" subfolder, then in a folder that starts with the UNIX timestamp of crawl start time. For example:
s3://aws-publicdatasets/common-crawl/parse-output/segment/1341690169105/1341826131693_45.arc.gz
# this key, secret access to aws-publicdatasets only -- created for WwOD 13 student usage
# turns out there is an anonymous mode in boto for public data sets:
# https://github.com/keiw/common_crawl_index/commit/ad341d0a41a828f260c9c08419dadff0dac6cf5b#L0R33
# conn=S3Connection(anon=True) will work instead of conn= S3Connection(KEY, SECRET) -- but there seems to be
# a bug in how S3Connection gets pickled for anon=True -- so for now, just use the KEY, SECRET
KEY = 'AKIAJH2FD7572FCTVSSQ'
SECRET = '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'
You can use this key/secret pair to configure both boto
and s3cmd
# s3cmd installed in custom PiCloud environment -- and maybe in your local environment too
# confirm s3://aws-publicdatasets/common-crawl/crawl-002/2010/01/06/10/1262847572760_10.arc.gz
# doc for s3cmd: http://s3tools.org/s3cmd
!s3cmd ls s3://aws-publicdatasets/common-crawl/crawl-002/2010/01/06/10/1262847572760_10.arc.gz
2012-01-05 19:19 100001092 s3://aws-publicdatasets/common-crawl/crawl-002/2010/01/06/10/1262847572760_10.arc.gz
s3://aws-publicdatasets/common-crawl/parse-output/segment/1341690169105/1341826131693_45.arc.gz
¶!s3cmd ls s3://aws-publicdatasets/common-crawl/parse-output/segment/1341690169105/1341826131693_45.arc.gz
2012-07-09 10:43 100001274 s3://aws-publicdatasets/common-crawl/parse-output/segment/1341690169105/1341826131693_45.arc.gz
# looking at parse-output itself
!s3cmd ls s3://aws-publicdatasets/common-crawl/parse-output
DIR s3://aws-publicdatasets/common-crawl/parse-output-test/ DIR s3://aws-publicdatasets/common-crawl/parse-output/ 2012-09-04 05:03 0 s3://aws-publicdatasets/common-crawl/parse-output-test_$folder$ 2012-11-09 11:28 0 s3://aws-publicdatasets/common-crawl/parse-output_$folder$
# looking at what is contained by parse-output "folder"
!s3cmd ls s3://aws-publicdatasets/common-crawl/parse-output/
DIR s3://aws-publicdatasets/common-crawl/parse-output/checkpoint_staging/ DIR s3://aws-publicdatasets/common-crawl/parse-output/checkpoints/ DIR s3://aws-publicdatasets/common-crawl/parse-output/segment/ DIR s3://aws-publicdatasets/common-crawl/parse-output/valid_segments2/ 2012-10-17 00:11 0 s3://aws-publicdatasets/common-crawl/parse-output/checkpoint_staging_$folder$ 2012-11-09 00:10 0 s3://aws-publicdatasets/common-crawl/parse-output/checkpoints_$folder$ 2012-09-05 05:13 0 s3://aws-publicdatasets/common-crawl/parse-output/segment_$folder$ 2012-11-09 11:28 2478 s3://aws-publicdatasets/common-crawl/parse-output/valid_segments.txt 2012-09-05 05:13 0 s3://aws-publicdatasets/common-crawl/parse-output/valid_segments2_$folder$ 2012-07-09 15:07 0 s3://aws-publicdatasets/common-crawl/parse-output/valid_segments_$folder$
There is a list of "valid segments" in
s3://aws-publicdatasets/common-crawl/parse-output/valid_segments.txt
-- a list of segments that are part of the current crawl. Let's download it and study it.
!s3cmd ls s3://aws-publicdatasets/common-crawl/parse-output/valid_segments.txt
2012-11-09 11:28 2478 s3://aws-publicdatasets/common-crawl/parse-output/valid_segments.txt
# we can download it:
!s3cmd get --force s3://aws-publicdatasets/common-crawl/parse-output/valid_segments.txt
s3://aws-publicdatasets/common-crawl/parse-output/valid_segments.txt -> ./valid_segments.txt [1 of 1] 2478 of 2478 100% in 0s 4.48 kB/s done
!head valid_segments.txt
1346823845675 1346823846036 1346823846039 1346823846110 1346823846125 1346823846150 1346823846176 1346876860445 1346876860454 1346876860467
# http://boto.s3.amazonaws.com/s3_tut.html
import boto
from boto.s3.connection import S3Connection
from itertools import islice
conn = S3Connection(KEY,SECRET)
# turns out there is an anonymous mode in boto for public data sets:
# https://github.com/keiw/common_crawl_index/commit/ad341d0a41a828f260c9c08419dadff0dac6cf5b#L0R33
#conn=S3Connection(anon=True)
bucket = conn.get_bucket('aws-publicdatasets')
for key in islice(bucket.list(prefix="common-crawl/parse-output/", delimiter="/"),None):
print key.name.encode('utf-8')
common-crawl/parse-output/checkpoint_staging_$folder$ common-crawl/parse-output/checkpoints_$folder$ common-crawl/parse-output/segment_$folder$ common-crawl/parse-output/valid_segments.txt common-crawl/parse-output/valid_segments2_$folder$ common-crawl/parse-output/valid_segments_$folder$ common-crawl/parse-output/checkpoint_staging/ common-crawl/parse-output/checkpoints/ common-crawl/parse-output/segment/ common-crawl/parse-output/valid_segments2/
# get valid_segments
# https://commoncrawl.atlassian.net/wiki/display/CRWL/About+the+Data+Set
import boto
from boto.s3.connection import S3Connection
conn = S3Connection(KEY, SECRET)
bucket = conn.get_bucket('aws-publicdatasets')
k = bucket.get_key("common-crawl/parse-output/valid_segments.txt")
s = k.get_contents_as_string()
valid_segments = filter(None, s.split("\n"))
print len(valid_segments), valid_segments[0]
177 1346823845675
# valid_segments are Unix timestamps (in ms) -- confirm current crawl is from 2012
import datetime
datetime.datetime.fromtimestamp(float(valid_segments[0])/1000.)
datetime.datetime(2012, 9, 4, 22, 44, 5, 675000)
As of the time of this writing (April 4, 2013), there are 177 valid segments in the current crawl. Now, it's time to figure out how to write a Python function called segment_stats
that takes a segment id and an optional stop
parameter (for the max number of keys to iterate through) of the form
def segment_stats(seg_id, stop=None):
pass
# YOUR EXERCISE TO FILL IN
and returns a dict
with 2 keys:
count
holding the number of keys inside the given valid segmentsize
holding the total number of bytes held in the keysbroken down by file type (there are 3 major types):
arg.gz
for theFor example:
segment_stats('1346823845675', None)
should return:
{
'count': {'arc.gz': 11904, 'metadata': 4377, 'success': 1, 'textData': 4377},
'size': {'arc.gz': 967409519222,
'metadata': 187079951008,
'success': 0,
'textData': 129994977292}
}
Since it can take 10-50 seconds or so to retrieve all the keys in a valid segment, it's worth limiting to say first 10 to get a feel for what you can do with a key. Run the following:
from itertools import islice
import boto
from boto.s3.connection import S3Connection
conn = S3Connection(KEY, SECRET)
bucket = conn.get_bucket('aws-publicdatasets')
for key in islice(bucket.list(prefix="common-crawl/parse-output/segment/1346823845675/", delimiter="/"),10):
print key.name.encode('utf-8')
common-crawl/parse-output/segment/1346823845675/1346864466526_10.arc.gz common-crawl/parse-output/segment/1346823845675/1346864469604_0.arc.gz common-crawl/parse-output/segment/1346823845675/1346864469638_1.arc.gz common-crawl/parse-output/segment/1346823845675/1346864471290_4.arc.gz common-crawl/parse-output/segment/1346823845675/1346864477152_29.arc.gz common-crawl/parse-output/segment/1346823845675/1346864479613_6.arc.gz common-crawl/parse-output/segment/1346823845675/1346864480261_2.arc.gz common-crawl/parse-output/segment/1346823845675/1346864480936_5.arc.gz common-crawl/parse-output/segment/1346823845675/1346864484063_39.arc.gz common-crawl/parse-output/segment/1346823845675/1346864484163_3.arc.gz
# WARNING -- this might take a bit of time to run -- run it to see how long it takes you to get all the keys in this
# segment. time depends on where you are running this code
%time all_files = list(islice(bucket.list(prefix="common-crawl/parse-output/segment/1346823845675/", delimiter="/"),None))
print len(all_files), all_files[0]
CPU times: user 4.77 s, sys: 0.29 s, total: 5.06 s Wall time: 44.76 s 20659 <Key: aws-publicdatasets,common-crawl/parse-output/segment/1346823845675/1346864466526_10.arc.gz>
But it's useful now to have all_files
to hold all the keys under the segment 1346823845675
Note, for example, you can get the size of the file and the name -- and the type of file (boto.s3.key.Key)
# http://boto.readthedocs.org/en/latest/ref/s3.html#module-boto.s3.key
file0 = all_files[0]
type(file0), file0.name, file0.size
(boto.s3.key.Key, u'common-crawl/parse-output/segment/1346823845675/1346864466526_10.arc.gz', 100011998)
import boto
from boto.s3.connection import S3Connection
# this key, secret access to aws-publicdatasets only -- createdd for WwOD 13 student usage
KEY = 'AKIAJH2FD7572FCTVSSQ'
SECRET = '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'
from itertools import islice
from pandas import DataFrame
conn= S3Connection(KEY, SECRET)
bucket = conn.get_bucket('aws-publicdatasets')
# you might find this conversion function between DataFrame and a list of a regular dict useful
#https://gist.github.com/mikedewar/1486027#comment-804797
def df_to_dictlist(df):
return [{k:df.values[i][v] for v,k in enumerate(df.columns)} for i in range(len(df))]
def cc_file_type(path):
fname = path.split("/")[-1]
if fname[-7:] == '.arc.gz':
return 'arc.gz'
elif fname[:9] == 'textData-':
return 'textData'
elif fname[:9] == 'metadata-':
return 'metadata'
elif fname == '_SUCCESS':
return 'success'
else:
return 'other'
# a first pass, using DataFrame. Might not be so efficient considering we are returning only totals
def segment_stats(seg_id, stop=None):
all_files = islice(bucket.list(prefix="common-crawl/parse-output/segment/{0}/".format(seg_id), delimiter="/"),stop)
df = DataFrame([{'size': f.size if hasattr(f, 'size') else 0, 'name':f.name, 'type':cc_file_type(f.name)} for f in all_files])
return {'count': df_to_dictlist(df[['size','type']].groupby('type').count()[['size']].T)[0],
'size': df_to_dictlist(df[['size', 'type']].groupby('type').sum().astype('int64').T)[0]}
# another version of segment_stats that doesn't use DataFrame; probably easier to comprehend what's going on too -- and possibly
# faster
def segment_stats2(seg_id, stop=None):
from collections import Counter
file_count = Counter()
byte_count = Counter()
all_files = islice(bucket.list(prefix="common-crawl/parse-output/segment/{0}/".format(seg_id), delimiter="/"),stop)
for f in all_files:
file_type = cc_file_type(f.name)
file_count.update({file_type: 1})
byte_count.update({file_type: f.size if hasattr(f, 'size') else 0})
return {'count': dict(file_count),
'size': dict(byte_count)}
# recall the first segment -- let's work on that segment
valid_segments[0]
'1346823845675'
# look at how long it takes to run locally
%time segment_stats(valid_segments[0], None)
CPU times: user 4.18 s, sys: 0.15 s, total: 4.33 s Wall time: 28.97 s
{'count': {'arc.gz': 11904, 'metadata': 4377, 'success': 1, 'textData': 4377}, 'size': {'arc.gz': 967409519222, 'metadata': 187079951008, 'success': 0, 'textData': 129994977292}}
# here's how to run it on PiCloud
# Prerequisite: http://docs.picloud.com/primer.html <--- READ THIS AND STUDY TO REFRESH YOUR MEMORY
import cloud
jid = cloud.call(segment_stats, '1346823845675', None, _env='/rdhyee/Working_with_Open_Data')
# pull up status -- refresh until done
cloud.status(jid)
'processing'
# this will block until job is done or errors out
cloud.join(jid)
# get your result
cloud.result(jid)
{'count': {'arc.gz': 11904, 'metadata': 4377, 'success': 1, 'textData': 4377}, 'size': {'arc.gz': 967409519222, 'metadata': 187079951008, 'success': 0, 'textData': 129994977292}}
# get some basic info
cloud.info(jid)
{1788: {'runtime': 16.1878, 'status': 'done', 'stderr': None, 'stdout': None}}
# get some specific info
cloud.info(jid, info_requested=['created', 'finished', 'runtime', 'cputime'])
{1788: {'cputime.system': 2.55, 'cputime.user': 8.0, 'created': datetime.datetime(2013, 4, 16, 15, 59, 32), 'finished': datetime.datetime(2013, 4, 16, 15, 59, 52), 'runtime': 16.1878}}
I had to retry 2 jobs
# now tally everything noting the retries -- might be worth writing this generally
# THIS CODE REFERS SPECIFICALLY TO RAYMOND YEE'S JOBS -- REPLACE WITH YOUR OWN IDS
from pandas import DataFrame
import cloud
from itertools import izip, ifilter, chain, islice
from matplotlib import pyplot as plt
valid_segments
segment_jids = xrange(319, 496)
retries_seg_ids = ['1346876860789', '1350433106986']
retries_jids = xrange(496, 498)
tally = list(ifilter(lambda x: x[2] == 'done',
izip(chain(valid_segments, retries_seg_ids), chain(segment_jids, retries_jids),
cloud.status(list(chain(segment_jids, retries_jids))))))
result = cloud.result([jid for (seg_id, jid, status) in tally])
# http://docs.picloud.com/moduledoc.html#module-cloud
jobs_info = cloud.info(list(islice(chain(segment_jids, retries_jids),None)),
info_requested=['created', 'finished', 'runtime', 'cputime', 'core']
)
started = [{'jid':k, 'time':v['finished'] - datetime.timedelta(seconds=v['runtime']), 'count': 1} for (k,v) in jobs_info.items()]
finished = [{'jid':k, 'time':v['finished'], 'count': -1} for (k,v) in jobs_info.items()]
df = DataFrame(started + finished)
exclude_n = 4
plot(df.sort_index(by='time')['time'][:-exclude_n], df.sort_index(by='time')['count'].cumsum()[:-exclude_n])
[<matplotlib.lines.Line2D at 0x82a4990>]
from collections import Counter
file_counter = Counter()
byte_counter = Counter()
result = cloud.result([jid for (seg_id, jid, status) in tally])
for r in result:
file_counter.update(r['count'])
byte_counter.update(r['size'])
file_counter, byte_counter
(Counter({'arc.gz': 856589, 'textData': 341525, 'metadata': 341517, 'success': 71, 'other': 17}), Counter({'arc.gz': 71106384571350, 'metadata': 11010558690874, 'textData': 6978342039325, 'other': 1626219, 'success': 0}))
jobs_info
{319L: {'cputime.system': 2.25, 'cputime.user': 6.4, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 1, 59), 'runtime': 13.1426}, 320L: {'cputime.system': 0.625, 'cputime.user': 11.200000000000001, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 2, 24), 'runtime': 24.1863}, 321L: {'cputime.system': 1.925, 'cputime.user': 3.5, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 2, 31), 'runtime': 5.93496}, 322L: {'cputime.system': 0, 'cputime.user': 0.9749999999999996, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 2, 35), 'runtime': 3.2074}, 323L: {'cputime.system': 0, 'cputime.user': 6.4, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 2, 48), 'runtime': 13.1659}, 324L: {'cputime.system': 0.32499999999999973, 'cputime.user': 7.350000000000001, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 3, 3), 'runtime': 14.6188}, 325L: {'cputime.system': 0, 'cputime.user': 4.474999999999998, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 3, 16), 'runtime': 13.0314}, 326L: {'cputime.system': 0.3250000000000002, 'cputime.user': 5.449999999999999, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 3, 29), 'runtime': 12.9266}, 327L: {'cputime.system': 0, 'cputime.user': 4.800000000000004, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 3, 42), 'runtime': 12.1657}, 328L: {'cputime.system': 0.3250000000000002, 'cputime.user': 10.25, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 4, 5), 'runtime': 22.8888}, 329L: {'cputime.system': 0.2999999999999998, 'cputime.user': 5.099999999999994, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 4, 19), 'runtime': 14.095}, 330L: {'cputime.system': 0.3250000000000002, 'cputime.user': 3.200000000000003, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 4, 27), 'runtime': 7.54293}, 331L: {'cputime.system': 0, 'cputime.user': 5.125, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 4, 41), 'runtime': 13.6969}, 332L: {'cputime.system': 0, 'cputime.user': 5.125, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 4, 55), 'runtime': 13.7375}, 333L: {'cputime.system': 0.6499999999999995, 'cputime.user': 5.125, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 5, 11), 'runtime': 15.6019}, 334L: {'cputime.system': 0, 'cputime.user': 5.424999999999997, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 5, 24), 'runtime': 12.3113}, 335L: {'cputime.system': 0, 'cputime.user': 4.5, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 5, 33), 'runtime': 9.28734}, 336L: {'cputime.system': 0.3250000000000002, 'cputime.user': 3.825000000000003, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 5, 44), 'runtime': 10.8933}, 337L: {'cputime.system': 0, 'cputime.user': 3.8499999999999943, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 5, 54), 'runtime': 9.42668}, 338L: {'cputime.system': 0, 'cputime.user': 5.75, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 6, 8), 'runtime': 14.3095}, 339L: {'cputime.system': 0.625, 'cputime.user': 3.8500000000000085, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 6, 19), 'runtime': 9.92344}, 340L: {'cputime.system': 0, 'cputime.user': 6.075000000000003, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 6, 33), 'runtime': 14.4308}, 341L: {'cputime.system': 1.9249999999999998, 'cputime.user': 9.9, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 7), 'runtime': 25.4636}, 342L: {'cputime.system': 0, 'cputime.user': 7.375, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 6, 51), 'runtime': 17.0747}, 343L: {'cputime.system': 0, 'cputime.user': 9.274999999999991, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 7, 16), 'runtime': 25.5032}, 344L: {'cputime.system': 1.6, 'cputime.user': 3.825, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 11), 'runtime': 68.5595}, 345L: {'cputime.system': 1.9249999999999998, 'cputime.user': 7.675, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 7, 35), 'runtime': 20.742}, 346L: {'cputime.system': 0, 'cputime.user': 5.75, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 7, 31), 'runtime': 14.6856}, 347L: {'cputime.system': 0.3250000000000002, 'cputime.user': 8.325000000000003, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 7, 50), 'runtime': 18.6423}, 348L: {'cputime.system': 0, 'cputime.user': 4.4750000000000005, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 7, 46), 'runtime': 11.2397}, 349L: {'cputime.system': 1.9249999999999998, 'cputime.user': 5.1, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 7, 59), 'runtime': 11.6796}, 350L: {'cputime.system': 0.6500000000000004, 'cputime.user': 4.15, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 7, 57), 'runtime': 10.263}, 351L: {'cputime.system': 2.25, 'cputime.user': 5.425, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 2), 'runtime': 10.7821}, 352L: {'cputime.system': 0, 'cputime.user': 4.800000000000011, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 1), 'runtime': 10.786}, 353L: {'cputime.system': 0.625, 'cputime.user': 5.449999999999999, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 9), 'runtime': 11.5266}, 354L: {'cputime.system': 0, 'cputime.user': 2.25, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 5), 'runtime': 5.7994}, 355L: {'cputime.system': 0, 'cputime.user': 3.5249999999999773, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 8), 'runtime': 6.19568}, 356L: {'cputime.system': 0.3250000000000002, 'cputime.user': 6.3999999999999995, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 16), 'runtime': 13.7778}, 357L: {'cputime.system': 1.6, 'cputime.user': 4.15, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 24), 'runtime': 69.7021}, 358L: {'cputime.system': 0, 'cputime.user': 5.125, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 17), 'runtime': 11.5543}, 359L: {'cputime.system': 0, 'cputime.user': 5.125, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 20), 'runtime': 11.5575}, 360L: {'cputime.system': 0.3250000000000002, 'cputime.user': 5.75, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 21), 'runtime': 12.2515}, 361L: {'cputime.system': 1.9249999999999998, 'cputime.user': 10.225, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 34), 'runtime': 83.3468}, 362L: {'cputime.system': 0.2999999999999998, 'cputime.user': 5.449999999999999, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 27), 'runtime': 10.4084}, 363L: {'cputime.system': 0, 'cputime.user': 2.2249999999999996, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 23), 'runtime': 5.62689}, 364L: {'cputime.system': 0.6500000000000004, 'cputime.user': 4.475000000000023, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 31), 'runtime': 10.8136}, 365L: {'cputime.system': 0, 'cputime.user': 5.450000000000003, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 35), 'runtime': 14.1835}, 366L: {'cputime.system': 0, 'cputime.user': 7.050000000000001, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 39), 'runtime': 16.1453}, 367L: {'cputime.system': 0, 'cputime.user': 8.325000000000003, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 43), 'runtime': 15.9796}, 368L: {'cputime.system': 0, 'cputime.user': 1.5999999999999943, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 36), 'runtime': 5.46459}, 369L: {'cputime.system': 1.6, 'cputime.user': 4.475, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 48), 'runtime': 11.1548}, 370L: {'cputime.system': 0.3249999999999993, 'cputime.user': 6.724999999999994, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 50), 'runtime': 13.8263}, 371L: {'cputime.system': 0, 'cputime.user': 7.674999999999983, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 53), 'runtime': 16.2544}, 372L: {'cputime.system': 0, 'cputime.user': 7.349999999999998, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 8), 'runtime': 14.9538}, 373L: {'cputime.system': 1.275, 'cputime.user': 4.15, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 51), 'runtime': 9.09589}, 374L: {'cputime.system': 0, 'cputime.user': 4.800000000000001, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 50), 'runtime': 9.93375}, 375L: {'cputime.system': 1.6, 'cputime.user': 1.6, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 10, 2), 'runtime': 72.9493}, 376L: {'cputime.system': 0, 'cputime.user': 1.2749999999999986, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 45), 'runtime': 1.83439}, 377L: {'cputime.system': 1.5999999999999999, 'cputime.user': 1.9, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 49), 'runtime': 2.75356}, 378L: {'cputime.system': 1.3000000000000003, 'cputime.user': 2.55, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 51), 'runtime': 3.27721}, 379L: {'cputime.system': 0.3250000000000002, 'cputime.user': 0.9499999999999993, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 47), 'runtime': 1.76929}, 380L: {'cputime.system': 0, 'cputime.user': 0.9750000000000014, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 49), 'runtime': 1.75147}, 381L: {'cputime.system': 0, 'cputime.user': 0.625, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 57), 'runtime': 1.89922}, 382L: {'cputime.system': 1.5999999999999999, 'cputime.user': 2.225, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 54), 'runtime': 2.92445}, 383L: {'cputime.system': 1.5999999999999999, 'cputime.user': 1.6, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 55), 'runtime': 3.24286}, 384L: {'cputime.system': 0.3250000000000002, 'cputime.user': 0.9750000000000001, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 51), 'runtime': 1.32308}, 385L: {'cputime.system': 2.25, 'cputime.user': 1.9, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 54), 'runtime': 2.90845}, 386L: {'cputime.system': 0.3250000000000002, 'cputime.user': 1.2749999999999986, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 52), 'runtime': 1.84549}, 387L: {'cputime.system': 0.3250000000000002, 'cputime.user': 1.2749999999999986, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 52), 'runtime': 1.88334}, 388L: {'cputime.system': 1.9250000000000003, 'cputime.user': 2.225, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 55), 'runtime': 3.2299}, 389L: {'cputime.system': 0, 'cputime.user': 0.9500000000000028, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 53), 'runtime': 1.96082}, 390L: {'cputime.system': 0.2999999999999998, 'cputime.user': 0.9500000000000002, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 53), 'runtime': 1.7653}, 391L: {'cputime.system': 0.32499999999999973, 'cputime.user': 0.9500000000000002, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 53), 'runtime': 1.54679}, 392L: {'cputime.system': 1.275, 'cputime.user': 1.9, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 10, 7), 'runtime': 74.0247}, 393L: {'cputime.system': 0, 'cputime.user': 0.9750000000000014, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 54), 'runtime': 1.82901}, 394L: {'cputime.system': 0, 'cputime.user': 0.6500000000000057, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 55), 'runtime': 1.81283}, 395L: {'cputime.system': 0, 'cputime.user': 0.9749999999999996, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 55), 'runtime': 1.59021}, 396L: {'cputime.system': 1.925, 'cputime.user': 2.575, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 58), 'runtime': 3.13537}, 397L: {'cputime.system': 0.9500000000000002, 'cputime.user': 1.9, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 59), 'runtime': 2.8934}, 398L: {'cputime.system': 0, 'cputime.user': 0.9750000000000014, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 56), 'runtime': 1.70866}, 399L: {'cputime.system': 0, 'cputime.user': 0.32499999999999973, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 56), 'runtime': 62.2221}, 400L: {'cputime.system': 0.3250000000000002, 'cputime.user': 0.625, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 56), 'runtime': 1.8694}, 401L: {'cputime.system': 0, 'cputime.user': 0.625, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 57), 'runtime': 1.82157}, 402L: {'cputime.system': 1.6, 'cputime.user': 2.225, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 59), 'runtime': 2.67548}, 403L: {'cputime.system': 0, 'cputime.user': 1.2749999999999995, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 57), 'runtime': 1.97104}, 404L: {'cputime.system': 0, 'cputime.user': 0.950000000000017, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 57), 'runtime': 1.80539}, 405L: {'cputime.system': 2.225, 'cputime.user': 2.225, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9), 'runtime': 3.00357}, 406L: {'cputime.system': 0, 'cputime.user': 1.6000000000000014, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 58), 'runtime': 1.8707}, 407L: {'cputime.system': 0, 'cputime.user': 0.9749999999999979, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 58), 'runtime': 1.71175}, 408L: {'cputime.system': 0, 'cputime.user': 0.32499999999999973, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 59), 'runtime': 1.68704}, 409L: {'cputime.system': 1.5999999999999999, 'cputime.user': 1.9, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 2), 'runtime': 2.92198}, 410L: {'cputime.system': 0, 'cputime.user': 1.2749999999999773, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 59), 'runtime': 1.90711}, 411L: {'cputime.system': 0, 'cputime.user': 0.9750000000000005, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 8, 59), 'runtime': 1.51018}, 412L: {'cputime.system': 0, 'cputime.user': 0.625, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9), 'runtime': 1.86641}, 413L: {'cputime.system': 0, 'cputime.user': 0.625, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9), 'runtime': 1.76413}, 414L: {'cputime.system': 0, 'cputime.user': 1.2750000000000021, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9), 'runtime': 1.90701}, 415L: {'cputime.system': 0, 'cputime.user': 0.6499999999999999, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 1), 'runtime': 1.91548}, 416L: {'cputime.system': 0, 'cputime.user': 0.9500000000000002, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 1), 'runtime': 1.94664}, 417L: {'cputime.system': 1.5999999999999999, 'cputime.user': 2.55, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 4), 'runtime': 2.99209}, 418L: {'cputime.system': 0, 'cputime.user': 0.625, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 1), 'runtime': 1.60673}, 419L: {'cputime.system': 0, 'cputime.user': 1.3000000000000114, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 1), 'runtime': 1.75869}, 420L: {'cputime.system': 0.32499999999999973, 'cputime.user': 0.9750000000000001, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 1), 'runtime': 1.71851}, 421L: {'cputime.system': 0, 'cputime.user': 1.5999999999999943, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 2), 'runtime': 1.77275}, 422L: {'cputime.system': 0.3250000000000002, 'cputime.user': 0.6500000000000004, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 2), 'runtime': 1.66671}, 423L: {'cputime.system': 0, 'cputime.user': 0.6499999999999999, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 2), 'runtime': 1.74193}, 424L: {'cputime.system': 0, 'cputime.user': 1.2750000000000021, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 2), 'runtime': 1.66118}, 425L: {'cputime.system': 0.3250000000000002, 'cputime.user': 0.6499999999999995, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 3), 'runtime': 1.65309}, 426L: {'cputime.system': 1.6, 'cputime.user': 1.6, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 6), 'runtime': 3.05715}, 427L: {'cputime.system': 0.2999999999999998, 'cputime.user': 0.625, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 3), 'runtime': 1.60067}, 428L: {'cputime.system': 0, 'cputime.user': 1.2999999999999998, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 3), 'runtime': 1.44499}, 429L: {'cputime.system': 0, 'cputime.user': 0.9500000000000002, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 3), 'runtime': 1.68051}, 430L: {'cputime.system': 0, 'cputime.user': 0.32500000000000284, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 4), 'runtime': 1.58001}, 431L: {'cputime.system': 0, 'cputime.user': 1.2749999999999995, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 4), 'runtime': 1.68855}, 432L: {'cputime.system': 0, 'cputime.user': 1.2750000000000004, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 4), 'runtime': 1.67802}, 433L: {'cputime.system': 0, 'cputime.user': 0.6499999999999999, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 10, 4), 'runtime': 61.6087}, 434L: {'cputime.system': 0, 'cputime.user': 0.6499999999999986, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 4), 'runtime': 1.46981}, 435L: {'cputime.system': 0, 'cputime.user': 1.2749999999999995, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 5), 'runtime': 1.8635}, 436L: {'cputime.system': 0.3250000000000002, 'cputime.user': 0.625, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 4), 'runtime': 1.40582}, 437L: {'cputime.system': 0, 'cputime.user': 0.32499999999998863, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 5), 'runtime': 1.82435}, 438L: {'cputime.system': 0, 'cputime.user': 0.6499999999999995, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 5), 'runtime': 1.70683}, 439L: {'cputime.system': 0, 'cputime.user': 0.32500000000000284, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 5), 'runtime': 1.39514}, 440L: {'cputime.system': 0, 'cputime.user': 0.9500000000000002, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 6), 'runtime': 1.75967}, 441L: {'cputime.system': 0, 'cputime.user': 0.9750000000000005, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 6), 'runtime': 1.65567}, 442L: {'cputime.system': 0, 'cputime.user': 0.9499999999999993, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 6), 'runtime': 1.79096}, 443L: {'cputime.system': 0, 'cputime.user': 0.9499999999999957, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 6), 'runtime': 1.79765}, 444L: {'cputime.system': 0, 'cputime.user': 0.6500000000000004, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 6), 'runtime': 1.52851}, 445L: {'cputime.system': 0, 'cputime.user': 0.9500000000000011, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 7), 'runtime': 1.63952}, 446L: {'cputime.system': 0, 'cputime.user': 0.950000000000017, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 7), 'runtime': 1.44924}, 447L: {'cputime.system': 0.3250000000000002, 'cputime.user': 1.2750000000000004, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 7), 'runtime': 1.7404}, 448L: {'cputime.system': 0, 'cputime.user': 1.2749999999999986, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 7), 'runtime': 1.46547}, 449L: {'cputime.system': 0, 'cputime.user': 0.2999999999999998, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 8), 'runtime': 1.60016}, 450L: {'cputime.system': 0, 'cputime.user': 1.2749999999999995, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 8), 'runtime': 1.64608}, 451L: {'cputime.system': 0, 'cputime.user': 0.9749999999999996, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 8), 'runtime': 1.67322}, 452L: {'cputime.system': 0, 'cputime.user': 0.6500000000000004, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 10, 18), 'runtime': 71.9488}, 453L: {'cputime.system': 0, 'cputime.user': 0.6500000000000057, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 8), 'runtime': 1.49604}, 454L: {'cputime.system': 0, 'cputime.user': 0.9500000000000002, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 8), 'runtime': 1.41834}, 455L: {'cputime.system': 0, 'cputime.user': 0.9749999999999996, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 9), 'runtime': 1.78469}, 456L: {'cputime.system': 1.2750000000000001, 'cputime.user': 2.55, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 12), 'runtime': 3.06034}, 457L: {'cputime.system': 0, 'cputime.user': 1.299999999999983, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 9), 'runtime': 1.59818}, 458L: {'cputime.system': 0, 'cputime.user': 0.9499999999999957, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 9), 'runtime': 1.67393}, 459L: {'cputime.system': 0, 'cputime.user': 0.6499999999999999, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 9), 'runtime': 1.54002}, 460L: {'cputime.system': 0, 'cputime.user': 0.6500000000000057, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 10), 'runtime': 1.76928}, 461L: {'cputime.system': 0, 'cputime.user': 0.9500000000000002, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 10), 'runtime': 1.8821}, 462L: {'cputime.system': 0, 'cputime.user': 1.2750000000000004, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 10), 'runtime': 1.57101}, 463L: {'cputime.system': 0, 'cputime.user': 0.625, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 10), 'runtime': 1.85145}, 464L: {'cputime.system': 0, 'cputime.user': 0.9499999999999957, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 10), 'runtime': 1.57632}, 465L: {'cputime.system': 0, 'cputime.user': 0.625, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 11), 'runtime': 1.60546}, 466L: {'cputime.system': 0, 'cputime.user': 0.9750000000000014, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 11), 'runtime': 1.47648}, 467L: {'cputime.system': 0, 'cputime.user': 0.9500000000000002, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 11), 'runtime': 1.71701}, 468L: {'cputime.system': 0, 'cputime.user': 0.6499999999999986, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 11), 'runtime': 1.49436}, 469L: {'cputime.system': 0, 'cputime.user': 1.2749999999999986, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 12), 'runtime': 1.66813}, 470L: {'cputime.system': 0.3250000000000002, 'cputime.user': 0.9500000000000002, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 12), 'runtime': 1.70157}, 471L: {'cputime.system': 0, 'cputime.user': 0.6499999999999999, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 12), 'runtime': 1.50946}, 472L: {'cputime.system': 0, 'cputime.user': 0.9749999999999996, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 12), 'runtime': 1.78406}, 473L: {'cputime.system': 0, 'cputime.user': 0.6499999999999986, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 13), 'runtime': 1.98973}, 474L: {'cputime.system': 0, 'cputime.user': 0.9750000000000227, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 13), 'runtime': 1.41618}, 475L: {'cputime.system': 0, 'cputime.user': 0.9500000000000011, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 13), 'runtime': 1.61929}, 476L: {'cputime.system': 0, 'cputime.user': 0.625, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 13), 'runtime': 1.43119}, 477L: {'cputime.system': 0, 'cputime.user': 1.2999999999999998, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 14), 'runtime': 1.567}, 478L: {'cputime.system': 0, 'cputime.user': 0.9499999999999993, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 14), 'runtime': 1.6739}, 479L: {'cputime.system': 0, 'cputime.user': 1.2749999999999986, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 14), 'runtime': 1.79159}, 480L: {'cputime.system': 0.3250000000000002, 'cputime.user': 0.6499999999999995, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 14), 'runtime': 1.70314}, 481L: {'cputime.system': 0, 'cputime.user': 0.9500000000000002, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 14), 'runtime': 1.70992}, 482L: {'cputime.system': 0.2999999999999998, 'cputime.user': 1.5999999999999996, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 14), 'runtime': 1.76887}, 483L: {'cputime.system': 2.225, 'cputime.user': 2.225, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 18), 'runtime': 3.32801}, 484L: {'cputime.system': 0.3250000000000002, 'cputime.user': 0.6500000000000004, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 14), 'runtime': 1.6893}, 485L: {'cputime.system': 0.3250000000000002, 'cputime.user': 0.625, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 15), 'runtime': 1.56225}, 486L: {'cputime.system': 0, 'cputime.user': 0.625, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 15), 'runtime': 1.71709}, 487L: {'cputime.system': 0, 'cputime.user': 0.6500000000000004, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 16), 'runtime': 1.89705}, 488L: {'cputime.system': 0, 'cputime.user': 0.625, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 15), 'runtime': 1.64788}, 489L: {'cputime.system': 0, 'cputime.user': 1.3000000000000043, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 16), 'runtime': 1.59969}, 490L: {'cputime.system': 0, 'cputime.user': 0.9750000000000014, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 16), 'runtime': 1.64547}, 491L: {'cputime.system': 0, 'cputime.user': 0.32499999999999973, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 16), 'runtime': 1.51186}, 492L: {'cputime.system': 0.3249999999999993, 'cputime.user': 0.9750000000000014, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 16), 'runtime': 1.69827}, 493L: {'cputime.system': 0.6500000000000004, 'cputime.user': 0.9500000000000011, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 16), 'runtime': 1.68349}, 494L: {'cputime.system': 0, 'cputime.user': 0.9500000000000011, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 17), 'runtime': 1.74946}, 495L: {'cputime.system': 0, 'cputime.user': 0, 'created': datetime.datetime(2013, 4, 4, 3, 1, 44), 'finished': datetime.datetime(2013, 4, 4, 3, 9, 16), 'runtime': 0.195143}, 496L: {'cputime.system': 2.55, 'cputime.user': 11.2, 'created': datetime.datetime(2013, 4, 4, 5, 21, 18), 'finished': datetime.datetime(2013, 4, 4, 5, 22, 39), 'runtime': 42.0444}, 497L: {'cputime.system': 1.9249999999999998, 'cputime.user': 2.225, 'created': datetime.datetime(2013, 4, 4, 5, 21, 18), 'finished': datetime.datetime(2013, 4, 4, 5, 22, 46), 'runtime': 4.60588}}
from collections import Counter
jobs_counter= Counter()
[jobs_counter.update(dict([(k, v[k]) for k in ('cputime.system', 'cputime.user', 'runtime')])) for v in jobs_info.values()]
jobs_counter
#print (jobs_counter['cputime.user'] + jobs_counter['cputime.system']), (jobs_counter['cputime.user'] + jobs_counter['cputime.system'])/3600. * 0.05
print jobs_counter['runtime'], (jobs_counter['runtime'])/3600. * 0.05
1509.430293 0.020964309625
# maybe use pickle to serialize results
import pickle
s = pickle.loads(pickle.dumps(dict(zip([seg_id for (seg_id, jid, status) in tally], result))))
# http://docs.picloud.com/moduledoc.html#module-cloud
jobs_info = cloud.info(list(islice(chain(segment_jids, retries_jids),None)),
info_requested=['created', 'finished', 'runtime', 'cputime']
)
from matplotlib import pyplot as plt
started = [{'jid':k, 'time':v['finished'] - datetime.timedelta(seconds=v['runtime']), 'count': 1} for (k,v) in jobs_info.items()]
finished = [{'jid':k, 'time':v['finished'], 'count': -1} for (k,v) in jobs_info.items()]
df = DataFrame(started + finished)
exclude_n = 4
plot(df.sort_index(by='time')['time'][:-exclude_n], df.sort_index(by='time')['count'].cumsum()[:-exclude_n])
[<matplotlib.lines.Line2D at 0x8226ad0>]
run jobs locally using cloud.mp
# http://docs.picloud.com/cloud_cloudmp.html
USE_LOCAL = False
if USE_LOCAL:
CLOUD = cloud.mp
else:
CLOUD = cloud
# try setting n_tasks to something less than # of all segments to test out code
n_tasks = len(valid_segments)
jids = CLOUD.map(segment_stats2, valid_segments[:n_tasks], [None]*n_tasks, _env='Working_with_Open_Data')
jids
xrange(1789, 1966)
CLOUD.status(jids)[:5]
['done', 'processing', 'done', 'done', 'done']
jobs_info = CLOUD.info(jids,
info_requested=['created', 'finished', 'runtime', 'cputime']
)
from collections import Counter
jobs_counter= Counter()
[jobs_counter.update(dict([(k, v[k]) for k in ('cputime.system', 'cputime.user', 'runtime')])) for v in jobs_info.values()]
jobs_counter
#print (jobs_counter['cputime.user'] + jobs_counter['cputime.system']), (jobs_counter['cputime.user'] + jobs_counter['cputime.system'])/3600. * 0.05
print "total runtime (s): ", jobs_counter['runtime'], "estimated cost: ", (jobs_counter['runtime'])/3600. * 0.05
total runtime (s): 1336.6622 estimated cost: 0.0185647527778
# plot # cores running vs time
started = [{'jid':k, 'time':v['finished'] - datetime.timedelta(seconds=v['runtime']), 'count': 1} for (k,v) in jobs_info.items()]
finished = [{'jid':k, 'time':v['finished'], 'count': -1} for (k,v) in jobs_info.items()]
df = DataFrame(started + finished)
plot(df.sort_index(by='time')['time'], df.sort_index(by='time')['count'].cumsum())
[<matplotlib.lines.Line2D at 0x8428a10>]
byte_counter, file_counter
(Counter({'arc.gz': 71106384571350L, 'metadata': 11010558690874L, 'textData': 6978342039325L, 'other': 1626219, 'success': 0}), Counter({'arc.gz': 856589, 'textData': 341525, 'metadata': 341517, 'success': 71, 'other': 17}))
# http://stackoverflow.com/a/1823101/7782
import locale
locale.setlocale(locale.LC_ALL, 'en_US')
locale.format("%d", byte_counter['arc.gz'], grouping=True)
'71,106,384,571,350'