Blaze provides a lightweight interface on top of pre-existing computational infrastructure. This notebook gives a quick overview of how Blaze interacts with a variety of data types.
from blaze import Data, by, compute
Blaze interacts with normal Python objects. Operations on Blaze Data
objects create expression trees.
These expressions deliver an intuitive numpy/pandas-like feel.
x = Data(1)
x
x.dshape
dshape("int64")
x + 1
print type(x + 1)
print type(compute(x + 1))
<class 'blaze.expr.arithmetic.Add'> <type 'int'>
Starting small, Blaze interacts happily with collections of data.
It uses Pandas for pretty notebook printing.
x = Data([1, 2, 3, 4, 5])
x
_2 | |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
3 | 4 |
4 | 5 |
x[x > 2] * 10
_2 | |
---|---|
0 | 30 |
1 | 40 |
2 | 50 |
x.dshape
dshape("5 * int64")
Slightly more exciting, Blaze operates on tabular data
L = [[1, 'Alice', 100],
[2, 'Bob', -200],
[3, 'Charlie', 300],
[4, 'Dennis', 400],
[5, 'Edith', -500]]
x = Data(L, fields=['id', 'name', 'amount'])
x.dshape
dshape("5 * {id: int64, name: string, amount: int64}")
x
id | name | amount | |
---|---|---|---|
0 | 1 | Alice | 100 |
1 | 2 | Bob | -200 |
2 | 3 | Charlie | 300 |
3 | 4 | Dennis | 400 |
4 | 5 | Edith | -500 |
deadbeats = x[x.amount < 0].name
deadbeats
name | |
---|---|
0 | Bob |
1 | Edith |
Blaze doesn't do work, it just tells other systems to do work.
In the previous example, Blaze told Python which for-loops to write. In this example, it calls the right functions in Pandas.
The user experience is identical, only performance differs.
from pandas import DataFrame
df = DataFrame([[1, 'Alice', 100],
[2, 'Bob', -200],
[3, 'Charlie', 300],
[4, 'Denis', 400],
[5, 'Edith', -500]], columns=['id', 'name', 'amount'])
df
id | name | amount | |
---|---|---|---|
0 | 1 | Alice | 100 |
1 | 2 | Bob | -200 |
2 | 3 | Charlie | 300 |
3 | 4 | Denis | 400 |
4 | 5 | Edith | -500 |
x = Data(df)
x
id | name | amount | |
---|---|---|---|
0 | 1 | Alice | 100 |
1 | 2 | Bob | -200 |
2 | 3 | Charlie | 300 |
3 | 4 | Denis | 400 |
4 | 5 | Edith | -500 |
deadbeats = x[x.amount < 0].name
deadbeats
name | |
---|---|
1 | Bob |
4 | Edith |
Calling compute
, we see that Blaze returns a thing like what it was given.
type(compute(deadbeats))
pandas.core.series.Series
Blaze extends beyond just Python and Pandas (that's the main motivation.)
Here it drives SQLAlchemy.
from sqlalchemy import Table, Column, MetaData, Integer, String, create_engine
tab = Table('bank', MetaData(),
Column('id', Integer),
Column('name', String),
Column('amount', Integer))
x = Data(tab)
x.dshape
dshape("var * {id: ?int32, name: ?string, amount: ?int32}")
Just like computations on pandas objects produce pandas objects, computations on SQLAlchemy tables produce SQLAlchemy Select statements.
deadbeats = x[x.amount < 0].name
compute(deadbeats)
<sqlalchemy.sql.selectable.Select at 0x7f2543f2fc10; Select object>
print compute(deadbeats) # SQLAlchemy generates actual SQL
SELECT bank.name FROM bank WHERE bank.amount < :amount_1
When we drive a SQLAlchemy table connected to a database we get actual computation.
engine = create_engine('sqlite:////home/mrocklin/workspace/blaze/blaze/examples/data/iris.db')
x = Data(engine)
x
x.iris
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
5 | 5.4 | 3.9 | 1.7 | 0.4 | Iris-setosa |
6 | 4.6 | 3.4 | 1.4 | 0.3 | Iris-setosa |
7 | 5.0 | 3.4 | 1.5 | 0.2 | Iris-setosa |
8 | 4.4 | 2.9 | 1.4 | 0.2 | Iris-setosa |
9 | 4.9 | 3.1 | 1.5 | 0.1 | Iris-setosa |
10 | 5.4 | 3.7 | 1.5 | 0.2 | Iris-setosa |
by(x.iris.species, shortest=x.iris.sepal_length.min(),
longest=x.iris.sepal_length.max())
species | longest | shortest | |
---|---|---|---|
0 | Iris-setosa | 5.8 | 4.3 |
1 | Iris-versicolor | 7.0 | 4.9 |
2 | Iris-virginica | 7.9 | 4.9 |
Often just figuring out how to produce the relevant Python object can be a challenge.
Blaze supports many formats of URI strings
x = Data('sqlite:////home/mrocklin/workspace/blaze/blaze/examples/data/iris.db::iris')
x
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
5 | 5.4 | 3.9 | 1.7 | 0.4 | Iris-setosa |
6 | 4.6 | 3.4 | 1.4 | 0.3 | Iris-setosa |
7 | 5.0 | 3.4 | 1.5 | 0.2 | Iris-setosa |
8 | 4.4 | 2.9 | 1.4 | 0.2 | Iris-setosa |
9 | 4.9 | 3.1 | 1.5 | 0.1 | Iris-setosa |
10 | 5.4 | 3.7 | 1.5 | 0.2 | Iris-setosa |
x = Data('impala://ec2-54-90-201-28.compute-1.amazonaws.com')
Github's database is mirrored in a Mongo collection hosted in the Netherlands.
Connecting via ssh tunnel. See http://ghtorrent.org/ to obtain access.
users = Data('mongodb://ghtorrentro:ghtorrentro@localhost/github::users')
users
avatar_url | bio | blog | company | created_at | followers | following | gravatar_id | hireable | html_url | id | location | login | name | public_gists | public_repos | type | url | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | https://secure.gravatar.com/avatar/a7e55f31bb4... | None | None | None | 2012-05-04T13:59:54Z | None | 0 | 0 | a7e55f31bb45321f30211e901cd89ffa | None | https://github.com/Michaelwussler | 1706010 | None | Michaelwussler | None | 0 | 3 | User | https://api.github.com/users/Michaelwussler |
1 | https://secure.gravatar.com/avatar/eb8139078bc... | None | None | None | 2012-05-03T18:47:13Z | None | 0 | 0 | eb8139078bc623dee103ed3917c080dc | None | https://github.com/praiser | 1703505 | None | praiser | None | 0 | 3 | User | https://api.github.com/users/praiser |
2 | https://secure.gravatar.com/avatar/13c7b665e0c... | None | 2010-04-07T12:15:00Z | vad.viktor@gmail.com | 2 | 3 | 13c7b665e0cbd94e0155387c35957d13 | False | https://github.com/vadviktor | 238703 | Budapest | vadviktor | Vad Viktor | 0 | 10 | User | https://api.github.com/users/vadviktor | ||
3 | https://secure.gravatar.com/avatar/b7937805411... | None | Appcelerator | 2012-04-02T16:13:58Z | yjin@appcelerator.com | 0 | 0 | b7937805411d278ceb839175e251e2a0 | False | https://github.com/ypjin | 1598831 | Beijing | ypjin | Yuping | 0 | 5 | User | https://api.github.com/users/ypjin | |
4 | https://secure.gravatar.com/avatar/89e109fca84... | http://blogs.perl.org/users/steven_haryanto | - | 2010-02-26T01:28:09Z | stevenharyanto@gmail.com | 39 | 307 | 89e109fca8474e5636c9feef7a8422ea | False | https://github.com/sharyanto | 211084 | Jakarta, Indonesia | sharyanto | Steven Haryanto | 5 | 195 | User | https://api.github.com/users/sharyanto | |
5 | https://secure.gravatar.com/avatar/7490b4e3e9c... | Perl, C, C++, JavaScript, PHP, Haskell, Ruby, ... | http://c9s.me | 2009-02-01T15:20:08Z | cornelius.howl@gmail.com | 330 | 599 | 7490b4e3e9cb85a1f7dc0c8ea01a86e5 | True | https://github.com/c9s | 50894 | Taipei, Taiwan | c9s | Yo-An Lin | 281 | 206 | User | https://api.github.com/users/c9s | |
6 | https://secure.gravatar.com/avatar/dc078ac4dbd... | None | azhari.harahap.us | CapungRiders | 2010-10-31T05:53:40Z | azhari@harahap.us | 26 | 11 | dc078ac4dbdc06d3e3c0ec0b6801b53d | False | https://github.com/back2arie | 461397 | Indonesia | back2arie | Azhari Harahap | 1 | 15 | User | https://api.github.com/users/back2arie |
7 | https://secure.gravatar.com/avatar/fb844ffed6c... | Git Ninja and language-agnostic problem solver... | http://dukeleto.pl | Leto Labs LLC | 2008-10-22T03:02:15Z | jonathan@leto.net | 175 | 635 | fb844ffed6c5a2e69638627e3b721308 | True | https://github.com/leto | 30298 | Portland, OR | leto | Jonathan "Duke" Leto | 276 | 112 | User | https://api.github.com/users/leto |
8 | https://secure.gravatar.com/avatar/3843ec7861e... | http://alanhaggai.org/ | Thought Ripples | 2009-01-13T16:25:15Z | haggai@cpan.org | 46 | 365 | 3843ec7861e271e803ea076035d683dd | False | https://github.com/alanhaggai | 46288 | IN | alanhaggai | Alan Haggai Alavi | 4 | 54 | User | https://api.github.com/users/alanhaggai | |
9 | https://secure.gravatar.com/avatar/f611628c558... | None | arisdottle.net | Team Rooster Pirates | 2009-05-12T19:29:09Z | amiri@roosterpirates.com | 16 | 87 | f611628c5588f7a0a72c65ec1f94dfb8 | False | https://github.com/amiri | 83806 | Los Angeles, CA | amiri | Amiri Barksdale | 16 | 18 | User | https://api.github.com/users/amiri |
10 | https://secure.gravatar.com/avatar/c57483c5cfe... | None | http://www.geekfarm.org/wu/muse/WebHome.html | None | 2009-02-08T03:28:54Z | git-c@geekfarm.org | 16 | 87 | c57483c5cfe159b98a6e33ee7e9eec38 | False | https://github.com/wu | 52700 | None | wu | Alex White | 0 | 15 | User | https://api.github.com/users/wu |
import h5py
f = h5py.File('/home/mrocklin/Downloads/OMI-Aura_L2-OMAERO_2014m1105t2304-o54838_v003-2014m1106t215558.he5')
x = Data(f)
x.dshape
dshape("""{ HDFEOS: { ADDITIONAL: {FILE_ATTRIBUTES: {}}, SWATHS: { ColumnAmountAerosol: { Data Fields: { AerosolIndexUV: 1643 * 60 * int16, AerosolIndexVIS: 1643 * 60 * int16, AerosolModelMW: 1643 * 60 * uint16, AerosolModelsPassedThreshold: 1643 * 60 * 10 * uint16, AerosolOpticalThicknessMW: 1643 * 60 * 14 * int16, AerosolOpticalThicknessMWPrecision: 1643 * 60 * int16, AerosolOpticalThicknessNUV: 1643 * 60 * 2 * int16, AerosolOpticalThicknessPassedThreshold: 1643 * 60 * 10 * 9 * int16, AerosolOpticalThicknessPassedThresholdMean: 1643 * 60 * 9 * int16, AerosolOpticalThicknessPassedThresholdStd: 1643 * 60 * 9 * int16, CloudFlags: 1643 * 60 * uint8, CloudPressure: 1643 * 60 * int16, EffectiveCloudFraction: 1643 * 60 * int8, InstrumentConfigurationId: 1643 * uint8, MeasurementQualityFlags: 1643 * uint8, NumberOfModelsPassedThreshold: 1643 * 60 * uint8, ProcessingQualityFlagsMW: 1643 * 60 * uint16, ProcessingQualityFlagsNUV: 1643 * 60 * uint16, RootMeanSquareErrorOfFitPassedThreshold: 1643 * 60 * 10 * int16, SingleScatteringAlbedoMW: 1643 * 60 * 14 * int16, SingleScatteringAlbedoMWPrecision: 1643 * 60 * int16, SingleScatteringAlbedoNUV: 1643 * 60 * 2 * int16, SingleScatteringAlbedoPassedThreshold: 1643 * 60 * 10 * 9 * int16, SingleScatteringAlbedoPassedThresholdMean: 1643 * 60 * 9 * int16, SingleScatteringAlbedoPassedThresholdStd: 1643 * 60 * 9 * int16, SmallPixelRadiancePointerUV: 1643 * 2 * int16, SmallPixelRadiancePointerVIS: 1643 * 2 * int16, SmallPixelRadianceUV: 6783 * 60 * float32, SmallPixelRadianceVIS: 6786 * 60 * float32, SmallPixelWavelengthUV: 6783 * 60 * uint16, SmallPixelWavelengthVIS: 6786 * 60 * uint16, TerrainPressure: 1643 * 60 * int16, TerrainReflectivity: 1643 * 60 * 9 * int16, XTrackQualityFlags: 1643 * 60 * uint8 }, Geolocation Fields: { GroundPixelQualityFlags: 1643 * 60 * uint16, Latitude: 1643 * 60 * float32, Longitude: 1643 * 60 * float32, OrbitPhase: 1643 * float32, SolarAzimuthAngle: 1643 * 60 * float32, SolarZenithAngle: 1643 * 60 * float32, SpacecraftAltitude: 1643 * float32, SpacecraftLatitude: 1643 * float32, SpacecraftLongitude: 1643 * float32, TerrainHeight: 1643 * 60 * int16, Time: 1643 * float64, ViewingAzimuthAngle: 1643 * 60 * float32, ViewingZenithAngle: 1643 * 60 * float32 } } } }, HDFEOS INFORMATION: { ArchiveMetadata.0: string[65535, 'A'], CoreMetadata.0: string[65535, 'A'], StructMetadata.0: string[32000, 'A'] } }""")
x.HDFEOS.SWATHS.ColumnAmountAerosol.Data_Fields.CloudPressure
x.HDFEOS.SWATHS.ColumnAmountAerosol.Data_Fields.CloudPressure.max()