# Using agate in a Jupyter notebook¶

First we import agate. Then we create an agate Table by loading data from a CSV file.

In :
import agate

table = agate.Table.from_csv('examples/realdata/ks_1033_data.csv')

print(table)

<agate.table.Table object at 0x106a929b0>


## Question 1: What was the total cost to Kansas City area counties?¶

To answer this question, we first must filter the table to only those rows which refer to a Kansas City area county.

In :
kansas_city = table.where(lambda r: r['county'] in ('JACKSON', 'CLAY', 'CASS', 'PLATTE'))

print(len(table.rows))
print(len(kansas_city.rows))

1575
15


We can then print the Sum of the costs of all those rows. (The cost column is named total_cost.)

In :
print('$%d' % kansas_city.columns['total_cost'].aggregate(agate.Sum()))  $3716


## Question 2: Which counties spent the most?¶

This question is more complicated. First we group the data by county, which gives us a TableSet named counties. A TableSet is a group of tables with the same columns.

In :
# Group by county
counties = table.group_by('county')

print(counties.keys())

('ALLEN', 'ANDERSON', 'BARTON', 'BOURBON', 'BROWN', 'BUTLER', 'CHAUTAUQUA', 'CHEROKEE', 'CHEYENNE', 'CLAY', 'COFFEY', 'COMANCHE', 'COWLEY', 'CRAWFORD', 'DECATUR', 'DICKINSON', 'DOUGLAS', 'ELLIS', 'ELLSWORTH', 'FINNEY', 'FORD', 'FRANKLIN', 'GEARY', 'PRATT', 'GRAHAM', 'GRANT', 'GRAY', 'GREELEY', 'GREENWOOD', 'HAMILTON', 'HARVEY', 'HASKELL', 'JACKSON', 'JEFFERSON', 'JOHNSON', 'KIOWA', 'LABETTE', 'LEAVENWORTH', 'LINN', 'LOGAN', 'LYON', 'MARION', 'MARSHALL', 'MCPHERSON', 'MEADE', 'MIAMI', 'MONTGOMERY', 'NEMAHA', 'NEOSHO', 'NORTON', 'OSAGE', 'OTTAWA', 'PAWNEE', 'POTTAWATOMIE', 'RAWLINS', 'RENO', 'RICE', 'RILEY', 'ROOKS', 'SALINE', 'SCOTT', 'SEDGWICK', 'SHAWNEE', 'SHERMAN', 'SMITH', 'STAFFORD', 'SUMNER', 'THOMAS', 'WABAUNSEE', 'WICHITA', 'WILSON', 'WOODSON', 'WYANDOTTE')


We then use the aggregate function to sum the total_cost column for each table in the group. The resulting values are collapsed into a new table, totals, which has a row for each county and a column named total_cost_sum containing the new total.

In :
# Aggregate totals for all counties
totals = counties.aggregate([
('total_cost', agate.Sum(), 'total_cost_sum')
])

print(totals.column_names)

('county', 'total_cost_sum')


Finally, we sort the counties by their total cost, limit the results to the top 10 and then print the results as a text bar chart.

In :
totals.order_by('total_cost_sum', reverse=True).limit(20).print_bars('county', 'total_cost_sum', width=100)

county     total_cost_sum
SEDGWICK       977,174.45 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
COFFEY         691,749.03 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
MONTGOMERY     447,581.20 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
JOHNSON        420,628.00 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
SALINE         245,450.24 ▓░░░░░░░░░░░░░░░░░░
FINNEY         171,862.20 ▓░░░░░░░░░░░░░
BROWN          145,254.96 ▓░░░░░░░░░░░
KIOWA           97,974.00 ▓░░░░░░░
WILSON          74,747.10 ▓░░░░░
FORD            70,780.00 ▓░░░░░
GREENWOOD       69,722.00 ▓░░░░░
DOUGLAS         68,069.42 ▓░░░░░
MIAMI           64,691.09 ▓░░░░░
LYON            51,236.00 ▓░░░░
HAMILTON        47,989.00 ▓░░░░
WYANDOTTE       45,259.50 ▓░░░
CHEYENNE        35,970.92 ▓░░░
PRATT           35,745.60 ▓░░░
FRANKLIN        34,309.00 ▓░░░
HARVEY          27,387.42 ▓░░
+-----------------+-----------------+------------------+-----------------+
0              250,000           500,000            750,000      1,000,000