Topic: Help with extremely simple reshaping of dataframe:
I'm taking advantage of this time to try and understand pandas a little better.
I'm trying to do something that seems extremely simple. I have a dataframe with this fake data about a number of people (say, participants in a study). I want to make a frequency matrix like shown below. I've tried various grouping, reindexing, and pivoting with no luck.
from pandas import DataFrame
df = DataFrame({'gender':['F','M','F','M','M','F','M'],
'major':['Biology','Chemistry','Biology','English','Biology'
,'Chemistry','English']})
df
gender | major | |
---|---|---|
0 | F | Biology |
1 | M | Chemistry |
2 | F | Biology |
3 | M | English |
4 | M | Biology |
5 | F | Chemistry |
6 | M | English |
7 rows × 2 columns
df.groupby(['gender','major']).agg('count').unstack()['gender'].fillna(0)
major | Biology | Chemistry | English |
---|---|---|---|
gender | |||
F | 2 | 1 | 0 |
M | 1 | 1 | 2 |
2 rows × 3 columns
df.pivot_table(rows='gender',cols='major', aggfunc='count')
gender | major | |||||
---|---|---|---|---|---|---|
major | Biology | Chemistry | English | Biology | Chemistry | English |
gender | ||||||
F | 2 | 1 | NaN | 2 | 1 | NaN |
M | 1 | 1 | 2 | 1 | 1 | 2 |
2 rows × 6 columns
df.pivot_table(rows='gender',cols='major', aggfunc='count')['gender']
major | Biology | Chemistry | English |
---|---|---|---|
gender | |||
F | 2 | 1 | NaN |
M | 1 | 1 | 2 |
2 rows × 3 columns
df.pivot_table(rows='gender',cols='major', aggfunc='count')['gender'].fillna(0)
major | Biology | Chemistry | English |
---|---|---|---|
gender | |||
F | 2 | 1 | 0 |
M | 1 | 1 | 2 |
2 rows × 3 columns
# maybe a clearer solution if you use 'values' --
# which can be either 'major' or 'gender' in this case
df.pivot_table(values='major', rows='gender',cols='major', aggfunc='count').fillna(0)
major | Biology | Chemistry | English |
---|---|---|---|
gender | |||
F | 2 | 1 | 0 |
M | 1 | 1 | 2 |
2 rows × 3 columns