Lesson 8

How to pull data from a microsoft sql database

In [1]:
# Import libraries
from pandas import DataFrame
import pandas as pd
from sqlalchemy import create_engine, MetaData, Table, select
In [2]:
print 'Pandas version: ' + pd.__version__
Pandas version: 0.11.0

Grab Data from SQL

In this section we use the sqlalchemy library to grab data from a sql database. Make sure to use your own ServerName, Database, TableName.

In [3]:
# Parameters
ServerName = "RepSer2"
Database = "BizIntel"
TableName = "DimDate"

# Create the connection
engine = create_engine('mssql+pyodbc://' + ServerName + '/' + Database)
conn = engine.connect()

# Required for querying tables
metadata = MetaData(conn)

# Table to query
tbl = Table(TableName, metadata, autoload=True, schema="dbo")
#tbl.create(checkfirst=True)

# Select all
sql = tbl.select()

# run sql code
result = conn.execute(sql)

# Insert to a dataframe
df = DataFrame(data=list(result), columns=result.keys())

# Close connection
conn.close()

print 'Done'
Done

Select the contents in the dataframe.

In [4]:
df
Out[4]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5479 entries, 0 to 5478
Data columns (total 16 columns):
DateSK          5479  non-null values
Date            5479  non-null values
Day             5479  non-null values
DaySuffix       5479  non-null values
DayOfWeek       5479  non-null values
DOWInMonth      5479  non-null values
DayOfYear       5479  non-null values
WeekOfYear      5479  non-null values
WeekOfMonth     5479  non-null values
Month           5479  non-null values
MonthName       5479  non-null values
Quarter         5479  non-null values
QuarterName     5479  non-null values
Year            5479  non-null values
StandardDate    5479  non-null values
HolidayText     210  non-null values
dtypes: object(16)
In [5]:
df.head()
Out[5]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 16 columns):
DateSK          5  non-null values
Date            5  non-null values
Day             5  non-null values
DaySuffix       5  non-null values
DayOfWeek       5  non-null values
DOWInMonth      5  non-null values
DayOfYear       5  non-null values
WeekOfYear      5  non-null values
WeekOfMonth     5  non-null values
Month           5  non-null values
MonthName       5  non-null values
Quarter         5  non-null values
QuarterName     5  non-null values
Year            5  non-null values
StandardDate    5  non-null values
HolidayText     1  non-null values
dtypes: object(16)

Convert to specific data types. The code below will have to be modified to match your table.

In [6]:
# Convert data types 
df.Date = df.Date.astype('datetime64')
df.StandardDate = df.StandardDate.astype('datetime64')
df.DateSK = df.DateSK.astype('int')
df.Day = df.Day.astype('int')
df.DOWInMonth = df.DOWInMonth.astype('int')
df.DayOfYear = df.DayOfYear.astype('int')
df.WeekOfYear = df.WeekOfYear.astype('int')
df.WeekOfMonth = df.WeekOfMonth.astype('int')
df.Month = df.Month.astype('int')
df.Quarter = df.Quarter.astype('int')
df.Year = df.Year.astype('int')

print 'Data Types'
print df.dtypes
Data Types
DateSK                   int32
Date            datetime64[ns]
Day                      int32
DaySuffix               object
DayOfWeek               object
DOWInMonth               int32
DayOfYear                int32
WeekOfYear               int32
WeekOfMonth              int32
Month                    int32
MonthName               object
Quarter                  int32
QuarterName             object
Year                     int32
StandardDate    datetime64[ns]
HolidayText             object
dtype: object

In [7]:
df.head()
Out[7]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 16 columns):
DateSK          5  non-null values
Date            5  non-null values
Day             5  non-null values
DaySuffix       5  non-null values
DayOfWeek       5  non-null values
DOWInMonth      5  non-null values
DayOfYear       5  non-null values
WeekOfYear      5  non-null values
WeekOfMonth     5  non-null values
Month           5  non-null values
MonthName       5  non-null values
Quarter         5  non-null values
QuarterName     5  non-null values
Year            5  non-null values
StandardDate    5  non-null values
HolidayText     1  non-null values
dtypes: datetime64[ns](2), int32(9), object(5)

Author: David Rojas LLC