This notebook describes a nearly fully automated process for configuring a MySQL database with the Daily Global Weather Measurements data on Amazon EC2. The original data is available as a collection of text files on an Amazon EBS volume snapshot (snap-ac47f4c5), as described here:
http://aws.amazon.com/datasets/2759
Before running this script, three files are required:
rootkey.csv : containing the AWSAccessKeyId and AWSSecretKey values for the AWS account being used
passwords.csv : containing a ssh key name, an access password for the EC2 instance, an access password for the MySQL database, and the user's IP address.
populateDB.py : a python file that will be sent to the EC2 instance to run one of the data loading processes. The text of this file is copied below, in the notes just before In[34].
A security group must also be set up in AWS prior to running this script, with the expected name of 'onlyMyIP' (see In[5] below).
The notebook also requires three Python libraries beyond the standard installation:
Finally, this notebook sadly cannot be run successfully via the Run All option. Instead, do the following:
scroll down to the heading "--- Manual Modification for Require TTY ---", select that cell
from there, select Run All Above from the Cell menu, wait for completion (5-10 minutes)
follow the instructions directly below that heading in order to change the EC2 instance to not require a TTY connection (which will then let us use Paramiko for scripted ssh connections to the machine)
then select Run All Below from the Cell menu, and the rest of the script will run to completion
The full script should take about 8-10 hours to run, but requires very few resources on the local machine (most of the work is done on the remote EC2 machine).
Libraries: boto is used for AWS interaction, Paramiko for ssh, and Pandas is only used here for nice table outputs in IPython (could thus be cut out if desired).
import sys, os, time
import paramiko as pm
import boto as bt
import pandas as pd
Access keys and passwords: in external files, formatted as lines of "[name] = [value]", as per the AWS rootkey.csv download.
AWSAccessKeyId, AWSSecretKey = ( line.strip().split('=')[1] for line in open('/Users/brian/rootkey.csv','r') )
sshKeyName, instancePass, mysqlPass, myIP = ( line.strip().split('=')[1] for line in open('/Users/brian/passwords.csv','r') )
Convenience methods: just one in this case - waitUntilReady is used throughout to wait for completed status, particularly when interacting with AWS.
def waitUntilReady(obj,desiredStatus):
while obj.update() != desiredStatus:
sys.stdout.write(".")
time.sleep(1)
Establish a connection to EC2.
ec2 = bt.connect_ec2( aws_access_key_id = AWSAccessKeyId,
aws_secret_access_key = AWSSecretKey )
Start an EC2 instance based on the basic Amazon Linux AMI.
res = ec2.run_instances( 'ami-146e2a7c', # "Amazon Linux AMI"
key_name = sshKeyName,
instance_type = 'r3.large',
security_groups = ['onlyMyIP'], # previously-configured security group
placement = 'us-east-1a' ) # region
allInstances = res.instances
instance = allInstances[0]
Start two EBS volumes, one empty, the other based on the weather data (snapshot 'snap-ac47f4c5').
mysqlVolume = ec2.create_volume(size=100, zone='us-east-1a', snapshot=None, volume_type='gp2')
givenDataVolume = ec2.create_volume(size=20, zone='us-east-1a', snapshot='snap-ac47f4c5', volume_type='gp2')
Wait until the instance and volumes are ready.
time.sleep(10) # to make sure the instance is registered before checking if it is ready
waitUntilReady(instance,'running')
waitUntilReady(mysqlVolume,'available')
waitUntilReady(givenDataVolume,'available')
instance, mysqlVolume, givenDataVolume
......
(Instance:i-088a96f2, Volume:vol-c39b2d89, Volume:vol-a69a2cec)
Attach the volumes to the instance.
att1 = ec2.attach_volume(mysqlVolume.id, instance.id, '/dev/xvdc')
att1 = ec2.attach_volume(givenDataVolume.id, instance.id, '/dev/xvdb')
Wait until they are properly attached.
waitUntilReady(mysqlVolume,'in-use')
waitUntilReady(givenDataVolume,'in-use')
At this point, an unfortunate thing needs to happen:
This will let us use Paramiko to run sudo commands without requiring a tty. Pain in the butt. See the following links for discussion of this issue:
The following methods will be used hereafter for sending commands and transfering files to the EC2 instance via SSH.
def establishSSHconnection():
rsaKey = pm.RSAKey.from_private_key_file(sshKeyName + '.pem')
ssh = pm.SSHClient()
ssh.set_missing_host_key_policy(pm.AutoAddPolicy())
totalTime = 0
connected = False
while not connected:
try:
ssh.connect( str(instance.public_dns_name),
username = 'ec2-user',
pkey = rsaKey )
connected = True
except Exception as e:
sys.stdout.write(".")
time.sleep(1)
totalTime += 1
if totalTime >= 300:
sys.stdout.write( str(instance.public_dns_name) + ': ' +
"Timed out waiting for instance to get ssh ready\n" )
break
return ssh
def prettyOut(obj):
l = [r.strip('\n').split('\t') for r in obj.readlines()]
if len(l) > 1:
df = pd.DataFrame(l[1:len(l)])
df.columns = l[0]
else:
df = pd.DataFrame(l)
return df
def sshCommandWait(cmd,pretty=False):
ssh = establishSSHconnection()
stdin, stdout, stderr = ssh.exec_command(cmd)
if pretty:
stdout = prettyOut(stdout)
stderr = prettyOut(stderr)
else:
stdout = stdout.readlines()
stderr = stderr.readlines()
ssh.close()
return stdout, stderr
def sshCommandNoWait(cmd):
ssh = establishSSHconnection()
ssh.exec_command(cmd)
ssh.close()
def sftpSendFile(localFile,remoteFile):
ssh = establishSSHconnection()
ftp = ssh.open_sftp()
ftp.put(localFile,remoteFile)
ftp.close()
ssh.close()
def sftpGetFile(remoteFile,localFile):
ssh = establishSSHconnection()
ftp = ssh.open_sftp()
try:
ftp.get(remoteFile,localFile)
success = True
except:
success = False
ftp.close()
ssh.close()
return success
A quick test of the SSH command method:
o,e = sshCommandWait('sudo echo hello')
o,e
([u'hello\n'], [])
View the currently mounted volumes:
o,e = sshCommandWait('df -h')
o
[u'Filesystem Size Used Avail Use% Mounted on\n', u'/dev/xvda1 7.8G 1.1G 6.6G 14% /\n', u'devtmpfs 7.5G 64K 7.5G 1% /dev\n', u'tmpfs 7.5G 0 7.5G 0% /dev/shm\n']
View all available volumes before starting the mounting process:
o,e = sshCommandWait('lsblk')
o
[u'NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT\n', u'xvda 202:0 0 8G 0 disk \n', u'\u2514\u2500xvda1 202:1 0 8G 0 part /\n', u'xvdb 202:16 0 20G 0 disk \n', u'xvdc 202:32 0 100G 0 disk \n']
Format the new volume (only the new one, not the given data one):
o,e = sshCommandWait('sudo mkfs -t ext4 /dev/xvdc')
Make directories on the EC2 instance and mount the volumes. The mounting of the given data volume will take a few minutes.
o,e = sshCommandWait('sudo mkdir /mnt/givenDataVolume')
o,e = sshCommandWait('sudo mkdir /mnt/mysqlVolume')
o,e = sshCommandWait('sudo mount /dev/xvdb /mnt/givenDataVolume')
o,e = sshCommandWait('sudo mount /dev/xvdc /mnt/mysqlVolume')
View list of mounted volumes again to be sure that the new volumes are mounted correctly.
o,e = sshCommandWait('df -h')
o
[u'Filesystem Size Used Avail Use% Mounted on\n', u'/dev/xvda1 7.8G 1.1G 6.6G 14% /\n', u'devtmpfs 7.5G 64K 7.5G 1% /dev\n', u'tmpfs 7.5G 0 7.5G 0% /dev/shm\n', u'/dev/xvdb 20G 17G 2.3G 88% /mnt/givenDataVolume\n', u'/dev/xvdc 99G 61M 94G 1% /mnt/mysqlVolume\n']
The Amazon Linux AMI has repos set up for mysql and mysql-server - we just need to update those repos and run the installers.
o,e = sshCommandWait('sudo yum -y update')
o,e = sshCommandWait('sudo yum -y install mysql')
o,e = sshCommandWait('sudo mkdir /var/lib/mysql')
o,e = sshCommandWait('sudo yum -y install mysql-server')
To use the attached volume for the database instead of the default folder, set up the necessary folders and permissions, and then change the data directory and the tmp directory in the MySQL configuration file.
o,e = sshCommandWait('sudo mkdir -p /mnt/mysqlVolume/data')
o,e = sshCommandWait('sudo mkdir -p /mnt/mysqlVolume/tmp')
o,e = sshCommandWait('sudo chown -R mysql.mysql /mnt/mysqlVolume/data')
o,e = sshCommandWait('sudo chown -R mysql.mysql /mnt/mysqlVolume/tmp')
cmd = 'sudo sed -i '
cmd += '"s/datadir\=\/var\/lib\/mysql/datadir\=\/mnt\/mysqlVolume\/data\\ntmpdir\=\/mnt\/mysqlVolume\/tmp/g" '
cmd += '/etc/my.cnf'
o,e = sshCommandWait(cmd)
o,e
([], [])
Start MySQL
o,e = sshCommandWait('sudo /etc/init.d/mysqld start')
Set root password for MySQL, and configure it to run on startup:
o,e = sshCommandWait('mysqladmin -u root password "' + mysqlPass + '"')
o,e = sshCommandWait('sudo chkconfig mysqld on')
A quick test of the server setup:
o,e = sshCommandWait('mysql -e "SHOW DATABASES;" -u root --password=' + mysqlPass, True)
o
Database | |
---|---|
0 | information_schema |
1 | mysql |
2 | performance_schema |
3 | test |
Add a database named 'weather' to MySQL:
o,e = sshCommandWait('mysql -e "CREATE DATABASE weather;" -u root --password=' + mysqlPass)
Create 3 tables in the database - 'country', 'station' and 'observation'. The first two will be small, the last one very large.
This will hold the simple relationship between an ID and a country name, from the 'country_list.txt' file in the top-level folder of the given data directory.
sqlcommand = 'CREATE TABLE country '
sqlcommand += '( FIPS_ID VARCHAR(2) PRIMARY KEY, country_name VARCHAR(47) ) '
sqlcommand += ';'
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather')
sqlcommand = 'DESCRIBE country;'
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather', True)
o
Field | Type | Null | Key | Default | Extra | |
---|---|---|---|---|---|---|
0 | FIPS_ID | varchar(2) | NO | PRI | NULL | |
1 | country_name | varchar(47) | YES | NULL |
This table will hold some meta-data about the stations, including name, stationID, countryID, latitude, longitude and elevation. The data will come from the 'ish-history.csv' file in the top-level folder of the given data directory.
sqlcommand = 'CREATE TABLE station '
sqlcommand += '( USAF VARCHAR(6) PRIMARY KEY, WBAN VARCHAR(6), '
sqlcommand += ' station_name VARCHAR(30), CTRY VARCHAR(2), FIPS_ID VARCHAR(2), ST VARCHAR(2), callid VARCHAR(4), '
sqlcommand += ' lat INT, lon INT, elev INT ) '
sqlcommand += ';'
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather')
sqlcommand = 'DESCRIBE station;'
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather', True)
o
Field | Type | Null | Key | Default | Extra | |
---|---|---|---|---|---|---|
0 | USAF | varchar(6) | NO | PRI | NULL | |
1 | WBAN | varchar(6) | YES | NULL | ||
2 | station_name | varchar(30) | YES | NULL | ||
3 | CTRY | varchar(2) | YES | NULL | ||
4 | FIPS_ID | varchar(2) | YES | NULL | ||
5 | ST | varchar(2) | YES | NULL | ||
6 | callid | varchar(4) | YES | NULL | ||
7 | lat | int(11) | YES | NULL | ||
8 | lon | int(11) | YES | NULL | ||
9 | elev | int(11) | YES | NULL |
This table will hold the weather data for each station (and is by far the largest of the three tables). The data will come from the fixed-width text files held in a large number of folders on the data volume. Note that in many of the fields in this table, the values will be stored as integers representing tenths of the real value (e.g. a value of 320 in the temp column is used to represent 32.0 degrees F).
sqlcommand = 'CREATE TABLE observation '
sqlcommand += '( STNYMD VARCHAR(14) PRIMARY KEY, '
sqlcommand += ' STN VARCHAR(6), WBAN VARCHAR(6), year INT, month INT, day INT, '
sqlcommand += ' temp INT, dewp INT, slp INT, stp INT, visib INT, '
sqlcommand += ' wdsp INT, mxspd INT, gust INT, maxtemp INT, mintemp INT, '
sqlcommand += ' prcp INT, sndp INT, FRSHTT VARCHAR(6) ) '
sqlcommand += ';'
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather')
sqlcommand = 'DESCRIBE observation;'
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather', True)
o
Field | Type | Null | Key | Default | Extra | |
---|---|---|---|---|---|---|
0 | STNYMD | varchar(14) | NO | PRI | NULL | |
1 | STN | varchar(6) | YES | NULL | ||
2 | WBAN | varchar(6) | YES | NULL | ||
3 | year | int(11) | YES | NULL | ||
4 | month | int(11) | YES | NULL | ||
5 | day | int(11) | YES | NULL | ||
6 | temp | int(11) | YES | NULL | ||
7 | dewp | int(11) | YES | NULL | ||
8 | slp | int(11) | YES | NULL | ||
9 | stp | int(11) | YES | NULL | ||
10 | visib | int(11) | YES | NULL | ||
11 | wdsp | int(11) | YES | NULL | ||
12 | mxspd | int(11) | YES | NULL | ||
13 | gust | int(11) | YES | NULL | ||
14 | maxtemp | int(11) | YES | NULL | ||
15 | mintemp | int(11) | YES | NULL | ||
16 | prcp | int(11) | YES | NULL | ||
17 | sndp | int(11) | YES | NULL | ||
18 | FRSHTT | varchar(6) | YES | NULL |
Populating the 'country' table requires just a simple file read.
sqlcommand = 'LOAD DATA LOCAL INFILE \'\/mnt\/givenDataVolume\/country-list.txt\' '
sqlcommand += 'INTO TABLE country '
sqlcommand += 'FIELDS TERMINATED BY \' \' '
sqlcommand += 'LINES TERMINATED BY \'\n\' '
sqlcommand += 'IGNORE 2 LINES '
sqlcommand += '; '
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather --local-infile')
sqlcommand = 'SELECT * FROM country LIMIT 5;'
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather', True)
o
FIPS_ID | country_name | |
---|---|---|
0 | AA | ARUBA |
1 | AC | ANTIGUA AND BARBUDA |
2 | AF | AFGHANISTAN |
3 | AG | ALGERIA |
4 | AI | ASCENSION ISLAND |
Populating the 'station' table also requires just a simple file read.
sqlcommand = 'LOAD DATA LOCAL INFILE \'\/mnt\/givenDataVolume\/ish-history.csv\' '
sqlcommand += 'INTO TABLE station '
sqlcommand += 'FIELDS TERMINATED BY \',\' ENCLOSED BY \'\\"\''
sqlcommand += 'LINES TERMINATED BY \'\n\' '
sqlcommand += 'IGNORE 1 LINES '
sqlcommand += '; '
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather --local-infile')
sqlcommand = 'SELECT * FROM station LIMIT 5;'
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather', True)
o
USAF | WBAN | station_name | CTRY | FIPS_ID | ST | callid | lat | lon | elev | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 000000 | 99999 | NYGGBUKTA GREENLAND- STA | GL | GL | 73483 | 21567 | 30 | ||
1 | 000010 | 99999 | JAN HAYEN | NO | NO | 70983 | -7700 | 229 | ||
2 | 000020 | 99999 | ISFJORD RADIO SPITZBERGEN | NO | NO | 78067 | 13633 | 79 | ||
3 | 000030 | 99999 | BJORNOYA BARENTS SEA | NO | NO | 74467 | 19283 | 290 | ||
4 | 000040 | 99999 | VAROO | NO | NO | 70367 | 31100 | 119 |
Populating the 'observation' table is much more complicated and time-consuming. The data is stored in multiple files in multiple folders:
gsod/[year]/[filename_by_station_ID]
In some cases the file has .op as file extension, sometimes it has no file extension. These files are also in fixed-width format, which LOAD DATA doesn't like, and they have some extra text in them that we do not want. So the following Python script is used to first process each file to make a csv file in the form we want, then to upload it to the data table.
import os, sys
mysqlPass = sys.argv[1]
given_path = "/mnt/givenDataVolume/gsod/"
mysql_path = "/mnt/mysqlVolume/data/"
fixed_widths = [[1,6],[8,12],[15,18],[19,20],[21,22],[25,30],[36,41],[47,52],[58,63],
[69,73],[79,83],[89,93],[96,100],[103,108],[111,116],[119,123],[126,130],
[133,138]]
with open('log.txt','w') as logfile:
for yr in range(1929,2010):
files = []
for (dirpath, dirnames, filenames) in os.walk(given_path + str(yr) + '/'):
files.extend(filenames)
break
for f in files:
logfile.write('year # ' + str(yr) + ', filename ' + f + '\n')
# read file as fixed width, write as csv
with open(given_path + str(yr) + '/' + f,'r') as fi:
with open(mysql_path + f.replace(".op","") + '.csv','w') as fo:
for li in fi:
if li[0:3] != 'STN':
# primary key = concat(stn,yr,mo,da)
fo.write(li[0:6] + li[14:22] + ',')
# remaining fields
for k, w in enumerate(fixed_widths):
if k < len(fixed_widths) - 1:
if k > 4:
fo.write( str(int(float(li[w[0]-1:w[1]])*10)) + ',')
else:
fo.write( li[w[0]-1:w[1]] + ',')
else:
fo.write( li[w[0]-1:w[1]] + '\n')
# load text file to mysql
logfile.write("loading to mysql" + '\n')
cmd = 'mysql -e '
cmd += '"LOAD DATA LOCAL INFILE \'' + mysql_path + f.replace(".op","") + '.csv' + '\' '
cmd += 'INTO TABLE observation FIELDS TERMINATED BY \',\' '
cmd += 'LINES TERMINATED BY \'\n\' '
cmd += ';" '
cmd += '-u root --password=' + mysqlPass + ' weather --local-infile'
os.system(cmd)
# delete csv file
os.system('sudo rm ' + mysql_path + f.replace(".op","") + '.csv')
with open('end.txt','w') as f:
f.write('complete')
The script above should be copied to a file named 'populateDB.py' on the local root folder. The following command then sends it to the EC2 instance:
sftpSendFile('populateDB.py','populateDB.py')
We can now run it on the remote machine. This process may take 3-3.5 hours. Nohup and output redirection are used to allow the process to run independently of this connection to it.
sshCommandNoWait('sudo nohup python populateDB.py ' + mysqlPass + ' > proc.out 2> proc.err < /dev/null &')
fileReady = False
fillTime = 0
while not fileReady:
fileReady = sftpGetFile('end.txt','test.txt')
if not fileReady:
time.sleep(60)
fillTime += 1
if fillTime > 100*60:
break
print fillTime
203
A quick test to make sure that the populating script worked:
sqlcommand = 'SELECT * FROM observation LIMIT 5;'
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather', True)
o
STNYMD | STN | WBAN | year | month | day | temp | dewp | slp | stp | visib | wdsp | mxspd | gust | maxtemp | mintemp | prcp | sndp | FRSHTT | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 00820920090309 | 008209 | 99999 | 2009 | 3 | 9 | 782 | 710 | 99999 | 99999 | 9999 | 13 | 19 | 9999 | 806 | 716 | 0 | 9999 | 000000 |
1 | 00820920090310 | 008209 | 99999 | 2009 | 3 | 10 | 678 | 656 | 99999 | 99999 | 9999 | 12 | 41 | 9999 | 806 | 554 | 0 | 9999 | 000000 |
2 | 00820920090311 | 008209 | 99999 | 2009 | 3 | 11 | 681 | 646 | 99999 | 99999 | 9999 | 13 | 29 | 9999 | 824 | 554 | 0 | 9999 | 000000 |
3 | 00820920090312 | 008209 | 99999 | 2009 | 3 | 12 | 666 | 639 | 99999 | 99999 | 9999 | 12 | 41 | 9999 | 806 | 572 | 0 | 9999 | 000000 |
4 | 00820920090313 | 008209 | 99999 | 2009 | 3 | 13 | 669 | 622 | 99999 | 99999 | 9999 | 13 | 41 | 9999 | 806 | 554 | 0 | 9999 | 000000 |
The following shows some general information about the table. Note that the table contains approximately 109,880,661 rows.
sqlcommand = "SHOW TABLE STATUS LIKE 'observation';"
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather')
o
[u'Name\tEngine\tVersion\tRow_format\tRows\tAvg_row_length\tData_length\tMax_data_length\tIndex_length\tData_free\tAuto_increment\tCreate_time\tUpdate_time\tCheck_time\tCollation\tChecksum\tCreate_options\tComment\n', u'observation\tInnoDB\t10\tCompact\t109880661\t145\t15998124032\t0\t0\t7340032\tNULL\t2015-03-24 02:27:57\tNULL\tNULL\tlatin1_swedish_ci\tNULL\t\t\n']
The following checks the memory usage of the database. Note that at this stage (before configuring the table indexes), the database requires about 16 GB of disk space.
o,e = sshCommandWait('df -h')
o
[u'Filesystem Size Used Avail Use% Mounted on\n', u'/dev/xvda1 7.8G 1.4G 6.4G 18% /\n', u'devtmpfs 7.5G 64K 7.5G 1% /dev\n', u'tmpfs 7.5G 0 7.5G 0% /dev/shm\n', u'/dev/xvdb 20G 17G 2.3G 88% /mnt/givenDataVolume\n', u'/dev/xvdc 99G 16G 79G 17% /mnt/mysqlVolume\n']
Since we are not using the given data volume any more (all of the data has been copied to the MySQL database), we can detach and delete it so that we are no longer being charged for it.
o,e = sshCommandWait('sudo umount -d /dev/xvdb')
givenDataVolume.detach()
waitUntilReady(givenDataVolume,'available')
givenDataVolume.delete()
........
True
To make queries on the table reasonably fast (particularly for the large table), we need to add Indexes.
sqlcommand = 'ALTER TABLE station '
sqlcommand += 'ADD INDEX (FIPS_ID), '
sqlcommand += 'ADD INDEX (lat), '
sqlcommand += 'ADD INDEX (lon) '
sqlcommand += ';'
cmd = 'sudo nohup mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather'
cmd += ' > proc2.out 2> proc2.err < /dev/null &'
sshCommandNoWait(cmd)
sqlcommand = 'DESCRIBE station;'
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather', True)
o
Field | Type | Null | Key | Default | Extra | |
---|---|---|---|---|---|---|
0 | USAF | varchar(6) | NO | PRI | NULL | |
1 | WBAN | varchar(6) | YES | NULL | ||
2 | station_name | varchar(30) | YES | NULL | ||
3 | CTRY | varchar(2) | YES | NULL | ||
4 | FIPS_ID | varchar(2) | YES | MUL | NULL | |
5 | ST | varchar(2) | YES | NULL | ||
6 | callid | varchar(4) | YES | NULL | ||
7 | lat | int(11) | YES | MUL | NULL | |
8 | lon | int(11) | YES | MUL | NULL | |
9 | elev | int(11) | YES | NULL |
sqlcommand = 'ALTER TABLE observation '
sqlcommand += 'ADD INDEX (STN), '
sqlcommand += 'ADD INDEX (year), '
sqlcommand += 'ADD INDEX (month), '
sqlcommand += 'ADD INDEX (day), '
sqlcommand += 'ADD INDEX (temp), '
sqlcommand += 'ADD INDEX (maxtemp), '
sqlcommand += 'ADD INDEX (mintemp) '
sqlcommand += ';'
cmd = 'sudo time nohup mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather'
cmd += ' > proc2.out 2> proc2.err < /dev/null &'
sshCommandNoWait(cmd)
time.sleep(60*fillTime*2)
sqlcommand = 'DESCRIBE observation;'
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass + ' weather', True)
o
Field | Type | Null | Key | Default | Extra | |
---|---|---|---|---|---|---|
0 | STNYMD | varchar(14) | NO | PRI | NULL | |
1 | STN | varchar(6) | YES | MUL | NULL | |
2 | WBAN | varchar(6) | YES | NULL | ||
3 | year | int(11) | YES | MUL | NULL | |
4 | month | int(11) | YES | MUL | NULL | |
5 | day | int(11) | YES | MUL | NULL | |
6 | temp | int(11) | YES | MUL | NULL | |
7 | dewp | int(11) | YES | NULL | ||
8 | slp | int(11) | YES | NULL | ||
9 | stp | int(11) | YES | NULL | ||
10 | visib | int(11) | YES | NULL | ||
11 | wdsp | int(11) | YES | NULL | ||
12 | mxspd | int(11) | YES | NULL | ||
13 | gust | int(11) | YES | NULL | ||
14 | maxtemp | int(11) | YES | MUL | NULL | |
15 | mintemp | int(11) | YES | MUL | NULL | |
16 | prcp | int(11) | YES | NULL | ||
17 | sndp | int(11) | YES | NULL | ||
18 | FRSHTT | varchar(6) | YES | NULL |
The following again checks the memory usage of the database. Note that we have more than doubled the disk space requirement of the database by adding the indexes.
o,e = sshCommandWait('df -h')
o
[u'Filesystem Size Used Avail Use% Mounted on\n', u'/dev/xvda1 7.8G 1.4G 6.4G 18% /\n', u'devtmpfs 7.5G 60K 7.5G 1% /dev\n', u'tmpfs 7.5G 0 7.5G 0% /dev/shm\n', u'/dev/xvdc 99G 34G 60G 37% /mnt/mysqlVolume\n']
The following change to the MySQL configuration file makes the database remotely accessible. A database restart is required after the change.
cmd = 'sudo sed -i "s/mysql\\.sock/mysql\\.sock\\nbind-address\\=0\\.0\\.0\\.0/g" /etc/my.cnf'
o,e = sshCommandWait(cmd)
o,e = sshCommandWait('sudo /etc/init.d/mysqld restart')
Set up a new MySQL user named "weatherdbuser", provide it with remote access privilege when accessing the database from 'myIP'.
sqlcommand = 'GRANT ALL ON weather.* TO weatherdbuser@' + myIP + ' IDENTIFIED BY \'' + mysqlPass + '\';'
o,e = sshCommandWait('mysql -e "' + sqlcommand + '" -u root --password=' + mysqlPass)
Another (shorter) IPython Notebook weather_queries.ipynb complements this one, wherein a variety of queries are made to the remote database. We just provide a quick example here.
import MySQLdb as mdb
print 'opening connection'
con = mdb.connect( host = instance.public_dns_name,
passwd = mysqlPass,
user = 'weatherdbuser',
db='weather' );
cur = con.cursor()
cur.execute("SELECT * FROM observation LIMIT 10;")
for i in range(cur.rowcount):
row = cur.fetchone()
print row
print 'closing connection'
con.close()
opening connection ('00820920090309', '008209', '99999', 2009L, 3L, 9L, 782L, 710L, 99999L, 99999L, 9999L, 13L, 19L, 9999L, 806L, 716L, 0L, 9999L, '000000') ('00820920090310', '008209', '99999', 2009L, 3L, 10L, 678L, 656L, 99999L, 99999L, 9999L, 12L, 41L, 9999L, 806L, 554L, 0L, 9999L, '000000') ('00820920090311', '008209', '99999', 2009L, 3L, 11L, 681L, 646L, 99999L, 99999L, 9999L, 13L, 29L, 9999L, 824L, 554L, 0L, 9999L, '000000') ('00820920090312', '008209', '99999', 2009L, 3L, 12L, 666L, 639L, 99999L, 99999L, 9999L, 12L, 41L, 9999L, 806L, 572L, 0L, 9999L, '000000') ('00820920090313', '008209', '99999', 2009L, 3L, 13L, 669L, 622L, 99999L, 99999L, 9999L, 13L, 41L, 9999L, 806L, 554L, 0L, 9999L, '000000') ('00820920090314', '008209', '99999', 2009L, 3L, 14L, 700L, 691L, 99999L, 99999L, 9999L, 11L, 60L, 9999L, 806L, 662L, 0L, 9999L, '000000') ('00820920090315', '008209', '99999', 2009L, 3L, 15L, 715L, 715L, 99999L, 99999L, 9999L, 15L, 29L, 9999L, 806L, 698L, 0L, 9999L, '000000') ('00820920090316', '008209', '99999', 2009L, 3L, 16L, 692L, 692L, 99999L, 99999L, 9999L, 15L, 80L, 9999L, 716L, 644L, 0L, 9999L, '000000') ('00820920090317', '008209', '99999', 2009L, 3L, 17L, 667L, 604L, 99999L, 99999L, 9999L, 26L, 70L, 9999L, 824L, 590L, 0L, 9999L, '000000') ('00820920090318', '008209', '99999', 2009L, 3L, 18L, 652L, 599L, 99999L, 99999L, 9999L, 18L, 60L, 9999L, 806L, 554L, 0L, 9999L, '000000') closing connection