Salmonella E-burst groups

This ipython notebook (ipynb) is intended to convery the genetic diversity (or homogeneity!) of different Salmonella serotypes, as we move from phenotypic methods to genotypic methods.

It is populated with 14 of the UK's most frequently observed Salmonella serotypes. Associated with each serotype is a python data structure (details below), that contains the different e-burst groups (EBG). Associated with each EBG is the MLST sequence types (STs) that make up each EBG. For more details see:

http://www.plospathogens.org/article/info%3Adoi%2F10.1371%2Fjournal.ppat.1002776

For people familiar with python syntax, the dictionary takes the format:

{serotype:{ebg[sts], ebg[sts]}, serotype:{ebg[sts], ...}

In [46]:
sero_ebg_st = {'parab-java': {32: ['681', '423', '42', '733', '734'], 19: ['88', '372', '127'], 59: ['28'], 155: ['404', '679'], 5: ['896', '772', '570', '307', '267', '110', '265', '264', '149', '86', '43', '325', '266']}, 'enteritidis': {32: ['74'], 4: ['745', '310', '616', '460', '168', '136', '11', '691', '640', '11SLV', '183', '366', '1479', '814'], 93: ['180', '172']}, 'typhimurium': {1: ['376', '429', '205', '204', '323', '19', '302', '394', '35', '34', '99', '98', '159', '137', '456', '209', '128', '313', '332', '328', '213'], 26: ['15'], 138: ['SLV36', '36'], 54: ['13']}, 'virchow': {9: ['618', '755', '303', 'SLV16', 'TLV16', '38', '16', '181', '326', '648', '359'], 70: ['197', '333']}, 'braenderup': {24: ['194', '311', '21', '22']}, 'kentucky': {56: ['727', '198'], 164: ['832', 'SLV314', '314'], 15: ['151', '152', '318', '723', '221']}, 'agona': {26: ['15'], 83: ['463'], 54: ['13', '37', '1328', '1215']}, 'newport': {3: ['201', '1496', '157', '211', '46', '158', '45', '355', '193', '121', '131', '116', '353', '125', '184', '350', '614', '165'], 2: ['199', '190', '115', '117', 'SLV118', '119', '118', '345', '347', '5', '187', '189', '120', '122', '123', '164', '167', '223', '163', '354', '352', '351', '375'], 35: ['166', '360', '156'], 154: ['808', '807'], 7: ['132', '346', '200', '348', '31', '191', '188', '349']}, 'oranienburg': {41: ['47', '1523', '1522', '23'], 203: ['1538', '1392', '1513', '1512'], 44: ['1553', '1516', '320', '169', '1515', '174'], 45: ['292'], 50: ['91', '1514'], 52: ['179']}, 'infantis': {31: ['295', '32', '1032', '41', 'SLV32']}, 'typhi': {13: ['1', '890', '3', '2', '8', '911', 'SLV2', '892']}, 'paratyphi-a': {11: ['130', '129', '495', '494', 'SLV85', '479', '85']}, 'stanley': {29: ['1027', 'DLV29', '51', '29', '182', 'SLV51']}, 'montevideo': {208: ['749', '1491'], 4: ['195', '305', '316', '1493', '1489', '1488', '81', '699', '1537', '1536', '1535', '4', '1531'], 39: ['748', '138', '1518']}}

The result of the code in the section below tells you how many E-burst groups there are in each serotype

In [47]:
print 'serotype\tnumber of EBGs\n'
for serotype in sero_ebg_st:
    print serotype, '\t', len(sero_ebg_st[serotype])
serotype	number of EBGs

parab-java 	5
enteritidis 	3
typhimurium 	4
braenderup 	1
infantis 	1
stanley 	1
oranienburg 	6
virchow 	2
kentucky 	3
agona 	3
typhi 	1
newport 	5
paratyphi-a 	1
montevideo 	3

The below code will tell you the names of the EBGs for each serotype and the number of STs within each.

In [48]:
print 'serotype\tNum of EBGs\n'
for serotype in sero_ebg_st:
    print serotype, '\t', len(sero_ebg_st[serotype]), '\t'
    
    print 'EBG : Num STs'
    for ebg in sero_ebg_st[serotype]:
        
        
        print '\t', ebg, ':', len(sero_ebg_st[serotype][ebg]), '\t\t',
    
    print
    
serotype	Num of EBGs

parab-java 	5 	
EBG : Num STs
	32 : 5 			59 : 1 			19 : 3 			155 : 2 			5 : 13 		
enteritidis 	3 	
EBG : Num STs
	32 : 1 			4 : 14 			93 : 2 		
typhimurium 	4 	
EBG : Num STs
	1 : 21 			26 : 1 			138 : 2 			54 : 1 		
braenderup 	1 	
EBG : Num STs
	24 : 4 		
infantis 	1 	
EBG : Num STs
	31 : 5 		
stanley 	1 	
EBG : Num STs
	29 : 6 		
oranienburg 	6 	
EBG : Num STs
	41 : 4 			203 : 4 			44 : 6 			45 : 1 			50 : 2 			52 : 1 		
virchow 	2 	
EBG : Num STs
	9 : 11 			70 : 2 		
kentucky 	3 	
EBG : Num STs
	56 : 2 			164 : 3 			15 : 5 		
agona 	3 	
EBG : Num STs
	26 : 1 			83 : 1 			54 : 4 		
typhi 	1 	
EBG : Num STs
	13 : 8 		
newport 	5 	
EBG : Num STs
	35 : 3 			2 : 23 			3 : 18 			154 : 2 			7 : 8 		
paratyphi-a 	1 	
EBG : Num STs
	11 : 7 		
montevideo 	3 	
EBG : Num STs
	208 : 2 			4 : 13 			39 : 3 		

Conclusion

Admittedly, this does nothing that a simple blog post couldn't do, but I think it would be neat to allow people to interact with the ipynb e.g. to be able to input an ST and find what EBG it is in perhaps? I think I will save that for another day though.

I also think it is quite cool for any budding pythonistas, it shows off the advantages of using nested dictionaries. You can download the dictionary and the code and experiment with dictionaries within dictionaries. They're great!