Pairinglist¶

Some tools or processes generate output, usually in some form of CSV output which holds a line entry for a given type of object, and any number of associated properties, all of the same type. This can be viewed as a compressed form of a table describing a many to many relationship, sometimes called a junction table or as I prefer in the context of this example, a pairing table.

Example¶

Consider a list of people and the operating systems they work with. I have included the trailing empty entries that can occur in csv files.

In [1]:

example = [
    ['Tom', 'Windows', 'Linux', ''],
    ['Dick', 'Windows', 'OSX', 'AIX'],
    ['Harry', 'Linux', 'BSD', '']
    ]

2-dimensional table¶

Although this might originally be stored somehwere as a pairing table already, in a database somewhere, if we do not control the source, or whatever mechanism generates the output, how do we extract the underlying structure? Let's say we want to end up with a representation like this:

	Windows	Linux	OSX	BSD	AIX
Tom	X	X
Dick	X		X		X
Harry		X		X

Because the columns in the original data are not aligned we need to extract the original pairing table to enable a pivot transformation.

In [2]:

def pairinglist(csvlist):
    # clear out the empty strings from csv conversion
    nonemptylist = map(lambda line: filter(None, line), csvlist)
    
    # generate all combinations as sublists of tuples
    pairings = map(lambda line: [(line[0], pairing) for pairing in line[1:]], nonemptylist)

    # flatten the sublists
    return [pairing for line in pairings for pairing in line]

In [3]:

for entry in example:
    print entry

['Tom', 'Windows', 'Linux', '']
['Dick', 'Windows', 'OSX', 'AIX']
['Harry', 'Linux', 'BSD', '']

In [4]:

for entry in pairinglist(example):
    print entry

('Tom', 'Windows')
('Tom', 'Linux')
('Dick', 'Windows')
('Dick', 'OSX')
('Dick', 'AIX')
('Harry', 'Linux')
('Harry', 'BSD')

Visualizations¶

The list of tuples generated by the function can be fed to a csvwriter, or some other tool. I tend to use Excel pivottables for this kind of data, but some python visualization could also be used.