In the previous version of this notebook attempted to run the CPAN module for InterologWalk. There were problems installing this and getting it to run locally. It turned out that it had already been run over a large set of proteins at Edinburgh and that the output file was available, which makes this task much easier.
Looking at this file and loading it:
cd ../../InterologWalk/
/home/gavin/Documents/MRes/InterologWalk
ls
IW_entrez.csv
!head IW_entrez.csv
import csv
As was done with the STRING notebook we will create a ocbio.ppipred.features
object to store the dictionary of interactions.
This can then be pickled and loaded when assembling feature vectors.
f = open("IW_entrez.csv")
featuredict = {}
for line in csv.reader(f,delimiter="\t"):
featuredict[frozenset(line)] = ['1']
f.close()
import sys
sys.path.append("../opencast-bio/")
import ocbio.ppipred
features = ocbio.ppipred.features(featuredict,1)
Testing with arbitrary keys:
realkey = featuredict.keys()[0]
fakekey = frozenset(["1234","4321"])
features[realkey]
['1']
features[fakekey]
['0']
Finally, we pickle this instance so that it can be accessed by the assembler to create feature vectors:
import pickle
f = open("human.interologwalk.features.pickle","wb")
pickle.dump(features,f)
f.close()