Edges weighted with the combined score generated by the STRING database will be useful for comparison against our own method and to test the community detection analysis before the weighted edges generated using our method are ready. Two options exist to get these weightings:
Unfortunately, the online service produces a table that does not include the Entrez IDs that are originally put in, so the output would have to be mapped back to Entrez IDs for our pipeline. The fastest way will be to use the pickled object created in the above notebook to generate features and take only the combined values:
cd ../../features
/home/gavin/Documents/MRes/features
import csv
ls
abundance.Entrez.full.txt@ head.training.nolabel.negative.Entrez.vectors.txt@ abundance.Entrez.traintest.txt@ pulldown.edges.Entrez.txt@ autogit.log pulldown.nolabel.Entrez.vectors.txt@ c2s.Entrez.full.txt@ training.nolabel.negative.Entrez.vectors.txt c2s.Entrez.traintest.txt@ training.nolabel.positive.Entrez.vectors.txt
import sys
sys.path.append("/home/gavin/Documents/MRes/opencast-bio/")
import ocbio.string
import pickle
f = open("../string/human.Entrez.string.pickle")
stringfeatures = pickle.load(f)
f.close()
pulldownpairfile = open("../forGAVIN/pulldown_data/pulldown.interactions.Entrez.tsv")
stringedgefile = open("pulldown.string.edges.tsv", "w")
cp = csv.reader(pulldownpairfile, delimiter="\t")
cs = csv.writer(stringedgefile, delimiter="\t")
for l in cp:
# for each pair index the feature dictionary
# write the pairs that are non-zero
pair = frozenset(l)
combinedscore = float(stringfeatures[pair][-1])
if combinedscore > 0.0000001:
cs.writerow(l + [combinedscore])
pulldownpairfile.close()
stringedgefile.close()
!head pulldown.string.edges.tsv