Seeing how much space can be saved, if working with sparse data.
In the worst case the database will be storing a single value for each pair of proteins. Saving this file is time consuming and even if it can be saved it might be impossible to index it.
cd ~/Documents/MRes/geneconversion/
/home/gavin/Documents/MRes/geneconversion
ls
gene2go gene_info human.Entrez.pickle testgen.pickle
import pickle
f = open("human.Entrez.pickle")
ids = pickle.load(f)
f.close()
import itertools
sys.path.append("/home/gavin/Documents/MRes/opencast-bio/")
import ocbio.extract
cd /mnt/external/remotes/geneontology/
/mnt/external/remotes/geneontology
db = ocbio.extract.openpairshelf("test.db")
for p in itertools.combinations(ids,2):
db[p] = 1
db.close()
print "something"
print len(ids)
18217