The current ENCODE-DCC portal lacks an obvious way to do batch downloads of data. The peices to support it are present, but it's not obvious how to actually do it.
This is my attempt to illustrate how one might go about doing it without too many additional dependencies.
I have my own way of querying data from the DCC that is more flexible but requires significantly more software to be installed.
For example a report I wrote to show our current submission status uses SPARQL to piece together multiple objects. My attempt to explain how the SPARQL solution works is here. (They don't directly download data, but do show how you can extract metadata from the DCC portal).
I developed this under python2, however these imports should make us more compatible with python3.
from __future__ import print_function
The purpose of this is to illustrate how one might try to do batch downloading from the new ENCODE-DCC portal.
import requests
import json
import os
The first step is to find what we want to batch download. Probably the simplest path is to work directly with the website to build the url containing the list of search strings and filters.
Once you have the url from the portal copy and paste it into your script and append &format=json to the end.
For example I drilled down using the facets on the portal and ended up with this in my browsers URL bar.
https://www.encodedcc.org/search/?type=experiment&organ_slims=brain&assay_term_name=DNase-seq
I copied it into the requests.get() call with the &format=json appended at the end.
Also it can be useful to add &format=json directly to the DCC web interface as you can get additional information that their user interface doesn't render. (You just might want a browser or browser add-on that tries to pretty print the json).
req = requests.get('https://www.encodedcc.org/search/?type=experiment&organ_slims=brain&assay_term_name=DNase-seq&format=json')
if req.status_code != 200:
print("Oh no, not successful: {}".format(req.code))
else:
print("Success!")
Success!
Once you have a successful search request we need to convert it from the JSON returned by the DCC portal to a python dictionary.
result = json.loads(req.content)
The search response contains some extra properties as it's primarily designed for the DCC Web UI to display results. But we can still take advantage of it to build a list of candidate things to download.
result.keys()
[u'title', u'notification', u'@graph', u'@type', u'filters', u'total', u'@id', u'facets', u'columns']
Out of curiosity, what's in some of the other attributes...
result['title']
u'Search'
That wasn't very interesting.
This one shows more promise, showing how many objects we found.
result['total']
25
result['facets']
returns a lot of summary information about what was matched, but it was long enough that I didn't want to clutter up this example with it.
The list of objects we searched for is returned int the '@graph' property of the search query. The search result only returns a small amount of information about a query, to get the rest of the objects we actually need to download each object individually. To do that we need the object @id.
so this fragment generats a list of experiment ids.
experiments = [x['@id'] for x in result['@graph']]
Before batch downloading everything, it would be helpful to see what information is actually available about an experiment.
experiments[0]
u'/experiments/ENCSR000EIY/'
response = requests.get('https://www.encodedcc.org/experiments/ENCSR000EIY/?format=json')
if response.status_code == 200:
encsr000eiy = json.loads(response.content)
print("Success")
Success
Lets see what fields are available for an experiment. (And also lets sort it for ease of finding things.)
sorted(encsr000eiy.keys())
[u'@id', u'@type', u'accession', u'aliases', u'alternate_accessions', u'assay_term_id', u'assay_term_name', u'assembly', u'award', u'biosample_term_id', u'biosample_term_name', u'biosample_type', u'dataset_type', u'date_created', u'date_released', u'dbxrefs', u'description', u'developmental_slims', u'documents', u'files', u'hub', u'lab', u'organ_slims', u'original_files', u'possible_controls', u'references', u'related_files', u'replicates', u'schema_version', u'status', u'submitted_by', u'synonyms', u'system_slims', u'uuid', u'visualize_ucsc']
Since we're trying to download data, the files property looks promising.
file1 = encsr000eiy['files'][0]
file1.keys()
[u'status', u'submitted_by', u'assembly', u'uuid', u'file_format', u'@type', u'md5sum', u'accession', u'schema_version', u'dataset', u'download_path', u'alternate_accessions', u'date_created', u'output_type', u'@id', u'submitted_file_name']
Now that we've seen some of the available data. we can get some of the links.
Unfortunately requests doesn't have a requests.download function, so we need to implement one.
def DownloadFile(url, local_filename):
"""download a file
based on
https://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py
"""
r = requests.get(url, stream=True)
f = open(local_filename, 'wb')
for chunk in r.iter_content(chunk_size=512 * 1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
f.close()
return
def get_links(experiment, terms=None):
"""Given an experiment object, and a list of terms return
a tuple with the list of term values and the download link.
get
e.g. get_links(experiment, ['assembly', 'biosample_term_name']) ->
([hg19', 'frontal cortex', 'ENCFF000SJO.bigWig'], 'https://www.encodedcc.org/2013/4/17/ENCFF000SJO.bigWig')
it will allways include the base filename as the last term.
"""
DOWNLOAD_ROOT = 'http://encodedcc.sdsc.edu/warehouse/'
if terms is None:
terms = []
for f in experiment['files']:
record = []
for t in terms:
record.append(experiment[t])
path, filename = os.path.split(f['download_path'])
record.append(filename)
# yield makes a "generator" that is a return that can return over and over again.
yield ((record, DOWNLOAD_ROOT + f['download_path']))
Now that we have our helper functions we can download all of the files associated with the experiments we selected.
This iterates over all of the experiment ids, requests the full experiment object from the DCC, gives it to get_links to find all the files and annotate them with the metadata we're interested in.
The commented out part does the actual downloading.
for object_id in experiments:
response = requests.get('https://www.encodedcc.org/{}?format=json'.format(object_id))
experiment = json.loads(response.content)
for attributes, file_url in get_links(experiment, ['assembly', 'biosample_term_name']):
local_filename = '-'.join(attributes)
print( file_url, local_filename )
# Uncomment this if you actually want to download.
#DownloadFile(file_url, local_filename)
http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SJO.bigWig hg19-frontal cortex-ENCFF000SJO.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SJP.bam hg19-frontal cortex-ENCFF000SJP.bam http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SJQ.narrowPeak.bigBed hg19-frontal cortex-ENCFF000SJQ.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SJR.bam hg19-frontal cortex-ENCFF000SJR.bam http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SJS.bigWig hg19-frontal cortex-ENCFF000SJS.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SKA.fastq.gz hg19-frontal cortex-ENCFF000SKA.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SKE.fastq.gz hg19-frontal cortex-ENCFF000SKE.fastq.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001UUX.narrowPeak.gz hg19-frontal cortex-ENCFF001UUX.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SFQ.bam hg19-frontal cortex-ENCFF000SFQ.bam http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SFR.bigWig hg19-frontal cortex-ENCFF000SFR.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SFT.narrowPeak.bigBed hg19-frontal cortex-ENCFF000SFT.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SFU.bam hg19-frontal cortex-ENCFF000SFU.bam http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SFV.bigWig hg19-frontal cortex-ENCFF000SFV.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SGJ.fastq.gz hg19-frontal cortex-ENCFF000SGJ.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SGL.fastq.gz hg19-frontal cortex-ENCFF000SGL.fastq.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001UUJ.narrowPeak.gz hg19-frontal cortex-ENCFF001UUJ.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SFK.bigWig hg19-cerebellum-ENCFF000SFK.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SFL.bam hg19-cerebellum-ENCFF000SFL.bam http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SFM.narrowPeak.bigBed hg19-cerebellum-ENCFF000SFM.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SFN.bam hg19-cerebellum-ENCFF000SFN.bam http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SFO.bigWig hg19-cerebellum-ENCFF000SFO.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SFW.fastq.gz hg19-cerebellum-ENCFF000SFW.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/17/ENCFF000SFX.fastq.gz hg19-cerebellum-ENCFF000SFX.fastq.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001UUI.narrowPeak.gz hg19-cerebellum-ENCFF001UUI.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEO.bam mm9-brain-ENCFF001QEO.bam http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEP.bam mm9-brain-ENCFF001QEP.bam http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEQ.broadPeak.bigBed mm9-brain-ENCFF001QEQ.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QER.broadPeak.bigBed mm9-brain-ENCFF001QER.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QES.narrowPeak.bigBed mm9-brain-ENCFF001QES.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QET.narrowPeak.bigBed mm9-brain-ENCFF001QET.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEU.fastq.gz mm9-brain-ENCFF001QEU.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEX.bigWig mm9-brain-ENCFF001QEX.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEY.bigWig mm9-brain-ENCFF001QEY.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QFB.fastq.gz mm9-brain-ENCFF001QFB.fastq.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YVE.broadPeak.gz mm9-brain-ENCFF001YVE.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YVF.broadPeak.gz mm9-brain-ENCFF001YVF.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YVG.narrowPeak.gz mm9-brain-ENCFF001YVG.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YVH.narrowPeak.gz mm9-brain-ENCFF001YVH.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001OZH.fastq.gz mm9-brain-ENCFF001OZH.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001OZI.fastq.gz mm9-brain-ENCFF001OZI.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEF.bam mm9-brain-ENCFF001QEF.bam http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEH.broadPeak.bigBed mm9-brain-ENCFF001QEH.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEJ.narrowPeak.bigBed mm9-brain-ENCFF001QEJ.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEK.fastq.gz mm9-brain-ENCFF001QEK.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEN.bigWig mm9-brain-ENCFF001QEN.bigWig http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YVB.broadPeak.gz mm9-brain-ENCFF001YVB.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YVD.narrowPeak.gz mm9-brain-ENCFF001YVD.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001OZF.broadPeak.bigBed mm9-brain-ENCFF001OZF.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001OZG.narrowPeak.bigBed mm9-brain-ENCFF001OZG.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PAM.bigWig mm9-brain-ENCFF001PAM.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PAO.bigWig mm9-brain-ENCFF001PAO.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PCC.bam mm9-brain-ENCFF001PCC.bam http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEE.bam mm9-brain-ENCFF001QEE.bam http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEG.broadPeak.bigBed mm9-brain-ENCFF001QEG.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEI.narrowPeak.bigBed mm9-brain-ENCFF001QEI.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEL.fastq.gz mm9-brain-ENCFF001QEL.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEM.bigWig mm9-brain-ENCFF001QEM.bigWig http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YMC.broadPeak.gz mm9-brain-ENCFF001YMC.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YMD.narrowPeak.gz mm9-brain-ENCFF001YMD.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YVA.broadPeak.gz mm9-brain-ENCFF001YVA.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YVC.narrowPeak.gz mm9-brain-ENCFF001YVC.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001OYT.fastq.gz mm9-brain-ENCFF001OYT.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001OYU.fastq.gz mm9-brain-ENCFF001OYU.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QCU.bam mm9-brain-ENCFF001QCU.bam http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QCV.bam mm9-brain-ENCFF001QCV.bam http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QCW.bam mm9-brain-ENCFF001QCW.bam http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QCY.broadPeak.bigBed mm9-brain-ENCFF001QCY.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QCZ.broadPeak.bigBed mm9-brain-ENCFF001QCZ.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDA.bam mm9-brain-ENCFF001QDA.bam http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDB.broadPeak.bigBed mm9-brain-ENCFF001QDB.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDC.broadPeak.bigBed mm9-brain-ENCFF001QDC.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDD.broadPeak.bigBed mm9-brain-ENCFF001QDD.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDE.broadPeak.bigBed mm9-brain-ENCFF001QDE.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDF.bam mm9-brain-ENCFF001QDF.bam http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDH.narrowPeak.bigBed mm9-brain-ENCFF001QDH.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDI.narrowPeak.bigBed mm9-brain-ENCFF001QDI.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDJ.narrowPeak.bigBed mm9-brain-ENCFF001QDJ.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDK.narrowPeak.bigBed mm9-brain-ENCFF001QDK.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDL.narrowPeak.bigBed mm9-brain-ENCFF001QDL.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDM.bam mm9-brain-ENCFF001QDM.bam http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDN.narrowPeak.bigBed mm9-brain-ENCFF001QDN.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDP.fastq.gz mm9-brain-ENCFF001QDP.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDQ.fastq.gz mm9-brain-ENCFF001QDQ.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDU.fastq.gz mm9-brain-ENCFF001QDU.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDV.bigWig mm9-brain-ENCFF001QDV.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDW.fastq.gz mm9-brain-ENCFF001QDW.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDX.bigWig mm9-brain-ENCFF001QDX.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDY.bigWig mm9-brain-ENCFF001QDY.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDZ.fastq.gz mm9-brain-ENCFF001QDZ.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEA.bigWig mm9-brain-ENCFF001QEA.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEB.bigWig mm9-brain-ENCFF001QEB.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QEC.fastq.gz mm9-brain-ENCFF001QEC.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QED.bigWig mm9-brain-ENCFF001QED.bigWig http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YUN.broadPeak.gz mm9-brain-ENCFF001YUN.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YUO.broadPeak.gz mm9-brain-ENCFF001YUO.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YUP.broadPeak.gz mm9-brain-ENCFF001YUP.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YUQ.broadPeak.gz mm9-brain-ENCFF001YUQ.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YUR.broadPeak.gz mm9-brain-ENCFF001YUR.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YUS.broadPeak.gz mm9-brain-ENCFF001YUS.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YUU.narrowPeak.gz mm9-brain-ENCFF001YUU.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YUV.narrowPeak.gz mm9-brain-ENCFF001YUV.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YUW.narrowPeak.gz mm9-brain-ENCFF001YUW.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YUX.narrowPeak.gz mm9-brain-ENCFF001YUX.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YUY.narrowPeak.gz mm9-brain-ENCFF001YUY.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YUZ.narrowPeak.gz mm9-brain-ENCFF001YUZ.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001OYR.broadPeak.bigBed mm9-brain-ENCFF001OYR.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001OYS.narrowPeak.bigBed mm9-brain-ENCFF001OYS.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001OZD.bigWig mm9-brain-ENCFF001OZD.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001OZE.bigWig mm9-brain-ENCFF001OZE.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PAP.bam mm9-brain-ENCFF001PAP.bam http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QCT.bam mm9-brain-ENCFF001QCT.bam http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QCX.broadPeak.bigBed mm9-brain-ENCFF001QCX.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDG.narrowPeak.bigBed mm9-brain-ENCFF001QDG.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDO.fastq.gz mm9-brain-ENCFF001QDO.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/19/ENCFF001QDT.bigWig mm9-brain-ENCFF001QDT.bigWig http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YMA.broadPeak.gz mm9-brain-ENCFF001YMA.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YMB.narrowPeak.gz mm9-brain-ENCFF001YMB.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YUM.broadPeak.gz mm9-brain-ENCFF001YUM.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YUT.narrowPeak.gz mm9-brain-ENCFF001YUT.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFO.bam mm9-telencephalon-ENCFF001PFO.bam http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFP.bam mm9-telencephalon-ENCFF001PFP.bam http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFR.broadPeak.bigBed mm9-telencephalon-ENCFF001PFR.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFS.broadPeak.bigBed mm9-telencephalon-ENCFF001PFS.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFT.broadPeak.bigBed mm9-telencephalon-ENCFF001PFT.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFU.narrowPeak.bigBed mm9-telencephalon-ENCFF001PFU.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFV.narrowPeak.bigBed mm9-telencephalon-ENCFF001PFV.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFW.narrowPeak.bigBed mm9-telencephalon-ENCFF001PFW.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFX.bam mm9-telencephalon-ENCFF001PFX.bam http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFY.bigWig mm9-telencephalon-ENCFF001PFY.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFZ.bigWig mm9-telencephalon-ENCFF001PFZ.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PGA.bigWig mm9-telencephalon-ENCFF001PGA.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PGB.fastq.gz mm9-telencephalon-ENCFF001PGB.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PGC.fastq.gz mm9-telencephalon-ENCFF001PGC.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PGF.fastq.gz mm9-telencephalon-ENCFF001PGF.fastq.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YNM.broadPeak.gz mm9-telencephalon-ENCFF001YNM.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YNN.broadPeak.gz mm9-telencephalon-ENCFF001YNN.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YNO.broadPeak.gz mm9-telencephalon-ENCFF001YNO.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YNP.narrowPeak.gz mm9-telencephalon-ENCFF001YNP.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YNQ.narrowPeak.gz mm9-telencephalon-ENCFF001YNQ.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YNR.narrowPeak.gz mm9-telencephalon-ENCFF001YNR.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PEY.broadPeak.bigBed mm9-cerebellum-ENCFF001PEY.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PEZ.broadPeak.bigBed mm9-cerebellum-ENCFF001PEZ.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFA.narrowPeak.bigBed mm9-cerebellum-ENCFF001PFA.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFB.narrowPeak.bigBed mm9-cerebellum-ENCFF001PFB.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFC.bam mm9-cerebellum-ENCFF001PFC.bam http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFD.fastq.gz mm9-cerebellum-ENCFF001PFD.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFG.bigWig mm9-cerebellum-ENCFF001PFG.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFH.bam mm9-cerebellum-ENCFF001PFH.bam http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFI.bigWig mm9-cerebellum-ENCFF001PFI.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFJ.bam mm9-cerebellum-ENCFF001PFJ.bam http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFK.broadPeak.bigBed mm9-cerebellum-ENCFF001PFK.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFL.narrowPeak.bigBed mm9-cerebellum-ENCFF001PFL.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFM.bigWig mm9-cerebellum-ENCFF001PFM.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFN.fastq.gz mm9-cerebellum-ENCFF001PFN.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001PFQ.fastq.gz mm9-cerebellum-ENCFF001PFQ.fastq.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YNG.broadPeak.gz mm9-cerebellum-ENCFF001YNG.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YNH.broadPeak.gz mm9-cerebellum-ENCFF001YNH.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YNI.narrowPeak.gz mm9-cerebellum-ENCFF001YNI.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YNJ.narrowPeak.gz mm9-cerebellum-ENCFF001YNJ.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YNK.broadPeak.gz mm9-cerebellum-ENCFF001YNK.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001YNL.narrowPeak.gz mm9-cerebellum-ENCFF001YNL.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2013/4/15/ENCFF000AFF.narrowPeak.bigBed hg19-choroid plexus epithelial cell-ENCFF000AFF.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001BBL.broadPeak.bigBed hg19-choroid plexus epithelial cell-ENCFF001BBL.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001BBM.narrowPeak.bigBed hg19-choroid plexus epithelial cell-ENCFF001BBM.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001BBN.bigWig hg19-choroid plexus epithelial cell-ENCFF001BBN.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001BBP.bigWig hg19-choroid plexus epithelial cell-ENCFF001BBP.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001BDE.bam hg19-choroid plexus epithelial cell-ENCFF001BDE.bam http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001BGI.fastq.gz hg19-choroid plexus epithelial cell-ENCFF001BGI.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001DBK.broadPeak.bigBed hg19-choroid plexus epithelial cell-ENCFF001DBK.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001DBL.bam hg19-choroid plexus epithelial cell-ENCFF001DBL.bam http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001DBN.narrowPeak.bigBed hg19-choroid plexus epithelial cell-ENCFF001DBN.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001DBP.bigWig hg19-choroid plexus epithelial cell-ENCFF001DBP.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001DBZ.fastq.gz hg19-choroid plexus epithelial cell-ENCFF001DBZ.fastq.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001SQT.narrowPeak.gz hg19-choroid plexus epithelial cell-ENCFF001SQT.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001VZR.broadPeak.gz hg19-choroid plexus epithelial cell-ENCFF001VZR.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001VZS.narrowPeak.gz hg19-choroid plexus epithelial cell-ENCFF001VZS.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001WIE.broadPeak.gz hg19-choroid plexus epithelial cell-ENCFF001WIE.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001WIG.narrowPeak.gz hg19-choroid plexus epithelial cell-ENCFF001WIG.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001DBI.bam hg19-choroid plexus epithelial cell-ENCFF001DBI.bam http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001DBJ.broadPeak.bigBed hg19-choroid plexus epithelial cell-ENCFF001DBJ.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001DBM.narrowPeak.bigBed hg19-choroid plexus epithelial cell-ENCFF001DBM.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001DBO.bigWig hg19-choroid plexus epithelial cell-ENCFF001DBO.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001DBY.fastq.gz hg19-choroid plexus epithelial cell-ENCFF001DBY.fastq.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001WID.broadPeak.gz hg19-choroid plexus epithelial cell-ENCFF001WID.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001WIF.narrowPeak.gz hg19-choroid plexus epithelial cell-ENCFF001WIF.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2013/4/15/ENCFF000AEW.narrowPeak.bigBed hg19-astrocyte of the cerebellum-ENCFF000AEW.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CWT.bam hg19-astrocyte of the cerebellum-ENCFF001CWT.bam http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CWU.broadPeak.bigBed hg19-astrocyte of the cerebellum-ENCFF001CWU.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CWV.broadPeak.bigBed hg19-astrocyte of the cerebellum-ENCFF001CWV.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CWW.narrowPeak.bigBed hg19-astrocyte of the cerebellum-ENCFF001CWW.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CWX.narrowPeak.bigBed hg19-astrocyte of the cerebellum-ENCFF001CWX.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CWY.bam hg19-astrocyte of the cerebellum-ENCFF001CWY.bam http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CWZ.bigWig hg19-astrocyte of the cerebellum-ENCFF001CWZ.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CXA.bigWig hg19-astrocyte of the cerebellum-ENCFF001CXA.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CXD.fastq.gz hg19-astrocyte of the cerebellum-ENCFF001CXD.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CXM.fastq.gz hg19-astrocyte of the cerebellum-ENCFF001CXM.fastq.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001SQK.narrowPeak.gz hg19-astrocyte of the cerebellum-ENCFF001SQK.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001WGN.broadPeak.gz hg19-astrocyte of the cerebellum-ENCFF001WGN.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001WGO.broadPeak.gz hg19-astrocyte of the cerebellum-ENCFF001WGO.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001WGP.narrowPeak.gz hg19-astrocyte of the cerebellum-ENCFF001WGP.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001WGQ.narrowPeak.gz hg19-astrocyte of the cerebellum-ENCFF001WGQ.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2013/4/15/ENCFF000AEY.narrowPeak.bigBed hg19-astrocyte of the hippocampus-ENCFF000AEY.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001AVJ.broadPeak.bigBed hg19-astrocyte of the hippocampus-ENCFF001AVJ.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001AVK.narrowPeak.bigBed hg19-astrocyte of the hippocampus-ENCFF001AVK.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001AVL.bigWig hg19-astrocyte of the hippocampus-ENCFF001AVL.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001AVN.fastq.gz hg19-astrocyte of the hippocampus-ENCFF001AVN.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001AVO.fastq.gz hg19-astrocyte of the hippocampus-ENCFF001AVO.fastq.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001AWT.bigWig hg19-astrocyte of the hippocampus-ENCFF001AWT.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001AXX.bam hg19-astrocyte of the hippocampus-ENCFF001AXX.bam http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CXP.bam hg19-astrocyte of the hippocampus-ENCFF001CXP.bam http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CXQ.broadPeak.bigBed hg19-astrocyte of the hippocampus-ENCFF001CXQ.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CXS.narrowPeak.bigBed hg19-astrocyte of the hippocampus-ENCFF001CXS.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CXU.bigWig hg19-astrocyte of the hippocampus-ENCFF001CXU.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CYE.fastq.gz hg19-astrocyte of the hippocampus-ENCFF001CYE.fastq.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001SQM.narrowPeak.gz hg19-astrocyte of the hippocampus-ENCFF001SQM.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001VZH.broadPeak.gz hg19-astrocyte of the hippocampus-ENCFF001VZH.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001VZI.narrowPeak.gz hg19-astrocyte of the hippocampus-ENCFF001VZI.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001WGW.broadPeak.gz hg19-astrocyte of the hippocampus-ENCFF001WGW.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001WGY.narrowPeak.gz hg19-astrocyte of the hippocampus-ENCFF001WGY.narrowPeak.gz http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CXN.bam hg19-astrocyte of the hippocampus-ENCFF001CXN.bam http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CXO.broadPeak.bigBed hg19-astrocyte of the hippocampus-ENCFF001CXO.broadPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CXR.narrowPeak.bigBed hg19-astrocyte of the hippocampus-ENCFF001CXR.narrowPeak.bigBed http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CXT.bigWig hg19-astrocyte of the hippocampus-ENCFF001CXT.bigWig http://encodedcc.sdsc.edu/warehouse/2013/4/18/ENCFF001CYF.fastq.gz hg19-astrocyte of the hippocampus-ENCFF001CYF.fastq.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001WGV.broadPeak.gz hg19-astrocyte of the hippocampus-ENCFF001WGV.broadPeak.gz http://encodedcc.sdsc.edu/warehouse/2014/1/27/ENCFF001WGX.narrowPeak.gz hg19-astrocyte of the hippocampus-ENCFF001WGX.narrowPeak.gz
I did test running the downloader while it was uncommented, but as I'm at an airport the bandwidth isn't really up to downloading a bunch of bioinformatics data.
Also other improvements might be to filter the list of what files to download to include just specific file types, for instance just bam files or just fastqs for instance something like this.
AnExample = True
if not AnExample:
if attributes[-1].endswith('.bam'):
local_file = '-'.join(attributes)
DownloadFile(file_url, local_file)
Other possibilities would be to be able to return information from both the experiment and file objects, the current example doesn't do very much with the information available in the file object.
from __future__ import print_function
import requests
import json
import os
def dcc_download(query_url, dry_run=True):
# need to implement check to make sure &format=json is in the query string.
response = requests.get(query_url)
if response.status_code != 200:
print("Oh no, not successful: {}".format(req.code))
return
result = json.loads(response.content)
experiments = [x['@id'] for x in result['@graph']]
for object_id in experiments:
response = requests.get('https://www.encodedcc.org/{}?format=json'.format(object_id))
experiment = json.loads(response.content)
for attributes, file_url in get_file_links(experiment, ['assembly', 'biosample_term_name']):
local_filename = '-'.join(attributes)
print( file_url, local_filename )
if not dry_run:
DownloadFile(file_url, local_filename)
def get_file_links(experiment, terms=None):
"""Given an experiment object, and a list of terms return
a tuple with the list of term values and the download link.
get
e.g. get_links(experiment, ['assembly', 'biosample_term_name']) ->
([hg19', 'frontal cortex', 'ENCFF000SJO.bigWig'], 'https://www.encodedcc.org/2013/4/17/ENCFF000SJO.bigWig')
it will allways include the base filename as the last term.
"""
DOWNLOAD_ROOT = 'http://encodedcc.sdsc.edu/warehouse/'
if terms is None:
terms = []
for f in experiment['files']:
record = []
for t in terms:
record.append(experiment[t])
path, filename = os.path.split(f['download_path'])
record.append(filename)
# yield makes a "generator" that is a return that can return over and over again.
yield ((record, DOWNLOAD_ROOT + f['download_path']))
def DownloadFile(url, local_filename):
"""download a file
based on
https://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py
"""
r = requests.get(url, stream=True)
f = open(local_filename, 'wb')
for chunk in r.iter_content(chunk_size=512 * 1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
f.close()
return