Notebook

Programmatically Access Materials Project Electrolyte Genome Data¶

Donny Winston and Xiaohui Qu
Created: November 18, 2015
Last Update: April 19, 2018

This notebook documents URL patterns to access Electrolyte Genome data and provides examples of access using the Python requests library.

If you have questions, please contact the Materials Project team. Contact information is available at https://materialsproject.org.

URL patterns¶

There is one way to query for results given search criteria (results), and there are a few ways to obtain data for individual molecules, either in full with metadata (json) or simply the structure for display (svg) or analysis (xyz). Below are the four corresponding URL patterns.

In [1]:

urlpattern = {
    "results": "https://materialsproject.org/molecules/results?query={spec}",
    "mol_json": "https://materialsproject.org/molecules/{mol_id}/json",
    "mol_svg": "https://materialsproject.org/molecules/{mol_id}/svg",
    "mol_xyz": "https://materialsproject.org/molecules/{mol_id}/xyz",
}

Setup¶

In [2]:

import json
import os
import sys
if sys.version_info[0] == 2:
    from urllib import quote_plus
else:
    from urllib.parse import quote_plus

import requests

In [ ]:

# Ensure you have an API key, which is located on your dashboard
# (https://materialsproject.org/dashboard).

MAPI_KEY = "fAkEaP1K4y" # <-- replace with your api key

# Please do NOT share a notebook with others with your API key hard-coded in it.
# One alternative: Load API key from a set environment variable, e.g.
#
# MAPI_KEY = os.environ['PMG_MAPI_KEY']
#
# Best alternative: Store and load API key using pymatgen, e.g.
### Do once, on command line (without "!" in front) or in notebook
# !pmg config --add PMG_MAPI_KEY "your_api_key_goes_here"
### Then, in notebook/script:
# from pymatgen import SETTINGS
# MAPI_KEY = SETTINGS.get("PMG_MAPI_KEY")

Getting a set of molecules¶

In [4]:

# Here is a function we'll use to get results. We'll walk though some examples that use it.

def get_results(spec, fields=None):
    """Take a specification document (a `dict`), and return a list of matching molecules.
    """
    # Stringify `spec`, ensure the string uses double quotes, and percent-encode it...
    str_spec = quote_plus(str(spec).replace("'", '"'))
    # ...because the spec is the value of a "query" key in the final URL.
    url = urlpattern["results"].format(spec=str_spec)
    return (requests.get(url, headers={'X-API-KEY': MAPI_KEY})).json()

In [5]:

# Find molecules containing oxygen and phosphorous,
# and collect the ionization energies (relative to a lithium electrode) of the results.

# Separate elements with a "-"
spec = {"elements": "O-P"}

results = get_results(spec)

# Not all molecules have data for all available properties
ionization_energies = [molecule["IE"] for molecule in results if "IE" in molecule]

In [6]:

# Molecules with ionization energies ("IE") will have oxidation potentials relative to metallic electrodes,
# available as "oxidation_<ELECTRODE>" keys. "IE" itself is relative to lithium.
# There is an analogous relationship between the presence of electron affinity ("EA") values
# and corresponding "reduction_<ELECTRODE>" keys for reduction potentials using a reference metal.

# `task_id` is the molecule's identifier, which we'll use later in this notebook.

# `MW` is molecular weight
# `smiles`: https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system

for key in results[0]:
    print(key)

smiles
oxidation_magnesium
task_id
oxidation_lithium
svg
oxidation_hydrogen
EA
point_group
reduction_hydrogen
MW
charge
reduction_magnesium
formula
IE
reduction_lithium

In [7]:

# A "silly" example specification that demonstrates many keys available to query, and
# the expected format of their value specifications.
#
# The "$"-prefixed keys are MongoDB syntax (https://docs.mongodb.org/manual/reference/operator/query/).

spec = {
    "elements": "C-H-O-F",
    "notelements": ["Al", "Br"], # a list (inconsistent for now with "elements" -- sorry)
    "charge": {"$in": [0, -1]}, # {0, 1, -1}
    "pointgroup": "C1",
    "functional_groups": {"$in": ["-COOH"]},
    "base_molecule": {"$in": ["s3"]},
    "nelements": 4,
    "EA": {"$gte": 0.4}, # >= 0.4
    "IE": {"$lt":  5}, # < 5
    "formula": "H11 C11 O4 F1", # "H11C11O4F" works too
}

results = get_results(spec)

What if we just want "everything"? Let's use an empty spec.

In [8]:

results = get_results({})
print("{} molecules in total right now".format(len(results)))

21954 molecules in total right now

The above request might take some time, but hopefully not much more than a few seconds. Why do we allow this? Well, we don't return all the data for each molecule, and the total size of what we send right now is less than 10 MB.

As our collection of molecules grows in size, this policy may change. So, please use targeted query specifications to get the results you need, especially if you want to periodically check for new molecules that meet some specification.

Getting data for individual molecules¶

You can get all data for a molecule given its ID.

In [9]:

def get_molecule(mol_id, fmt='json'):
    url = urlpattern["mol_" + fmt].format(mol_id=mol_id)
    response = requests.get(url, headers={'X-API-KEY': MAPI_KEY})
    if fmt == 'json':
        return response.json()
    else:
        return response.content

In [10]:

first_result = results[0]
mol_id = first_result['task_id']
print("ID: {}".format(mol_id))

# Get all data by default
molecule = get_molecule(mol_id)
print("There are {} key/value pairs in molecule {}. Have a look around!".format(len(molecule), mol_id))

# The SVG format provides a two-dimensional "pretty picture" of the molecular structure.
svg_of_molecule = get_molecule(mol_id, fmt='svg')
with open('molecule.svg','w') as f:
    f.write(svg_of_molecule)
    print("scalable vector graphic saved")

# The XYZ representation provided is the optimized geometry of the molecule in a charge-neutral state.
xyz_of_molecule = get_molecule(mol_id, fmt='xyz')
with open('molecule.xyz','w') as f:
    f.write(xyz_of_molecule)
    print("XYZ file saved. Can load into molecule-viewer software.")

ID: mol-64060
There are 29 key/value pairs in molecule mol-64060. Have a look around!
scalable vector graphic saved
XYZ file saved. Can load into molecule-viewer software.