As a half-French, half-German person, I’ve often found myself discussing electricity policies in the last months, especially in the context of the climate crisis.

Since 2011, Germany has been heavily investing in renewable energy sources (wind and solar) and shutting down nuclear. These developements can notably be traced back to the Fukushima nuclear crisis, which was a decisive factor in the enactement of the so-called Energiewende (energy transition).

France has been using low-carbon electricity for decades now, due to its use of nuclear energy (last I checked it was in the top3 nuclearized countries in the world, with the US and Japan).

There are a lot of questions I’ve been wanting to ask for a long time regarding German and French electricity. Since this morning, I’ve learnt that there is an easy way to get access to electric data using Python called entsoe-py. This notebook is a quick exploration of its capabilities.

I got started with this blog post through a tweet, which I’ll post below, and that :

J’ai ressorti la calculette parce que certains n’ont toujours pas compris que BotElectricity n’est pas ElectricityMap.https://t.co/1E14LVOwa5

Donc regardons dans le détail. On va se rapprocher des chiffres de ElectricityMap.
— Thomas 💉💉💉 (@Thomas_Auriel) January 13, 2022

Querying the production¶

In [94]:

import entsoe
import pandas as pd

api_key = open(".entsoetoken").read()

client = entsoe.EntsoePandasClient(api_key)

start = pd.Timestamp('20220101', tz='Europe/Brussels')
end = pd.Timestamp('20220117', tz='Europe/Brussels')
country_code = 'DE'  # Belgium

df = client.query_generation(country_code, start=start, end=end, psr_type=None)

df

Out[94]:

	Biomass	Fossil Brown coal/Lignite	Fossil Gas	Fossil Hard coal	Fossil Oil	Geothermal	Hydro Pumped Storage		Hydro Run-of-river and poundage	Hydro Water Reservoir	Nuclear	Other	Other renewable	Solar		Waste	Wind Offshore	Wind Onshore
	Actual Aggregated	Actual Aggregated	Actual Aggregated	Actual Aggregated	Actual Aggregated	Actual Aggregated	Actual Aggregated	Actual Consumption	Actual Aggregated	Actual Aggregated	Actual Aggregated	Actual Aggregated	Actual Aggregated	Actual Aggregated	Actual Consumption	Actual Aggregated	Actual Aggregated	Actual Aggregated	Actual Consumption
2022-01-01 00:00:00+01:00	4322.0	3596.0	2668.0	2048.0	279.0	23.0	764.0	1340.0	1429.0	58.0	3208.0	363.0	126.0	0.0	0.0	858.0	6137.0	25833.0	0.0
2022-01-01 00:15:00+01:00	4323.0	3582.0	2761.0	2041.0	279.0	23.0	553.0	1549.0	1429.0	41.0	3325.0	363.0	126.0	0.0	0.0	860.0	6042.0	25426.0	0.0
2022-01-01 00:30:00+01:00	4292.0	3526.0	2756.0	2038.0	279.0	23.0	338.0	1809.0	1428.0	30.0	3344.0	363.0	126.0	0.0	0.0	850.0	5966.0	25078.0	0.0
2022-01-01 00:45:00+01:00	4299.0	3544.0	2753.0	2030.0	279.0	23.0	230.0	1984.0	1428.0	34.0	3342.0	363.0	126.0	0.0	0.0	835.0	5773.0	25144.0	0.0
2022-01-01 01:00:00+01:00	4295.0	3565.0	2634.0	2053.0	279.0	23.0	488.0	1570.0	1424.0	119.0	3351.0	363.0	126.0	0.0	0.0	854.0	5693.0	24891.0	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2022-01-16 22:45:00+01:00	4497.0	9699.0	4288.0	8126.0	322.0	23.0	53.0	3984.0	1151.0	162.0	4089.0	416.0	179.0	0.0	0.0	702.0	4710.0	27177.0	0.0
2022-01-16 23:00:00+01:00	4492.0	9830.0	4151.0	7908.0	323.0	23.0	307.0	3161.0	1152.0	235.0	4091.0	410.0	177.0	0.0	0.0	702.0	4698.0	27379.0	0.0
2022-01-16 23:15:00+01:00	4493.0	9702.0	4056.0	7613.0	323.0	23.0	210.0	3420.0	1148.0	162.0	4092.0	410.0	177.0	0.0	0.0	698.0	4696.0	27653.0	0.0
2022-01-16 23:30:00+01:00	4494.0	9679.0	3973.0	7333.0	323.0	23.0	121.0	4026.0	1148.0	140.0	4090.0	410.0	177.0	0.0	0.0	696.0	4706.0	28004.0	0.0
2022-01-16 23:45:00+01:00	4488.0	9750.0	3836.0	7162.0	323.0	23.0	106.0	4283.0	1149.0	147.0	4085.0	399.0	171.0	0.0	0.0	688.0	4700.0	28148.0	0.0

1536 rows × 19 columns

Let’s transform the multi-index to a normal one.

In [95]:

def convert_df(df):
    """Gets rid of the multi index"""
    cols = df.columns
    new_cols = ["_".join([col[0], col[1]]) for col in cols]
    new_df = pd.DataFrame(data=df.values, columns=new_cols, index=df.index)
    return new_df

df = convert_df(df)

And now, let’s plot the result.

In [97]:

import hvplot.pandas

df.hvplot(responsive=True, height=500)

Out[97]:

Estimating CO2 impact¶

As explained in the above twitter thread (in French), ENTSOE does not compute CO2 intensity per se. To do that, we need to multiply the production values by an intensity.

We can conveniently copy the script from BotElectricity found here: https://gitlab.com/ThomasAuriel/BotElectricity/-/blob/master/scripts/data/co2Intensity.py

In [98]:

intensities = {"Mixed": 600,
            "Generation": 600,
            "Load": 600,
            "Biomass": 230,
            "Fossil Brown coal/Lignite": 1200,  # According to https://www.energy-charts.de/emissions.htm?source=lignite&view=specific&emission=co2&year=2017 but IPCC Does no do the difference between lignite and Coal
            "Fossil Coal-derived gas": 820,
            "Fossil Gas": 490,
            "Fossil Hard coal": 820,
            "Fossil Oil": 490,
            "Fossil Oil shale": 490,
            "Fossil Peat": 820,
            "Geothermal": 38,
            "Hydro Pumped Storage": 24,
            "Hydro Run-of-river and poundage": 24,
            "Hydro Water Reservoir": 24,
            "Marine": 24,
            "Nuclear": 12,
            "Other renewable": 30,
            "Solar": 48,
            "Waste": 230,
            "Wind Offshore": 12,
            "Wind Onshore": 11,
            "Other": 600,
}

And now, let’s map this and sum it to get an average CO2 intensity in gCO2/kWh.

In [99]:

import numpy as np

def compute_intensity(df):
    """Computes CO2 intensity by multiplying kWh * intensity/kWh."""
    co2_intensity = pd.Series(data=df.iloc[:, 0] * 0, index=df.index, name="CO2 intensity gCO2/kWh")
    for col in df.columns:
        label = col.split("_")[0]
        intensity = intensities[label]
        co2_intensity += intensity * df[col].replace(np.nan, 0)
    co2_intensity /= df.iloc[:, :-1].sum(axis=1).values
    return co2_intensity

co2_intensity = compute_intensity(df)

co2_intensity.hvplot(responsive=True, height=500)

Out[99]:

Case study France vs Germany¶

With these first things in place, I’d like to conduct a little case study.

I’d like to use as much data as I can, so I’m going to dowload data as far back as January 2015, which is apparently easily accessible in the ENTSOE database.

In [100]:

from tqdm import tqdm

def download_yearly_data(country_code, year):
    if year == 2015:
        start = pd.Timestamp('20150105', tz='Europe/Brussels')
    else:
        start = pd.Timestamp(f'{year}0101', tz='Europe/Brussels')
    end = pd.Timestamp(f'{year}1231', tz='Europe/Brussels')
    df = client.query_generation(country_code, start=start, end=end, psr_type=None)
    return df

def download_all_data():
    for year in tqdm(range(2015, 2022)):
        for country_code in tqdm(["FR", "DE"]): 
            df = download_yearly_data(country_code, year)
            df.to_pickle(f"entsoe_data_{country_code}_{year}.pickle")

Once this is done (it takes almost an hour to do this), we can read the data that was written on disk and make big dataframes out of it:

In [102]:

data = {}
for country_code in ["FR", "DE"]:
    dfs = []
    for year in range(2015, 2022):
        df = pd.read_pickle(f"entsoe_data_{country_code}_{year}.pickle")
        dfs.append(df)
    df = pd.concat(dfs)
    df = convert_df(df)
    data[country_code] = compute_intensity(df)

With that, we can draw a plot that shows the min, mean and max values for CO2 intensity.

In [103]:

def resample(df, rule="W"):
    c = pd.concat([df.resample(rule="W").min(),
           df.resample(rule="W").mean(), 
           df.resample(rule="W").max(),],
          axis=1)
    c.columns = ["min", "mean", "max"]
    return c 

def plot(key):
    resampled = resample(data[key])
    return resampled.hvplot.area(x="index", y="min", y2="max", alpha=0.5) * \
           resampled.hvplot(x="index", y="mean", label=key, ylabel="CO2 intensity (gCO2/kWh)")


plot("FR") * plot("DE")

Out[103]:

Compared to Germany, France shines with really low CO2 intensity. Of course, one has to ask, at what price? Is the nuclear risk worth taking?

On the other side, seing this rather CO2 high intensity in Germany, one could ask: so 1000 Billion $ got you this kind of "green electricity"? Where did the money go because it doesn’t look green to me, although a lot of it is renewable. But that means the German electricity is contributing more to Climate Change (4x more) than French electricity.

What about averages per year? Can we see a trend in "greener electricity" in Germany?

In [107]:

df_average = pd.DataFrame([data["FR"].resample("Y").mean(),
              data["DE"].resample("Y").mean()]).transpose()

df_average.columns = ["yearly averages FR", "yearly averages DE"]

df_average

Out[107]:

	yearly averages FR	yearly averages DE
2015-12-30 23:00:00+00:00	49.500008	511.071502
2016-12-30 23:00:00+00:00	62.085879	502.348246
2017-12-30 23:00:00+00:00	71.003115	479.810522
2018-12-30 23:00:00+00:00	54.157468	463.221327
2019-12-30 23:00:00+00:00	55.883369	397.493792
2020-12-30 23:00:00+00:00	56.664114	353.771539
2021-12-30 23:00:00+00:00	56.247407	405.879554

In [110]:

df_average.hvplot(ylabel="gCO2/kWh")

Out[110]:

Emitted tons of CO2¶

Another way to reason about this discrepancy is to simply count emitted CO2.

In [113]:

def compute_emitted_co2(df):
    emitted_co2 = pd.Series(data=df.iloc[:, 0] * 0, index=df.index, name="CO2 emitted (tons of CO2)")
    for col in df.columns:
        label = col.split("_")[0]
        intensity = intensities[label]
        emitted_co2 += intensity * df[col].replace(np.nan, 0)
    emitted_co2 /= 1000 * 1000 # convert to tons
    return emitted_co2

data = {}
for country_code in ["FR", "DE"]:
    dfs = []
    for year in range(2015, 2022):
        df = pd.read_pickle(f"entsoe_data_{country_code}_{year}.pickle")
        dfs.append(df)
    df = pd.concat(dfs)
    df = convert_df(df)
    data[country_code] = compute_emitted_co2(df).resample(rule="W").sum()

from bokeh.models.formatters import DatetimeTickFormatter

formatter = DatetimeTickFormatter(months='%b %Y')
df_totals = pd.DataFrame([data["FR"].resample("Y").sum(),
              data["DE"].resample("Y").sum(),]).transpose() / 1e6
df_totals.columns = ["FR", "DE"] 

df_totals.hvplot.bar(rot=45, ylabel="CO2 emitted (1e6 tons of CO2)")

Out[113]:

In [115]:

df_totals

Out[115]:

	FR	DE
2015-12-30 23:00:00+00:00	0.026750	1.092464
2016-12-30 23:00:00+00:00	0.031737	1.101497
2017-12-30 23:00:00+00:00	0.037664	1.073566
2018-12-30 23:00:00+00:00	0.028733	1.021152
2019-12-30 23:00:00+00:00	0.028553	0.837654
2020-12-30 23:00:00+00:00	0.026308	0.702176
2021-12-30 23:00:00+00:00	0.028044	0.816597
2022-12-30 23:00:00+00:00	0.000251	0.008577

In [116]:

data["FR"].cumsum().hvplot(label="FR") * data["DE"].cumsum().hvplot(label="DE", title="CO2 emitted (tons of CO2)", ylabel="emitted tons of CO2 since 2015")

Out[116]:

Subtleties¶

production vs consumption: here come exports
prices: looking at electricity map, it seems prices in France are way higher than elsewhere -- what are these prices? who pays them? what do they mean?
investments: Nuclear is often criticized for being expensive, but how about global investments in Germany for grid + production (what is the number)?
safety of nuclear (report 2012 european stress test) https://www.french-nuclear-safety.fr/asn-inspects/european-stress-tests/stress-test-news/asn-report-on-the-complementary-safety-assessments-csa
we have data from the european environment agency about longer time series: https://www.eea.europa.eu/data-and-maps/daviz/co2-emission-intensity-9/#tab-chart_2_filters=%7B%22rowFilters%22%3A%7B%22ugeo%22%3A%5B%22France%22%5D%7D%3B%22columnFilters%22%3A%7B%7D%7D

In [ ]: