Let's analyse the CSV file provided by the people who wrote the 33 year summary.
The raw data is here: https://gitlab.com/maxigas/cccongresstalks/blob/master/cccongresstalks.csv
import pandas as pd
url = "https://gitlab.com/maxigas/cccongresstalks/raw/master/cccongresstalks.csv"
df = pd.read_csv(url, sep='|', engine='c', error_bad_lines=False)
df = df.replace(to_replace=pd.np.nan, value="")
df
year | congress | title | abstract | link | tags | |
---|---|---|---|---|---|---|
0 | 1984 | 1C3 | Kommunikation über den Computer. Grundwissen | Workshop | t | |
1 | 1984 | 1C3 | Datex und ähnliche Netzwerke | Workshop | t | |
2 | 1984 | 1C3 | Professionelle Mailboxen: Konzepte und Beispiele | Workshop | t | |
3 | 1984 | 1C3 | Bedienerfreundlichkeit und Datenschutz in Mail... | Diskussion | t | |
4 | 1984 | 1C3 | Psychische Störungen durch Computermißbrauch | Workshop | s | |
5 | 1984 | 1C3 | Offene Netze. 15 Jahre Erfahrung aus den USA | Workshop | s/t | |
6 | 1984 | 1C3 | Rund um Bildschrimtext | Workshop | t | |
7 | 1984 | 1C3 | Einführung ins Telefonsystem | Workshop | t | |
8 | 1984 | 1C3 | Datenfunk | Workshop | t | |
9 | 1984 | 1C3 | Jura für Hacker | Workshop | s | |
10 | 1984 | 1C3 | Modems. Von Datenklo bis Autodial und -answer | Workshop | t | |
11 | 1985 | 2C3 | [Die Entwicklung von Mailbox als Medium und di... | https://web.archive.org/web/20061208062857/htt... | ||
12 | 1985 | 2C3 | [Freunde aus anderen Ländern (bisher: AU, CH, ... | https://web.archive.org/web/20061208062857/htt... | ||
13 | 1985 | 2C3 | [Der CCC erörtert die preisgünstigste Datenver... | https://web.archive.org/web/20061208062857/htt... | ||
14 | 1986 | 3C3 | Sichere Kopplung an das Postnetz. Was läuft be... | http://www.offiziere.ch/trust-us/ds/17/001.htm | ||
15 | 1986 | 3C3 | Parlakom – das Parlament am Netz. Computer im ... | http://www.offiziere.ch/trust-us/ds/17/001.htm | ||
16 | 1986 | 3C3 | Computer Artists Cologne stellen sich vor | http://www.offiziere.ch/trust-us/ds/17/001.htm | ||
17 | 1986 | 3C3 | Datenfernübertragung für Anfänger | http://www.offiziere.ch/trust-us/ds/17/001.htm | ||
18 | 1986 | 3C3 | PC-Virenforum | - Was sind Computerviren; - Wie arebietne Vire... | http://www.offiziere.ch/trust-us/ds/17/001.htm | |
19 | 1986 | 3C3 | Video über den letzten Kongress | http://www.offiziere.ch/trust-us/ds/17/001.htm | ||
20 | 1986 | 3C3 | Kompromittierende Abstrahlung: Abhören von Mon... | http://www.offiziere.ch/trust-us/ds/17/001.htm | ||
21 | 1986 | 3C3 | Frühschoppen | Fünf Hacker aus sechs Ländern | http://www.offiziere.ch/trust-us/ds/17/001.htm | |
22 | 1986 | 3C3 | Renümee des Sysoptages vom letzten Congress. K... | http://www.offiziere.ch/trust-us/ds/17/001.htm | ||
23 | 1986 | 3C3 | Informationen zum Netzverbund FIDO-NET | http://www.offiziere.ch/trust-us/ds/17/001.htm | ||
24 | 1986 | 3C3 | Auswirkungen des 2. WiKg. | Workshop auch über “Hacker-Jäger” | http://www.offiziere.ch/trust-us/ds/17/001.htm | |
25 | 1986 | 3C3 | Desktop Publishing – die Zeitschrift vom Schre... | http://www.offiziere.ch/trust-us/ds/17/001.htm | ||
26 | 1986 | 3C3 | Regionale Vernetzung von Mailboxen, Serversystem | http://www.offiziere.ch/trust-us/ds/17/001.htm | ||
27 | 1986 | 3C3 | Btx als preiswerter Datenserver. Vorschlag zum... | http://www.offiziere.ch/trust-us/ds/17/001.htm | ||
28 | 1986 | 3C3 | Mailboxen – neue Konzepte | http://www.offiziere.ch/trust-us/ds/17/001.htm | ||
29 | 1989 | 6C3 | [Wie koennen Heimcomputer und Datenfernuebertr... | http://www.offiziere.ch/trust-us/ds/32/012_fem... | ||
... | ... | ... | ... | ... | ... | ... |
2420 | 2016 | 33C3 | Methodisch inkorrekt!: How thousands of compan... | Wer hat diese Jungs wieder reingelassen?! Nico... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2421 | 2016 | 33C3 | Corporate surveillance, digital tracking, big ... | Today virtually everything we do is monitored ... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2422 | 2016 | 33C3 | Memory Deduplication: The Curse that Keeps on ... | We are 4 security researchers who have collect... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2423 | 2016 | 33C3 | Liberté, Égalité, Fraternité... and privacy ?! | France is under a state of emergency since Nov... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2424 | 2016 | 33C3 | Do as I Say not as I Do: Stealth Modification ... | Input/Output is the mechanisms through which e... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2425 | 2016 | 33C3 | Talking Behind Your Back: Dissecting a Modern ... | In the last two years, the marketing industry ... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2426 | 2016 | 33C3 | Decoding the LoRa PHY: Wie Wertschätzung in (T... | LoRa is an emerging Low Power Wide Area Networ... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2427 | 2016 | 33C3 | Von Alpakas, Hasenbären und Einhörnern – Über ... | Wie würdigen verschiedene Tech-Communities das... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2428 | 2016 | 33C3 | From Server Farm to Data Table: Hackers' know... | Early digital computers were the size of room... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2429 | 2016 | 33C3 | Hacking collective as a laboratory | Talk presents findings from sociological inves... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2430 | 2016 | 33C3 | The 12 Networking Truths: Power and politics i... | In *The 12 Networking Truths* Swedish artist J... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2431 | 2016 | 33C3 | Ethics in the data society: Why Censorship is ... | This talk presents the idea that ethics as log... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2432 | 2016 | 33C3 | The Economic Consequences of Internet Censorship | Internet censorship today is widespread, both ... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2433 | 2016 | 33C3 | The High Priests of the Digital Age | The High Priests of the Digital Age Are Workin... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2434 | 2016 | 33C3 | Genetic Codes and what they tell us – and ever... | The genome – the final frontier – or just a co... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2435 | 2016 | 33C3 | Datenschutzgrundverordnung: Rechte für Mensche... | Ziel des Vortrages ist es, einen Überblick übe... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2436 | 2016 | 33C3 | The Ultimate Game Boy Talk | The 8-bit Game Boy was sold between 1989 and 2... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2437 | 2016 | 33C3 | Security Nightmares 0x11 | Was hat sich im letzten Jahr im Bereich IT-Sic... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2438 | 2016 | 33C3 | 33C3 Closing Ceremony: Mind-sets, state-of-the... | https://events.ccc.de/congress/2016/Fahrplan/s... | ||
2439 | 2016 | 33C3 | Mass Surveillance through Computational Lingui... | Even though the Snowden revelations for the fi... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2440 | 2016 | 33C3 | Beyond Virtual and Augmented Reality: 50 most ... | With recent development in capture technology,... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2441 | 2016 | 33C3 | Retail Surveillance / Retail Countersurveillan... | From geo-magnetic tracking for smartphones to ... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2442 | 2016 | 33C3 | Rebel Cities: Was der Anti-Terror-Kampf von de... | Cities are emerging as a space for local actio... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2443 | 2016 | 33C3 | Privatisierung der Rechtsdurchsetzung: About m... | 2016 drehte der Anti-Terror-Kampf in der EU au... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2444 | 2016 | 33C3 | Surveilling the surveillers: Post-trump open b... | In the last years, technology-savvy artists an... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2445 | 2016 | 33C3 | Prediction Fail: Lightning Talks | Live action role play as method of fast social... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2446 | 2016 | 33C3 | Lightning Talks Day 4: The usual extremely fac... | Lightning Talks are short lectures (almost) an... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2447 | 2016 | 33C3 | 33C3 Infrastructure Review: Deciding between t... | NOC, POC, VOC and QOC show interesting facts a... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2448 | 2016 | 33C3 | The Transhumanist Paradox: Theresa May’s effor... | How does a pluralist society – a society built... | https://events.ccc.de/congress/2016/Fahrplan/s... | |
2449 | 2016 | 33C3 | Understanding the Snooper’s Charter: Secure Bo... | The ‚Investigative Powers Bill‘ is about to be... | https://events.ccc.de/congress/2016/Fahrplan/s... |
2450 rows × 6 columns
Now that we have loaded the data, let's do some statistics.
First, the number of talks by year.
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('bmh')
gb = df.groupby(by='year')
gb.congress.count().describe()
count 31.000000 mean 79.032258 std 58.214823 min 3.000000 25% 26.000000 50% 88.000000 75% 105.000000 max 214.000000 Name: congress, dtype: float64
gb.congress.count().plot.bar()
<matplotlib.axes._subplots.AxesSubplot at 0xc151b70>
We can do the same thing by average summary length.
df['abstract_len'] = df['abstract'].apply(lambda s: len(s))
df.groupby(by='year').abstract_len.mean().plot.bar()
<matplotlib.axes._subplots.AxesSubplot at 0xc1c4d30>
There seems to be some anomaly in 1994: apparently whole talks got embedded instead of just abstracts.
df.groupby(by='year').abstract_len.mean().plot.bar()
plt.ylim(0, 1000)
(0, 1000)
Now, let's do some statistics about the abstracts. What are some popular words?
from collections import Counter
all_abstracts = df.abstract.str.lower().str.split('\s').sum()
c = Counter(all_abstracts)
fig, ax = plt.subplots(figsize=(10, 6))
pd.DataFrame((c.most_common(50)), columns=['word', 'count']).set_index('word').plot.bar(ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0xeb7b2e8>
Not super useful. Let's do a wordcloud.
from wordcloud import WordCloud
wordcloud = WordCloud(width=1000, height=1000).generate(" ".join(all_abstracts))
plt.figure(figsize=(10, 10))
plt.imshow(wordcloud)
<matplotlib.image.AxesImage at 0x15a2f160>
Actually still not useful. Let's eliminate stopwords.
from wordcloud import STOPWORDS
STOPWORDS.update(['und', 'die', 'zu', 'auf', 'der', 'von',
'das', 'den', 'ein', 'dem', 'im', 'auch',
'noch', 'ist', 'es', 'mit', 'talk', 'des',
'wie', 'sich', 'wir', 'will', 'für', 'nicht', 'um', 'als',
'wird', 'werden', 'eine', 'über', 'sie', 'wenn', 'durch', 'aber', 'dabei',
'zum', 'aus', 'sind', 'nur', 'gibt', 'einer', '-', 'show', 'hat', 'man',
'einem', 'welche', 'kann', 'einen', 'last', 'well', 'nach'])
wordcloud = WordCloud(background_color='white', width=1000, height=1000, stopwords=STOPWORDS).generate(" ".join(all_abstracts))
plt.figure(figsize=(10, 10))
plt.imshow(wordcloud)
plt.axis('off')
(-0.5, 999.5, 999.5, -0.5)
We can try to generate one of these wordclouds for each year of the Congress.
wordclouds = {}
for year in df.year.unique():
abstracts = " ".join(df.abstract[df.year == year].str.lower().str.split('\s').sum())
if len(abstracts.strip()) > 0:
wordcloud = WordCloud(background_color='white', width=1000, height=1000, stopwords=STOPWORDS).generate(abstracts)
wordclouds[year] = wordcloud
print(year)
1984 1985 1986 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Let's now display this interactively:
from ipywidgets import interact
@interact
def display_year(year_index=[0, len(wordclouds.keys()) - 1]):
"Displays each wordcloud by year."
year = list(wordclouds.keys())[year_index]
wordcloud = wordclouds[year]
plt.figure(figsize=(10, 10))
plt.title(year)
plt.imshow(wordcloud)
plt.axis('off')