National Archives Index Explorer¶

The National Archives provide a web based search interface for searching index catalogues of various National Archives collections.

As well as a simple search box that does a free text search over all record columns (presumably?), we can also run advanced searches that can include reference and date limits.

Search results containing the index records for your search hits can be downloaded as a CSV file.

By searching for records associated with a particular collection tag / reference, we can obtain, and thence download, a copy of the collection's index records.

We can then load these records into our own database and search them using our own search tools, as well as annotation the records using things like named entity recognition.

So let's have a go at that...

Obtaining the Index Data¶

Searching for index records associated with HO-40-1 over the period 1800-15 leads us to a search results page with the URL:

https://discovery.nationalarchives.gov.uk/results/r?_cr=HO%2040-1&_dss=range&_sd=1810&_ed=1815&_ro=any&_st=adv

This HTTP GETs the URL https://discovery.nationalarchives.gov.uk/results/r with arguments:

_cr:'HO 40-1'
_dss:'range'
_sd:1810
_ed:1815

To download the data records, we then need to click a form button, rather than a web link.

We can automate this procedure by constructing the desired URL, with appropriate arguments, ensuring the correct form download options are set, "click" the download button and capture the response.

In [1]:

# Mechanical soup is a combination of a simple virtual browser (mechanize) and
# a web scraping package (BeautifulSoup)
import mechanicalsoup

Define the URL of the search results and download page:

In [2]:

url='https://discovery.nationalarchives.gov.uk/results/r'

Specify the search limits around the collection we are interested in:

In [3]:

params = {'_cr':'HO 40-1','_dss':'range','_sd':1810,'_ed':1815}

Open the page with those parameters:

In [4]:

browser = mechanicalsoup.StatefulBrowser()
browser.open(url, params=params)

Out[4]:

<Response [200]>

Configure the search form:

In [5]:

browser.select_form('form[action="/search/download"]')
browser["expSize"] = "10"
#browser.get_current_form().print_summary()

"Click" the download button:

In [6]:

response = browser.submit_selected()

Read the response into a pandas dataframe and preview the result, casting date fields into date format:

In [7]:

#StringIO is a function for wrapping a file pointer around a string
from io import StringIO

#Pandas is a package for working with tabular datasets
import pandas as pd

In [8]:

df = pd.read_csv(StringIO(response.text))

#Force the start and end date columns into a date format
df['Start Date'] = pd.to_datetime(df['Start Date'],errors='coerce', dayfirst=True)
df['End Date'] = pd.to_datetime(df['End Date'],errors='coerce', dayfirst=True)

df.head(3)

Out[8]:

	Citable Reference	Context Description	Title	Description	Start Date	Start Date (num)	End Date	End Date (num)	Covering Dates	Held by	Catalogue level	References	Opening Date	Closure Status	Closure Type	Closure Code	Subjects	Digitised	ID	Score
0	HO 40/1	Home Office: Disturbances Correspondence.	HO 40. The Luddite riots - reports	HO 40. The Luddite riots - reports.	1812-01-01	18120101	1855-12-31	18551231	1812-1855	The National Archives, Kew	6	NaN	NaN	Open Document, Open Description	Normal Closure before FOI Act:	30	C10086 Public disorder	Yes	C3083303	0.177554
1	HO 40/1/6	Home Office: Disturbances Correspondence. HO 4...	Lancashire. Lt. Gen. (copies of (1) above) Mai...	Lancashire. Lt. Gen. (copies of (1) above) Mai...	1812-05-01	18120501	1812-06-30	18120630	1812 May - June	The National Archives, Kew	7	\r\nFormer Reference Pro: HO 40/1/(6)	NaN	Open Document, Open Description	Normal Closure before FOI Act:	30	C10086 Public disorder	NaN	C6573173	0.158834
2	HO 40/1/7	Home Office: Disturbances Correspondence. HO 4...	Yorkshire magistrates reports (copies of (1) a...	Yorkshire magistrates reports (copies of (1) a...	1812-03-01	18120301	1812-05-31	18120531	1812 Mar. - May	The National Archives, Kew	7	\r\nFormer Reference Pro: HO 40/1/(7)	NaN	Open Document, Open Description	Normal Closure before FOI Act:	30	C10086 Public disorder	NaN	C6573174	0.158834

Building Up A Larger Index¶

We can build up a larger index by extending our search, or by combining the downloads from mutliple searches.

Create a function to do the download of a single index:

In [9]:

def get_index(reference, start=1810, end=1815, typ='ref'):
    """Download index for a specify reference and convert it to a dataframe."""
    
    url='https://discovery.nationalarchives.gov.uk/results/r'
    params = {'_dss':'range','_sd':start,'_ed':end}
    
    if typ=='search':
        params['_q']=reference
    else:
        params['_cr']=reference
    
    browser = mechanicalsoup.StatefulBrowser()
    browser.open(url, params=params)
    
    #No results
    if browser.get_current_page().find("div", {"class": "emphasis-block no-results"}):
        return pd.DataFrame()
    
    browser.select_form('form[action="/search/download"]')
    browser["expSize"] = "10"
    
    response = browser.submit_selected()
    
    _df = pd.read_csv(StringIO(response.text))

    #Force the start and end date columns into a date format
    _df['Start Date'] = pd.to_datetime(_df['Start Date'], errors='coerce', dayfirst=True)
    _df['End Date'] = pd.to_datetime(_df['End Date'], errors='coerce', dayfirst=True)
    
    return _df 

In [10]:

get_index('HO 42').head(3)

Out[10]:

	Citable Reference	Context Description	Title	Description	Start Date	Start Date (num)	End Date	End Date (num)	Covering Dates	Held by	Catalogue level	References	Opening Date	Closure Status	Closure Type	Closure Code	Subjects	Digitised	ID	Score
0	HO 42	Home Office: Domestic Correspondence, George III.	Home Office: Domestic Correspondence, George III	Original Home Office domestic letters. PLEASE ...	1782-01-01	17820101	1820-12-31	18201231	1782-1820	The National Archives, Kew	3	NaN	NaN	NaN	Normal Closure before FOI Act:	30	NaN	NaN	C8906	0.032754
1	HO 42/108	Home Office: Domestic Correspondence, George III.	HO 42. Letters and Papers. Supplementary.	HO 42. Letters and Papers. Supplementary.	1810-07-01	18100701	1810-10-31	18101031	1810 July 01-1810 Oct 31	The National Archives, Kew	6	NaN	NaN	Open Document, Open Description	Normal Closure before FOI Act:	30	NaN	Yes	C1905727	0.026892
2	HO 42/111	Home Office: Domestic Correspondence, George III.	HO 42. Letters and Papers	HO 42. Letters and Papers.	1811-04-01	18110401	1811-06-30	18110630	1811 Apr 01-1811 June 30	The National Archives, Kew	6	NaN	NaN	Open Document, Open Description	Normal Closure before FOI Act:	30	NaN	Yes	C1905730	0.026892

Note that some searches seem to be quite wideranging against particular codes (rather than lookups by reference), and some responses also appear to contain transcipts in the Description field.

In [45]:

#Pull out the first 500 characters of records longer than 3000 characters
[r[:500] for r in get_index('HO 42',typ='search')['Description'].to_list() if len(r)>2000]

Out[45]:

['Report of Soulden Lawrence on 16 individual petitions (13 from the prisoner; H Neale, officer of marines; Mr Castle, Clerk of the Crown for Durham and A Graham) and 4 collective petitions (34 members of the corporation of Durham; 2 others (31 and 34 people) with similar signatories and 3 people, the prisoner and others of London) on behalf of John Davison, late a captain in the Royal Marines, convicted at the Somerset Assizes held in Taunton in August 1809, for the theft of 6 yards of muslin, va',
 'General registers, early warrant and entry books and other records covering the multifarious subjects for which the Home Office has had responsibility; also records of subjects which do not fit into other divisional categories. Broadly, the subjects and their series in this division are as follows: Addresses, HO 55, HO 57, HO 249 Admiralty, HO 28, HO 29 Advertisements, HO 174 Animals and wild birds, HO 183, HO 285 Automatic data processing, HO 337 Betting, gaming and lotteries, HO 320 Bouillon p',
 "Board and Committee minutes HO.RVI/1/1-7 Board of Governors minutes, Court of Governors before 1948. 1911 - 1971 (7 volumes) See also HO.RVI/47 for Joint Minutes with Regional Hospital Board. HO.RVI/2/1-61 House Committee minutes, 1751 - 1971 (61 volumes) Volumes 1-4, 6-11, 17-18 contain patients' admissiosn and discharges. HO.RVI/151/1-6 Rough House Committee minutes, 1752 - 1755 (6 volumes) HO.RVI/3/1-2 Anaesthetic Committee minutes, 1924 - 1948 (2 volumes) HO.RVI/4/1-2 Appeal Committee minute",
 'Administration HO.PM/1/1-18 Minutes 1760 - 1945 From 1760 to 1822 weekly court minutes, to 1900 also House Committee minutes, from 1900 also Finance Committee and Management Committee minutes. (18 volumes, 27 papers) HO.PM/2 Charity for the Relief of Poor Women Lying-in at Their Own Homes, minutes 1787 - 1858 Lying-in hospital House Committee minutes, 1859 (1 volume) HO.PM/3/1-3 Medical Staff meetings minutes, 1917 - 1951 (3 volumes) HO.PM/45 Honorary medical staff meetings minutes, 1935 - 1949 ',
 "Report of Alexander Thomson on 1 individual petition (the prisoner [detailed, gives information concerning family and business]) on behalf of Peter Degraves, merchant of London and Manchester, Lancashire, tried at the 'last' Lancaster Assizes held in 1810 and convicted of stealing a large quantity of goods including French cambrics, value between Â£2,000-3,000, property of John Parson, merchant of Manchester, from the warehouse of Thomas Benbridge/Thomas Bainbridge. Evidences supplied by John Pa",
 '1707-1812 Watchet Harbour, copies of Acts, 1707-08, 1720-21. 1770. 1809 printed and ms.) with petitions etc.; copies of Minehead Harbour Act 1711; accounts and estimates for repair c.1720 with undated agreement to build a quay at Watchet by Wm. Rowe of Bridgwater, mason, 1708; petitions re. need for improvements, 1811 with correspondence, 1812. 1 bundle 1707-1809 ms copies of Watchet Harbour Acts (as above). 1 volume 1772-1808 Watchet Quay maintenance accounts 1772-1808 (1 volume) 1782-1808 (2 c',
 "HIL/1 Records of firm and Hillman family HIL/2 Co-partnership agreements HIL/3 Apprenticeship indentures HIL/4 Assignments of debts HIL/5 Wills and executorship papers HIL/6 Title deeds of clients' properties HIL/6/1-14 Lewes: St Thomas at Cliffe HIL/6/15-29 Lewes: other parishes HIL/6/30-32 Alfriston HIL/6/33 Arlington HIL/6/34-36 Barcombe HIL/6/37 Bishopstone HIL/6/38-40 Brighton HIL/6/41 Ditchling HIL/6/42-48 Eastbourne HIL/6/49 Framfield HIL/6/50 Friston HIL/6/51-53 Hailsham HIL/6/54 Helling",
 'SUMMARY OF CONTENTS L/C Lieutenancy - County L/C/D Deputy Lieutenancy L/C/C County - commissions L/C/C/1 Original commissions; 1853-1861 L/C/C/2 Letters of royal approval; 1835-1913 L/C/C/3 Correspondence and papers; 1778-1915 L/C/C/4 Draft commissions and precedents; 1804-1870 L/C/C/5 Lists of Deputy Lieutenants; 1807-1852 L/C/G General meetings L/C/M Militia L/C/M/1 Lists of men enrolled; 1803-1855 L/C/M/2 Subdivisional returns; 1806-1831 L/C/M/3 Regimental returns; 1804-1874 L/C/M/4 Correspon']

We can now use that function to download and combine indexes for multiple references:

In [11]:

search_references = ['HO 40-1', 'HO 40-2', 'HO 43-19', 'HO 43-20', 'HO 43-21', 'HO 42-110']

In [41]:

df_combined = pd.DataFrame()

for reference in search_references:
    _df = get_index(reference)
    print(f'{reference}: {len(_df)}')
    df_combined = df_combined.append( _df )
    
df_combined = df_combined.sort_values('Citable Reference').reset_index(drop=True)
df_combined.head()

HO 40-1: 9
HO 40-2: 10
HO 43-19: 1
HO 43-20: 1
HO 43-21: 1
HO 42-110: 1

Out[41]:

	Citable Reference	Context Description	Title	Description	Start Date	Start Date (num)	End Date	End Date (num)	Covering Dates	Held by	Catalogue level	References	Opening Date	Closure Status	Closure Type	Closure Code	Subjects	Digitised	ID	Score
0	HO 40/1	Home Office: Disturbances Correspondence.	HO 40. The Luddite riots - reports	HO 40. The Luddite riots - reports.	1812-01-01	18120101	1855-12-31	18551231	1812-1855	The National Archives, Kew	6	NaN	NaN	Open Document, Open Description	Normal Closure before FOI Act:	30	C10086 Public disorder	Yes	C3083303	0.177578
1	HO 40/1/1	Home Office: Disturbances Correspondence. HO 4...	Cheshire, Lancashire, Yorkshire ff 1-173 ff 17...	Cheshire, Lancashire, Yorkshire ff 1-173 ff 17...	1812-03-01	18120301	1812-06-30	18120630	1812 Mar. - June	The National Archives, Kew	7	\r\nFormer Reference Pro: HO 40/1/(1)	NaN	Open Document, Open Description	Normal Closure before FOI Act:	30	C10086 Public disorder	NaN	C6573168	0.152372
2	HO 40/1/2	Home Office: Disturbances Correspondence. HO 4...	Cheshire magistrates reports (copies of (1) ab...	Cheshire magistrates reports (copies of (1) ab...	1812-03-01	18120301	1812-06-30	18120630	1812 Mar. - June	The National Archives, Kew	7	\r\nFormer Reference Pro: HO 40/1/(2)	NaN	Open Document, Open Description	Normal Closure before FOI Act:	30	C10086 Public disorder	NaN	C6573169	0.152372
3	HO 40/1/3	Home Office: Disturbances Correspondence. HO 4...	Lancashire magistrates reports (copies of (1) ...	Lancashire magistrates reports (copies of (1) ...	1812-03-01	18120301	1812-05-31	18120531	1812 Mar. - May	The National Archives, Kew	7	\r\nFormer Reference Pro: HO 40/1/(3)	NaN	Open Document, Open Description	Normal Closure before FOI Act:	30	C10086 Public disorder	NaN	C6573170	0.156902
4	HO 40/1/4	Home Office: Disturbances Correspondence. HO 4...	Lancashire magistrates reports (copies of (1) ...	Lancashire magistrates reports (copies of (1) ...	1812-03-01	18120301	1812-06-30	18120630	1812 Mar. - June	The National Archives, Kew	7	\r\nFormer Reference Pro: HO 40/1/(4)	NaN	Open Document, Open Description	Normal Closure before FOI Act:	30	C10086 Public disorder	NaN	C6573171	0.157119

We can get a better view over the descriptions:

In [13]:

df_combined['Description'].to_list()

Out[13]:

['HO 40. The Luddite riots - reports.',
 'Cheshire, Lancashire, Yorkshire ff 1-173 ff 174-283.',
 'Cheshire magistrates reports (copies of (1) above) ff 284-341.',
 'Lancashire magistrates reports (copies of (1) above) ff 342-371.',
 'Lancashire magistrates reports (copies of (1) above) ff 372-471.',
 'Enclosures to a letter dated (copies of (1) above) 16 May, 1812 in (4) above ff 472-485.',
 "Lancashire. Lt. Gen. (copies of (1) above) Maitland's reports ff 486-540.",
 'Yorkshire magistrates reports (copies of (1) above) ff 541-596.',
 'Yorkshire Sir Francis Lindley (copies of (1) above) Wood, Vice Lt. West Riding; reports ff 597-624.',
 'HO 40. The Luddite riots - military reports.',
 'Cheshire ff 1a - 115.',
 'Lancashire ff 116-253.',
 'Yorkshire ff 254-399 ff 400-562.',
 'Chelmsford, London and miscellaneous ff 563-646.',
 'Notebook containing names of known and suspected Luddites.',
 'Notebook containing various payments to constables, etc.',
 'Copies of letters addressed to Lt. Gen. Maitland.',
 'Copies of letters addressed to Lt. Gen. Maitland.',
 'Copies of letters addressed to Lt. Gen. Maitland.',
 'HO 42. Letters and Papers.',
 'Domestic Letter Book.',
 'Domestic Letter Book.',
 'Domestic Letter Book.']

Extract Named Entities¶

The title field appears to be a subset of the description field (up to the first N characters).

We can parse named entities out of the description field to make searching the records easier.

The spacy natural language processing (NLP) package provides a named entity tagger that is good enough to get us started.

In [14]:

import spacy

In [15]:

#Install the package that provides the named entity model
#!python -m spacy download en_core_web_sm

Here's an example of running the named entity tagger:

In [16]:

nlp = spacy.load("en_core_web_sm")

TEST_STRING = "Joseph Radcliffe, wrote a letter to the Home Office on March 5th, 1812 about the Luddites."

doc = nlp(TEST_STRING)

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Joseph Radcliffe 0 16 PERSON
the Home Office 36 51 ORG
March 5th, 1812 55 70 DATE
Luddites 81 89 GPE

GPE is a "geo-political entity". There is also a related NORP: "nationalities or religious or political groups". The numbers are the index values identifying the first and last character of the extracted string in the original string.

We can create a simple function to pull out the elements we want, returning a list of all elements extracted from a block of text.

In [17]:

def entity_rec(txt):
    """Extract entities from text and return a list entity text and entity type tuples."""
    
    doc = nlp(txt)
    
    ents = []
    for ent in doc.ents:
        #ents.append((ent.text, ent.start_char, ent.end_char, ent.label_))
        #Exclude certain entity types from the returned list
        if ent.label_ not in ['CARDINAL']:
            ents.append((ent.text, ent.label_))
        
    return ents

We can apply this function to the Description text associated with each row:

In [18]:

df['Entities'] = df['Description'].apply(lambda x: entity_rec(x))
df.head(3)

Out[18]:

	Citable Reference	Context Description	Title	Description	Start Date	Start Date (num)	End Date	End Date (num)	Covering Dates	Held by	...	References	Opening Date	Closure Status	Closure Type	Closure Code	Subjects	Digitised	ID	Score	Entities
0	HO 40/1	Home Office: Disturbances Correspondence.	HO 40. The Luddite riots - reports	HO 40. The Luddite riots - reports.	1812-01-01	18120101	1855-12-31	18551231	1812-1855	The National Archives, Kew	...	NaN	NaN	Open Document, Open Description	Normal Closure before FOI Act:	30	C10086 Public disorder	Yes	C3083303	0.177554	[(Luddite, NORP)]
1	HO 40/1/6	Home Office: Disturbances Correspondence. HO 4...	Lancashire. Lt. Gen. (copies of (1) above) Mai...	Lancashire. Lt. Gen. (copies of (1) above) Mai...	1812-05-01	18120501	1812-06-30	18120630	1812 May - June	The National Archives, Kew	...	\r\nFormer Reference Pro: HO 40/1/(6)	NaN	Open Document, Open Description	Normal Closure before FOI Act:	30	C10086 Public disorder	NaN	C6573173	0.158834	[(Lancashire, ORG), (Maitland, GPE)]
2	HO 40/1/7	Home Office: Disturbances Correspondence. HO 4...	Yorkshire magistrates reports (copies of (1) a...	Yorkshire magistrates reports (copies of (1) a...	1812-03-01	18120301	1812-05-31	18120531	1812 Mar. - May	The National Archives, Kew	...	\r\nFormer Reference Pro: HO 40/1/(7)	NaN	Open Document, Open Description	Normal Closure before FOI Act:	30	C10086 Public disorder	NaN	C6573174	0.158834	[(Yorkshire, PERSON)]

3 rows × 21 columns

We can then generate a long format data frame that associates each entity tuple with each record, as identified by the record ID:

In [19]:

df_entities = df.explode('Entities').reset_index(drop=True)[['ID','Entities']]
df_entities.head(3)

Out[19]:

	ID	Entities
0	C3083303	(Luddite, NORP)
1	C6573173	(Lancashire, ORG)
2	C6573173	(Maitland, GPE)

We can then split out the entity tuple elements into separate columns, noting that the entity type recognition, as well the entity extraction, may be a bit ropey:

In [20]:

df_entities[['Entity','Type']] = df_entities['Entities'].apply(pd.Series)
df_entities.drop(columns='Entities', inplace=True)
df_entities.head(10)

Out[20]:

	ID	Entity	Type
0	C3083303	Luddite	NORP
1	C6573173	Lancashire	ORG
2	C6573173	Maitland	GPE
3	C6573174	Yorkshire	PERSON
4	C6573175	Yorkshire Sir	PERSON
5	C6573175	Francis Lindley	PERSON
6	C6573175	West Riding	GPE
7	C6573171	Lancashire	ORG
8	C6573170	Lancashire	ORG
9	C6573168	Cheshire	ORG

If we wanted to work on this a bit more, it would be handy to try be be able to recognise English county and placenames as such. We could also try to munge any DATE elements through a robust date parser in order to get the dates into an actual date object.

One other useful bit of information are the folio / page numbers.

In [22]:

import re

TEST_STRING_2 = "Cheshire, Lancashire, Yorkshire ff 1-173 ff 174-283."

FF_PATTERN = r"ff \d+-\d+"

m = re.findall(FF_PATTERN, TEST_STRING_2, re.MULTILINE)
m

Out[22]:

['ff 1-173', 'ff 174-283']

Again, we can capture these into a long dataframe:

In [32]:

df['Pages'] = df['Description'].apply(lambda x: re.findall(FF_PATTERN, x, re.MULTILINE))
df[['Description','Pages']].head(10)

Out[32]:

	Description	Pages
0	HO 40. The Luddite riots - reports.	[]
1	Lancashire. Lt. Gen. (copies of (1) above) Mai...	[ff 486-540]
2	Yorkshire magistrates reports (copies of (1) a...	[ff 541-596]
3	Yorkshire Sir Francis Lindley (copies of (1) a...	[ff 597-624]
4	Lancashire magistrates reports (copies of (1) ...	[ff 372-471]
5	Lancashire magistrates reports (copies of (1) ...	[ff 342-371]
6	Cheshire, Lancashire, Yorkshire ff 1-173 ff 17...	[ff 1-173, ff 174-283]
7	Cheshire magistrates reports (copies of (1) ab...	[ff 284-341]
8	Enclosures to a letter dated (copies of (1) ab...	[ff 472-485]

We can make the table longer by exploding multiple page references for any given record, and then also splitting out the first and last page reference:

In [40]:

df_pages = df.explode('Pages').reset_index(drop=True)[['ID','Pages']].dropna()
df_pages[['Start', 'End']] = df_pages['Pages'].str.replace('ff','').str.strip().str.split('-').apply(pd.Series)
df_pages.sort_values(['ID','Start'], inplace=True)
df_pages.reset_index(drop=True, inplace=True)
df_pages.head(10)

Out[40]:

	ID	Pages	Start	End
0	C6573168	ff 1-173	1	173
1	C6573168	ff 174-283	174	283
2	C6573169	ff 284-341	284	341
3	C6573170	ff 342-371	342	371
4	C6573171	ff 372-471	372	471
5	C6573172	ff 472-485	472	485
6	C6573173	ff 486-540	486	540
7	C6573174	ff 541-596	541	596
8	C6573175	ff 597-624	597	624

Referencing Into Actual PDF Documents¶

When downloading a scanned collection from the National Archives, the scan associated with a reference, for example, the scan associated with HO 40/1, may be split into several separate PDF documents.

We can merge these into a single document, which makes working with it slghtly easier from a programmatic point of view, albeit at making the memory requirements when dealing with a particular collection slightly heavier...

The following cell finds the filenames of all the PDFs I downloaded as part of the HO-40-1 download and sorts them.

In [25]:

from os import listdir

reference = 'HO-40-1'
pdfs = [f'../HO - Home Office/{f}' for f in listdir('../HO - Home Office') if f.startswith(reference)]
pdfs.sort()

pdfs[:3]

Out[25]:

['../HO - Home Office/HO-40-1_01.pdf',
 '../HO - Home Office/HO-40-1_02.pdf',
 '../HO - Home Office/HO-40-1_03.pdf']

We can then merge all these separate PDFs into a single PDF and save it as a new file:

In [26]:

from PyPDF2 import PdfFileMerger

merger = PdfFileMerger()

for pdf in pdfs:
    merger.append(pdf)

#Save the merged PDF
merger.write(f"{reference}_result.pdf")
merger.close()

We can view specified pages within the merged PDF as an image file, converted from the PDF using ImageMagick, at a specific page number.

In [27]:

page_num = 500

In [28]:

#The wand package provides a Python API for the Imagemagick application
#!pip3 install --user Wand
from wand.image import Image as WImage

print(f'Displaying at PDF page {page_num}.')

WImage(filename=f'{reference}_result.pdf[{page_num}]',resolution=200)

Displaying at PDF page 500.

Out[28]:

In [ ]: