(Update:
(It's like 'ReadMe', for notebooks, innit. Hilarious.)
To achieve our aims we shall do the following:
That's nearly everything.
It isn't quite everything though. Several other files above folder also need modifying for this to work.
I'll note that down here or somewhere.
Let's get cracking.
Setup
import os
import numpy as np
from IPython.display import Image, SVG, HTML, display, clear_output
main_dir = '/media/sf_WINDOWS_D_DRIVE/Neurodebian/code/git_repos/jg_fork_of_kk_phd-thesis-template'
thesis_dir = '/media/sf_WINDOWS_D_DRIVE/Neurodebian/code/git_repos/DoctoralThesis'
nbc_dir = main_dir + '/JG_hacks/Nbconvertness'
do_real = True
chaps2do = [1,2,3,4,5,6,7,8]
do_mock = False#True
mockchaps2do = [1,2,3,4]
# (just in case we weren't there already...)
os.chdir(main_dir + '/JG_hacks')
Previously, I had been using IPython.display.Image to embed figures in the notebooks. I really would rather do things this way. But turns out that nbconvert uses a different function to embed images than this template seems to expect. Markdown image embedding seems to be done the same , however (with 'includegraphics').
So I reluctantly changed all my figures so that they are markdown.
An important additional thing necessary to do here was make sure they were the correct dimensions for A4 paper. Embedding images deals with this automatically, but markdown images don't. Also, this means that the images themselves now need to be copied over, and the appropriate paths set. This is done below.
Fortunately, it seems that the compile function in this template does a better job than nbconvert's '--post PDF' post-processor when the images don't fit. They still look wrong, but you can see what you need to fix in order to get them correct.
In the following:
os.system('rm -r ' + nbc_dir)
os.system('mkdir ' + nbc_dir)
os.system('mkdir %s/figures' %nbc_dir)
os.system('mkdir %s/figures_orig' %nbc_dir)
0
os.chdir(main_dir + '/JG_Hacks/Nbconvertness')
# Note this basically deletes everything and starts from scratch!
if do_real:
os.system('rm -r ' + main_dir + '/Chapter*')
for c in chaps2do:
os.system('mkdir ' + main_dir + '/Chapter%s' %c)
os.system('mkdir ' + main_dir + '/Chapter%s/figures' %c)
os.system('mkdir ' + main_dir + '/Chapter%s/figures_orig' %c)
os.system('cp %s/Chapter%s/Chapter%s.ipynb %s/Chapter%s/' %(thesis_dir,c,c,main_dir,c) )
os.system('cp %s/Chapter%s/figures/*.png %s/Chapter%s/figures/' %(thesis_dir,c,main_dir,c) )
os.system('cp %s/Chapter%s/figures/*.png %s/Chapter%s/figures_orig/' %(thesis_dir,c,main_dir,c) )
os.system('cp ' + main_dir + '/Chapter%s/figures/*.png %s/figures/' %(c,nbc_dir))
os.system('cp ' + main_dir + '/Chapter%s/figures_orig/*.png %s/figures_orig/' %(c,nbc_dir))
# also copy figures to a common dir in the root foldoer
# ...because the chapter parsing was mixing up numbers a bit...
# (also added a path in graphicspath to this folder..)
os.system('cp -r %s/figures %s/ ' %(nbc_dir,main_dir))
0
It's useful to note that I am actually using the full 'nbconvert --to latex --post PDF' function here. So we do also get an nbconverted PDF. I then do some other things with the tex files to get a more fancy PDF that I can submit as-is.
import glob
# \newf = f.replace('/figures_orig/', '/figures/')
"""
if do_real:
for c in chaps2do:
for f in glob.glob('%s/Chapter%s/figures_orig/*.png' %(main_dir,c)):
#print f
os.system('convert -size 320x85^ %s %s' %(f,f.replace('/figures_orig/', '/figures/')))
"""
# skipping this for now; trying the latex command...
"""
fs= glob.glob(nbc_dir + '/figures/*.png')
for f in fs:
print 'applying imagemack convert to %s; size=320x85' %f.split('/figures/')[-1]
os.system('convert -size 320x85^ %s %s' %(f,f) ) #f.replace('/figures_orig/', '/figures/')))
""";
Make template
%%writefile jg_thesis_nbconvert_template.tplx
((= This line inherits from the built in template that you want to use. =))
((* extends 'report.tplx' *))
((* block date *))
\date{\today}
((* endblock date *))
((* block author *))
\author{John David Griffiths}
((* endblock author *))
((* block title *))
\title{The white matter disonnection syndrome in neurocognitive ageing}
((* endblock title *))
((* block packages *))
((( super() )))
\usepackage[round]{natbib}
\usepackage[doublespacing]{setspace}
\usepackage{parskip}
((* endblock packages *))
((* block commands *))
% Prevent overflowing lines due to hard-to-break entities
\sloppy
% Setup hyperref package
\hypersetup{
breaklinks=true, % so long urls are correctly broken across lines
hidelinks
}
% Slightly bigger margins than the latex defaults
\geometry{verbose,tmargin=1in,bmargin=1in,lmargin=1in,rmargin=1in}
%\parskip=2\baselineskip \advance\parskip by 0pt plus 20pt
\setlength{\parskip}{0pt} % 1ex plus 0.5ex minus 0.2ex}
\setlength{\parindent}{0pt}
((* endblock commands *))
((* block bibliography *))
\bibliographystyle{apalike}
\bibliography{Thesis}
((* endblock bibliography *))
% Disable input cells
((* block input_group *))
((* endblock input_group *))
((= This line selects the cell style. =))
((* set cell_style = 'style_bw_python.tplx' *))
% Define block headings
% Note: latex will only number headings that aren't starred
% (i.e. \subsection , but not \subsection* )
((* block h1 -*))
((* endblock h1 -*))
((* block h2 -*))\chapter((* endblock h2 -*))
((* block h3 -*))\section((* endblock h3 -*))
((* block h4 -*))\subsection((* endblock h4 -*))
((* block h5 -*))\subsubsection((* endblock h5 -*))
((* block h6 -*))\paragraph*((* endblock h6 -*))
((* block h7 -*))\subparagraph*((* endblock h7 -*))
Writing jg_thesis_nbconvert_template.tplx
Important note about section headings:
In order to get both markdown (notebook) and latex playing ball, I:
I am also, at the moment at least, not using headings for abstract. So that's just bold font markdown.
"""
((* block h1 -*))\chapter((* endblock h1 -*))
((* block h2 -*))\section((* endblock h2 -*))
((* block h3 -*))\subsection((* endblock h3 -*))
((* block h4 -*))\subsubsection((* endblock h4 -*))
((* block h5 -*))\paragraph((* endblock h5 -*))
((* block h6 -*))\subparagraph((* endblock h6 -*))
""";
#%load jg_thesis_nbconvert_template.tplx
Copy over bibliography
os.system('cp /media/sf_WINDOWS_D_DRIVE/CloudStorage/Dropbox/Mendeley/Bibtex_sync/AspiraJohn/Thesis.bib %s/' %nbc_dir)
0
also copy to the 'References' folder, and rename
os.system('cp %s/Thesis.bib %s/References/references.bib' %(nbc_dir, main_dir))
0
Merge chapters
(Add notes about nbmerge? )
os.system('cp %s/Compile_Documents/jg_nbmerge.py %s ' %(thesis_dir, nbc_dir) )
0
if do_real:
dirstr = ' '.join(['%s/Chapter%s/Chapter%s.ipynb ' %(main_dir,c,c) for c in chaps2do])
!python $nbc_dir/jg_nbmerge.py $dirstr > $nbc_dir/jg_thesis_chapters.ipynb
if do_mock:
dirstr = ' '.join(['%s/mock_Chapter%s/mock_Chapter%s.ipynb ' %(main_dir,c,c) for c in mockchaps2do])
!python $nbc_dir/jg_nbmerge.py $dirstr > $nbc_dir/jg_mock_thesis_chapters.ipynb
Run nbconvert
if do_real:
!ipython nbconvert --to latex --post PDF --template jg_thesis_nbconvert_template.tplx jg_thesis_chapters.ipynb
clear_output()
print 'nbconvert finished.'
nbconvert finished.
"""
%((* block h1 -*))\chapter((* endblock h1 -*))
%((* block h2 -*))\section((* endblock h2 -*))
%((* block h3 -*))\subsection((* endblock h3 -*))
%((* block h4 -*))\subsubsection((* endblock h4 -*))
%((* block h5 -*))\paragraph((* endblock h5 -*))
%((* block h6 -*))\subparagraph((* endblock h6 -*))
""";
if do_mock:
!ipython nbconvert --to latex --post PDF --template jg_thesis_nbconvert_template.tplx jg_mock_thesis_chapters.ipynb
clear_output()
print 'nbconvert finished. '
The .tex file from nbconvert includes /documentclass and /usepackages etc. stuff that I don't need.
All I need are the chapters.
Need to do the following:
Need to do the following:
addit = '\n\n \chapter \n\n \graphicspath{{Chapter%s/figures/}} \n\n'
mock_addit = '\n\n \chapter \n\n \graphicspath{{mock_Chapter%s/figures/}} \n\n'
Note: I think actually the ignore-heading-1 thing you're doing here isn't working. It's ending up at the end of the previous chapter.
Might be better to have a clearer loop for the parsing, with separate lines for each bit. That would clearer and more extensible.
My chapters
numwords = {1:'One', 2:'Two', 3:'Three',4:'Four', 5:'Five', 6:'Six', 7: 'Seven', 8: 'Eight'}
if do_real:
#mock_addit = '\n\n \chapter%s} \n\n \graphicspath{{mock_Chapter%s/figures/}} \n\n %s'
#addit = '\n\n \chapter%s} \n\n \graphicspath{{Chapter%s/}} \n\n %s'
thetex = open(nbc_dir + '/jg_thesis_chapters.tex', 'r').read()
chapchunks = ['\chapter' + c for c in ''.join(thetex).split('\chapter')][1:]
parsedchunks = []
for c_it, c in enumerate(chaps2do):
ch = chapchunks[c_it]
# This is a bit clumsy, but haven't found a better solution yet. I want to be able to write 'Chapter 1', etc.
# In the notebook, but latex adds that in for me according to \chapter sections.
# So solution = I use word numbers, and parse out that text here
# The text to replace will look like {Chapter One}
ch = ch.replace('{Chapter ' + numwords[c] + '}', '')
# Caption stuff:
# This is the solution I came up with to give me the short caption / long caption and label format used
# in this latex template. In the markdown cells with figures in, the latex captions are written asa
# ![SHORTCAPTION= blah blah blah LONGCAPTION= blah blah blah LABEL= blah blah blah ]
ch = ch.replace('{SHORTCAPTION=', '[')\
.replace('LONGCAPTION=',']{')
# 'LABEL' - NOT DOING THIS YET
#c = ''.join(cc[0] + '\\label{fig:' + cc.split('}')[0] + '}' for cc in c.split('LABEL='))
# The [H] is supposed to make the image and caption stick together as a block, or something like that
# REMOVED - IS THIS STICKING ALL THE FIGS AT THE END OF EACH CHAPTER??
#c = c.replace('[htbp]', '[H]')
# Add in a subtitle; need to replace {title} with [title]{subtitle}
# (that's not quite right. Not doing this.
# Add graphics path
ch = ch.split('}')[0] + \
'} \n\n \graphicspath{{},{Chapter' + str(c) + '/}} \n\n ' + \
'}'.join(ch.split('}')[1:])
#add a resize figures command - COMMENTE OUT. SEEMS TO MESS UP COMPILATION.
#c = c.replace('includegraphics{', 'includegraphics[width=0.7\\textwidth]{' )
parsedchunks.append(ch)
open(main_dir + '/Chapter%s/chapter%s.tex' %(c,c), 'w').writelines(ch)
# Coud add: something for getting chapter short and long titeles...
"""
#parsedtex = [addit %(s.split('}')[0],s_it,'}'.join(s.split('}')[1:]))\
# for s_it, s in enumerate(''.join(thetex).split('\chapter'))][1:]
#parsedtex_mod = [p.replace('{SHORTCAPTION=', '[').replace('LONGCAPTION=',']{').replace('[htbp]', '[H]') for p in parsedtex]
#parsedtex = [mock_addit %s_it + s for s_it, s in enumerate(''.join(thetex).split('\chapter'))][1:]
#for p_it, p in enumerate(parsedtex): open(main_dir + 'Chapter%s/chapter%s.tex' %(p_it+1, p_it+1), 'w').writelines(p)
#for c,p in zip(chaps2do, parsedtex_mod): open(main_dir + '/Chapter%s/chapter%s.tex' %(c,c), 'w').writelines(p)
""";
"""
if do_real:
#mock_addit = '\n\n \chapter%s} \n\n \graphicspath{{mock_Chapter%s/figures/}} \n\n %s'
addit = '\n\n \chapter%s} \n\n \graphicspath{{Chapter%s/}} \n\n %s'
thetex = open(nbc_dir + '/jg_thesis_chapters.tex', 'r').read()
parsedtex = [addit %(s.split('}')[0],s_it,'}'.join(s.split('}')[1:]))\
for s_it, s in enumerate(''.join(thetex).split('\chapter'))][1:]
parsedtex_mod = [p.replace('{SHORTCAPTION=', '[').replace('LONGCAPTION=',']{').replace('[htbp]', '[H]') for p in parsedtex]
#parsedtex = [mock_addit %s_it + s for s_it, s in enumerate(''.join(thetex).split('\chapter'))][1:]
#for p_it, p in enumerate(parsedtex): open(main_dir + 'Chapter%s/chapter%s.tex' %(p_it+1, p_it+1), 'w').writelines(p)
for c,p in zip(chaps2do, parsedtex_mod): open(main_dir + '/Chapter%s/chapter%s.tex' %(c,c), 'w').writelines(p)
""";
Mock chapters
if do_mock:
#mock_addit = '\n\n \chapter%s} \n\n \graphicspath{{mock_Chapter%s/figures/}} \n\n %s'
#addit = '\n\n \chapter%s} \n\n \graphicspath{{Chapter%s/}} \n\n %s'
thetex = open(nbc_dir + '/mock_jg_thesis_chapters.tex', 'r').read()
chapchunks = ['\chapter' + c for c in ''.join(thetex).split('\chapter')][1:]
parsedchunks = []
for c_it, c in enumerate(mockchaps2do):
ch = chapchunks[c_it]
# This is a bit clumsy, but haven't found a better solution yet. I want to be able to write 'Chapter 1', etc.
# In the notebook, but latex adds that in for me according to \chapter sections.
# So solution = I use word numbers, and parse out that text here
# The text to replace will look like {Chapter One}
ch = ch.replace('{Chapter ' + numwords[c] + '}', '')
# Caption stuff:
# This is the solution I came up with to give me the short caption / long caption and label format used
# in this latex template. In the markdown cells with figures in, the latex captions are written asa
# ![SHORTCAPTION= blah blah blah LONGCAPTION= blah blah blah LABEL= blah blah blah ]
ch = ch.replace('{SHORTCAPTION=', '[')\
.replace('LONGCAPTION=',']{')
# 'LABEL' - NOT DOING THIS YET
#c = ''.join(cc[0] + '\\label{fig:' + cc.split('}')[0] + '}' for cc in c.split('LABEL='))
# The [H] is supposed to make the image and caption stick together as a block, or something like that
# REMOVED - IS THIS STICKING ALL THE FIGS AT THE END OF EACH CHAPTER??
#c = c.replace('[htbp]', '[H]')
# Add in a subtitle; need to replace {title} with [title]{subtitle}
# (that's not quite right. Not doing this.
# Add graphics path
ch = ch.split('}')[0] + \
'} \n\n \graphicspath{{},Chapter' + str(c) + '/}} \n\n ' + \
'}'.join(ch.split('}')[1:])
#add a resize figures command - COMMENTE OUT. SEEMS TO MESS UP COMPILATION.
#c = c.replace('includegraphics{', 'includegraphics[width=0.7\\textwidth]{' )
parsedchunks.append(ch)
open(main_dir + '/mock_Chapter%s/mock_chapter%s.tex' %(c,c), 'w').writelines(ch)
# Coud add: something for getting chapter short and long titeles...
"""
#parsedtex = [addit %(s.split('}')[0],s_it,'}'.join(s.split('}')[1:]))\
# for s_it, s in enumerate(''.join(thetex).split('\chapter'))][1:]
#parsedtex_mod = [p.replace('{SHORTCAPTION=', '[').replace('LONGCAPTION=',']{').replace('[htbp]', '[H]') for p in parsedtex]
#parsedtex = [mock_addit %s_it + s for s_it, s in enumerate(''.join(thetex).split('\chapter'))][1:]
#for p_it, p in enumerate(parsedtex): open(main_dir + 'Chapter%s/chapter%s.tex' %(p_it+1, p_it+1), 'w').writelines(p)
#for c,p in zip(chaps2do, parsedtex_mod): open(main_dir + '/Chapter%s/chapter%s.tex' %(c,c), 'w').writelines(p)
""";
"""
if do_mock:
#mock_addit = '\n\n \chapter%s} \n\n \graphicspath{{mock_Chapter%s/figures/}} \n\n %s'
mock_addit = '\n\n \chapter%s} \n\n \graphicspath{{mock_Chapter%s/}} \n\n %s'
thetex = open(nbc_dir + '/jg_mock_thesis_chapters.tex', 'r').read()
parsedtex = [mock_addit %(s.split('}')[0],s_it,'}'.join(s.split('}')[1:]))\
for s_it, s in enumerate(''.join(thetex).split('\chapter'))][1:]
#parsedtex = [mock_addit %s_it + s for s_it, s in enumerate(''.join(thetex).split('\chapter'))][1:]
#for p_it, p in enumerate(parsedtex): open(main_dir + 'Chapter%s/chapter%s.tex' %(p_it+1, p_it+1), 'w').writelines(p)
parsedtex_mod = [p.replace('{SHORTCAPTION=', '[').replace('LONGCAPTION=',']{').replace('[htbp]', '[H]') for p in parsedtex]
for c,p in zip(mockchaps2do, parsedtex_mod): open(main_dir + '/mock_Chapter%s/mock_chapter%s.tex' %(c,c), 'w').writelines(p)
""";

...which doesn't appear to be achievable via conventiona lmarkdown syntax. It's important for the thesis compilation because the 'list of figures' will include the full caption, if there isn't a square brackets one next to it.
See the 'mock_Chapter1' notebook for examples
os.chdir(main_dir)
if do_real:
!sh compile-thesis.sh compile thesis
!okular thesis.pdf
if do_mock:
!sh compile-thesis.sh compile mock_thesis
!okular mock_thesis.pdf
print 'finished!'
finished!
THE LIMIT FOR THESES IS 60,000.
(Including intro and headers etc??)
Apparently this is really difficult to do.
Discussion of this here
import numpy as np
import pandas as pd
I'm opting for 'texcount'. Gives the following kind of output
os.chdir(main_dir)
!texcount Chapter1/chapter1.tex
File: Chapter1/chapter1.tex Encoding: ascii Words in text: 433 Words in headers: 19 Words in float captions: 6 Number of headers: 5 Number of floats: 2 Number of math inlines: 0 Number of math displayed: 0 Subcounts: text+headers+captions (#headers/#floats/#inlines/#displayed) 132+3+0 (1/0/0/0) Chapter: A Disconnectionist Manifesto. 77+5+0 (1/0/0/0) Section: Ageing as a disconnection syndrome 101+4+0 (1/0/0/0) Section: Conduction delays: a hypothesis 100+4+6 (1/2/0/0) Section: Schemas for disconnection / etc. 23+3+0 (1/0/0/0) Section: Structure of thesis
Doing this for all chapters:
res1 = !texcount Chapter1/chapter1.tex
res2 = !texcount Chapter2/chapter2.tex
res3 = !texcount Chapter3/chapter3.tex
res4 = !texcount Chapter4/chapter4.tex
res5 = !texcount Chapter5/chapter5.tex
res6 = !texcount Chapter6/chapter6.tex
res7 = !texcount Chapter7/chapter7.tex
runsumlist = [] ; matchme = 'Words in text'
for r_it,r in enumerate([res1,res2,res3,res4,res5,res6,res7]):
for rr in r:
if matchme in rr:
print '\nChapter %s: %s' %(r_it+1,rr)
runsumlist.append(rr.split(' ')[-1])
print '\ntotal words: %1.0f' %pd.DataFrame(runsumlist).astype('int').sum().values[0]
Chapter 1: Words in text: 433 Chapter 2: Words in text: 5499 Chapter 3: Words in text: 8286 Chapter 4: Words in text: 8229 Chapter 5: Words in text: 5928 Chapter 6: Words in text: 3050 Chapter 7: Words in text: 3441 total words: 34866
Alternative, using pdftotext, counts more words.
# Alternative using pdftotext:
!pdftotext thesis.pdf - | wc -w
43323
This will include page numbers etc., so is to be expected. And may also have unpredictably behaviour re: equations. So wouldb't be expected to be bang on. But helps complement the other methd. Seems to be favoured along with texcount in post listed above.
Another alternative, which is clearly wrong (perhaps need different .dvis for whole thesis or for each chapter??)
!catdvi thesis.dvi | wc -w
3250
"""
#http://stackoverflow.com/questions/19575702/pythonhow-to-split-file-into-chunks-by-the-occurrence-of-the-header-word
fname = 'jg_thesis_chapters.tex'
#main_dir = '../newf = 'kk_latex_template/jg_Chapter2/jg_parsed_chapter_'
token = '\chapter'
chunks = []
current_chunk = []
for line in open(fname):
#if line.startswith(token) and current_chunk:
if token in line and current_chunk:
# if line starts with token and the current chunk is not empty
chunks.append(current_chunk[:]) # add not empty chunk to chunks
current_chunk = [] # make current chunk blank
# just append a line to the current chunk on each iteration
current_chunk.append(line)
chunks.append(current_chunk) # append the last chunk outside the loop
#So having file with contents:
for c_it, c in enumerate(chunks[1:]):
f = open('%s_%s.tex' %(newf,c_it+1), 'w')
crep = [cc.replace('.png', '') for cc in c]
f.writelines(crep)#c)
f.close()
#%load newtexfile.tex
""";