Gino Kalkman
PhD researcher at Eep Talstra Centre for Bible and Computer
Faculty of Theology
VU University
This program serves two major functions. First of all, it has provided its developer with some initial experience in the use of the LAF Fabric Workbench, which was and is still being developed by Dirk Roorda and is expected to play a significant role in future research conducted by the Eep Talstra Center for Bible and Computing, of which the program's developer is a member. Second, the specific queries conducted in the program provide us with new insights into the object of the developer's PhD research, i.e.: the functions of the verbal forms in Biblical Hebrew poetry.
The main assumption in this research project is that the meaning of the Biblical Hebrew verbal forms is mainly determined by the clause patterns in which they occur. Thus, verbal forms in a daughter clause may adopt volitive and non-volitive aspects of meaning from verbal forms in their mother clauses. Furthermore, the verbal forms essentially fulfill a function at the discourse level, at which they denote shifts in type (narrative vs. discursive) and level (mainline vs. background) of communication.
The current program investigates the differences between the Biblical Hebrew genres (prose, poetry and prophecy) in their use of these clause patterns. Since the number of different types of clause patterns is quite high, we decided to focus on the patterns that involve an asyndetic relation between two clauses. In our database, the type of clause pattern is annotated by a three-digit label, which is attached to both the mother clause and the daughter clause. The asyndetic clause connections are assigned labels from 100 up to 168. The first digit in the offset from 100 is the tense of the verbal predicate of the daughter clause atom, the second that of the mother clause atom. The digit-representations of the Hebrew tenses are as follows:
0 None
1 Imperfect --> yiqtol
2 Perfect --> qatal
3 Imperative
4 Infinitive Construct
5 Infinitive Absolute
6 Participle
7 Imperfect Consecutive --> wayyiqtol
8 Imperfect Copulative --> weyiqtol
Since the final two tenses (7 and 8) cannot be used in asyndetic clause connections - the waw being the Hebrew copulative - the first digit in the offset from 100 will never be higher than 6.
The program calculates for each of the possible patterns in the range 100-168 the number of occurrences in the books and chapters that are included in the three categories of prosaic, poetic and prophetic texts. Since the total numbers of patterns are very different for each of the genres - thus, a large part of the Hebrew poetry (e.g. the Psalms) had not yet been analyzed in the data set that is currently used by the 'LAF Fabric Workbench' - the program also calculates for each of the clause patterns its relative value, i.e.: which portion of the total number of clause patterns in a given genre is represented by that specific clause pattern.
In the next cell the required libraries are imported and a task processor is created. The Notebook class in laf.notebook implements a LAF-Fabric task processor.
import sys
import collections
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
fabric = LafFabric()
0.00s This is LAF-Fabric 4.2.8 http://laf-fabric.readthedocs.org/texts/API-reference.html
In the next cell we load the data needed by the processor. In our program, we will need a few features of nodes. Since this program aims to extract information about the clause atom relation values of a clause atom, we will load the clause_atom_relation features of all clause atoms. These features label the relationship between a clause atom and its mother. Next to that, we need to keep track of the Biblical books and the chapter numbers in order to be able to correctly categorize the clause patterns into the three genres of prose, poetry and prophecy.
The init() function draws that data in. It needs to know the name of the LAF header file. The '--' argument explicitates that we do not draw in an extra annotation package.
The string 'ClauseFunctions' represents the name we give to this task. This name determines where on the developer's filesystem the log file and the output file will be put.
fabric.load('bhs3', '--', 'ClauseFunctions', {
"xmlids": {
"node": False,
"edge": False,
},
"features": {
"shebanq": {
"node": [
"db.otype",
"ft.clause_atom_relation",
"sft.chapter,book",
],
"edge": [
],
},
},
})
exec(fabric.localnames.format(var='fabric'))
0.00s LOADING API: please wait ... 0.01s INFO: DATA COMPILED AT: 2014-04-18T18-24-37 2.57s LOGFILE=C:\Users\Gino/laf-fabric-data/bhs3/tasks/ClauseFunctions/__log__ClauseFunctions.txt 2.57s INFO: DATA LOADED FROM SOURCE bhs3 AND ANNOX -- FOR TASK ClauseFunctions
In the cell below some global variables are initialized. It should be noted that only variables whose values will not be modified or updated by the program are made global here. This concerns several lists and dictionaries of items that will be used for checking if a book (and chapter) belongs to the corpus we want to analyze and for determining the genre of a book or chapter.
#first create a list containing the names of all books that will be investigated
books = ["Genesis", "Exodus", "Deuteronomy", "Judges", "I_Samuel", "Isaiah", "Jeremiah", "Lamentations"]
#create lists for each of the genres containing the books that belong to that specific genre
#note that for the genre of poetry, the list consists not of booknames, but of dictionaries containing combinations of booknames (keys) and
#the specific chapters in those books that are poetic (values). Our implementation guarantees that the other chapters of these books -
#except for the book of Lamentations - are included in the prosaic corpus (note the overlap in booknames)
prose = ["Genesis", "Exodus", "Deuteronomy", "Judges", "I_Samuel"]
poetry = {"Exodus":["15"], "Deuteronomy":["32","33"], "Judges":["5"], "I_Samuel":["22"], "Lamentations":["3"]}
prophecy = ["Isaiah", "Jeremiah"]
#create a dictionary used for mapping the numerical values of tenses to their string equivalents (cf. 0. Introduction).
tenses = {0: "nominal", 1: "yiqtol", 2: "qatal", 3: "imperative", 4: "infinConstr", 5: "infinAbsol", 6: "participle", 7: "wayyiqtol", 8: "weyiqtol"}
#a library function is used to create or reopen the output file 'table.csv' and make it ready for being written to.
out = outfile('table.csv')
The next cell contains the main part of our program. The program starts by calling the function perform_task(), which defines several local variables pointing to modifiable lists and dictionaries. These lists and dictionaries will store and update the number of clause atom relations for each genre and the number of occurrences of each type of clause atom relation per genre.
The perform_task() function then calls the iterate_through_nodes() function, which, in turn, defines the local variables 'book' and 'chapter', that are updated in that specific function and used to assign the correct value to the variable 'genre'.
After all nodes have been iterated through and the results of the queries have been stored, the perform_task() function finishes by calling the write_to_output() function, which writes the result data to an output file.
#write the total numbers of the clause atom relation types for each of the genres to the output file
def print_totals(prose, poetry, prophecy):
out.write("{},{},{},{},{},{},{}\n".format("TOTALS", prose, 100, poetry, 100, prophecy, 100))
#Maps digits in numerical label of clause atom relation type to corresponding values of the verb forms in the mother and daughter clause
#by using the dictionary 'tenses'. The first digit in the offset from 100 is the tense of the verbal predicate of the daughter clause atom,
#the second that of the mother clause atom (cf. 0. 'Introduction')
#Subsequently, combine the string values of the verb forms into a single string representation of the clause atom relation as a whole.
def map_label(pattern):
#retrieve the string representation of the tense in the daughter clause
tense_daughter = tenses[int(pattern[1])]
#retrieve the string representation of the tense in the mother clause
tense_mother = tenses[int(pattern[2])]
#return a combination of the string representations of the tense values of the mother and the daughter clause
return (tense_mother + " -> " + tense_daughter)
#write the number of occurrences of a clause atom relation type (pattern) in a genre to the output file
def print_number(out, pattern, car_genre, total_genre):
#if the clause atom relation type occurs in this specific genre, retrieve the number of occurrences and write it to the output file
if pattern in car_genre:
out.write("{:10},".format(car_genre[pattern]))
#calculate the relative portion of clause atom relations in this genre represented by this specifc clause atom relation type and
#write the result to the output file
out.write("{:10}".format(car_genre[pattern] / total_genre * 100))
#if the clause atom relation type does not occur in this specific genre, write an absolute value of 0 and a relative value of 0.0 to
#the output file
else:
out.write("{:10},".format("-"))
out.write("{:10}".format("0.0"))
#the results of our queries are written to the output file 'table.csv'
def write_to_output(out, car_prose, car_poetry, car_prophecy, car_all, totals):
#write the 'headings' of the columns to the output file
#write the total numbers of the clause atom relation types for each of the genres to the output file
print_totals(totals["prose"], totals["poetry"], totals["prophecy"])
#iterate through all clause atom relation types in the sorted list and for each of these types write the number of its occurrences in
#each of the genres to the output file
for pattern in car_all:
#map the three-digit clause atom relation label to a string representation
pattern_string = map_label(pattern)
#write the string representation of the clause atom relation type (i.e.: the clause pattern) to the output file
out.write("{},".format(pattern_string))
#write the number of occurrences of this clause atom relation type (clause pattern) in our prosaic corpus to the output file
print_number(out, pattern, car_prose, totals["prose"])
out.write(",")
#write the number of occurrences of this clause atom relation type (clause pattern) in our poetic corpus to the output file
print_number(out, pattern, car_poetry, totals["poetry"])
out.write(",")
#write the number of occurrences of this clause atom relation type (clause pattern) in prophetic corpus to the output file
print_number(out, pattern, car_prophecy, totals["prophecy"])
out.write("\n")
#test the current genre
def set_genre(bookname, chapter):
#remember that only the poetry list has key-value pairs as elements;
if ((bookname in poetry.keys()) and (chapter in poetry[bookname])):
return "poetry"
#booknames shared by prose and poetry will not cause problems, since the poetic chapters have been filtered out at this point
elif (bookname in prose):
return "prose"
#our implementation guarantees that all clause atom relations that have not been categorized at this point, belong to our corpus of
#prophetic books
else:
return "prophecy"
#a clause atom relation is added to the dictionary of the correct genre. If the clause atom relation type (key) has already been
#encountered in that genre (i.e.: the key already exists), the value of that key will be simply raised by 1; if not, a new key is added
#to the dictionary and its value, having a default value of 0, will be set to 1.
def add_car_to_dictionary(car, car_genre):
car_genre[car] += 1
#handle a new clause atom relation label
def analyze_new_car(car, car_prose, car_poetry, car_prophecy, car_total, totals, bookname, chapter):
#only continue if the clause atom relation is an asyndetic one (i.e.: has a three-digit value between 100 and 168; see 0 'Introduction')
if 100 <= int(car) <= 168:
#if the clause atom relation type has not yet been encountered before, add its label to the list of clause pattern types
#attested in our corpus
if car not in car_total:
car_total.append(car)
#define the genre to which the current clause atom relation belongs; do this by testing if the current bookname (and chapter)
#is listed in one of the genre lists created in the previous cell
genre = set_genre(bookname, chapter)
#include the clause atom relation in the correct genre and adjust the total number of clause atom relations counted for that
#genre; note that we use only one function for adding a clause atom relation to the correct dictionary
if (genre == "prose"):
add_car_to_dictionary(car, car_prose)
totals["prose"] += 1
elif (genre == "poetry"):
add_car_to_dictionary(car, car_poetry)
totals["poetry"] += 1
else:
add_car_to_dictionary(car, car_prophecy)
totals["prophecy"] += 1
def iterate_through_nodes(car_prose, car_poetry, car_prophecy, car_total, totals):
#bookname is initially set to "None" and will be reset every time the program encounters a node having the object type 'book'.
bookname = None
#chapter is initially set to "None" and will be reset every time the program encounters a node having the object type 'chapter'.
chapter = None
#For each new node the program is provided with it tests the node's object type and subsequently calls the corresponding functions
for i in NN():
otype = F.shebanq_db_otype.v(i)
#test if the node's object type is that of 'clause_atom' and only continue if the current book (+ chapter) is included in our corpus
#of prosaic, poetic and prophetic texts by searching for its presence in the list 'books'. Note that for the book of Lamentations, we
#only analyze the third chapter (since the other chapters are not yet analyzed in the current dataset)
if ((otype == "clause_atom") and (bookname in books) and not (bookname == "Lamentations" and chapter != 3)):
#retrieve the three-digit label of the current clause atom relation
car = F.shebanq_ft_clause_atom_relation.v(i)
#handle the current clause atom relation
analyze_new_car(car, car_prose, car_poetry, car_prophecy, car_total, totals, bookname, chapter)
#if the node's object type is that of 'book', the bookname is reset to the value of the current node. Note that since a 'book node'
#always precedes the 'clause atom nodes' in that book, the bookname will always be set correctly. Thus, its initial value 'None' will
#always have been reset before the program for the first time encounters a new clause atom relation.
elif otype == "book":
bookname = F.shebanq_sft_book.v(i)
#as an intermediate test, print all booknames that are encountered to the standard output (see pink block below this cell)
sys.stderr.write("{} ".format(bookname))
#if the node's object type is that of 'chapter', the chapter number is reset to the value of the current node. Note that since a
#'chapter node' always precedes the 'clause atom nodes' in that chapter, the chapter numbers will always be set correctly. Thus, its
#initial value 'None' will always have been reset before the program for the first time encounters a new clause atom relation.
elif otype == "chapter":
chapter = F.shebanq_sft_chapter.v(i)
def perform_task():
#create 3 dictionaries for prose, poetry and prophecy. These dictionaries will contain combinations of a clause pattern type (key) and the
#number of occurrences of that clause pattern type (value). Every new key (clause pattern type) will be assigned a default value (number of
#occurrences) of 0 by using the lambda function.
car_prose = collections.defaultdict(lambda: 0)
car_poetry = collections.defaultdict(lambda: 0)
car_prophecy = collections.defaultdict(lambda: 0)
#create an empty list containing all clause pattern types that have been found in the books we are comparing. This will enable us to
#output only those patterns that are attested at least once in our corpus of prosaic, poetic and prophetic texts.
car_total = []
#create three variables containing the total number of clause patterns in each of the 3 genres we are comparing. This will enable us to
#calculate the relative portion of all clause patterns in a given genre that is represented by a specific clause pattern type.
totals = {"prose": 0, "poetry": 0, "prophecy": 0}
#iterate through all nodes in the LAF-dataset
iterate_through_nodes(car_prose, car_poetry, car_prophecy, car_total, totals)
#after the program has iterated over all nodes, a sorted list of all clause atom relation types (between 100 and 168) that are attested
#in our corpus is stored in the variable 'car_all'
car_all = sorted(car_total)
#the results are written to the output file
write_to_output(out, car_prose, car_poetry, car_prophecy, car_all, totals)
perform_task()
Genesis Exodus Leviticus Numbers Deuteronomy Joshua Judges I_Samuel II_Samuel I_Kings II_Kings Isaiah Jeremiah Ezekiel Hosea Joel Amos Obadiah Jonah Micah Nahum Habakkuk Zephaniah Haggai Zechariah Malachi Psalms Job Proverbs Ruth Canticles Ecclesiastes Lamentations Esther Daniel Ezra Nehemiah I_Chronicles II_Chronicles
At the end of our program we need to close all open files (i.e. the log file and the output file). This is done in the final() function in the processor object.
close()
13s Results directory: C:\Users\Gino/laf-fabric-data/bhs3/tasks/ClauseFunctions __log__ClauseFunctions.txt 214 Thu May 22 13:52:16 2014 table.csv 5616 Thu May 22 13:52:28 2014
After we have finished our programme, the results stored in the output file will have to be displayed. This is done in the form of an HTML-table, in which (relatively) high frequencies (> 5%) have been marked in red colour.
from IPython.display import display, HTML
import csv, re
table_file = open('../../../laf-fabric-data/bhs3/tasks/ClauseFunctions/table.csv', 'r')
reader = csv.reader(table_file)
numerical = re.compile("[0-9]")
total = '<table class="presentation" id="Analysis">'
head = '<tr><th style="text-align: center">' + "CARnumber" + '</th><th style="text-align: center">' + "Prose" + '</th><th style="text-align: center">' + "% of total in prose" + '</th><th style="text-align: center">' + "Poetry" + '</th><th style="text-align: center">' + "% of total in poetry" + '</th><th style="text-align: center">' + "Prophecy" + '</th><th style="text-align: center">' + "% of total in prophecy" + '</th></tr>'
total += head
for line in reader:
code = '<tr>'
for word in line:
#print ("x" + word + "x")
if (word.strip() == '0.0'):
code += '<td style ="text-align: center">' + "-" + '</td>'
elif (len(word) > 4 and numerical.match(word[0])):
word = str(round(float(word), 2))
if (float(word) > 5):
code += '<td style="color: Red; text-align: center">' + word + '</td>'
else:
code += '<td style="text-align: center">' + word + '</td>'
else:
code += '<td style="text-align: center">' + word + '</td>'
code += '</tr>'
total += code
total += '</table'
display(HTML(total))
CARnumber | Prose | % of total in prose | Poetry | % of total in poetry | Prophecy | % of total in prophecy |
---|---|---|---|---|---|---|
TOTALS | 2677 | 100 | 221 | 100 | 3478 | 100 |
nominal -> nominal | 375 | 14.01 | 38 | 17.19 | 263 | 7.56 |
yiqtol -> nominal | 105 | 3.92 | 17 | 7.69 | 213 | 6.12 |
qatal -> nominal | 145 | 5.42 | 15 | 6.79 | 299 | 8.6 |
imperative -> nominal | 48 | 1.79 | 4 | 1.81 | 187 | 5.38 |
infinConstr -> nominal | 17 | 0.64 | - | - | 31 | 0.89 |
infinAbsol -> nominal | - | - | - | - | 4 | 0.12 |
participle -> nominal | 31 | 1.16 | 4 | 1.81 | 104 | 2.99 |
wayyiqtol -> nominal | 114 | 4.26 | 1 | 0.45 | 24 | 0.69 |
weyiqtol -> nominal | 6 | 0.22 | 1 | 0.45 | 4 | 0.12 |
nominal -> yiqtol | 229 | 8.55 | 17 | 7.69 | 240 | 6.9 |
yiqtol -> yiqtol | 317 | 11.84 | 22 | 9.95 | 264 | 7.59 |
qatal -> yiqtol | 141 | 5.27 | 13 | 5.88 | 259 | 7.45 |
imperative -> yiqtol | 73 | 2.73 | 5 | 2.26 | 82 | 2.36 |
infinConstr -> yiqtol | 10 | 0.37 | - | - | 7 | 0.2 |
infinAbsol -> yiqtol | 7 | 0.26 | - | - | 1 | 0.03 |
participle -> yiqtol | 35 | 1.31 | 1 | 0.45 | 64 | 1.84 |
wayyiqtol -> yiqtol | 11 | 0.41 | 3 | 1.36 | 18 | 0.52 |
weyiqtol -> yiqtol | 9 | 0.34 | - | - | 11 | 0.32 |
nominal -> qatal | 156 | 5.83 | 21 | 9.5 | 234 | 6.73 |
yiqtol -> qatal | 31 | 1.16 | 7 | 3.17 | 118 | 3.39 |
qatal -> qatal | 106 | 3.96 | 23 | 10.41 | 335 | 9.63 |
imperative -> qatal | 30 | 1.12 | 4 | 1.81 | 88 | 2.53 |
infinConstr -> qatal | 5 | 0.19 | - | - | 5 | 0.14 |
participle -> qatal | 13 | 0.49 | 1 | 0.45 | 46 | 1.32 |
wayyiqtol -> qatal | 126 | 4.71 | 4 | 1.81 | 38 | 1.09 |
weyiqtol -> qatal | 1 | 0.04 | - | - | 3 | 0.09 |
nominal -> imperative | 123 | 4.59 | 4 | 1.81 | 59 | 1.7 |
yiqtol -> imperative | 45 | 1.68 | 3 | 1.36 | 38 | 1.09 |
qatal -> imperative | 46 | 1.72 | 1 | 0.45 | 39 | 1.12 |
imperative -> imperative | 21 | 0.78 | 1 | 0.45 | 28 | 0.81 |
infinConstr -> imperative | 2 | 0.07 | - | - | 1 | 0.03 |
infinAbsol -> imperative | 3 | 0.11 | - | - | 1 | 0.03 |
participle -> imperative | 13 | 0.49 | - | - | 8 | 0.23 |
wayyiqtol -> imperative | 2 | 0.07 | - | - | 1 | 0.03 |
nominal -> infinAbsol | 2 | 0.07 | - | - | 4 | 0.12 |
yiqtol -> infinAbsol | 6 | 0.22 | - | - | 8 | 0.23 |
qatal -> infinAbsol | 2 | 0.07 | - | - | 11 | 0.32 |
imperative -> infinAbsol | 1 | 0.04 | - | - | 4 | 0.12 |
infinConstr -> infinAbsol | - | - | - | - | 4 | 0.12 |
infinAbsol -> infinAbsol | - | - | - | - | 1 | 0.03 |
participle -> infinAbsol | - | - | - | - | 2 | 0.06 |
wayyiqtol -> infinAbsol | 4 | 0.15 | - | - | 3 | 0.09 |
weyiqtol -> infinAbsol | - | - | - | - | 1 | 0.03 |
nominal -> participle | 84 | 3.14 | 7 | 3.17 | 109 | 3.13 |
yiqtol -> participle | 32 | 1.2 | - | - | 52 | 1.5 |
qatal -> participle | 43 | 1.61 | 2 | 0.9 | 79 | 2.27 |
imperative -> participle | 14 | 0.52 | - | - | 19 | 0.55 |
infinConstr -> participle | 8 | 0.3 | - | - | 6 | 0.17 |
infinAbsol -> participle | 1 | 0.04 | - | - | - | - |
participle -> participle | 31 | 1.16 | - | - | 41 | 1.18 |
wayyiqtol -> participle | 52 | 1.94 | 2 | 0.9 | 13 | 0.37 |
weyiqtol -> participle | 1 | 0.04 | - | - | 4 | 0.12 |
From the table above some interesting insights can be deduced. Thus, we find several patterns that are more strongly attested in one genre than in another. In poetry, for instance, a sequence of two perfects (21: qatal > qatal) is the second most occurring asyndetic clause pattern, while in prose other combinations of mother and daughter clause are more prominent, especially those containing a imperfect consecutive (wayyiqtol) mother clause.
On the other hand, the differences in numbers of occurrences are not extreme, which appears to support our research hypothesis that prose and poetry make use of one single verbal system. They clearly show some mutual differences in their preferences for specific patterns within this system, but the table seems to suggest indeed that this does not entail that the two genres make use of wholly different linguistic systems (or - as many Hebraists assume implicitly - that poetry's of verbal forms isn't bound to any grammatical rules at all).
Yet, these conclusions are based on a rather small and not very balanced dataset. Thus, in the version of the data used for the current task, the largest collection of Biblical Hebrew poetic texts - the Book of Psalms - had not yet been analyzed and could therefore not be incorporated in our set of data. For a more extensive and better supported analysis of clause patterns, we refer the reader to the collection of notebooks accompanying our dissertation.