Notebook

Participles in the Hebrew Bible¶

About¶

This task gives an inventory of particples and their context. It is based on a request by Janet Dyk for data by which the verbal/nominal roles of particples can be studied.

It is work in progress. When we started we had not yet identified the exact set of features in the database that should give us clues. So, in this notebook you'll find a number of attempts.

Firing up LAF-Fabric¶

We fire up the engine, collect data, format the data and write it to a tab delimited file.

Then the LAF-Fabric task is completed.

After that we play around a bit with the data, see how we can visualize it with the python module pandas.

Get a LAF processor¶

In [1]:

import sys
import collections
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
fabric = LafFabric()

  0.00s This is LAF-Fabric 4.3.3
http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html

Load data for this task¶

In [2]:

fabric.load('etcbc4', '--', 'participle', {
    "xmlids": {"node": False, "edge": False},
    "features": ('''
        otype
        book chapter verse label
        g_word_utf8 g_cons_utf8
        vt
        sp pdp
        rela typ
        prs vbe
        g_prs
    ''','''
        functional_parent
    '''),
    "primary": False,
})
exec(fabric.localnames.format(var='fabric'))

  0.00s LOADING API: please wait ... 
  0.00s INFO: USING DATA COMPILED AT: 2014-07-14T16-45-08
  5.69s LOGFILE=/Users/dirk/laf-fabric-output/etcbc4/participle/__log__participle.txt
  5.69s INFO: DATA LOADED FROM SOURCE etcbc4 AND ANNOX -- FOR TASK participle AT 2014-07-15T16-40-04

Data exploration¶

In this section we investigate the values that some features take in the database

Part of speech tags¶

The following pos-tags were found in the features sp and pdp.

For most word occurrences, these values coincide. Where they differ, we output both, separated by a ~.

In [3]:

poss = set()
for i in NN(test=F.otype.v, values=['word']):
    pos = F.sp.v(i)
    pdpos = F.pdp.v(i)
    poss.add(pos)
    poss.add(pdpos)
poss

Out[3]:

{'adjv',
 'advb',
 'art',
 'conj',
 'inrg',
 'intj',
 'nega',
 'nmpr',
 'prde',
 'prep',
 'prin',
 'prps',
 'subs',
 'verb'}

Trees¶

Construct trees according to the parent relationship. Make indexes for

finding the sentence above each clause
finding the words in each clause

In [4]:

top_nodes = set(NN(test=F.otype.v, value='sentence'))
msg("Top nodes found: {}".format(len(top_nodes)))

  6.93s Top nodes found: 66045

Clause constituent relations¶

In [5]:

nodes_seen = set()
to_sentence = {}
clause_words = collections.defaultdict(lambda: set())
sentence_words = collections.defaultdict(lambda: set())
sentence_verse = {}
verse_label = None

def walk_tree(node, sentence, clause):
    if node in nodes_seen:
        return
    
    nodes_seen.add(node)
    to_sentence[node] = sentence
    new_clause = clause

    otype = F.otype.v(node)
    if otype == 'clause': new_clause = node
    if otype == 'word':
        clause_words[clause].add(node)
        sentence_words[sentence].add(node)
    
    children = Ci.functional_parent.v(node)
    for child in children:
        walk_tree(child, sentence, new_clause)

s = 0
sc = 0
chunk = 10000

for node in NN(top_nodes | set(NN(test=F.otype.v, value='verse'))):
    if F.otype.v(node) == 'verse':
        verse_label = F.label.v(node)
        continue
    sentence_verse[node] = verse_label
    nodes_seen = set()
    walk_tree(node, node, None)
    s += 1
    sc += 1
    if sc == chunk:
        msg("{} trees visited".format(s))
        sc = 0
    
msg("{} trees visited".format(s))

    11s 10000 trees visited
    12s 20000 trees visited
    13s 30000 trees visited
    14s 40000 trees visited
    14s 50000 trees visited
    15s 60000 trees visited
    16s 66045 trees visited

In [6]:

rels = collections.defaultdict(int)
verse_label = None
nr_of_examples = 3
examples = collections.defaultdict(lambda: set())
for i in NN(test=F.otype.v, values=['clause', 'verse']):
    if F.otype.v(i) == 'verse':
        verse_label = F.label.v(i)
    else:
        ccr = F.rela.v(i)
        rels[ccr] += 1
        if len(examples[ccr]) < nr_of_examples:
            examples[ccr].add(i)
for ccr in rels:
    print("{}: {:>6} x".format(ccr, rels[ccr]))

print("\n")

for ccr in sorted(examples):
    for clause in examples[ccr]:
        sentence = to_sentence[clause]
        cwords = sorted(clause_words[clause])
        swords = sorted(sentence_words[sentence])
        vlabel = sentence_verse[sentence]
#        print("{} in {}: {}".format(ccr, vlabel, " ".join([x[1] for x in P.data(i)])))
        print("{:<4} {:<10} {}\n{:<16}{}".format(
            ccr, 
            vlabel, 
            " ".join([F.g_word_utf8.v(word) for word in cwords]),
            '',
            " ".join([F.g_word_utf8.v(word) for word in swords]),
        ))
        

PrAd:      1 x
Subj:    436 x
Objc:   1347 x
Resu:   1193 x
Adju:   5872 x
RgRc:    198 x
Cmpl:    241 x
CoVo:    305 x
Spec:     41 x
Attr:   5930 x
NA:  69400 x
PreC:    156 x
Coor:   2858 x


Adju  GEN 01,16 לְ הָאִ֖יר עַל הָ אָֽרֶץ
                וַ יִּתֵּ֥ן אֹתָ֛ם אֱלֹהִ֖ים בִּ רְקִ֣יעַ הַ שָּׁמָ֑יִם לְ הָאִ֖יר עַל הָ אָֽרֶץ וְ לִ מְשֹׁל֙ בַּ  יֹּ֣ום וּ בַ  לַּ֔יְלָה וּֽ לֲ הַבְדִּ֔יל בֵּ֥ין הָ אֹ֖ור וּ בֵ֣ין הַ חֹ֑שֶׁךְ
Adju  GEN 01,15 לְ הָאִ֖יר עַל הָ אָ֑רֶץ
                וְ הָי֤וּ לִ מְאֹורֹת֙ בִּ רְקִ֣יעַ הַ שָּׁמַ֔יִם לְ הָאִ֖יר עַל הָ אָ֑רֶץ
Adju  GEN 01,14 לְ הַבְדִּ֕יל בֵּ֥ין הַ יֹּ֖ום וּ בֵ֣ין הַ לָּ֑יְלָה
                יְהִ֤י מְאֹרֹת֙ בִּ רְקִ֣יעַ הַ שָּׁמַ֔יִם לְ הַבְדִּ֕יל בֵּ֥ין הַ יֹּ֖ום וּ בֵ֣ין הַ לָּ֑יְלָה
Attr  GEN 01,07 אֲשֶׁ֖ר מֵ עַ֣ל לָ  רָקִ֑יעַ
                וַ יַּבְדֵּ֗ל בֵּ֤ין הַ מַּ֨יִם֙ אֲשֶׁר֙ מִ תַּ֣חַת לָ  רָקִ֔יעַ וּ בֵ֣ין הַ מַּ֔יִם אֲשֶׁ֖ר מֵ עַ֣ל לָ  רָקִ֑יעַ
Attr  GEN 01,11 מַזְרִ֣יעַ זֶ֔רַע
                תַּֽדְשֵׁ֤א הָ אָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב מַזְרִ֣יעַ זֶ֔רַע עֵ֣ץ פְּרִ֞י עֹ֤שֶׂה פְּרִי֙ לְ מִינֹ֔ו אֲשֶׁ֥ר זַרְעֹו בֹ֖ו עַל הָ אָ֑רֶץ
Attr  GEN 01,07 אֲשֶׁר֙ מִ תַּ֣חַת לָ  רָקִ֔יעַ
                וַ יַּבְדֵּ֗ל בֵּ֤ין הַ מַּ֨יִם֙ אֲשֶׁר֙ מִ תַּ֣חַת לָ  רָקִ֔יעַ וּ בֵ֣ין הַ מַּ֔יִם אֲשֶׁ֖ר מֵ עַ֣ל לָ  רָקִ֑יעַ
Cmpl  GEN 13,16 לִ מְנֹות֙ אֶת עֲפַ֣ר הָ אָ֔רֶץ
                וְ שַׂמְתִּ֥י אֶֽת זַרְעֲךָ֖ כַּ עֲפַ֣ר הָ אָ֑רֶץ אֲשֶׁ֣ר אִם יוּכַ֣ל אִ֗ישׁ לִ מְנֹות֙ אֶת עֲפַ֣ר הָ אָ֔רֶץ גַּֽם זַרְעֲךָ֖ יִמָּנֶֽה
Cmpl  GEN 13,06 לָ שֶׁ֥בֶת יַחְדָּֽו
                וְ לֹ֥א יָֽכְל֖וּ לָ שֶׁ֥בֶת יַחְדָּֽו
Cmpl  GEN 04,07 אִם תֵּיטִיב֙
                הֲ לֹ֤וא אִם תֵּיטִיב֙ שְׂאֵ֔ת
CoVo  GEN 16,08 אֵֽי מִ זֶּ֥ה בָ֖את
                הָגָ֞ר שִׁפְחַ֥ת שָׂרַ֛י אֵֽי מִ זֶּ֥ה בָ֖את וְ אָ֣נָה תֵלֵ֑כִי
CoVo  GEN 15,02 מַה תִּתֶּן לִ֔י
                אֲדֹנָ֤י יֱהוִה֙ מַה תִּתֶּן לִ֔י וְ אָנֹכִ֖י הֹולֵ֣ךְ עֲרִירִ֑י
CoVo  GEN 23,11 שְׁמָעֵ֔נִי
                לֹֽא אֲדֹנִ֣י שְׁמָעֵ֔נִי
Coor  GEN 01,16 וְ לִ מְשֹׁל֙ בַּ  יֹּ֣ום וּ בַ  לַּ֔יְלָה
                וַ יִּתֵּ֥ן אֹתָ֛ם אֱלֹהִ֖ים בִּ רְקִ֣יעַ הַ שָּׁמָ֑יִם לְ הָאִ֖יר עַל הָ אָֽרֶץ וְ לִ מְשֹׁל֙ בַּ  יֹּ֣ום וּ בַ  לַּ֔יְלָה וּֽ לֲ הַבְדִּ֔יל בֵּ֥ין הָ אֹ֖ור וּ בֵ֣ין הַ חֹ֑שֶׁךְ
Coor  GEN 01,16 וּֽ לֲ הַבְדִּ֔יל בֵּ֥ין הָ אֹ֖ור וּ בֵ֣ין הַ חֹ֑שֶׁךְ
                וַ יִּתֵּ֥ן אֹתָ֛ם אֱלֹהִ֖ים בִּ רְקִ֣יעַ הַ שָּׁמָ֑יִם לְ הָאִ֖יר עַל הָ אָֽרֶץ וְ לִ מְשֹׁל֙ בַּ  יֹּ֣ום וּ בַ  לַּ֔יְלָה וּֽ לֲ הַבְדִּ֔יל בֵּ֥ין הָ אֹ֖ור וּ בֵ֣ין הַ חֹ֑שֶׁךְ
Coor  GEN 02,04 וְ כָל עֵ֥שֶׂב הַ שָּׂדֶ֖ה טֶ֣רֶם יִצְמָ֑ח
                בְּ יֹ֗ום עֲשֹׂ֛ות יְהוָ֥ה אֱלֹהִ֖ים אֶ֥רֶץ וְ שָׁמָֽיִם וְ כֹ֣ל שִׂ֣יחַ הַ שָּׂדֶ֗ה טֶ֚רֶם יִֽהְיֶ֣ה בָ  אָ֔רֶץ וְ כָל עֵ֥שֶׂב הַ שָּׂדֶ֖ה טֶ֣רֶם יִצְמָ֑ח
NA    GEN 01,01 בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַ שָּׁמַ֖יִם וְ אֵ֥ת הָ אָֽרֶץ
                בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַ שָּׁמַ֖יִם וְ אֵ֥ת הָ אָֽרֶץ
NA    GEN 01,02 וְ הָ אָ֗רֶץ הָיְתָ֥ה תֹ֨הוּ֙ וָ בֹ֔הוּ
                וְ הָ אָ֗רֶץ הָיְתָ֥ה תֹ֨הוּ֙ וָ בֹ֔הוּ
NA    GEN 01,02 וְ חֹ֖שֶׁךְ עַל פְּנֵ֣י תְהֹ֑ום
                וְ חֹ֖שֶׁךְ עַל פְּנֵ֣י תְהֹ֑ום
Objc  GEN 01,12 כִּי טֹֽוב
                וַ יַּ֥רְא אֱלֹהִ֖ים כִּי טֹֽוב
Objc  GEN 01,04 כִּי טֹ֑וב
                וַ יַּ֧רְא אֱלֹהִ֛ים אֶת הָ אֹ֖ור כִּי טֹ֑וב
Objc  GEN 01,10 כִּי טֹֽוב
                וַ יַּ֥רְא אֱלֹהִ֖ים כִּי טֹֽוב
PrAd  DAN 12,01 כָּת֥וּב בַּ  סֵּֽפֶר
                וּ בָ  עֵ֤ת הַ הִיא֙ יִמָּלֵ֣ט עַמְּךָ֔ כָּל הַ נִּמְצָ֖א כָּת֥וּב בַּ  סֵּֽפֶר
PreC  GEN 02,13 הַ סֹּובֵ֔ב אֵ֖ת כָּל אֶ֥רֶץ כּֽוּשׁ
                ה֣וּא הַ סֹּובֵ֔ב אֵ֖ת כָּל אֶ֥רֶץ כּֽוּשׁ
PreC  GEN 02,11 הַ סֹּבֵ֗ב אֵ֚ת כָּל אֶ֣רֶץ הַֽ חֲוִילָ֔ה
                ה֣וּא הַ סֹּבֵ֗ב אֵ֚ת כָּל אֶ֣רֶץ הַֽ חֲוִילָ֔ה אֲשֶׁר שָׁ֖ם הַ זָּהָֽב
PreC  GEN 02,14 הַֽ הֹלֵ֖ךְ קִדְמַ֣ת אַשּׁ֑וּר
                ה֥וּא הַֽ הֹלֵ֖ךְ קִדְמַ֣ת אַשּׁ֑וּר
Resu  GEN 02,04 וְ כֹ֣ל שִׂ֣יחַ הַ שָּׂדֶ֗ה טֶ֚רֶם יִֽהְיֶ֣ה בָ  אָ֔רֶץ
                בְּ יֹ֗ום עֲשֹׂ֛ות יְהוָ֥ה אֱלֹהִ֖ים אֶ֥רֶץ וְ שָׁמָֽיִם וְ כֹ֣ל שִׂ֣יחַ הַ שָּׂדֶ֗ה טֶ֚רֶם יִֽהְיֶ֣ה בָ  אָ֔רֶץ וְ כָל עֵ֥שֶׂב הַ שָּׂדֶ֖ה טֶ֣רֶם יִצְמָ֑ח
Resu  GEN 02,14 ה֥וּא פְרָֽת
                וְ הַ נָּהָ֥ר הָֽ רְבִיעִ֖י ה֥וּא פְרָֽת
Resu  GEN 02,17 לֹ֥א תֹאכַ֖ל מִמֶּ֑נּוּ
                וּ מֵ עֵ֗ץ הַ דַּ֨עַת֙ טֹ֣וב וָ רָ֔ע לֹ֥א תֹאכַ֖ל מִמֶּ֑נּוּ
RgRc  GEN 02,17 אֲכָלְךָ֥ מִמֶּ֖נּוּ
                כִּ֗י בְּ יֹ֛ום אֲכָלְךָ֥ מִמֶּ֖נּוּ מֹ֥ות תָּמֽוּת
RgRc  GEN 02,04 עֲשֹׂ֛ות יְהוָ֥ה אֱלֹהִ֖ים אֶ֥רֶץ וְ שָׁמָֽיִם
                בְּ יֹ֗ום עֲשֹׂ֛ות יְהוָ֥ה אֱלֹהִ֖ים אֶ֥רֶץ וְ שָׁמָֽיִם וְ כֹ֣ל שִׂ֣יחַ הַ שָּׂדֶ֗ה טֶ֚רֶם יִֽהְיֶ֣ה בָ  אָ֔רֶץ וְ כָל עֵ֥שֶׂב הַ שָּׂדֶ֖ה טֶ֣רֶם יִצְמָ֑ח
RgRc  GEN 03,05 אֲכָלְכֶ֣ם מִמֶּ֔נּוּ
                כִּ֚י יֹדֵ֣עַ אֱלֹהִ֔ים כִּ֗י בְּ יֹום֙ אֲכָלְכֶ֣ם מִמֶּ֔נּוּ וְ נִפְקְח֖וּ עֵֽינֵיכֶ֑ם וִ הְיִיתֶם֙ כֵּֽ אלֹהִ֔ים יֹדְעֵ֖י טֹ֥וב וָ רָֽע
Spec  GEN 48,07 לָ בֹ֣א אֶפְרָ֑תָה
                וַ אֲנִ֣י בְּ בֹאִ֣י מִ פַּדָּ֗ן מֵ֩תָה֩ עָלַ֨י רָחֵ֜ל בְּ אֶ֤רֶץ כְּנַ֨עַן֙ בַּ  דֶּ֔רֶךְ בְּ עֹ֥וד כִּבְרַת אֶ֖רֶץ לָ בֹ֣א אֶפְרָ֑תָה
Spec  GEN 42,06 ה֥וּא
                וְ יֹוסֵ֗ף ה֚וּא הַ שַּׁלִּ֣יט עַל הָ אָ֔רֶץ ה֥וּא הַ מַּשְׁבִּ֖יר לְ כָל עַ֣ם הָ אָ֑רֶץ
Spec  EXO 35,09 לְ שָׁרֵ֣ת בַּ  קֹּ֑דֶשׁ
                וְ כָל חֲכַם לֵ֖ב בָּכֶ֑ם יָבֹ֣אוּ וְ יַעֲשׂ֔וּ אֵ֛ת כָּל אֲשֶׁ֥ר צִוָּ֖ה יְהוָֽה אֶת הַ֨ מִּשְׁכָּ֔ן אֶֽת אָהֳלֹ֖ו וְ אֶת מִכְסֵ֑הוּ אֶת קְרָסָיו֙ וְ אֶת קְרָשָׁ֔יו אֶת ֯בריחו אֶת עַמֻּדָ֖יו וְ אֶת אֲדָנָֽיו אֶת הָ אָרֹ֥ן וְ אֶת בַּדָּ֖יו אֶת הַ כַּפֹּ֑רֶת וְ אֵ֖ת פָּרֹ֥כֶת הַ מָּסָֽךְ אֶת הַ שֻּׁלְחָ֥ן וְ אֶת בַּדָּ֖יו וְ אֶת כָּל כֵּלָ֑יו וְ אֵ֖ת לֶ֥חֶם הַ פָּנִֽים וְ אֶת מְנֹרַ֧ת הַ מָּאֹ֛ור וְ אֶת כֵּלֶ֖יהָ וְ אֶת נֵרֹתֶ֑יהָ וְ אֵ֖ת שֶׁ֥מֶן הַ מָּאֹֽור וְ אֶת מִזְבַּ֤ח הַ קְּטֹ֨רֶת֙ וְ אֶת בַּדָּ֔יו וְ אֵת֙ שֶׁ֣מֶן הַ מִּשְׁחָ֔ה וְ אֵ֖ת קְטֹ֣רֶת הַ סַּמִּ֑ים וְ אֶת מָסַ֥ךְ הַ פֶּ֖תַח לְ פֶ֥תַח הַ מִּשְׁכָּֽן אֵ֣ת מִזְבַּ֣ח הָ עֹלָ֗ה וְ אֶת מִכְבַּ֤ר הַ נְּחֹ֨שֶׁת֙ אֲשֶׁר לֹ֔ו אֶת בַּדָּ֖יו וְ אֶת כָּל כֵּלָ֑יו אֶת הַ כִּיֹּ֖ר וְ אֶת כַּנֹּֽו אֵ֚ת קַלְעֵ֣י הֶ חָצֵ֔ר אֶת עַמֻּדָ֖יו וְ אֶת אֲדָנֶ֑יהָ וְ אֵ֕ת מָסַ֖ךְ שַׁ֥עַר הֶ חָצֵֽר אֶת יִתְדֹ֧ת הַ מִּשְׁכָּ֛ן וְ אֶת יִתְדֹ֥ת הֶ חָצֵ֖ר וְ אֶת מֵיתְרֵיהֶֽם אֶת בִּגְדֵ֥י הַ שְּׂרָ֖ד לְ שָׁרֵ֣ת בַּ  קֹּ֑דֶשׁ אֶת בִּגְדֵ֤י הַ קֹּ֨דֶשׁ֙ לְ אַהֲרֹ֣ן הַ כֹּהֵ֔ן וְ אֶת בִּגְדֵ֥י בָנָ֖יו לְ כַהֵֽן
Subj  GEN 24,50 דַּבֵּ֥ר אֵלֶ֖יךָ רַ֥ע אֹו טֹֽוב
                לֹ֥א נוּכַ֛ל דַּבֵּ֥ר אֵלֶ֖יךָ רַ֥ע אֹו טֹֽוב
Subj  GEN 02,18 הֱיֹ֥ות הָֽ אָדָ֖ם לְ בַדֹּ֑ו
                לֹא טֹ֛וב הֱיֹ֥ות הָֽ אָדָ֖ם לְ בַדֹּ֑ו
Subj  GEN 04,17 בֹּ֣נֶה עִ֔יר
                וַֽ יְהִי֙ בֹּ֣נֶה עִ֔יר

In [7]:

ccrs = {
 'Adju': 'adjunct clause',
 'Attr': 'attributive clause',
 'Cmpl': 'complement clause, but not subject or object',
 'CoVo': 'continuation of the vocative',
 'Coor': 'coordination',
 'Objc': 'object clause',
 'PrAd': 'predicative adjunct clause',
 'PreC': 'predicative complement clause',
 'Resu': 'clause after resumptive extrapolated fronted element',
 'RgRc': 'Regens rectum (governing governed)',
 'Spec': 'Specification clause',
 'Subj': 'subject clause',
 'NA': 'not known/not marked',
}

Subject¶

clause that has the function of subject

Object¶

clause that has the function of object

Complement¶

clause that has a function of a verb complement, but not subject or object

Attributive¶

clause that has an attributive function (often with a relative pronoun)

Adjunct¶

clauses with additional information, usually without a finite verb

Predicative clause¶

clause that has a predicative function

Coordination¶

multiple dependent clauses coordinated (with and, or etc) to each other under the same head (a main clause or a phrase (asher))

Continuation of the vocative¶

clause that follows after a vocative: Adam, where are you.

Resumptive¶

King David, Nathan the prohpet spoke severly to him [here King David is casus pendens or extrapolated element.

Regens Rectum¶

You shall reign over the birds and the animals and all creeps on the face of the earth [Here all governs the reptiles]

none¶

No clause constituent relation marked.

Phrase atom types¶

In [8]:

pats = set()
for i in NN(test=F.otype.v, values=['phrase_atom']):
    pat = F.typ.v(i)
    pats.add(pat)
pats

Out[8]:

{'AdjP',
 'AdvP',
 'CP',
 'DPrP',
 'IPrP',
 'InjP',
 'InrP',
 'NP',
 'NegP',
 'PP',
 'PPrP',
 'PrNP',
 'VP'}

Tense¶

In [9]:

tenses = set()
for i in NN(test=F.otype.v, values=['word']):
    tense = F.vt.v(i)
    tenses.add(tense)
tenses

Out[9]:

{'NA', 'impf', 'impv', 'infa', 'infc', 'perf', 'ptca', 'ptcp', 'wayq'}

Pronominal suffixes (paradigmatic, graphical, plain)¶

In [10]:

ppss = set()
gpss = set()
for i in NN(test=F.otype.v, values=['word']):
    pps = F.prs.v(i)
    gps = F.g_prs.v(i)
    ppss.add(pps)
    gpss.add(gps)
    
output = "paradigmatic pronoun suffix: "
output += ", ".join(sorted(ppss))
output += "\n" + "graphical pronoun suffix: "
output += ", ".join(sorted(gpss))
print(output)

paradigmatic pronoun suffix: H, H=, HJ, HM, HN, HW, HWN, J, K, K=, KM, KN, KWN, M, MW, N, N>, NJ, NW, W, absent, n/a
graphical pronoun suffix: , +, +:@K@, +:AHOM, +:AHOWN, +:AK@, +:AK@H, +:AKEM, +:AKOM, +:AKOWN, +:H@, +:HEM, +:HEN, +:HOM, +:HOWM, +:HOWN, +:HW., +:K@, +:K@H, +:KEM, +:KEN, +:KOM, +:KOWN, +:NIJ, +:NW., +;>, +;H., +;HW., +;K, +;K:, +;K;H, +;KIJ, +;M, +;MOW, +;N.@H, +;NIJ, +;NW., +>, +@>, +@H, +@H,, +@H., +@H:N@H, +@H;M, +@H;N, +@HAM, +@HEM, +@HEN, +@HW., +@K:, +@K@H, +@KEM, +@KEN@H, +@M, +@MOW, +@N, +@N@H, +@NIJ, +@NW., +A, +AH., +AJ, +AM, +AN, +AN.IJ, +AN@>, +ANIJ, +D, +EH@, +EK:, +EK@, +EK@H, +EM, +EN@H, +H, +H., +H.EM, +H;M@H, +H;N, +H>, +H@, +HEM, +HEN, +HEN@H, +HIJ, +HM, +HOM, +HOWN, +HW, +HW., +HWN, +IJ, +IK, +IK:, +J, +JNJ, +K, +K.@, +K.@H, +K.EM, +K:, +K@, +K@H, +KEM, +KEN, +KEN@H, +KIJ, +KJ, +KM, +KOWN, +M, +MOW, +MW., +N, +N>, +N@>, +NH, +NIJ, +NJ, +NW, +NW., +OH, +OW, +W, +W., +WMW, +WNJ, +WW

Verbal ending (paradigmatic)¶

In [11]:

pves = set()
for i in NN(test=F.otype.v, value='word'):
    pve = F.vbe.v(i)
    pves.add(pve)
pves

Out[11]:

{'',
 'H',
 'H=',
 'J',
 'JN',
 'N',
 'N>',
 'NH',
 'NW',
 'T',
 'T=',
 'T==',
 'TJ',
 'TM',
 'TN',
 'TWN',
 'W',
 'WN',
 'n/a'}

Conventions for all tasks¶

In [12]:

pos_table = {
 'adjv': 'aj',
 'advb': 'av',
 'art': 'dt',
 'conj': 'cj',
 'intj': 'ij',
 'inrg': 'ir',
 'nega': 'ng',
 'subs': 'n',
 'nmpr': 'n-pr',
 'prep': 'pp',
 'prps': 'pr-ps',
 'prde': 'pr-dem',
 'prin': 'pr-int',
 'verb': 'vb',
}

pron_suffix_table = {
 '',
 'H',
 'H=',
 'J',
 'JN',
 'N',
 'N>',
 'NH',
 'NW',
 'T',
 'T=',
 'T==',
 'TJ',
 'TM',
 'TN',
 'TWN',
 'W',
 'WN',
 'n/a',
}

Task: Participles in Clause Atoms¶

Specification¶

We want to analyse participles in their clause-atoms, not in their full clauses. We are particularly interested in verbal complements that these participles have in their clause-atom.

Let us start with pronominal suffixes attached to the participle.

We need to find all words marked with tense=ptca or tense=ptcp. From there, we need all surrounding words in the same clause-atom. Of all words, we need the sp and the pdp features, and of the participle we need the prs as well.

We output a tab delimited file.

One row per participle, containing the following fields:

sequence number | passage label |

pos-tags of words before | pos tag of ptc | pronoun suffix (paradigmatic) | pos-tags of words after |

plain text of words after | pronoun suffix (plain) | plain text of ptc | plain text of words before

Every participle is shown within its clause-atom.

If there are several participles in the same clause-atom, we put every participle in a separate row.

Execute the task: data collection¶

In [14]:

msg("Get the participles...")

book = None
chapter = None
verse = None
label = None

found_total = 0
found_in_book = 0
found_total = 0
clause_atoms = []
current_clause = []
has_participle = None

for i in NN(test=F.otype.v, values=['book', 'chapter', 'verse', 'clause_atom', 'word']):
    otype = F.otype.v(i)
    if otype == 'word':
        tense = F.vt.v(i)
        is_participle = tense == 'ptca' or tense == 'ptcp'
        pron_suff_para = None
        if is_participle:
            has_participle = True
            pron_suff_para = F.prs.v(i)
            found_total += 1
        pos = pos_table[F.sp.v(i)]
        pdpos = pos_table[F.pdp.v(i)]
        rpos = pos if pos == pdpos else "{}~{}".format(pos, pdpos)
        current_clause.append((
            is_participle,
            F.g_cons_utf8.v(i),
            rpos,
            pron_suff_para,
        ))
    elif otype == 'clause_atom':
        if has_participle:
            clause_atoms.append((label, current_clause))
            found_in_book += 1
        current_clause = []
        has_participle = False
    elif otype == 'book':
        if book != None:
            msg("{} ({})".format(book, found_in_book), withtime=False)
            found_in_book = 0
        book = F.book.v(i)
    elif otype == 'chapter':
        chapter = F.chapter.v(i)
    elif otype == 'verse':
        verse = F.verse.v(i)
        label = "{} {}:{}".format(book, chapter, verse)
if has_participle:
    clause_atoms.append((label, current_clause))
msg("{} ({})".format(book, found_in_book), withtime=False)

msg("Found {} participles in {} clause atoms".format(found_total, len(clause_atoms)))

 1m 10s Get the participles...
Genesis (354)
Exodus (340)
Leviticus (232)
Numeri (383)
Deuteronomium (435)
Josua (187)
Judices (227)
Samuel_I (325)
Samuel_II (251)
Reges_I (287)
Reges_II (290)
Jesaia (807)
Jeremia (743)
Ezechiel (505)
Hosea (71)
Joel (28)
Amos (82)
Obadia (7)
Jona (15)
Micha (72)
Nahum (48)
Habakuk (28)
Zephania (44)
Haggai (10)
Sacharia (132)
Maleachi (51)
Psalmi (907)
Iob (217)
Proverbia (490)
Ruth (34)
Canticum (63)
Ecclesiastes (126)
Threni (58)
Esther (108)
Daniel (335)
Esra (110)
Nehemia (204)
Chronica_I (193)
Chronica_II (355)
 1m 14s Found 9664 participles in 9154 clause atoms

Execute the task: formatting output¶

In [15]:

split_clause_atoms = []
for (label, clause) in clause_atoms:
    ptcs = [n for (n, w) in enumerate(clause) if w[0]]
    for ptc in ptcs:
        split_clause_atoms.append((
            label,
            clause[0:ptc],
            clause[ptc],
            clause[ptc+1:len(clause)] if ptc < len(clause) - 1 else [],
        ))

In [16]:

ptc_cl_atoms = outfile("ptc_cl_atoms.csv")
ptc_cl_atoms.write("n\tpassage\tp_pre\tp_ptc\tp_suff\tp_post\tt_post\tt_ptc\tt_pre\n")
for (n, (label, pre, ptc, post)) in enumerate(split_clause_atoms):
    fields = [str(n+1), label]
    fields.append("|".join([w[2] for w in pre]))
    fields.append(ptc[2])
    fields.append(ptc[3])
    fields.append("|".join([w[2] for w in post]))
    fields.append(" ".join([w[1] for w in post]))
    fields.append(ptc[1])
    fields.append(" ".join([w[1] for w in pre]))
    ptc_cl_atoms.write("{}\n".format("\t".join(fields)))
close()

 1m 20s Results directory:
/Users/dirk/laf-fabric-output/etcbc4/participle

__log__participle.txt                  1084 Tue Jul 15 18:41:24 2014
ptc_cl_atoms.csv                     825100 Tue Jul 15 18:41:24 2014

Playing with the output¶

First of all: I opened the ptc_tab.csv (a tab delimited file) in OpenOffice, and there I formatted some rows and columns, defined a region, and sorted the rows. The result I saved in ptc_tab.ods (also on GitHub, same directory as this notebook).

Let's get an impression of what we've got in our tab delimited file.

In [17]:

%matplotlib inline
import pandas
from IPython.display import display
pandas.set_option('display.notebook_repr_html', True)

In [18]:

table_file = my_file('ptc_cl_atoms.csv')
df = pandas.read_csv(table_file, sep="\t", keep_default_na=False, na_values=[])
df.head(10)

Out[18]:

	n	passage	p_pre	p_ptc	p_suff	p_post	t_post	t_ptc	t_pre
0	1	Genesis 1:3	cj\|n\|n	vb	absent	pp\|n\|dt\|n	על פני ה מים	מרחפת	ו רוח אלהים
1	2	Genesis 1:7	cj\|vb	vb	absent	n~pp\|n\|pp\|n	בין מים ל מים	מבדיל	ו יהי
2	3	Genesis 1:11		vb	absent	n	זרע	מזריע
3	4	Genesis 1:11		vb	absent	n\|pp\|n	פרי ל מינו	עשׂה
4	5	Genesis 1:12		vb	absent	n\|pp\|n	זרע ל מינהו	מזריע
5	6	Genesis 1:12		vb	absent	n	פרי	עשׂה
6	7	Genesis 1:21	dt~cj	vb	absent			רמשׂת	ה
7	8	Genesis 1:27	dt~cj	vb	absent	pp\|dt\|n	על ה ארץ	רמשׂ	ה
8	9	Genesis 1:29	dt~cj	vb	absent	pp\|dt\|n	על ה ארץ	רמשׂת	ה
9	10	Genesis 1:29		vb	absent	n	זרע	זרע

10 rows × 9 columns