Notebook

נָתַן and locatives¶

We try to mark all those occurrences of נָתַן + object + complement. What we want to mark is whether the complement is a proper indirect object or a locative.

This is part of Janet's project of creating a valence dictionary and using a flowchart to arrive at the meaning of a verb in its context of complements.

First we use an MQL query to get all occurrences of נָתַן with an object and a complement. Then we apply a few heuristics to detect those cases where the complement is a locative or an indirect object.

The query is also on SHEBANQ, a version by Dirk and a version by Janet.

In [1]:

import sys
import collections
import subprocess

from lxml import etree

import laf
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
fabric = LafFabric()

  0.00s This is LAF-Fabric 4.5.0
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: http://shebanq-doc.readthedocs.org/en/latest/texts/welcome.html

In [14]:

version = '4b'
API = fabric.load('etcbc{}'.format(version), 'lexicon', 'ntn', {
    "xmlids": {"node": False, "edge": False},
    "features": ('''
        oid otype monads
        function
        g_word_utf8 trailer_utf8
        lex prs sp nametype
        book chapter verse label number
    ''',''),
    "prepare": prepare,
    "primary": False,
}, verbose='DETAIL')
exec(fabric.localnames.format(var='fabric'))

  0.00s LOADING API: please wait ... 
  0.00s DETAIL: COMPILING m: UP TO DATE
  0.10s INFO: USING DATA COMPILED AT: 2015-05-04T13-46-20
  0.10s DETAIL: COMPILING a: UP TO DATE
  0.10s INFO: USING DATA COMPILED AT: 2015-05-04T14-07-34
  0.11s DETAIL: keep main: G.node_anchor_min
  0.11s DETAIL: keep main: G.node_anchor_max
  0.11s DETAIL: keep main: G.node_sort
  0.12s DETAIL: keep main: G.node_sort_inv
  0.12s DETAIL: keep main: G.edges_from
  0.12s DETAIL: keep main: G.edges_to
  0.12s DETAIL: keep main: F.etcbc4_db_monads [node] 
  0.12s DETAIL: keep main: F.etcbc4_db_oid [node] 
  0.12s DETAIL: keep main: F.etcbc4_db_otype [node] 
  0.12s DETAIL: keep main: F.etcbc4_ft_function [node] 
  0.12s DETAIL: keep main: F.etcbc4_ft_g_word_utf8 [node] 
  0.12s DETAIL: keep main: F.etcbc4_ft_lex [node] 
  0.12s DETAIL: keep main: F.etcbc4_ft_number [node] 
  0.13s DETAIL: keep main: F.etcbc4_ft_prs [node] 
  0.13s DETAIL: keep main: F.etcbc4_ft_sp [node] 
  0.13s DETAIL: keep main: F.etcbc4_lex_nametype [node] 
  0.13s DETAIL: keep main: F.etcbc4_sft_book [node] 
  0.13s DETAIL: keep main: F.etcbc4_sft_chapter [node] 
  0.13s DETAIL: keep main: F.etcbc4_sft_label [node] 
  0.13s DETAIL: keep main: F.etcbc4_sft_verse [node] 
  0.13s DETAIL: keep annox: F.etcbc4_db_monads [node] 
  0.14s DETAIL: keep annox: F.etcbc4_db_oid [node] 
  0.14s DETAIL: keep annox: F.etcbc4_db_otype [node] 
  0.14s DETAIL: keep annox: F.etcbc4_ft_function [node] 
  0.14s DETAIL: keep annox: F.etcbc4_ft_g_word_utf8 [node] 
  0.14s DETAIL: keep annox: F.etcbc4_ft_lex [node] 
  0.14s DETAIL: keep annox: F.etcbc4_ft_number [node] 
  0.14s DETAIL: keep annox: F.etcbc4_ft_prs [node] 
  0.14s DETAIL: keep annox: F.etcbc4_ft_sp [node] 
  0.14s DETAIL: keep annox: F.etcbc4_lex_nametype [node] 
  0.14s DETAIL: keep annox: F.etcbc4_sft_book [node] 
  0.14s DETAIL: keep annox: F.etcbc4_sft_chapter [node] 
  0.14s DETAIL: keep annox: F.etcbc4_sft_label [node] 
  0.15s DETAIL: keep annox: F.etcbc4_sft_verse [node] 
  0.15s DETAIL: load main: F.etcbc4_ft_trailer_utf8 [node] 
  0.38s DETAIL: load annox: F.etcbc4_ft_trailer_utf8 [node] 
  0.39s DETAIL: prep prep: G.node_sort
  0.46s DETAIL: prep prep: G.node_sort_inv
  1.11s DETAIL: prep prep: L.node_up
  7.22s DETAIL: prep prep: L.node_down
    17s INFO: DATA LOADED FROM SOURCE etcbc4b AND ANNOX lexicon FOR TASK ntn AT 2015-05-28T10-20-03

For each result, we write out a line of information. Here is a description of the columns.

order in what order the Predicate, Object, and Complement have been encountered
verb the verb occurrence in vocalised Hebrew
object the text of the complete (direct) object in Hebrew
# loc lexemes how many distinct lexemes with a locative meaning occur in the complement (given by a fixed list)
# topo how many lexemes with nametype = topo occur in the complement (nametype is a feature of the lexicon)
# prep_b how many occurrences of the preposition B occur in the complement
locativity a crude measure of the locativity of the complement, just the sum of # loc lexemes, # topo, and # prep_b
# prep_l how many occurrences of the preposition L with a pronominal suffix on it occur in the complement
# L prop how many occurrences of L plus proper name occur in the complement
indirect object a crude indicator of whether the complement is an indirect object, just the sum of # prep_l and # L prop
complement text the text of the complete complement as a sequence of transcribed, consonantal lexemes
clause atom number of the clause_atom containing the predicate with NTN
clause text the text of the complete clause

In [37]:

locative_lexemes = {
    '>RY/',
    'BJT/',
    'DRK/',
    'HR/',
    'JM/',
    'JRDN/',
    'JRWCLM/',
    'JFR>L/',
    'MDBR/',
    'MW<D/',
    'MZBX/',
    'MYRJM/',
    'MQWM/',
    'SBJB/',
    '<JR/',
    'FDH/',
    'CM',
    'CMJM/',
    'CMC/',
    'C<R/',
}
no_prs = {'absent', 'n/a'}

statclass = {
    'o': 'info',
    '+': 'good',
    '-': 'error',
    '?': 'warning',
    '!': 'special',
    '*': 'note',
}
statsym = dict((x[1], x[0]) for x in statclass.items())

def cert_status(cert):
    if cert == 0: return 'error'
    elif cert == 1: return 'warning'
    elif cert <= 10: return 'good'
    else: return 'special'

tsvfile = outfile('ntn.csv')
notefile = outfile('ntn-note.csv')
nresults = 0
nclauses = 0
orders = collections.Counter()
certs = collections.Counter()
tsvfile.write('book\tchapter\tverse\torder\tverb\tobject\tloc\tloc\tloc\tloc\tind\tind\tind\tcomplement text\tca_num\tclause text\n')
tsvfile.write('book\tchapter\tverse\torder\tverb\tobject\t# loc lexemes\t# topo\t# prep_b\tlocativity\t# prep_l\t# L prop\tindirect object\tcomplement text\tca_num\tclause text\n')
pclass = collections.Counter()
pclass['LI'] = 0
notefile.write('{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\n'.format(
    'version', 'book', 'chapter', 'verse', 'clause_atom', 'is_shared', 'is_published', 'status', 'keywords', 'ntext',
))
keywords = 'ntn-loca'
is_shared = 'T'
is_published = ''
status = statsym['info']
ntext_fmt = 'locative versus indirect object: L={} I={}; {}'

climit = 900

kws = ''.join(' {} '.format(k) for k in set(keywords.strip().split()))

for clause in F.otype.s('clause'):
    nclauses += 1
    phrases = {}
    order = ''
    verb = None
    for phrase in L.d('phrase', clause):
        pf = F.function.v(phrase)
        if pf in {'Pred', 'Objc', 'Cmpl'}:
            words = L.d('word', phrase)
            if pf not in phrases:
                order += pf[0]
                phrases[pf] = words
            else:
                phrases[pf].extend(words)
    is_ntn = False

    for w in phrases.get('Pred', []):
        if F.sp.v(w) == 'verb' and F.lex.v(w) == 'NTN[':
            is_ntn = True
            verb = w
            break
    if not is_ntn: continue
    nresults += 1    
    orders[order] += 1    

    book = F.book.v(L.u('book', verb))
    chapter = F.chapter.v(L.u('chapter', verb))
    verse = F.verse.v(L.u('verse', verb))    
    clause_atom = F.number.v(L.u('clause_atom', verb))
    
    verb_txt = F.g_word_utf8.v(verb)
    obj_txt = ''.join(F.g_word_utf8.v(x)+F.trailer_utf8.v(x) for x in phrases.get('Objc', []))
    cmpl_txt = ''.join(F.g_word_utf8.v(x)+F.trailer_utf8.v(x) for x in phrases.get('Cmpl', []))
    if len(cmpl_txt) > climit:
        cmpl_txt = cmpl_txt[0:climit]+'...'
    clause_txt = ''.join(F.g_word_utf8.v(x)+F.trailer_utf8.v(x) for x in L.d('word', clause))

    compl_wnodes = phrases.get('Cmpl', [])
    compl_lexemes = [F.lex.v(w) for w in compl_wnodes]
    compl_lset = set(compl_lexemes)
    lex_locativity = len(locative_lexemes & compl_lset)
    prep_b = len([x for x in compl_lexemes if x == 'B'])
    prep_l = len([x for x in compl_wnodes if F.lex.v(x) == 'L' and F.prs.v(x) not in no_prs])
    prep_lpr = 0
    lwn = len(compl_wnodes)
    for (n, wn) in enumerate(compl_wnodes):
        if F.lex.v(wn) == 'L':
            if n+1 < lwn:
                if F.sp.v(compl_wnodes[n+1]) == 'nmpr':
                    prep_lpr += 1
    topo = len([x for x in compl_wnodes if F.nametype.v(x) == 'topo'])

    loca = lex_locativity + topo + prep_b
    indi = prep_l + prep_lpr

    this_class = ''
    this_class += 'L' if loca else ''
    this_class += 'I' if indi else ''
    pclass[this_class] += 1
    
    tsvfile.write('{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}'.format(
        book, 
        chapter, 
        verse,
        order,
        verb_txt,
        obj_txt,
        lex_locativity,
        topo,
        prep_b,
        loca,
        prep_l,
        prep_lpr,
        indi,
        ' '.join(compl_lexemes),
        clause_atom,
        clause_txt,
    ).replace('\n', ' ')+'\n')
    
    ntext = ntext_fmt.format(loca, indi, cmpl_txt)
    certainty = abs(loca - indi) * max((loca, indi))
    certs[certainty] += 1
    status = statsym[cert_status(certainty)]
    notefile.write('{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}'.format(
        version, book, chapter, verse, clause_atom, is_shared, is_published, status, kws, ntext,
    ).replace('\n', ' ')+'\n')
    
tsvfile.close()
notefile.close()
for order in sorted(orders):
    print("{:<5}: {:>3} results".format(order, orders[order]))

for cert in sorted(certs):
    print("{:>5} = {:<8}: {:>3} results".format(cert, cert_status(cert), certs[cert]))

for this_class in pclass:
    print("{:<2}: {:>3} results".format(this_class, pclass[this_class]))
print('Total: {:>3} results in {} clauses'.format(nresults, nclauses))

CP   :  22 results
CPO  :  59 results
OCP  :  17 results
OP   :  32 results
OPC  : 139 results
P    :  60 results
PC   : 351 results
PCO  : 372 results
PO   : 200 results
POC  : 364 results
    0 = error   : 790 results
    1 = warning : 736 results
    2 = good    :   1 results
    4 = good    :  57 results
    9 = good    :   9 results
   16 = special :   5 results
   25 = special :   1 results
   36 = special :   2 results
   49 = special :   1 results
  156 = special :   6 results
  169 = special :   5 results
  196 = special :   2 results
  676 = special :   1 results
I : 497 results
  : 781 results
L : 322 results
LI:  16 results
Total: 1616 results in 87900 clauses

In [27]:

!head -n 10 {my_file('ntn.csv')}

book	chapter	verse	order	verb	object	loc	loc	loc	loc	ind	ind	ind	complement text	ca_num	clause text
book	chapter	verse	order	verb	object	# loc lexemes	# topo	# prep_b	locativity	# prep_l	# L prop	indirect object	complement text	ca_num	clause text
Genesis	1	17	POC	יִּתֵּ֥ן	אֹתָ֛ם 	1	0	1	2	0	0	0	B RQJ</ H CMJM/	67	וַיִּתֵּ֥ן אֹתָ֛ם אֱלֹהִ֖ים בִּרְקִ֣יעַ הַשָּׁמָ֑יִם 
Genesis	1	29	PCO	נָתַ֨תִּי	אֶת־כָּל־עֵ֣שֶׂב ׀ וְאֶת־כָּל־הָעֵ֛ץ 	0	0	0	0	1	0	1	L	121	הִנֵּה֩ נָתַ֨תִּי לָכֶ֜ם אֶת־כָּל־עֵ֣שֶׂב ׀ וְאֶת־כָּל־הָעֵ֛ץ 
Genesis	3	6	PC	תִּתֵּ֧ן		0	0	0	0	0	0	0	GM L >JC/	258	וַתִּתֵּ֧ן גַּם־לְאִישָׁ֛הּ עִמָּ֖הּ 
Genesis	3	12	PC	נָתַ֣תָּה		0	0	0	0	0	0	0	<MD/	285	אֲשֶׁ֣ר נָתַ֣תָּה עִמָּדִ֔י 
Genesis	3	12	PC	נָֽתְנָה		0	0	0	0	0	0	0	MN H <Y/	286	הִ֛וא נָֽתְנָה־לִּ֥י מִן־הָעֵ֖ץ 
Genesis	4	12	POC	תֵּת	כֹּחָ֖הּ 	0	0	0	0	1	0	1	L	385	תֵּת־כֹּחָ֖הּ לָ֑ךְ 
Genesis	9	2	CP	נִתָּֽנוּ		0	0	1	1	0	0	0	B JD/	762	בְּיֶדְכֶ֥ם נִתָּֽנוּ׃

In [28]:

!head -n 10 {my_file('ntn-note.csv')}

version	book	chapter	verse	clause_atom	is_shared	is_published	status	keywords	ntext
4b	Genesis	1	17	67	T		+	ntn-loca	locative versus indirect object: L=2 I=0; בִּרְקִ֣יעַ הַשָּׁמָ֑יִם 
4b	Genesis	1	29	121	T		?	ntn-loca	locative versus indirect object: L=0 I=1; לָכֶ֜ם 
4b	Genesis	3	6	258	T		-	ntn-loca	locative versus indirect object: L=0 I=0; גַּם־לְאִישָׁ֛הּ 
4b	Genesis	3	12	285	T		-	ntn-loca	locative versus indirect object: L=0 I=0; עִמָּדִ֔י 
4b	Genesis	3	12	286	T		-	ntn-loca	locative versus indirect object: L=0 I=0; מִן־הָעֵ֖ץ 
4b	Genesis	4	12	385	T		?	ntn-loca	locative versus indirect object: L=0 I=1; לָ֑ךְ 
4b	Genesis	9	2	762	T		?	ntn-loca	locative versus indirect object: L=1 I=0; בְּיֶדְכֶ֥ם 
4b	Genesis	9	3	766	T		?	ntn-loca	locative versus indirect object: L=0 I=1; לָכֶ֖ם 
4b	Genesis	9	13	797	T		?	ntn-loca	locative versus indirect object: L=1 I=0; בֶּֽעָנָ֑ן

Download the result files from my SURFdrive: tab separated file and a formatted openoffice spreadsheet

Generate a manual annotation file¶

We need per note:

Data version
Book
Chapter number
Verse number
Clause atom number
Status
Message

In [ ]: