We try to mark all those occurrences of נָתַן + object + complement. What we want to mark is whether the complement is a proper indirect object or a locative.
This is part of Janet's project of creating a valence dictionary and using a flowchart to arrive at the meaning of a verb in its context of complements.
First we use an MQL query to get all occurrences of נָתַן with an object and a complement. Then we apply a few heuristics to detect those cases where the complement is a locative or an indirect object.
The query is also on SHEBANQ, a version by Dirk and a version by Janet.
import sys
import collections
import subprocess
from lxml import etree
import laf
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
fabric = LafFabric()
0.00s This is LAF-Fabric 4.5.0 API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html Feature doc: http://shebanq-doc.readthedocs.org/en/latest/texts/welcome.html
version = '4b'
API = fabric.load('etcbc{}'.format(version), 'lexicon', 'ntn', {
"xmlids": {"node": False, "edge": False},
"features": ('''
oid otype monads
function
g_word_utf8 trailer_utf8
lex prs sp nametype
book chapter verse label number
''',''),
"prepare": prepare,
"primary": False,
}, verbose='DETAIL')
exec(fabric.localnames.format(var='fabric'))
0.00s LOADING API: please wait ... 0.00s DETAIL: COMPILING m: UP TO DATE 0.10s INFO: USING DATA COMPILED AT: 2015-05-04T13-46-20 0.10s DETAIL: COMPILING a: UP TO DATE 0.10s INFO: USING DATA COMPILED AT: 2015-05-04T14-07-34 0.11s DETAIL: keep main: G.node_anchor_min 0.11s DETAIL: keep main: G.node_anchor_max 0.11s DETAIL: keep main: G.node_sort 0.12s DETAIL: keep main: G.node_sort_inv 0.12s DETAIL: keep main: G.edges_from 0.12s DETAIL: keep main: G.edges_to 0.12s DETAIL: keep main: F.etcbc4_db_monads [node] 0.12s DETAIL: keep main: F.etcbc4_db_oid [node] 0.12s DETAIL: keep main: F.etcbc4_db_otype [node] 0.12s DETAIL: keep main: F.etcbc4_ft_function [node] 0.12s DETAIL: keep main: F.etcbc4_ft_g_word_utf8 [node] 0.12s DETAIL: keep main: F.etcbc4_ft_lex [node] 0.12s DETAIL: keep main: F.etcbc4_ft_number [node] 0.13s DETAIL: keep main: F.etcbc4_ft_prs [node] 0.13s DETAIL: keep main: F.etcbc4_ft_sp [node] 0.13s DETAIL: keep main: F.etcbc4_lex_nametype [node] 0.13s DETAIL: keep main: F.etcbc4_sft_book [node] 0.13s DETAIL: keep main: F.etcbc4_sft_chapter [node] 0.13s DETAIL: keep main: F.etcbc4_sft_label [node] 0.13s DETAIL: keep main: F.etcbc4_sft_verse [node] 0.13s DETAIL: keep annox: F.etcbc4_db_monads [node] 0.14s DETAIL: keep annox: F.etcbc4_db_oid [node] 0.14s DETAIL: keep annox: F.etcbc4_db_otype [node] 0.14s DETAIL: keep annox: F.etcbc4_ft_function [node] 0.14s DETAIL: keep annox: F.etcbc4_ft_g_word_utf8 [node] 0.14s DETAIL: keep annox: F.etcbc4_ft_lex [node] 0.14s DETAIL: keep annox: F.etcbc4_ft_number [node] 0.14s DETAIL: keep annox: F.etcbc4_ft_prs [node] 0.14s DETAIL: keep annox: F.etcbc4_ft_sp [node] 0.14s DETAIL: keep annox: F.etcbc4_lex_nametype [node] 0.14s DETAIL: keep annox: F.etcbc4_sft_book [node] 0.14s DETAIL: keep annox: F.etcbc4_sft_chapter [node] 0.14s DETAIL: keep annox: F.etcbc4_sft_label [node] 0.15s DETAIL: keep annox: F.etcbc4_sft_verse [node] 0.15s DETAIL: load main: F.etcbc4_ft_trailer_utf8 [node] 0.38s DETAIL: load annox: F.etcbc4_ft_trailer_utf8 [node] 0.39s DETAIL: prep prep: G.node_sort 0.46s DETAIL: prep prep: G.node_sort_inv 1.11s DETAIL: prep prep: L.node_up 7.22s DETAIL: prep prep: L.node_down 17s INFO: DATA LOADED FROM SOURCE etcbc4b AND ANNOX lexicon FOR TASK ntn AT 2015-05-28T10-20-03
For each result, we write out a line of information. Here is a description of the columns.
order
in what order the Predicate, Object, and Complement have been encounteredverb
the verb occurrence in vocalised Hebrewobject
the text of the complete (direct) object in Hebrew# loc lexemes
how many distinct lexemes with a locative meaning occur in the complement (given by a fixed list)# topo
how many lexemes with nametype = topo
occur in the complement (nametype is a feature of the lexicon)# prep_b
how many occurrences of the preposition B
occur in the complementlocativity
a crude measure of the locativity of the complement, just the sum of # loc lexemes
, # topo
, and # prep_b
# prep_l
how many occurrences of the preposition L
with a pronominal suffix on it occur in the complement# L prop
how many occurrences of L
plus proper name occur in the complementindirect object
a crude indicator of whether the complement is an indirect object, just the sum of # prep_l
and # L prop
complement text
the text of the complete complement as a sequence of transcribed, consonantal lexemesclause text
the text of the complete clauselocative_lexemes = {
'>RY/',
'BJT/',
'DRK/',
'HR/',
'JM/',
'JRDN/',
'JRWCLM/',
'JFR>L/',
'MDBR/',
'MW<D/',
'MZBX/',
'MYRJM/',
'MQWM/',
'SBJB/',
'<JR/',
'FDH/',
'CM',
'CMJM/',
'CMC/',
'C<R/',
}
no_prs = {'absent', 'n/a'}
statclass = {
'o': 'info',
'+': 'good',
'-': 'error',
'?': 'warning',
'!': 'special',
'*': 'note',
}
statsym = dict((x[1], x[0]) for x in statclass.items())
def cert_status(cert):
if cert == 0: return 'error'
elif cert == 1: return 'warning'
elif cert <= 10: return 'good'
else: return 'special'
tsvfile = outfile('ntn.csv')
notefile = outfile('ntn-note.csv')
nresults = 0
nclauses = 0
orders = collections.Counter()
certs = collections.Counter()
tsvfile.write('book\tchapter\tverse\torder\tverb\tobject\tloc\tloc\tloc\tloc\tind\tind\tind\tcomplement text\tca_num\tclause text\n')
tsvfile.write('book\tchapter\tverse\torder\tverb\tobject\t# loc lexemes\t# topo\t# prep_b\tlocativity\t# prep_l\t# L prop\tindirect object\tcomplement text\tca_num\tclause text\n')
pclass = collections.Counter()
pclass['LI'] = 0
notefile.write('{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\n'.format(
'version', 'book', 'chapter', 'verse', 'clause_atom', 'is_shared', 'is_published', 'status', 'keywords', 'ntext',
))
keywords = 'ntn-loca'
is_shared = 'T'
is_published = ''
status = statsym['info']
ntext_fmt = 'locative versus indirect object: L={} I={}; {}'
climit = 900
kws = ''.join(' {} '.format(k) for k in set(keywords.strip().split()))
for clause in F.otype.s('clause'):
nclauses += 1
phrases = {}
order = ''
verb = None
for phrase in L.d('phrase', clause):
pf = F.function.v(phrase)
if pf in {'Pred', 'Objc', 'Cmpl'}:
words = L.d('word', phrase)
if pf not in phrases:
order += pf[0]
phrases[pf] = words
else:
phrases[pf].extend(words)
is_ntn = False
for w in phrases.get('Pred', []):
if F.sp.v(w) == 'verb' and F.lex.v(w) == 'NTN[':
is_ntn = True
verb = w
break
if not is_ntn: continue
nresults += 1
orders[order] += 1
book = F.book.v(L.u('book', verb))
chapter = F.chapter.v(L.u('chapter', verb))
verse = F.verse.v(L.u('verse', verb))
clause_atom = F.number.v(L.u('clause_atom', verb))
verb_txt = F.g_word_utf8.v(verb)
obj_txt = ''.join(F.g_word_utf8.v(x)+F.trailer_utf8.v(x) for x in phrases.get('Objc', []))
cmpl_txt = ''.join(F.g_word_utf8.v(x)+F.trailer_utf8.v(x) for x in phrases.get('Cmpl', []))
if len(cmpl_txt) > climit:
cmpl_txt = cmpl_txt[0:climit]+'...'
clause_txt = ''.join(F.g_word_utf8.v(x)+F.trailer_utf8.v(x) for x in L.d('word', clause))
compl_wnodes = phrases.get('Cmpl', [])
compl_lexemes = [F.lex.v(w) for w in compl_wnodes]
compl_lset = set(compl_lexemes)
lex_locativity = len(locative_lexemes & compl_lset)
prep_b = len([x for x in compl_lexemes if x == 'B'])
prep_l = len([x for x in compl_wnodes if F.lex.v(x) == 'L' and F.prs.v(x) not in no_prs])
prep_lpr = 0
lwn = len(compl_wnodes)
for (n, wn) in enumerate(compl_wnodes):
if F.lex.v(wn) == 'L':
if n+1 < lwn:
if F.sp.v(compl_wnodes[n+1]) == 'nmpr':
prep_lpr += 1
topo = len([x for x in compl_wnodes if F.nametype.v(x) == 'topo'])
loca = lex_locativity + topo + prep_b
indi = prep_l + prep_lpr
this_class = ''
this_class += 'L' if loca else ''
this_class += 'I' if indi else ''
pclass[this_class] += 1
tsvfile.write('{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}'.format(
book,
chapter,
verse,
order,
verb_txt,
obj_txt,
lex_locativity,
topo,
prep_b,
loca,
prep_l,
prep_lpr,
indi,
' '.join(compl_lexemes),
clause_atom,
clause_txt,
).replace('\n', ' ')+'\n')
ntext = ntext_fmt.format(loca, indi, cmpl_txt)
certainty = abs(loca - indi) * max((loca, indi))
certs[certainty] += 1
status = statsym[cert_status(certainty)]
notefile.write('{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}'.format(
version, book, chapter, verse, clause_atom, is_shared, is_published, status, kws, ntext,
).replace('\n', ' ')+'\n')
tsvfile.close()
notefile.close()
for order in sorted(orders):
print("{:<5}: {:>3} results".format(order, orders[order]))
for cert in sorted(certs):
print("{:>5} = {:<8}: {:>3} results".format(cert, cert_status(cert), certs[cert]))
for this_class in pclass:
print("{:<2}: {:>3} results".format(this_class, pclass[this_class]))
print('Total: {:>3} results in {} clauses'.format(nresults, nclauses))
CP : 22 results CPO : 59 results OCP : 17 results OP : 32 results OPC : 139 results P : 60 results PC : 351 results PCO : 372 results PO : 200 results POC : 364 results 0 = error : 790 results 1 = warning : 736 results 2 = good : 1 results 4 = good : 57 results 9 = good : 9 results 16 = special : 5 results 25 = special : 1 results 36 = special : 2 results 49 = special : 1 results 156 = special : 6 results 169 = special : 5 results 196 = special : 2 results 676 = special : 1 results I : 497 results : 781 results L : 322 results LI: 16 results Total: 1616 results in 87900 clauses
!head -n 10 {my_file('ntn.csv')}
book chapter verse order verb object loc loc loc loc ind ind ind complement text ca_num clause text book chapter verse order verb object # loc lexemes # topo # prep_b locativity # prep_l # L prop indirect object complement text ca_num clause text Genesis 1 17 POC יִּתֵּ֥ן אֹתָ֛ם 1 0 1 2 0 0 0 B RQJ</ H CMJM/ 67 וַיִּתֵּ֥ן אֹתָ֛ם אֱלֹהִ֖ים בִּרְקִ֣יעַ הַשָּׁמָ֑יִם Genesis 1 29 PCO נָתַ֨תִּי אֶת־כָּל־עֵ֣שֶׂב ׀ וְאֶת־כָּל־הָעֵ֛ץ 0 0 0 0 1 0 1 L 121 הִנֵּה֩ נָתַ֨תִּי לָכֶ֜ם אֶת־כָּל־עֵ֣שֶׂב ׀ וְאֶת־כָּל־הָעֵ֛ץ Genesis 3 6 PC תִּתֵּ֧ן 0 0 0 0 0 0 0 GM L >JC/ 258 וַתִּתֵּ֧ן גַּם־לְאִישָׁ֛הּ עִמָּ֖הּ Genesis 3 12 PC נָתַ֣תָּה 0 0 0 0 0 0 0 <MD/ 285 אֲשֶׁ֣ר נָתַ֣תָּה עִמָּדִ֔י Genesis 3 12 PC נָֽתְנָה 0 0 0 0 0 0 0 MN H <Y/ 286 הִ֛וא נָֽתְנָה־לִּ֥י מִן־הָעֵ֖ץ Genesis 4 12 POC תֵּת כֹּחָ֖הּ 0 0 0 0 1 0 1 L 385 תֵּת־כֹּחָ֖הּ לָ֑ךְ Genesis 9 2 CP נִתָּֽנוּ 0 0 1 1 0 0 0 B JD/ 762 בְּיֶדְכֶ֥ם נִתָּֽנוּ׃
!head -n 10 {my_file('ntn-note.csv')}
version book chapter verse clause_atom is_shared is_published status keywords ntext 4b Genesis 1 17 67 T + ntn-loca locative versus indirect object: L=2 I=0; בִּרְקִ֣יעַ הַשָּׁמָ֑יִם 4b Genesis 1 29 121 T ? ntn-loca locative versus indirect object: L=0 I=1; לָכֶ֜ם 4b Genesis 3 6 258 T - ntn-loca locative versus indirect object: L=0 I=0; גַּם־לְאִישָׁ֛הּ 4b Genesis 3 12 285 T - ntn-loca locative versus indirect object: L=0 I=0; עִמָּדִ֔י 4b Genesis 3 12 286 T - ntn-loca locative versus indirect object: L=0 I=0; מִן־הָעֵ֖ץ 4b Genesis 4 12 385 T ? ntn-loca locative versus indirect object: L=0 I=1; לָ֑ךְ 4b Genesis 9 2 762 T ? ntn-loca locative versus indirect object: L=1 I=0; בְּיֶדְכֶ֥ם 4b Genesis 9 3 766 T ? ntn-loca locative versus indirect object: L=0 I=1; לָכֶ֖ם 4b Genesis 9 13 797 T ? ntn-loca locative versus indirect object: L=1 I=0; בֶּֽעָנָ֑ן
Download the result files from my SURFdrive: tab separated file and a formatted openoffice spreadsheet
We need per note: