#!/usr/bin/env python
# coding: utf-8
#
#
#
#
#
#
# # Monad Query Language
# The LAF resource which is ETCBC Hebrew Text Database is the result of converting an EMDROS database into LAF.
# [EMDROS](http://emdros.org) is a text database system written by Ulrik Sandborg-Petersen based on the PhD. thesis of Crist-Jan Doedens: [Text Databases. One Database Model and Several Retrieval Languages](http://books.google.nl/books?id=9ggOBRz1dO4C&dq=editions%3AISBN9051837291&source=gbs_book_other_versions).
#
# The query language of this system, MQL, is a so-called *topographic* query language, meaning that the query instruction is at the same time a template for the query results. More formally, there is a correspondence between the structure of the query instruction and the structure of the query results, and this correspondence holds for the sequential order and the embedding order.
#
# Put otherwise, MQL is a very convenient language to query the data for tree fragments.
#
# A specification of MQL can be found at the [Emdros docs page](http://emdros.org/docs.html).
#
# In order to run this notebook, you need to have the [EMDROS software](http://sourceforge.net/projects/emdros/files/) installed. It is open source and there are binaries for Windows and Mac. The ETCBC database file is included in the laf-fabric-data working directory, that you can download from [DANS](http://www.persistent-identifier.nl/?identifier=urn%3Anbn%3Anl%3Aui%3A13-048i-71).
# # LAF Fabric and MQL
# This notebook shows how you can integrate MQL with LAF-Fabric. This is what you can do:
#
# * write an MQL query in a code cell as a python string
# * fire that query to the EMDROS database containing the ETCBC data
# * getting the results back in the form of sets of nodes of LAF-Fabric.
#
# Because the LAF data has been migrated from the EMDROS data, we have a mapping from EMDROS object identifiers to LAF nodes.
# We apply this mapping to the query results in order to translate them to nodesets.
# # Sheafs, straws, and grains
# An MQL query has a form like this:
#
# select all objects
# in {1-40}
# where
# [phrase
# [word g_cons = 'H']
# [word]
# ]
# ..
# [phrase
# [word]
# [word]
# ]
#
# After the ``where`` there is a sequence of objects, which in turn may contain objects.
# The query result after firing this query is a so-called *sheaf*. It is a list of results or *straws*, where each straw looks like the sequence of objects after the ``where``. These objects are the *matched objects* or *grains*.
# And here is the catch: a grain may contain a sheaf itself, because the objects inside objects also may have multiple subresults in the data.
#
# In other words: a sheaf is a recursive structure: it is a list of straws, which are lists of grains, which are monads (words) or objects containing a sheaf.
#
# The sheaf is a very economic representation of the set of tree fragments that are the result of an MQL query.
#
# Yet for some purposes it is necessary to have a list of ordinary results. We provide a method to generate results from a sheaf.
# What this method does can be thought of as making copies of the sheaf, and wherever there is a sheaf (a list of straws), it replaces the sheaf by choosing a single straw. The results correspond to all possible ways of making those choices.
#
# In other words: a ``result`` is a recursive structure: it is list of grains, which are monads (words) or objects containing a result.
#
# Put otherwise: a ``result`` is a simplified sheaf, without the aggregating level of sheaf, leaving only straws and grains.
# # MQL API
# Inside the *etcbc* package there is a module *mql*.
# This module exposes two classes: ``MQL`` and ``Sheaf``.
#
# from etcbc.mql import MQL
#
# You initialize the MQL object after loading LAF-Fabric by passing the API as a parameter:
#
# Q = MQL(API)
#
# If you have a query, e.g. the example above as a string in a variable ``query``, you can say:
#
# sheaf = Q.mql(query)
#
# Then you have the results of the query in ``sheaf``. It is a list of lists of tuples (corresponding to *sheaf*, *straw*, *grain*),
# where a grain is either an integer, which is the node corresponding to a monad (word object), or it is a tuple ``(node, subsheaf)``, where ``node`` corresponds to an object of an other type, containing a sheaf of subobjects.
#
# ``sheaf`` is an object of class ``Sheaf``, and there are the following methods in this class:
#
# * ``render(callable)``: prints out the sheaf in a pretty format, each word is rendered by applying ``callable`` to its node.
# * ``compact(callable)``: returns as string a compact representation of the sheaf, ``callable`` has the same meaning as above.
# * ``results()``: generates (as a generator) the list of results that is represented by the sheaf.
# * ``compact_results(callable)``: returns the compact representations of the results of the sheaf.
# In[1]:
import sys
import collections
import subprocess
from lxml import etree
import laf
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
from etcbc.mql import MQL
fabric = LafFabric()
# In[2]:
API = fabric.load('etcbc4', '--', 'mql', {
"xmlids": {"node": False, "edge": False},
"features": ('''
oid otype monads
g_word_utf8 g_cons lex function
book chapter verse label
''','''
functional_parent
'''),
"prepare": prepare,
}, verbose='DETAIL')
exec(fabric.localnames.format(var='fabric'))
Q = MQL(API)
# In[8]:
qu1 = '''
select all objects where
[subphrase
[word lex="M