#!/usr/bin/env python # coding: utf-8 # # # # # # # # Monad Query Language # The LAF resource which is ETCBC Hebrew Text Database is the result of converting an EMDROS database into LAF. # [EMDROS](http://emdros.org) is a text database system written by Ulrik Sandborg-Petersen based on the PhD. thesis of Crist-Jan Doedens: [Text Databases. One Database Model and Several Retrieval Languages](http://books.google.nl/books?id=9ggOBRz1dO4C&dq=editions%3AISBN9051837291&source=gbs_book_other_versions). # # The query language of this system, MQL, is a so-called *topographic* query language, meaning that the query instruction is at the same time a template for the query results. More formally, there is a correspondence between the structure of the query instruction and the structure of the query results, and this correspondence holds for the sequential order and the embedding order. # # Put otherwise, MQL is a very convenient language to query the data for tree fragments. # # A specification of MQL can be found at the [Emdros docs page](http://emdros.org/docs.html). # # In order to run this notebook, you need to have the [EMDROS software](http://sourceforge.net/projects/emdros/files/) installed. It is open source and there are binaries for Windows and Mac. The ETCBC database file is included in the laf-fabric-data working directory, that you can download from [DANS](http://www.persistent-identifier.nl/?identifier=urn%3Anbn%3Anl%3Aui%3A13-048i-71). # # LAF Fabric and MQL # This notebook shows how you can integrate MQL with LAF-Fabric. This is what you can do: # # * write an MQL query in a code cell as a python string # * fire that query to the EMDROS database containing the ETCBC data # * getting the results back in the form of sets of nodes of LAF-Fabric. # # Because the LAF data has been migrated from the EMDROS data, we have a mapping from EMDROS object identifiers to LAF nodes. # We apply this mapping to the query results in order to translate them to nodesets. # # Sheafs, straws, and grains # An MQL query has a form like this: # # select all objects # in {1-40} # where # [phrase # [word g_cons = 'H'] # [word] # ] # .. # [phrase # [word] # [word] # ] # # After the ``where`` there is a sequence of objects, which in turn may contain objects. # The query result after firing this query is a so-called *sheaf*. It is a list of results or *straws*, where each straw looks like the sequence of objects after the ``where``. These objects are the *matched objects* or *grains*. # And here is the catch: a grain may contain a sheaf itself, because the objects inside objects also may have multiple subresults in the data. # # In other words: a sheaf is a recursive structure: it is a list of straws, which are lists of grains, which are monads (words) or objects containing a sheaf. # # The sheaf is a very economic representation of the set of tree fragments that are the result of an MQL query. # # Yet for some purposes it is necessary to have a list of ordinary results. We provide a method to generate results from a sheaf. # What this method does can be thought of as making copies of the sheaf, and wherever there is a sheaf (a list of straws), it replaces the sheaf by choosing a single straw. The results correspond to all possible ways of making those choices. # # In other words: a ``result`` is a recursive structure: it is list of grains, which are monads (words) or objects containing a result. # # Put otherwise: a ``result`` is a simplified sheaf, without the aggregating level of sheaf, leaving only straws and grains. # # MQL API # Inside the *etcbc* package there is a module *mql*. # This module exposes two classes: ``MQL`` and ``Sheaf``. # # from etcbc.mql import MQL # # You initialize the MQL object after loading LAF-Fabric by passing the API as a parameter: # # Q = MQL(API) # # If you have a query, e.g. the example above as a string in a variable ``query``, you can say: # # sheaf = Q.mql(query) # # Then you have the results of the query in ``sheaf``. It is a list of lists of tuples (corresponding to *sheaf*, *straw*, *grain*), # where a grain is either an integer, which is the node corresponding to a monad (word object), or it is a tuple ``(node, subsheaf)``, where ``node`` corresponds to an object of an other type, containing a sheaf of subobjects. # # ``sheaf`` is an object of class ``Sheaf``, and there are the following methods in this class: # # * ``render(callable)``: prints out the sheaf in a pretty format, each word is rendered by applying ``callable`` to its node. # * ``compact(callable)``: returns as string a compact representation of the sheaf, ``callable`` has the same meaning as above. # * ``results()``: generates (as a generator) the list of results that is represented by the sheaf. # * ``compact_results(callable)``: returns the compact representations of the results of the sheaf. # In[1]: import sys import collections import subprocess from lxml import etree import laf from laf.fabric import LafFabric from etcbc.preprocess import prepare from etcbc.mql import MQL fabric = LafFabric() # In[2]: API = fabric.load('etcbc4', '--', 'mql', { "xmlids": {"node": False, "edge": False}, "features": (''' oid otype monads g_word_utf8 g_cons lex function book chapter verse label ''',''' functional_parent '''), "prepare": prepare, }, verbose='DETAIL') exec(fabric.localnames.format(var='fabric')) Q = MQL(API) # In[8]: qu1 = ''' select all objects where [subphrase [word lex="M