Treepace - Tree Pattern Replace¶

Welcome to Treepace tutorial. First, we import the library:

In [1]:

from treepace import *

Data structures¶

Nodes¶

The basic unit of all trees is a node.

In [2]:

Node("label")

Out[2]:

In Treepace, any object (not only a string) can become a label of the node.

In [3]:

from glob import glob
from IPython.display import display
with open(glob('*.ipynb')[0], 'rb') as file_handle:
    display(Node(file_handle))

Trees¶

A node has children, which can have other children...

In [4]:

root = Node('root',
            [Node('c1'), Node('c2',
                              [Node('subchild')])])

A tree is defined by the reference to the root node.

In [5]:

Tree(root)

Out[5]:

It is possible to load and save a tree to various formats like tab-indented / parenthesized text or XML.

In [6]:

print(Tree.load('root (element1 (sub-element) element2)').save(IndentedText))

root
    element1
        sub-element
    element2

Subtrees¶

A subtree is a connected part of the tree consisting of the selected nodes of the main tree (highlighted with blue).

In [7]:

Subtree([root, root.children[1]])

Out[7]:

As we will see later, searching methods return Match objects. Each match consists of groups (subtrees), where the group 0 represents the whole match – just like in a regex. In this tutorial, it will be highlighted with green color.

In [8]:

c2 = root.children[1]
Match([Subtree([c2, c2.parent]),
       Subtree([c2])
      ])

Out[8]:

Searching¶

To search for a pattern anywhere in the tree, use the search() method. The result is a list of matches.

One node patterns¶

The most basic pattern is a dot which matches one arbitrary node.

In [9]:

tree = Tree.load('a (b c)')
tree.search('.')

Out[9]:

,

A text literal matches the nodes whose string representation is equal to the given literal.

In [10]:

tree.search('a')

Out[10]:

A pattern can contain arbitrary Python code, enclosed in square brackets. The expression is evaluated for each relevant node (accessible in the expression via the variable node) and matches if its result equals True.

In [11]:

tree.search('[node.value != "c"]')

Out[11]:

,

An underscore is a shortcut for node.value.

In [12]:

tree.search('[_.upper() == "C"]')

Out[12]:

Relations¶

Multiple node patterns can be connected using relations. In the following example, we search for a node 'a' which has a child 'b'. The whole subtree is returned – not only the final component.

In [13]:

tree.search('a < b')

Out[13]:

Other availabe relations are: immediately following sibling (,), any sibling (&) and parent (>).

In [14]:

tree.search('a < b, c')[0]

Out[14]:

In [15]:

tree.search('a < c & b')[0]

Out[15]:

The 'parent' relationship is implicitly followed by a 'match any node' pattern. This is useful to form queries like this:

In [16]:

Tree.load('a (b (c) d (e))').search('a < b <c>, d')[0]

Out[16]:

Groups¶

To mark a part of the match as a group, use brackets. The groups are numbered from 1 and can be nested.

In [17]:

tree.search('{a < {b}, {c}}')[0]

Out[17]:

It is possible to back-reference saved groups by $n.

In [18]:

Tree.load('m (n (o) m (n))').search('{m < n}, $1')[0]

Out[18]:

More complicated relationship between the nodes in a match can be expressed using back-references in a predicate.

In [19]:

nums = Tree(Node(1, [Node(-1), Node(0.5)]))
match = nums.search('{[_ != 2]} < [abs(_) == $1]')
match[0].group(0)

Out[19]:

Other searching methods¶

To assert that the match must begin exactly at the root node, use the match() method.

In [20]:

Tree.load('node (node (node))').match('node < node')

Out[20]:

If the match must cover all nodes of the tree, the fullmatch() method can be called. This is useful for validation.

In [21]:

fruits = Tree.load('fruits (apple pear apple)')
display(fruits)
if fruits.fullmatch('fruits < apple & pear'):
    print('The stock contains at least one apple and pear, but no other fruit.')
else:
    print('The condition is not met.')

The stock contains at least one apple and pear, but no other fruit.

Replacing¶

Basic replacing¶

The replace() method substitutes all matches of the pattern with the given replacement. Although it is not necessary, we will first search for the pattern (for illustration):

In [22]:

shop = Tree.load('shop (item (bread) item (water) item (roll) item (water))')
pattern = '{item} < water'
display(shop.search(pattern))

,

The actual replacement is simple:

In [23]:

shop.replace(pattern, '$1 < juice')
display(shop)

Transformation¶

The transformation consists of one or more rules in the form: pattern -> replacement. Each rule is repeated until a match is found. In addition, the whole list of rules is repeatead while at least one rule finds a match. To illustrate this behavior, the following transformation is performed:

In [24]:

subject = Tree.load('a (b)')
print('Original:')
display(subject)

subject.transform('''x -> y
                     a -> x''')
print('Transformed:')
display(subject)

Original:

Transformed:

A more useful transformation follows. Here is a sample XML document:

In [25]:

text = '''<article>
              <heading>An example</heading>
              <content>
                  <calc>
                      <plus>
                          <elem>3</elem>
                          <elem>4</elem>
                      </plus>
                  </calc>
              </content>
          </article>'''
doc = Tree.load(text, XmlText)
doc

Out[25]:

We will replace a semantic document representation with its visual HTML form and solve a mathematical expression.

In [26]:

doc.transform('''
article -> html < body
heading -> h1
content -> p
calc < plus < elem<{.}>, elem<{.}> -> [text(num($1) + num($2))]
''')
display(doc)
print(doc.save(XmlText))

<html>
    <body>
        <h1>An example</h1>
        <p>7</p>
    </body>
</html>

This concludes the tutorial. You can install the library by running

py -m pip install treepace

on Windows or

pip install treepace

on Linux.