Notebook

TPC-H Query 5 – Tables¶

This IPython notebook is part of a series of tutorials that introduce how data algebra facilitates querying data from multiple sources and in different structures, on the example of a modified TPC-H query.

The tutorials assume basic familiarity with our library; we suggest working through our Hello_World tutorial first. They also assume basic knowledge of relational data, RDF and XML.

In some cases, later parts of the tutorial assume knowledge of concepts introduced in earlier parts, so it is best to work through them in the listed sequence:

1-Introduction: Introduces this series of tutorials, TPC-H, the TPC-H query 5 and our modifications to it.
2-Tables (this tutorial): Introduces our representation of tabular data, on the example of CSV.
3-Graphs: Introduces our representation of RDF-style tabular graph data, on the example of Turtle.
4-Hierarchies: Introduces our representation of hierarchical data, on the example of XML.
5-Query: Brings it all together and explains the whole query.

Tabular Data In the Query¶

In our modified query 5, the tables customer, orders and lineitem are provided as CSV data, the original tables region and nation are provided as XML data and the original table supplier is provided as RDF graph. In this tutorial, we present our representation of tabular data, how to import it from the CSV format (see also RFC 4180) and how to execute simple operations on it like project and restrict.

As examples, we use two subqueries from the modified query 5, one slightly modified. We treat them here as stand-alone queries, so that it is easier to understand what they do.

The `customers_joined_projected` Subquery¶

Our first example query loads the table customer and selects the columns custkey and nationkey for all customers in the nations given by the table nations. In the full query, the result of this subquery is then referenced as customers_joined_projected.

(
    SELECT 
        custkey, nationkey, nationname
    FROM 
        customer
    JOIN
        nations
    ON 
        customer.nationkey = nations.nationkey
) AS customers_joined_projected

This example is a simplified version of the following subquery in our modified query 5. To keep this example simple, we replaced the pseudo-subquery (in XQuery) that provides the result that is named nations with a hardcoded table nations. The translation of the XQuery subquery is shown later, in 4-Hierarchies.

(
    SELECT 
        custkey, nationkey, nationname
    FROM 
        customer
    JOIN (
        -- This is a pseudo-subquery in XQuery. It extracts a list of nations
        -- for the desired region, and for every nation it returns an XML 'row' 
        -- with the columns 'nationkey' and 'nationname' (renamed from 'name').
        -- This forms an XML pseudo-'table' named 'nations' that is then joined 
        -- with 'customer'.
        for $x in doc("regions.xml")/regions/region[name="MIDDLE EAST"]/nation
            return <nation>{$x/nationkey}<nationname>{data($x/name)}</nationname></nation>
    ) AS nations
    ON
        customer.nationkey = nations.nationkey
) AS customers_joined_projected

The `orders_restricted_projected` Subquery¶

Our second example query loads the table orders and selects the columns orderkey and custkey for all orders in the given date range. In the full query, the result of this subquery is then referenced as orders_restricted_projected.

(
    SELECT 
        orderkey, custkey
    FROM 
        orders
    WHERE 
        startdate <= orders.orderdate and orders.orderdate < enddate 
) AS orders_restricted_projected

This example is taken directly from the corresponding subquery in our modified query 5.

Representation of Tables¶

We represent tables, rows and columns as constructs built of sets and couplets.

Mathematical concepts:

A couplet is an ordered pair $a{\mapsto}b \in A \times A$ with $a, b \in A$ (using the Kuratowski definition $a{\mapsto}b := \{\{a\}, \{a, b\}\}$). We call $a$ the left part of the couplet, and $b$ the right part.
A relation is a set of couplets. If the relation is (left-)functional, it is a function from the left parts of the couplets to the right parts.
A clan is a set of relations.

Data model:

A table can be modeled as a clan. In detail:

A cell in a table is represented by a couplet, where the left part is the column (column name), and the right part is the cell value.
- In our data algebra notation, this could look like $custkey{\mapsto}15$.
A row is a relation (a set of couplets). Because every row has only a single value in a given column and no two column names are the same (by definition), row relations are always left-functional and are a function from column names to the cell values.
- In our data algebra notation, this could look like $\{custkey{\mapsto}15, phone{\mapsto}\text{33-687-542-7601}\}$.
A table is a set of rows/functions.
- In our data algebra notation, this could look like $\{\{custkey{\mapsto}1, phone{\mapsto}\text{25-989-741-2988}\}, \{custkey{\mapsto}15, phone{\mapsto}\text{33-687-542-7601}\}\}$.

All three constructs (couplets/cells, relations/rows, clans/tables) are associated with a number of operations. See also our Hello_World introduction.

The `customer` Table¶

In the first example query, we import the CSV table customer, select the customers in a given set of nations nationkeys and project the useful columns custkey, nationkey and nationname. The associated SQL query looks like this:

SELECT 
    custkey, nationkey, nationname
FROM 
    customer
JOIN
    nations
ON 
    customer.nationkey = nations.nationkey

Below follows example code that shows how to use our API to import the table customer from a CSV file into our MathObject representation.

algebraixlib.io.csv provides utilities for importing and exporting CSV data.
iprint_latex is a utility that prints our MathObjects in LaTeX format in IPython notebooks. The short=True argument tells it to create abbreviated output. It shortenes long strings, indicated by the appended ellipses '...'. It also only prints the first few elements of sets and replaces the remainder with an ellipsis, followed by a number in parentheses that indicates how many objects have been ommitted.
customer_types defines the types for columns that are not strings. (It does not select columns to be imported; all columns not listed here are imported as strings.) It is passed as argument to import_csv, which then processes the indicated CSV columns accordingly.

In [1]:

import algebraixlib.io.csv as csv
from algebraixlib.util.latexprinter import iprint_latex

customer_types = {'custkey': int, 'nationkey': int, 'acctbal': float}
customers = csv.import_csv('customer.csv', customer_types)
iprint_latex('customers', short=True)

$$customers = \left\{\begin{array}{l}\left\{\color{red}{\mbox{'acctbal'}}{\mapsto}{\color{blue}{\mbox{-234.12}}},\ \color{red}{\mbox{'address'}}{\mapsto}{\color{blue}{\mbox{',wSZXdVR ...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'x-ray fin...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{125}}},\ \text{...}(4)\right\},\\ \left\{\color{red}{\mbox{'acctbal'}}{\mapsto}{\color{blue}{\mbox{-272.6}}},\ \color{red}{\mbox{'address'}}{\mapsto}{\color{blue}{\mbox{'PkWS 3HlX...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'ckages. r...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{11}}},\ \text{...}(4)\right\},\\ \left\{\color{red}{\mbox{'acctbal'}}{\mapsto}{\color{blue}{\mbox{-362.86}}},\ \color{red}{\mbox{'address'}}{\mapsto}{\color{blue}{\mbox{'putjlmskx...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'ithely fi...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{72}}},\ \text{...}(4)\right\},\\ \left\{\color{red}{\mbox{'acctbal'}}{\mapsto}{\color{blue}{\mbox{-551.37}}},\ \color{red}{\mbox{'address'}}{\mapsto}{\color{blue}{\mbox{'7yiheXNSp...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'ages. fur...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{98}}},\ \text{...}(4)\right\},\\ \text{...}(146)\end{array}\right\}$$

Next we create a mockup of the result of the pseudo-subquery in XQuery (which is explained in 4-Hierarchies). nations is a clan (table) with the lefts (columns) nationkey and nationname and for our mockup we add three rows/nations.

A clan is a Set of Sets of Couplets (a set of relations).

algebraixlib.mathobjects provides the classes that represent our data.

In [2]:

from algebraixlib.mathobjects import Set, Couplet

nations = Set(Set(Couplet('nationkey', 4), Couplet('nationname', 'EGYPT')),
              Set(Couplet('nationkey', 10), Couplet('nationname', 'IRAN')),
              Set(Couplet('nationkey', 11), Couplet('nationname', 'IRAQ')))
iprint_latex('nations')

$$nations = \left\{\begin{array}{l}\left\{\color{red}{\mbox{'nationkey'}}{\mapsto}{\color{blue}{\mbox{10}}},\ \color{red}{\mbox{'nationname'}}{\mapsto}{\color{blue}{\mbox{'IRAN'}}}\right\},\\ \left\{\color{red}{\mbox{'nationkey'}}{\mapsto}{\color{blue}{\mbox{11}}},\ \color{red}{\mbox{'nationname'}}{\mapsto}{\color{blue}{\mbox{'IRAQ'}}}\right\},\\ \left\{\color{red}{\mbox{'nationkey'}}{\mapsto}{\color{blue}{\mbox{4}}},\ \color{red}{\mbox{'nationname'}}{\mapsto}{\color{blue}{\mbox{'EGYPT'}}}\right\}\end{array}\right\}$$

We now join the two tables. They have only one column with matching names, which is the column on which they are to be joined. (While this is a lucky coincidence, the principle of the operation wouldn't be affected if we had to rename a column or two.)

Mathematically, we represent such a join (where the column names align just right) as functional union:

$$ \begin{align*} \text{Functional union}:\ &R \underset{f}{\cup} Q &&:=\begin{cases} R \cup Q & \text{ if } R \cup Q \text{ is functional } \\ \text{undefined} & \text{ otherwise } \end{cases} & \text{ for } R, Q \in P(M \times M)\\ \text{Functional cross-union}:\ &\mathbb{C} \underset{f}{\blacktriangledown} \mathbb{D} &&:= \{R \underset{f}{\cup} Q\ : R \in \mathbb{C} \text{ and } Q \in \mathbb{D}\} & \text{ for } \mathbb{C}, \mathbb{D} \in P^2(M \times M) \end{align*} $$

With this we get:

$$ CustomersNations = Customer \underset{f}{\blacktriangledown} Nations $$

Since only a part of the relations in the customers clan have a couplet with a left part of nationkey and a right that matches a right in the nations clan`s nationkey lefts, the number of relations goes down from 150 (in the customers clan) to 19 (in the result of the join).

algebraixlib.algebras.clans contains the functions that are related to our algebra of clans.
clans.cross_functional_union executes the functional cross-union ($\underset{f}{\blacktriangledown}$).

In [3]:

import algebraixlib.algebras.clans as clans

customers_nations = clans.cross_functional_union(customers, nations)
iprint_latex('customers_nations', short=True)

$$customers\_nations = \left\{\begin{array}{l}\left\{\color{red}{\mbox{'acctbal'}}{\mapsto}{\color{blue}{\mbox{-588.38}}},\ \color{red}{\mbox{'address'}}{\mapsto}{\color{blue}{\mbox{'9mcCK L7r...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'rate care...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{104}}},\ \text{...}(5)\right\},\\ \left\{\color{red}{\mbox{'acctbal'}}{\mapsto}{\color{blue}{\mbox{-986.96}}},\ \color{red}{\mbox{'address'}}{\mapsto}{\color{blue}{\mbox{'AmKUMlJf2...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'ing packa...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{128}}},\ \text{...}(5)\right\},\\ \left\{\color{red}{\mbox{'acctbal'}}{\mapsto}{\color{blue}{\mbox{162.57}}},\ \color{red}{\mbox{'address'}}{\mapsto}{\color{blue}{\mbox{'QM5YabAsT...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'uickly ca...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{132}}},\ \text{...}(5)\right\},\\ \left\{\color{red}{\mbox{'acctbal'}}{\mapsto}{\color{blue}{\mbox{2135.6}}},\ \color{red}{\mbox{'address'}}{\mapsto}{\color{blue}{\mbox{'BhSPlEWGv...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'ing to th...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{148}}},\ \text{...}(5)\right\},\\ \text{...}(15)\end{array}\right\}$$

We then compose the result to project the desired columns.

When composing a clan with a clan diagonal – a clan with a single relation that contains only couplets where left and right are the same – we remove all couplets with left parts not present in the clan diagonal from all relations, effectively projecting the columns present in the clan diagonal.

$$ CustomersNations_{Proj} = CustomersNations \circ \{\{custkey{\mapsto}custkey, nationkey{\mapsto}nationkey, nationname{\mapsto}nationname\}\} $$

Composition on couplets, relations and clans is defined as follows:

$$ \begin{align*} \text{Couplet composition}:\ &c{\mapsto}d \circ a{\mapsto}b &&:= \begin{cases} a{\mapsto}d & \text{if } b = c \\ \text{undefined} & \text{otherwise} \end{cases} &\text{for } a, b, c, d \in M \\ \text{Relation composition}:\ &Rel_2 \circ Rel_1 &&:= \{cp_2 \circ cp_1\ :\ cp_1 \in Rel_1,\ cp_2 \in Rel_2\} &\text{for } Rel_1,\ Rel_2 \in P(M \times M) \\ \text{Clan composition}:\ &Clan_2 \circ Clan_1 &&:= \{Rel_2 \circ Rel_1\ :\ Rel_1 \in Clan_1,\ Rel_2 \in Clan_2\} &\text{ for } Clan_1,\ Clan_2 \in P^2(M \times M) \\ \end{align*} $$

The number of relations in the clan (19) doesn't change, but the number of couplets in each relation is reduced to the ones that have the desired column names as lefts.

clans.compose executes the clan composition ($\circ$).

In [4]:

customers_nations_composed = clans.compose(customers_nations, 
    Set(Set(Couplet('custkey', 'custkey'), 
            Couplet('nationkey', 'nationkey'), 
            Couplet('nationname', 'nationname'))))
iprint_latex('customers_nations_composed', short=True)

$$customers\_nations\_composed = \left\{\begin{array}{l}\left\{\color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{104}}},\ \color{red}{\mbox{'nationkey'}}{\mapsto}{\color{blue}{\mbox{10}}},\ \color{red}{\mbox{'nationname'}}{\mapsto}{\color{blue}{\mbox{'IRAN'}}}\right\},\\ \left\{\color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{105}}},\ \color{red}{\mbox{'nationkey'}}{\mapsto}{\color{blue}{\mbox{10}}},\ \color{red}{\mbox{'nationname'}}{\mapsto}{\color{blue}{\mbox{'IRAN'}}}\right\},\\ \left\{\color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{110}}},\ \color{red}{\mbox{'nationkey'}}{\mapsto}{\color{blue}{\mbox{10}}},\ \color{red}{\mbox{'nationname'}}{\mapsto}{\color{blue}{\mbox{'IRAN'}}}\right\},\\ \left\{\color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{128}}},\ \color{red}{\mbox{'nationkey'}}{\mapsto}{\color{blue}{\mbox{4}}},\ \color{red}{\mbox{'nationname'}}{\mapsto}{\color{blue}{\mbox{'EGYPT'}}}\right\},\\ \text{...}(15)\end{array}\right\}$$

This operation (a composition of a clan with a clan diagonal) is so common that we have a wrapper function for it in the clans algebra:

clans.project accepts a clan as argument, followed by one or more atoms. It creates a clan diagonal from these atoms and composes the clan with it, effectively projecting the columns passed in as arguments.

In [5]:

customers_nations_projected = clans.project(customers_nations, 'custkey', 'nationkey', 'nationname')
iprint_latex('customers_nations_projected', short=True)

$$customers\_nations\_projected = \left\{\begin{array}{l}\left\{\color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{104}}},\ \color{red}{\mbox{'nationkey'}}{\mapsto}{\color{blue}{\mbox{10}}},\ \color{red}{\mbox{'nationname'}}{\mapsto}{\color{blue}{\mbox{'IRAN'}}}\right\},\\ \left\{\color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{105}}},\ \color{red}{\mbox{'nationkey'}}{\mapsto}{\color{blue}{\mbox{10}}},\ \color{red}{\mbox{'nationname'}}{\mapsto}{\color{blue}{\mbox{'IRAN'}}}\right\},\\ \left\{\color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{110}}},\ \color{red}{\mbox{'nationkey'}}{\mapsto}{\color{blue}{\mbox{10}}},\ \color{red}{\mbox{'nationname'}}{\mapsto}{\color{blue}{\mbox{'IRAN'}}}\right\},\\ \left\{\color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{128}}},\ \color{red}{\mbox{'nationkey'}}{\mapsto}{\color{blue}{\mbox{4}}},\ \color{red}{\mbox{'nationname'}}{\mapsto}{\color{blue}{\mbox{'EGYPT'}}}\right\},\\ \text{...}(15)\end{array}\right\}$$

The `orders` Table¶

In the second example query of this notebook, we import the CSV table orders, select the orders in a given date range (from startdate to enddate) and project the useful columns orderkey and custkey. The associated SQL query looks like this:

SELECT 
    orderkey, custkey
FROM 
    orders
WHERE 
    startdate <= orders.orderdate and orders.orderdate < enddate

First, we again import the table orders from a CSV file.

order_types again defines the types for columns that need special handling. All columns not listed here are imported as strings.
read_date shows how custom functions can be used instead of standard types to perform special conversions, on the example of the column orderdate. For each row, the function receives the CSV column value as string and returns what is needed (a datetime in this case).

In [6]:

print('The display of this data may take a while. 15 s is not unusual... \n', flush=True)

from datetime import datetime

def read_date(date_str: str) -> datetime:
    return datetime.strptime(date_str, '%Y-%m-%d').date()

order_types = {
    'orderkey': int, 'custkey': int, 'orderdate': read_date,
    'totalprice': float, 'shippriority': int
}
orders = csv.import_csv('orders.csv', order_types)
iprint_latex('orders', short=True)

The display of this data may take a while. 15 s is not unusual...

$$orders = \left\{\begin{array}{l}\left\{\color{red}{\mbox{'clerk'}}{\mapsto}{\color{blue}{\mbox{'Clerk#000...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'es haggle...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{32}}},\ \color{red}{\mbox{'orderdate'}}{\mapsto}{\color{blue}{\mbox{datetime.d...}}},\ \text{...}(5)\right\},\\ \left\{\color{red}{\mbox{'clerk'}}{\mapsto}{\color{blue}{\mbox{'Clerk#000...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'n deposit...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{28}}},\ \color{red}{\mbox{'orderdate'}}{\mapsto}{\color{blue}{\mbox{datetime.d...}}},\ \text{...}(5)\right\},\\ \left\{\color{red}{\mbox{'clerk'}}{\mapsto}{\color{blue}{\mbox{'Clerk#000...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{' accounts...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{131}}},\ \color{red}{\mbox{'orderdate'}}{\mapsto}{\color{blue}{\mbox{datetime.d...}}},\ \text{...}(5)\right\},\\ \left\{\color{red}{\mbox{'clerk'}}{\mapsto}{\color{blue}{\mbox{'Clerk#000...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'ealthy pi...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{73}}},\ \color{red}{\mbox{'orderdate'}}{\mapsto}{\color{blue}{\mbox{datetime.d...}}},\ \text{...}(5)\right\},\\ \text{...}(1496)\end{array}\right\}$$

Then we restrict the clan orders using the function select_dates. It returns True if the right part associated with the left orderdate in the relation rel falls between startdate (inclusive) and enddate (exclusive).

As used here, our function sets.restrict implements a set builder of this form:

$$ Orders_{Restricted} = \{order\ :\ order \in Orders \text{ and } SelectDates(order) \text{ is true}\} \\ $$

The restriction reduces the number of relations in the clan from TODO to 16 (for the given date range).

In [7]:

import algebraixlib.algebras.sets as sets

startdate = datetime(1994, 1, 1).date()
enddate = datetime(1994, 2, 1).date()

def select_dates(rel) -> bool:
    orderdate = rel('orderdate').value
    return (startdate <= orderdate) and (orderdate < enddate)

orders_restricted = sets.restrict(orders, select_dates)
iprint_latex('orders_restricted', short=True)

$$orders\_restricted = \left\{\begin{array}{l}\left\{\color{red}{\mbox{'clerk'}}{\mapsto}{\color{blue}{\mbox{'Clerk#000...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'al ideas ...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{149}}},\ \color{red}{\mbox{'orderdate'}}{\mapsto}{\color{blue}{\mbox{datetime.d...}}},\ \text{...}(5)\right\},\\ \left\{\color{red}{\mbox{'clerk'}}{\mapsto}{\color{blue}{\mbox{'Clerk#000...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'ts are. f...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{70}}},\ \color{red}{\mbox{'orderdate'}}{\mapsto}{\color{blue}{\mbox{datetime.d...}}},\ \text{...}(5)\right\},\\ \left\{\color{red}{\mbox{'clerk'}}{\mapsto}{\color{blue}{\mbox{'Clerk#000...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'slyly. pe...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{136}}},\ \color{red}{\mbox{'orderdate'}}{\mapsto}{\color{blue}{\mbox{datetime.d...}}},\ \text{...}(5)\right\},\\ \left\{\color{red}{\mbox{'clerk'}}{\mapsto}{\color{blue}{\mbox{'Clerk#000...}}},\ \color{red}{\mbox{'comment'}}{\mapsto}{\color{blue}{\mbox{'ng accoun...}}},\ \color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{4}}},\ \color{red}{\mbox{'orderdate'}}{\mapsto}{\color{blue}{\mbox{datetime.d...}}},\ \text{...}(5)\right\},\\ \text{...}(12)\end{array}\right\}$$

Finally we again project the desired columns orderkey and custkey, using the function clans.project:

$$ Orders_{Restricted,Projected} = Orders_{Restricted} \circ \{\{orderkey{\mapsto}orderkey, custkey{\mapsto}custkey\}\}) $$

This again doesn't change the number of relations in the clan (16), but the number of couplets in each relation is reduced to the ones that have the desired column name as left.

In [8]:

orders_restricted_projected = clans.project(orders_restricted, 'orderkey', 'custkey')
iprint_latex('orders_restricted_projected', short=True)

$$orders\_restricted\_projected = \left\{\begin{array}{l}\left\{\color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{100}}},\ \color{red}{\mbox{'orderkey'}}{\mapsto}{\color{blue}{\mbox{5316}}}\right\},\\ \left\{\color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{104}}},\ \color{red}{\mbox{'orderkey'}}{\mapsto}{\color{blue}{\mbox{4323}}}\right\},\\ \left\{\color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{109}}},\ \color{red}{\mbox{'orderkey'}}{\mapsto}{\color{blue}{\mbox{5638}}}\right\},\\ \left\{\color{red}{\mbox{'custkey'}}{\mapsto}{\color{blue}{\mbox{118}}},\ \color{red}{\mbox{'orderkey'}}{\mapsto}{\color{blue}{\mbox{290}}}\right\},\\ \text{...}(12)\end{array}\right\}$$

Next Step¶

Continue with 3-Graphs; it introduces our representation of RDF-style tabular graph data, on the example of Turtle.

This file is part of algebraixlib .

algebraixlib is free software: you can redistribute it and/or modify it under the terms of version 3 of the GNU Lesser General Public License as published by the Free Software Foundation.

algebraixlib is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with algebraixlib. If not, see GNU licenses.

TPC-H Query 5 – Tables¶

Tabular Data In the Query¶

The customers_joined_projected Subquery¶

The orders_restricted_projected Subquery¶