The following tutorial gives a quick overview of both data algebra and the algebraixlib Python package.
Data in an algebraixlib program is represented as MathObject
s. MathObject
s come in four types: Atom
, Couplet
, Set
, and Multiset
. Multiset
s will not be covered in this tutorial. Values that aren't themselves modeled by Data Algebra, such as strings and numbers, are represented by Atom
s.
from algebraixlib.mathobjects import Atom
peanut_butter = Atom("peanut butter")
jelly = Atom("jelly")
Every MathObject
can be pretty-printed to the console using print()
.
print(peanut_butter)
'peanut butter'
The non-MathObject
value of the Atom
can be accessed by its value
property.
try:
one = Atom(1)
two = Atom(2)
print("1 + 2 = {}".format(one.value + two.value))
print("Will throw", one + two)
except TypeError as e:
print("Error:", e)
1 + 2 = 3 Error: unsupported operand type(s) for +: 'Atom' and 'Atom'
Couplet
s relate two pieces of information together. Those pieces of information must be represented as MathObject
s; our two Atom
s from earlier qualify.
from algebraixlib.mathobjects import Couplet
from algebraixlib.util.latexprinter import iprint_latex
from IPython.display import Math, display
import algebraixlib.util.latexprinter
algebraixlib.util.latexprinter.Config.colorize_output = False
together = Couplet(peanut_butter, jelly)
iprint_latex("together")
iprint_latex("nested", Couplet(together, together))
MathObject
initializers will coerce their arguments to be Atom
s if non-MathObject
s are passed.
coerced = Couplet("this", "that")
print(repr(coerced))
Couplet(left=Atom('this'), right=Atom('that'))
The components of a Couplet
are known as its left
and right
. Sometimes initializng a Couplet
with named arguments can add clarity.
up_down = Couplet(left="up", right="down")
print("left is {}, right is {}".format(up_down.left, up_down.right))
left is 'up', right is 'down'
A Couplet
's components can be swapped by evaluating the unary operation $transpose$.
import algebraixlib.algebras.couplets as couplets
one_two = Couplet(1, 2)
transpose_result = couplets.transpose(one_two)
print("A couplet {} and its transpose {}".format(one_two, transpose_result))
iprint_latex("one_two")
iprint_latex("transpose_result")
A couplet (1->2) and its transpose (2->1)
When an expression is undefined in algebraixlib, it returns a special value, the singleton Undef
. Undef
cannot be used as a value in a MathObject
and cannot be compared to any value (even itself). Use the is
and is not
operators to test if a value is undefined.
from algebraixlib.undef import Undef
print(Undef() is Undef())
print(Undef() is not Undef())
print(None is not Undef())
True False True
The binary operation $composition(a{\mapsto}b, c{\mapsto}d)$ evaluates to $c{\mapsto}b$ when $a = d$, otherwise it is undefined. Composition is often written with the infix operator $\circ$.
a_to_b = Couplet('a', 'b') # a->b
b_to_c = Couplet('b', 'c') # b->c
iprint_latex("b{\mapsto}c \circ a{\mapsto}b", couplets.compose(b_to_c, a_to_b)) # b->c * a->b = a->c
iprint_latex("a{\mapsto}b \circ b{\mapsto}c", couplets.compose(a_to_b, b_to_c)) # undef, composition is not commutative
Set
s are used to create unordered collections of unique MathObject
s. Note that this is a different class than Python's built-in set
container. Non-MathObject
s will be coerced into Atom
s by Set
's initializer.
from algebraixlib.mathobjects import Set
many = Set(Atom("hello"), "world", Couplet("hola", "mundo"), "duplicate", "duplicate")
iprint_latex("many")
print("repr = ", repr(many))
repr = Set(Atom('duplicate'), Atom('hello'), Couplet(left=Atom('hola'), right=Atom('mundo')), Atom('world'))
Set
s support for...in
syntax for iteration and in
and not in
syntax for membership tests. Because sets are unordered, they do not support random access (no bracket operator).
nums = Set(1, 2, 3, 4, 5)
for elem in nums:
print(elem)
print(1 in nums)
print(7 in nums)
1 2 3 4 5 True False
Set
s can be unioned, intersected, set-minused. Relations such as is_subset
and is_superset
are defined.
a = Set(1, 2)
b = Set(2, 3)
import algebraixlib.algebras.sets as sets
print("union(a, b) =", sets.union(a, b))
print("intersect(a, b) =", sets.intersect(a, b))
print("minus(a, b) =", sets.minus(a, b))
print("is_subset(a, b) =", sets.is_subset_of(a, b))
print("is_superset(a, {1}) =", sets.is_superset_of(a, Set(1)))
union(a, b) = {1, 2, 3} intersect(a, b) = {2} minus(a, b) = {1} is_subset(a, b) = False is_superset(a, {1}) = True
We can use a Couplet
to model a single truth, such as ${sky}{\mapsto}{blue}$ or ${name}{\mapsto}{jeff}$. By collecting multiple Couplet
s together in a Set
, we form a mathematical model of a data record. This data structure, called a binary relation (abbreviated from here on as simply "relation"), is the fundamental set theory construct in a data algebra program.
record_relation = Set(Couplet('id', 123), Couplet('name', 'jeff'), Couplet('loves', 'math'),
Couplet('loves', 'code'))
iprint_latex("record_relation")
Some relations specify a function from each couplet's left component to its right. This is the case when every left value maps to exactly one right value. Such a relation is called "left functional". Likewise, a relation can be said to be "right functional" when every right value maps to exactly one left value.
import algebraixlib.algebras.relations as relations
functional_relation = Set(Couplet('subject', 123), Couplet('name', 'james'), Couplet('level', 10))
print(relations.get_right(functional_relation, 'subject'))
print(relations.get_left(functional_relation, 123))
print(relations.get_right(record_relation,
'loves')) # See non-functional record_relation above.
123 'subject' undef
Function evaluation syntax makes this more concise for left functional relations.
subject = functional_relation('subject')
one_two_three = functional_relation(123)
iprint_latex("functional_relation(\mbox{'subject'})", subject)
iprint_latex("functional_relation(123)", one_two_three)
The power set of a set $S$, which we'll denote as $P(S)$, is the set of all subsets of $S$. Note how in the example below, the elements of set_s
are numbers, and the elements of powerset_s
are sets of numbers.
set_s = Set(1, 2, 3)
powerset_s = sets.power_set(set_s)
iprint_latex("S", set_s)
iprint_latex("P(S)", powerset_s)
Consider that if $C$ is the set of all couplets, then the set of all relations $R$ can be defined as $P(C)$, that is, every relation is an element of the power set of all couplets. It turns out that we can exploit this relationship by "extending" operations on couplets to relations and make them useful there. To extend a unary operation such as couplets.transpose
, we apply it to every Couplet
in a relation, which results in another relation.
import algebraixlib.extension as ext
first_relation = Set(Couplet('a', 1), Couplet('b', 2), Couplet('c', 3))
transposed_relation = ext.unary_extend(first_relation, couplets.transpose)
iprint_latex("first_relation")
iprint_latex("transposed_relation")
Similarly, a binary operation like couplets.composition
can be extended by evaluating it for every element of the cross product of two relations. Notice that couplets.composition
is a partial binary operation (given two legitimate Couplet
s, it may be undefined). When couplets.compose(a, b)
is not defined, it simply isn't included in the membership of the resulting relation. By extending, we have turned $composition$ into a full binary operation in the power set algebra.
second_relation = Set(Couplet('one', 'a'), Couplet('won', 'a'), Couplet('four', 'd'))
composed_relation = ext.binary_extend(first_relation, second_relation, couplets.compose)
empty_relation = ext.binary_extend(second_relation, first_relation,
couplets.compose) # empty relation; still not commutative
iprint_latex("second_relation")
iprint_latex("composed_relation")
iprint_latex("empty_relation")
These extended operations are defined as functions in the relations
module.
transpose_is_same = transposed_relation == relations.transpose(first_relation)
compose_is_same = composed_relation == relations.compose(first_relation, second_relation)
print("transpose_is_same:", transpose_is_same)
print("compose_is_same:", compose_is_same)
transpose_is_same: True compose_is_same: True
The following docstring specifies a CSV table of words in various languages, with their meaning normalized to English.
vocab_csv = """word,language,meaning
hello,English,salutation
what's up,English,salutation
hola,Spanish,salutation
world,English,earth
mundo,Spanish,earth
gallo,Spanish,rooster
Duniyā,Hindi,earth
Kon'nichiwa,Japanese,salutation
hallo,German,salutation
nuqneH,Klingon,salutation
sekai,Japanese,earth
schmetterling,German,butterfly
mariposa,Spanish,butterfly
"""
Tables can be modeled as sets of binary relations, which we call "clans". In the case of tables, we can further specify that each relation will be a function from left to right (since each (row, header) coordinate corresponds to exactly one element).
from io import StringIO
from algebraixlib.io.csv import import_csv
file = StringIO(vocab_csv)
vocab_clan = import_csv(file)
iprint_latex("vocab_clan")
$superstrict(A, B)$ is a partial binary operation on sets. It is defined as $A$ if $A$ is a superset of $B$, otherwise it is undefined. We use the infix operator $\vartriangleright$ for superstriction.
hello_relation = Set(Couplet('word', 'hello'), Couplet('language', 'English'),
Couplet('meaning', 'salutation'))
super_pos = sets.superstrict(hello_relation, Set(Couplet('language', 'English')))
super_neg = sets.superstrict(hello_relation, Set(Couplet('language', 'Mandarin')))
iprint_latex("hello_relation", hello_relation)
iprint_latex("hello_relation \\vartriangleright \{ \mbox{'language'}{\mapsto}\mbox{'English'} \}", super_pos)
iprint_latex("hello_relation \\vartriangleright \{ \mbox{'language'}{\mapsto}\mbox{'Mandarin'} \}", super_neg)
By extending the $superstrict$ operation to clans, which are sets of sets (of couplets), we can define a helpful mechanism to restrict vocab_clan
to only those relations that contain particular values.
import algebraixlib.algebras.clans as clans
salutation_records_clan = clans.superstrict(vocab_clan, Set(Set(Couplet('meaning', 'salutation'))))
earth_records_clan = clans.superstrict(vocab_clan, Set(Set(Couplet('meaning', 'earth'))))
iprint_latex("salutation_records_clan")
iprint_latex("earth_records_clan")
Our extended relations.compose
operation from earlier can be extended again to work with clans. By choosing an appropriate right-hand argument, clan composition can model the relational algebra notion of projection.
words_langs_clan = Set(Set(Couplet('word', 'word'), Couplet('language', 'language')))
iprint_latex("words_langs_clan")
The relations.diag
and clans.diag
utility functions create a "diagonal" relation or clan, respectively, with simpler syntax.
assert words_langs_clan == clans.diag('word', 'language')
Since the meaning of each set of records ('salutation') is invariant among the relations in salutation_records_clan
, we can drop those Couplet
s. Note that the cardinality of the resulting clan is the same, but each relation now contains only two Couplet
s.
salutation_words_n_langs_clan = clans.compose(salutation_records_clan, words_langs_clan)
iprint_latex("salutation_words_n_langs_clan")
However, we can take this one step further and "rename" the 'word' attribute to something more specific by replacing the value 'word' with 'salutation' everywhere we find it as the left of a Couplet
. By doing this, we both compress the information in each relation and also set our data up for later processing.
salutations_n_langs_clan = clans.compose(salutation_words_n_langs_clan,
Set(Set(Couplet("salutation", "word"),
Couplet("language", "language"))))
iprint_latex("salutations_n_langs_clan")
We'll do the same for earth_records_clan
, but do the projection and "rename" all in one composition operation.
earths_n_langs_clan = clans.compose(earth_records_clan,
Set(Set(Couplet("earth", "word"),
Couplet("language", "language"))))
iprint_latex("earths_n_langs_clan")
Our next task will be to relate these clans to each other in a way that preserves the functional characteristic of every relation. We can define a partial binary operation $functional\_union(A, B)$ on relations to be $union(A, B)$ if $union(A, B)$ is left functional else undefined.
func_union_pos = relations.functional_union(hello_relation,
Set(Couplet('language', 'English'),
Couplet('more', 'info')))
func_union_neg = relations.functional_union(hello_relation,
Set(Couplet('language', 'Spanish'),
Couplet('more', 'info')))
iprint_latex("func_union_pos")
iprint_latex("func_union_neg")
Extending this operation to clans models natural join-like behavior.
salutations_words_langs_clan = clans.cross_functional_union(salutations_n_langs_clan,
earths_n_langs_clan)
iprint_latex("salutations_words_langs_clan")
Now that the clans have been related to each other through their language attributes, we can do another projection. Notice how the "renaming" of 'word' to 'salutation' and 'earth' allows us to distinguish each of the words' meaning after joining the clans.
salutations_n_words_clan = clans.compose(salutations_words_langs_clan,
clans.diag('salutation', 'earth'))
iprint_latex("salutations_n_words_clan")
Finally, we will distill this data down to a single relation describing "Hello, World" phrases.
greeting_relation = Set(Couplet(rel('salutation'), rel('earth'))
for rel in salutations_n_words_clan)
iprint_latex("Greetings!!!", greeting_relation)
© Copyright Permission.io, Inc. (formerly known as Algebraix Data Corporation), Copyright (c) 2022.
This file is part of algebraixlib
.
algebraixlib
is free software: you can redistribute it and/or modify it under the terms of version 3 of the GNU Lesser General Public License as published by the Free Software Foundation.
algebraixlib
is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with algebraixlib
. If not, see GNU licenses.