Joins
Joins enable cominbing multiple items in a single FROM clause.
The FROM clause expressions:
l, r
\$<=>\$
l CROSS JOIN r
\$<=>\$
l JOIN r ON TRUE
have the same semantics. They combine the bag of bindings produced
from the FROM item \$l\$ with the bag of binding tuples produced
by the FROM item \$r\$, whereas the expression \$r\$ may
utilize variables defined by \$l\$. Again, the term “the semantics
of l CROSSJOIN r” is equivalent to the term “the semantics of
FROM l CROSSJOIN r”. In both cases, the semantics specify a bag of
binding tuples.
Join Associativity
We explain the CROSS JOIN and , as if they are left associative
binary operators, despite the fact that one can write more than two
FROM items without specifying grouping with parenthesis. Since the
, and CROSS JOIN operators are associative, we may write (as is
common in SQL):
f1, f2, f3
\$<=>\$
f1 CROSS JOIN f2 CROSS JOIN f3
\$<=>\$
f1 JOIN f2 ON TRUE JOIN f3 ON TRUE
\$<=>\$
(f1, f2), f3
\$<=>\$
(f1 CROSS JOIN f2) CROSS JOIN f3
\$<=>\$
(f1 JOIN f2 ON TRUE) JOIN f3 ON TRUE
CROSS JOIN
Consider the following:
l CROSS JOIN r
unlike SQL, the rhs \$r\$ of the expression may use variables defined by the lhs item \$l\$. The result of this expression for a database environment \$p_0\$ and variables environment \$p\$ is the bag of binding tuples produced by the following pseudo-code. The pseudo-code uses the function \$"eval"(p_0,p,e)\$ that evaluates the expression \$e\$ within the environments \$p_0\$ and \$p\$, i.e., \$p_0,p |-- e -> "eval"(p_0,p,e)\$.
for each binding tuple b_l in eval(p0,p,l)
for each binding tuple b_r in eval(p0, (p || b_l), r)
add (b_l || b_r) to the output bag
In other words, the l CROSS JION r outputs all binding tuples
\$b = b^l || b^r\$, where \$b^l in "eval"(p_0,p,l)\$ and
\$b^r in "eval"(p_0, (p || b^l), r)\$. The key extension to SQL
is that \$r\$ is evaluated in the variables environment \$p ||
b^l\$, i.e., it can use the variables that were defined by
\$l\$. The details of the variable scoping aspects are described in
scoping.
This example simply reminds the tuple calculus explanation of the FROM SQL
semantics. It does not yet endeavor into special aspects of PartiQL.
Consider the following database, which is conventional SQL:
<
customers: [
{ 'id': 5, 'name': 'Joe' },
{ 'id': 7, 'name': 'Mary' }
],
orders: [
{ 'custId': 7, 'productId': 101 },
{ 'custId': 7, 'productId': 523 }
]
>
Then consider the following FROM clause, which could be coming from a
conventional SQL query:
FROM customers AS c, orders AS o
Note that in PartiQL this could also be written using the CROSS JOIN
keyword, and presumably, one would put the sensible equality condition
c.id = o.custId in the WHERE clause. At any rate, this FROM
clause outputs the bag of binding tuples:
\$B_"FROM"^"out" = <<<<\$
\$" " << c: {"'id'": 5, "'name'": "'Joe'"}, o: {"'custId'": 7, "'productId'": 101} >> \$
\$" " << c: {"'id'": 5, "'name'": "'Joe'"}, o: {"'custId'": 7, "'productId'": 523} >> \$
\$" " << c: {"'id'": 7, "'name'": "'Mary'"}, o: {"'custId'": 7, "'productId'": 101} >> \$
\$" " << c: {"'id'": 7, "'name'": "'Mary'"}, o: {"'custId'": 7, "'productId'": 523} >> \$
\$>>>>\$
Due to scoping rules defined in scoping, when the rhs of a CROSS JOIN is a path or a function that uses a variable named n, such variable must be
referred as @n.
Consider the database:
<
"sensors": [
{ 'readings': [ {'v': 1.3}, {'v': 2} ] },
{ 'readings': [ {'v': 0.7}, {'v': 0.8}, {'v': 0.9} ] },
]
>
Intuitively, the following FROM clause unnests the tuples that are nested
within the readings.
FROM sensors AS s, s.readings AS r
\$B_"FROM"^"out" = <<<<\$
\$" " << s: {"'readings'": \[{"'v'": 1.3}, {"'v'": 2}]}, r: {v:1.3} >>,\$
\$" " << s: {"'readings'": \[{"'v'": 1.3}, {"'v'": 2}]}, r: {v:2} >>,\$
\$" " << s: {"'readings'": \[{"'v'": 0.7}, {"'v'": 0.8}, {"'v'": 0.9}]}, r: {"'v'":0.7} >>,\$
\$" " << s: {"'readings'": \[{"'v'": 0.7}, {"'v'": 0.8}, {"'v'": 0.9}]}, r: {"'v'":0.8} >>,\$
\$" " << s: {"'readings'": \[{"'v'": 0.7}, {"'v'": 0.8}, {"'v'": 0.9}]}, r: {"'v'":0.9} >>\$
\$>>>>\$
LEFT JOIN
The FROM clause expression:
l LEFT CROSS JOIN r
\$<=>\$
l LEFT JOIN r ON TRUE
replicates SQL’s LEFT JOIN functionality and, in addition, it also
works for the case where the lhs of \$r\$ uses variables defined
from \$l\$.
Let’s assume that the variables defined by \$r\$ are
\$v_1^r, ..., v_n^r\$. The result of evaluating l LEFT CROSS JOIN r in
environments \$p_0\$ and \$p\$ is the bag of binding tuples
produced by the following pseudocode, which also uses the
\$"eval"\$ function.
for each binding bl in eval(p0, p, l)
Br = eval(p0,(p||bl),r)
if Br is the empty bag
add (bl || ⟨v1r : NULL . . . vnr : NULL⟩) to the output bag
else
for each binding br in Br
add (bl || br) to the output bag
Consider the database:
<
"sensors": [
{ 'readings': [ {'v': 1.3}, {'v': 2} ] },
{ 'readings': [ {'v': 0.7}, {'v': 0.8}, {'v': 0.9} ] },
{ 'readings': [] }
]
>
Notice that the value of the last tuple’s reading attribute is the
empty array. The following FROM clause unnests the tuples that are
nested within the readings but will also keep around the tuple with
the empty readings. (See the last binding tuple.)
FROM sensors AS s LEFT CROSS JOIN s.readings AS r
\$B_"FROM"^"out" = <<<<\$
\$" "<< s: {"'readings'": \[{"'v'":1.3}, {"'v'":2}]}, r: {"'v'":1.3} >>,\$
\$" "<< s: {"'readings'": \[{"'v'":1.3}, {"'v'":2}]}, r: {"'v'":2} >>,\$
\$" "<< s: {"'readings'": \[{"'v'":0.7}, {"'v'":0.8}, {"'v'":0.9}]}, r: {"'v'":0.7} >>,\$
\$" "<< s: {"'readings'": \[{"'v'":0.7}, {"'v'":0.8}, {"'v'":0.9}]}, r: {"'v'":0.8} >>,\$
\$" "<< s: {"'readings'": \[{"'v'":0.7}, {"'v'":0.8}, {"'v'":0.9}]}, r: {"'v'":0.9} >>,\$
\$" "<< s: {"'readings'": \[]}, r: "NULL" >>\$
\$>>>>\$
FULL JOIN
The clause expression:
l FULL JOIN r ON c
replicates SQL’s FULL JOIN functionality. It assumes that (alike
SQL) the lhs of \$r\$ does not use variables defined from
\$l\$. Thus, we do not discuss further.
Join Conditions
In compliance to SQL, the and have an optional clause. The semantics of can be explained as syntactic sugar over the core PartiQL. They can also be explained by a simple extension of the semantics of CROSS JOIN, LEFT JOIN, and FULL JOIN. The semantics of:
l JOIN r ON c
are the following modification of the pseudocode of CROSS JOIN.
for each binding tuple bl in eval(p0,p,l)
for each binding tuple br in eval(p0, (p || bl), r)
if eval(p0, (p || bl || br), c) is true
add (bl || br) to the output bag
The semantics of:
l LEFT JOIN r ON c
are the following modification of the pseudocode of CROSS JOIN. In essence, the outputs a tuple padded with whenever there is no binding of \$r\$ that satisfies the condition \$c\$.
for each binding bl in eval(p0, p, l)
Br = eval(p0,(p||bl),r)
Qr = << >>
for each binding br in Br
if eval(p0, (p || bl || br), c) is true
add br in Qr
if Qr is the empty bag
add (bl || ⟨v1r : NULL . . . vnr : NULL⟩) to the output bag
else
for each binding br in Qr
add (bl || br) to the output bag