Joins
Joins enable cominbing multiple items in a single FROM clause.
The FROM
clause expressions:
l, r
\$<=>\$
l CROSS JOIN r
\$<=>\$
l JOIN r ON TRUE
have the same semantics. They combine the bag of bindings produced
from the FROM
item \$l\$ with the bag of binding tuples produced
by the FROM
item \$r\$, whereas the expression \$r\$ may
utilize variables defined by \$l\$. Again, the term “the semantics
of l CROSSJOIN r
” is equivalent to the term “the semantics of
FROM l CROSSJOIN r
”. In both cases, the semantics specify a bag of
binding tuples.
Join Associativity
We explain the CROSS JOIN
and ,
as if they are left associative
binary operators, despite the fact that one can write more than two
FROM
items without specifying grouping with parenthesis. Since the
,
and CROSS JOIN
operators are associative, we may write (as is
common in SQL):
f1, f2, f3
\$<=>\$
f1 CROSS JOIN f2 CROSS JOIN f3
\$<=>\$
f1 JOIN f2 ON TRUE JOIN f3 ON TRUE
\$<=>\$
(f1, f2), f3
\$<=>\$
(f1 CROSS JOIN f2) CROSS JOIN f3
\$<=>\$
(f1 JOIN f2 ON TRUE) JOIN f3 ON TRUE
CROSS JOIN
Consider the following:
l CROSS JOIN r
unlike SQL, the rhs \$r\$ of the expression may use variables defined by the lhs item \$l\$. The result of this expression for a database environment \$p_0\$ and variables environment \$p\$ is the bag of binding tuples produced by the following pseudo-code. The pseudo-code uses the function \$"eval"(p_0,p,e)\$ that evaluates the expression \$e\$ within the environments \$p_0\$ and \$p\$, i.e., \$p_0,p |-- e -> "eval"(p_0,p,e)\$.
for each binding tuple b_l in eval(p0,p,l)
for each binding tuple b_r in eval(p0, (p || b_l), r)
add (b_l || b_r) to the output bag
In other words, the l CROSS JION r
outputs all binding tuples
\$b = b^l || b^r\$, where \$b^l in "eval"(p_0,p,l)\$ and
\$b^r in "eval"(p_0, (p || b^l), r)\$. The key extension to SQL
is that \$r\$ is evaluated in the variables environment \$p ||
b^l\$, i.e., it can use the variables that were defined by
\$l\$. The details of the variable scoping aspects are described in
scoping.
This example simply reminds the tuple calculus explanation of the FROM
SQL
semantics. It does not yet endeavor into special aspects of PartiQL.
Consider the following database, which is conventional SQL:
<
customers: [
{ 'id': 5, 'name': 'Joe' },
{ 'id': 7, 'name': 'Mary' }
],
orders: [
{ 'custId': 7, 'productId': 101 },
{ 'custId': 7, 'productId': 523 }
]
>
Then consider the following FROM
clause, which could be coming from a
conventional SQL query:
FROM customers AS c, orders AS o
Note that in PartiQL this could also be written using the CROSS JOIN
keyword, and presumably, one would put the sensible equality condition
c.id = o.custId
in the WHERE
clause. At any rate, this FROM
clause outputs the bag of binding tuples:
\$B_"FROM"^"out" = <<<<\$
\$" " << c: {"'id'": 5, "'name'": "'Joe'"}, o: {"'custId'": 7, "'productId'": 101} >> \$
\$" " << c: {"'id'": 5, "'name'": "'Joe'"}, o: {"'custId'": 7, "'productId'": 523} >> \$
\$" " << c: {"'id'": 7, "'name'": "'Mary'"}, o: {"'custId'": 7, "'productId'": 101} >> \$
\$" " << c: {"'id'": 7, "'name'": "'Mary'"}, o: {"'custId'": 7, "'productId'": 523} >> \$
\$>>>>\$
Due to scoping rules defined in scoping, when the rhs of a CROSS JOIN
is a path or a function that uses a variable named n, such variable must be
referred as @n.
Consider the database:
<
"sensors": [
{ 'readings': [ {'v': 1.3}, {'v': 2} ] },
{ 'readings': [ {'v': 0.7}, {'v': 0.8}, {'v': 0.9} ] },
]
>
Intuitively, the following FROM
clause unnests the tuples that are nested
within the readings
.
FROM sensors AS s, s.readings AS r
\$B_"FROM"^"out" = <<<<\$
\$" " << s: {"'readings'": \[{"'v'": 1.3}, {"'v'": 2}]}, r: {v:1.3} >>,\$
\$" " << s: {"'readings'": \[{"'v'": 1.3}, {"'v'": 2}]}, r: {v:2} >>,\$
\$" " << s: {"'readings'": \[{"'v'": 0.7}, {"'v'": 0.8}, {"'v'": 0.9}]}, r: {"'v'":0.7} >>,\$
\$" " << s: {"'readings'": \[{"'v'": 0.7}, {"'v'": 0.8}, {"'v'": 0.9}]}, r: {"'v'":0.8} >>,\$
\$" " << s: {"'readings'": \[{"'v'": 0.7}, {"'v'": 0.8}, {"'v'": 0.9}]}, r: {"'v'":0.9} >>\$
\$>>>>\$
LEFT JOIN
The FROM
clause expression:
l LEFT CROSS JOIN r
\$<=>\$
l LEFT JOIN r ON TRUE
replicates SQL’s LEFT JOIN
functionality and, in addition, it also
works for the case where the lhs of \$r\$ uses variables defined
from \$l\$.
Let’s assume that the variables defined by \$r\$ are
\$v_1^r, ..., v_n^r\$. The result of evaluating l LEFT CROSS JOIN r
in
environments \$p_0\$ and \$p\$ is the bag of binding tuples
produced by the following pseudocode, which also uses the
\$"eval"\$ function.
for each binding bl in eval(p0, p, l)
Br = eval(p0,(p||bl),r)
if Br is the empty bag
add (bl || ⟨v1r : NULL . . . vnr : NULL⟩) to the output bag
else
for each binding br in Br
add (bl || br) to the output bag
Consider the database:
<
"sensors": [
{ 'readings': [ {'v': 1.3}, {'v': 2} ] },
{ 'readings': [ {'v': 0.7}, {'v': 0.8}, {'v': 0.9} ] },
{ 'readings': [] }
]
>
Notice that the value of the last tuple’s reading
attribute is the
empty array. The following FROM
clause unnests the tuples that are
nested within the readings
but will also keep around the tuple with
the empty readings
. (See the last binding tuple.)
FROM sensors AS s LEFT CROSS JOIN s.readings AS r
\$B_"FROM"^"out" = <<<<\$
\$" "<< s: {"'readings'": \[{"'v'":1.3}, {"'v'":2}]}, r: {"'v'":1.3} >>,\$
\$" "<< s: {"'readings'": \[{"'v'":1.3}, {"'v'":2}]}, r: {"'v'":2} >>,\$
\$" "<< s: {"'readings'": \[{"'v'":0.7}, {"'v'":0.8}, {"'v'":0.9}]}, r: {"'v'":0.7} >>,\$
\$" "<< s: {"'readings'": \[{"'v'":0.7}, {"'v'":0.8}, {"'v'":0.9}]}, r: {"'v'":0.8} >>,\$
\$" "<< s: {"'readings'": \[{"'v'":0.7}, {"'v'":0.8}, {"'v'":0.9}]}, r: {"'v'":0.9} >>,\$
\$" "<< s: {"'readings'": \[]}, r: "NULL" >>\$
\$>>>>\$
FULL JOIN
The clause expression:
l FULL JOIN r ON c
replicates SQL’s FULL JOIN
functionality. It assumes that (alike
SQL) the lhs of \$r\$ does not use variables defined from
\$l\$. Thus, we do not discuss further.
Join Conditions
In compliance to SQL, the and have an optional clause. The semantics of can be explained as syntactic sugar over the core PartiQL. They can also be explained by a simple extension of the semantics of CROSS JOIN, LEFT JOIN, and FULL JOIN. The semantics of:
l JOIN r ON c
are the following modification of the pseudocode of CROSS JOIN.
for each binding tuple bl in eval(p0,p,l)
for each binding tuple br in eval(p0, (p || bl), r)
if eval(p0, (p || bl || br), c) is true
add (bl || br) to the output bag
The semantics of:
l LEFT JOIN r ON c
are the following modification of the pseudocode of CROSS JOIN. In essence, the outputs a tuple padded with whenever there is no binding of \$r\$ that satisfies the condition \$c\$.
for each binding bl in eval(p0, p, l)
Br = eval(p0,(p||bl),r)
Qr = << >>
for each binding br in Br
if eval(p0, (p || bl || br), c) is true
add br in Qr
if Qr is the empty bag
add (bl || ⟨v1r : NULL . . . vnr : NULL⟩) to the output bag
else
for each binding br in Qr
add (bl || br) to the output bag