Joins

Joins enable cominbing multiple items in a single FROM clause.

The FROM clause expressions:

l, r

\$<=>\$

l CROSS JOIN r

\$<=>\$

l JOIN r ON TRUE

have the same semantics. They combine the bag of bindings produced from the FROM item \$l\$ with the bag of binding tuples produced by the FROM item \$r\$, whereas the expression \$r\$ may utilize variables defined by \$l\$. Again, the term “the semantics of l CROSSJOIN r” is equivalent to the term “the semantics of FROM l CROSSJOIN r”. In both cases, the semantics specify a bag of binding tuples.

Join Associativity

We explain the CROSS JOIN and , as if they are left associative binary operators, despite the fact that one can write more than two FROM items without specifying grouping with parenthesis. Since the , and CROSS JOIN operators are associative, we may write (as is common in SQL):

f1, f2, f3

\$<=>\$

f1 CROSS JOIN f2 CROSS JOIN f3

\$<=>\$

f1 JOIN f2 ON TRUE JOIN f3 ON TRUE

\$<=>\$

(f1, f2), f3

\$<=>\$

(f1 CROSS JOIN f2) CROSS JOIN f3

\$<=>\$

(f1 JOIN f2 ON TRUE) JOIN f3 ON TRUE

CROSS JOIN

Consider the following:

l CROSS JOIN r

unlike SQL, the rhs \$r\$ of the expression may use variables defined by the lhs item \$l\$. The result of this expression for a database environment \$p_0\$ and variables environment \$p\$ is the bag of binding tuples produced by the following pseudo-code. The pseudo-code uses the function \$"eval"(p_0,p,e)\$ that evaluates the expression \$e\$ within the environments \$p_0\$ and \$p\$, i.e., \$p_0,p |-- e -> "eval"(p_0,p,e)\$.

for each binding tuple b_l in eval(p0,p,l)
    for each binding tuple b_r in eval(p0, (p || b_l), r)
    	add (b_l || b_r) to the output bag

In other words, the l CROSS JION r outputs all binding tuples \$b = b^l || b^r\$, where \$b^l in "eval"(p_0,p,l)\$ and \$b^r in "eval"(p_0, (p || b^l), r)\$. The key extension to SQL is that \$r\$ is evaluated in the variables environment \$p || b^l\$, i.e., it can use the variables that were defined by \$l\$. The details of the variable scoping aspects are described in scoping.

Example 1.  

This example simply reminds the tuple calculus explanation of the FROM SQL semantics. It does not yet endeavor into special aspects of PartiQL. Consider the following database, which is conventional SQL:

<
  customers: [
    { 'id': 5, 'name': 'Joe' },
    { 'id': 7, 'name': 'Mary' }
  ],
  orders: [
    { 'custId': 7, 'productId': 101 },
    { 'custId': 7, 'productId': 523 }
  ]
>

Then consider the following FROM clause, which could be coming from a conventional SQL query:

FROM customers AS c, orders AS o

Note that in PartiQL this could also be written using the CROSS JOIN keyword, and presumably, one would put the sensible equality condition c.id = o.custId in the WHERE clause. At any rate, this FROM clause outputs the bag of binding tuples:

\$B_"FROM"^"out" = <<<<\$
\$" " << c: {"'id'": 5, "'name'": "'Joe'"}, o: {"'custId'": 7, "'productId'": 101} >> \$
\$" " << c: {"'id'": 5, "'name'": "'Joe'"}, o: {"'custId'": 7, "'productId'": 523} >> \$
\$" " << c: {"'id'": 7, "'name'": "'Mary'"}, o: {"'custId'": 7, "'productId'": 101} >> \$
\$" " << c: {"'id'": 7, "'name'": "'Mary'"}, o: {"'custId'": 7, "'productId'": 523} >> \$
\$>>>>\$

Due to scoping rules defined in scoping, when the rhs of a CROSS JOIN is a path or a function that uses a variable named n, such variable must be referred as @n.

Example 2.  

Consider the database:

<
  "sensors": [
    { 'readings': [ {'v': 1.3}, {'v': 2} ] },
    { 'readings': [ {'v': 0.7}, {'v': 0.8}, {'v': 0.9} ] },
  ]
>

Intuitively, the following FROM clause unnests the tuples that are nested within the readings.

FROM sensors AS s, s.readings AS r

\$B_"FROM"^"out" = <<<<\$
\$" " << s: {"'readings'": \[{"'v'": 1.3}, {"'v'": 2}]}, r: {v:1.3} >>,\$
\$" " << s: {"'readings'": \[{"'v'": 1.3}, {"'v'": 2}]}, r: {v:2} >>,\$
\$" " << s: {"'readings'": \[{"'v'": 0.7}, {"'v'": 0.8}, {"'v'": 0.9}]}, r: {"'v'":0.7} >>,\$
\$" " << s: {"'readings'": \[{"'v'": 0.7}, {"'v'": 0.8}, {"'v'": 0.9}]}, r: {"'v'":0.8} >>,\$
\$" " << s: {"'readings'": \[{"'v'": 0.7}, {"'v'": 0.8}, {"'v'": 0.9}]}, r: {"'v'":0.9} >>\$
\$>>>>\$

LEFT JOIN

The FROM clause expression:

l LEFT CROSS JOIN r

\$<=>\$

l LEFT JOIN r ON TRUE

replicates SQL’s LEFT JOIN functionality and, in addition, it also works for the case where the lhs of \$r\$ uses variables defined from \$l\$.

Let’s assume that the variables defined by \$r\$ are \$v_1^r, ..., v_n^r\$. The result of evaluating l LEFT CROSS JOIN r in environments \$p_0\$ and \$p\$ is the bag of binding tuples produced by the following pseudocode, which also uses the \$"eval"\$ function.

for each binding bl in eval(p0, p, l)
    Br = eval(p0,(p||bl),r)
    if Br is the empty bag
       add (bl || ⟨v1r : NULL . . . vnr : NULL⟩) to the output bag
    else
       for each binding br in Br
           add (bl || br) to the output bag
Example 3.  

Consider the database:

<
  "sensors": [
    { 'readings': [ {'v': 1.3}, {'v': 2} ] },
    { 'readings': [ {'v': 0.7}, {'v': 0.8}, {'v': 0.9} ] },
    { 'readings': [] }
  ]
>

Notice that the value of the last tuple’s reading attribute is the empty array. The following FROM clause unnests the tuples that are nested within the readings but will also keep around the tuple with the empty readings. (See the last binding tuple.)

FROM sensors AS s LEFT CROSS JOIN s.readings AS r

\$B_"FROM"^"out" = <<<<\$
\$" "<< s: {"'readings'": \[{"'v'":1.3}, {"'v'":2}]}, r: {"'v'":1.3} >>,\$
\$" "<< s: {"'readings'": \[{"'v'":1.3}, {"'v'":2}]}, r: {"'v'":2} >>,\$
\$" "<< s: {"'readings'": \[{"'v'":0.7}, {"'v'":0.8}, {"'v'":0.9}]}, r: {"'v'":0.7} >>,\$
\$" "<< s: {"'readings'": \[{"'v'":0.7}, {"'v'":0.8}, {"'v'":0.9}]}, r: {"'v'":0.8} >>,\$
\$" "<< s: {"'readings'": \[{"'v'":0.7}, {"'v'":0.8}, {"'v'":0.9}]}, r: {"'v'":0.9} >>,\$
\$" "<< s: {"'readings'": \[]}, r: "NULL" >>\$
\$>>>>\$

FULL JOIN

The clause expression:

l FULL JOIN r ON c

replicates SQL’s FULL JOIN functionality. It assumes that (alike SQL) the lhs of \$r\$ does not use variables defined from \$l\$. Thus, we do not discuss further.

Join Conditions

In compliance to SQL, the and have an optional clause. The semantics of can be explained as syntactic sugar over the core PartiQL. They can also be explained by a simple extension of the semantics of CROSS JOIN, LEFT JOIN, and FULL JOIN. The semantics of:

l JOIN r ON c

are the following modification of the pseudocode of CROSS JOIN.

for each binding tuple bl in eval(p0,p,l)
    for each binding tuple br in eval(p0, (p || bl), r)
    	if eval(p0, (p || bl || br), c) is true
	        add (bl || br) to the output bag

The semantics of:

l LEFT JOIN r ON c

are the following modification of the pseudocode of CROSS JOIN. In essence, the outputs a tuple padded with whenever there is no binding of \$r\$ that satisfies the condition \$c\$.

for each binding bl in eval(p0, p, l)
    Br = eval(p0,(p||bl),r)
    Qr = << >>
    for each binding br in Br
        if eval(p0, (p || bl || br), c) is true
	        add br in Qr
    if Qr is the empty bag
       add (bl || ⟨v1r : NULL . . . vnr : NULL⟩) to the output bag
    else
       for each binding br in Qr
           add (bl || br) to the output bag

LATERAL

SQL 2003 used the LATERAL keyword to correlate clause items. In the interest of compatibility with SQL, PartiQL also allows the use of the keyword LATERAL, though it does not do anything more than the comma itself would do. That is l, LATERAL r is equivalent l, r.