Environments & Binding Tuples

Environments

EBNF Grammar for PartiQL Names

bind-name ::= global-name
            | variable

qualified-name ::= identifier '.' identifier  ('.' identifier)*

variable-name ::= identifier

identifier ::= ('$'|'_'|letter) ('$'|'_'|letter|digit)*
             | '"' quoted-identifier-body '"'

Each PartiQL (sub-)query and PartiQL (sub-)expression \$q\$ is evaluated within the database environment \$p_0\$ created by the database names and the variables environment \$p\$ created by the defined query variables. The pair of these environments, \$(p_0, p)\$ is collectively called the bindings environment.

In either case, an environment is a binding tuple \$<<x_1:v_1,...,x_n:v_n>>\$, where each \$x_i\$ is a bind name that is unique and binds to the PartiQL value \$v_i\$. The two distinct environments may also be thought of as global (the database object names) and local (the lexically defined variables in a particular scope of the query).

Similarly, for a given \$q\$ at compile (i.e. planning) time, a database type environment, \$Gamma_0\$, and variables type environment, \$Gamma\$ are defined. The type environment is a binding tuple \$<<x_1:tau_1,...,x_n:tau_n>>\$ , where each \$x_i\$ is a name that is unique and binds to the PartiQL type \$tau_i\$. For schema-less values, \$tau\$ can be considered the union of any possible type (for which all operations are potentially applicable).

Qualified names only ever appear in the database environment. Lexically defined variable names are always just simple identifiers. For example, a relational database might define a compound name mydb.log, where mydb is the schema (and not actually a value) and log could be a table name within that schema. Note, that a qualified name is distinct from a quoted identifier that contains a dot. Thus, the qualified name mydb.log is distinct from the bind name "mydb.log".

Example 1.

Let us assume that we evaluate the following query on the database this example value, whose top-level value is named mydb.log.

SELECT x.resourceId
FROM mydb.log.configurationItems x

The query is evaluated within the database environment \$p_0 = <<"mydb.log : { 'fileversion': '1.0', 'configurationItems': ... }">>\$ and the variables environment \$p_1 = <<>>\$. Notice the database environment \$p_0\$ has a single name/value pair, which corresponds to the only name (mydb.log) of the database of example value. The variables environment has no name/value pair because the above query is not a subquery of a larger query.

Next, consider the subexpression x.resourceId of the example’s query. This subexpression will, generally, be evaluated many times - once for each x. Technically, each time it is evaluated within the same database environment \$p_0\$ and within a variables environment \$p_2 = (: x : ... :)\$, i.e., a variables environment that defines the variable x.

Remark on relationship of binding tuples to PartiQL tuples A binding tuple is similar to a PartiQL tuple, if you think of the bind names as attribute names. The characterization “binding” pertains to its use in the semantics (e.g. an association of names to types) and the fact that qualified names are not reified in the PartiQL data model, and we have a representation. As we will see collections of binding tuples will be homogenous, i.e., they will all have the same “attribute” names. Also important, is that when we represent binding tuples we explicitly represent a variable with a MISING value, as opposed to omitting it because the lack of a variable name is distinct from a variable whose value is MISSING. For example, we write \$<<x : 1, y: "MISSING">>\$ instead of \$<<x : 1>>\$.

Evaluation in environment The notation \$(p_0,p) |-- q -> v\$ denotes that the PartiQL query \$q\$ evaluates to the value \$v\$ when evaluated within the database environment \$p_0\$ and the variables environment \$p\$, i.e. when every variable of \$q\$ is instantiated by its binding in \$p\$ and each database name is instantiated to its value in \$p_0\$. For example, consider the query x + y / 2, the database environment \$p_0 = <<x:5>>\$ and the variables environment \$p = <<y:3>>\$. Then \$(p_0,p) |-- "(x+y)/2" -> "5+3/2" -> 4\$.

PartiQL Semantics

The semantics of PartiQL are shorter than the semantics of SQL itself—despite being backwards compatible with SQL. The key reason is that the semantics of each clause of an SFW query in the PartiQL core can be understood in isolation from the other clauses. A clause is simply a function that inputs and outputs binding tuples. Thus the specifics of how the binding tuples of a query and of its subqueries are produced are a central part of the semantics. At a high level (which will be elaborated upon later) the construction of binding environments proceeds as follows:

When a query is submitted to a database, it is evaluated in an empty variables environment \$p = <<>>\$
The FROM clause of a SFW query produces new environments by concatenating bindings of the FROM variables to the environment of its SFW query, as explained below.

The subqueries that appear in the WHERE, SELECT, etc clauses are evaluated in these new environments. The optional GROUP BY clause also produces additional variable bindings, as explained in GROUP BY.

Example 2. An Example SFW Query with Flow of Binding Tuples

\$p_0: << "mydb.r" : \[ 3, 'x'], "mydb.s" : < < {'a':1, 'b': 2}, {'a': 3} > > >> \$
\$p = <<>>\$

FROM mydb.r AS x, mydb.s AS y
\$B_"FROM"^"out" = B_"WHERE"^"in" = \$
\$< <\$
\$ <<x:3, y:{'a':1, 'b':2}>>\$
\$ <<x:3, y:{'a':3}>>\$
\$ <<x:'x', y:{'a':1, 'b':2}>>\$
\$ <<x:'x', y:{'a':3}>>\$
\$> >\$

WHERE x > y.b
\$B_"WHERE"^"out" = B_"SELECT"^"in" = \$
\$< <\$ \$<<x:3, y:{'a':1, 'b':2}>>\$ \$> >\$

SELECT x AS foo, y.a AS bar
\$"Result" = < < <<{"foo":3, "bar":1}>> > >\$

PartiQL Clauses as Operators

Similar to SQL semantics, the clauses of an PartiQL SFW query are evaluated in the following order: WITH, FROM, LET, WHERE, GROUP BY, HAVING, LETTING (which is special to PartiQL), ORDER BY, LIMIT/OFFSET, and SELECT (or SELECT VALUE or PIVOT, which are both special to ion PartiQL). ^[1]

Using the example, we illustrate how the clauses of an SFW query input and output binding tuples. In this query, the FROM, WHERE, and SELECT clauses are displayed apart from each other so that the example can also illustrate the binding tuples that flow from the one clause to the next.

The query is evaluated within the bindings environment \$(p_0, p)\$ shown at the top of the <<example-sfw-bindings,example>. Consequently, the FROM clause is evaluated in the same environment. Thereafter the FROM clause outputs the bag of binding tuples \$B_"FROM"^"out"\$, which has four binding tuples in the example. In each binding tuple of \$B_"FROM"^"out"\$ , each variable of the FROM clause is bound to a value. There are no restrictions that a variable binds to homogenous values across binding tuples. In the example, x binds to two values that are heterogeneous: some bindings of x bind to a number, while others to a string. It would also be possible that a variable binds to, say, a scalar in one binding, while the same variable binds to a complex value in another binding.

Each subsequent clause inputs a bag of binding tuples, evaluates the constituent expressions of the clause (which may themselves contain nested SFW queries), and outputs a bag of binding tuples that is in turn input by the next clause. For instance, the WHERE clause inputs the bag of binding tuples that have been output by the FROM clause (\$B_"FROM"^"out" = B_"WHERE"^"in"\$), and outputs the subset thereof that satisfies the condition expression of the WHERE clause. This subset is the \$B_"WHERE"^"out" = B_"SELECT"^"in"\$.

In particular, the WHERE’s condition is evaluated once for each input binding tuple \$b\$ in \$B_"WHERE"^"in"\$. In general, each evaluation is done within the bindings environment \$(p_0,p || b)\$ , i.e., the concatenation of the binding tuple \$p\$ (where \$p\$ is the binding environment of the SFW query) with the binding tuple \$b\$ that has the variables of the clause. In the particular example \$p || b\$ is simply \$b\$ since \$p=<<>>\$. The condition is evaluated once for each of the four input binding tuples of \$B_"WHERE"^"in"\$. The variables environment of the first evaluation is:

\$p = <<x:3, y: { 'a':1, 'b':2 } >>\$

The condition evaluates to for the first binding tuple of \$B_"WHERE"^"in"\$, since

\$(p_0,p) |-- x > y.b -> 3 > { 'a':1, 'b':3}.b -> true \$

Thus the first binding tuple of \$B_"WHERE"^"in"\$ is output from the WHERE and is input to SELECT.

The pattern of “input bag of binding tuples, evaluate constituent expressions, output bag of binding tuples” has a few exceptions: First, the ORDER BY clause inputs a bag of binding tuples and outputs an array of binding tuples. Second, a LIMIT/OFFSET clause need not evaluate its constituent expression for each input binding tuple. For example a LIMIT 10 clause that inputs an array with 100 binding tuples need not access binding tuples 11-100.

Finally, the SELECT clause is responsible for converting from binding tuples to collections of arbitrary PartiQL elements. The SELECT inputs a bag (or array, if ORDER BY is present) of binding tuples, and outputs the SFW query’s result, which is a bag (resp. array) with exactly one element for each input binding tuple. In the example, the SELECT expressions x and y.a are evaluated once for each of the input binding tuples of \$B_"SELECT"^"in"\$, which in this example happen to be just one binding tuple.

Finally, notice that the above discussion of SFW queries did not capture the set operators UNION, INTERSECT, and EXCEPT. As is the case with SQL semantics too, the coordination of with the set operators requires attention.

In summary, each clause of PartiQL is an operator that inputs/outputs binding tuples. As such, we can (and will) present the semantics of each clause separately from the semantics of the other clauses. This is not the case in SQL: Notably, in the presence of aggregation functions the SELECT, HAVING, and WHERE cannot be interpreted in isolation; they can only be interpreted along with the GROUP BY clause.

Scoping Rules of Variables

As in any programming language, the PartiQL semantics have to deal with issues of variable scope. For example, how are references to x resolved in the following query:

SELECT x.a AS a
FROM db1 AS x
WHERE x.b IN (SELECT x.c FROM db2 AS x)

Since this is an SQL query and PartiQL is backwards compatible to SQL, it is easy to tell that the x in x.c resolves to the variable defined by the inner query’s FROM clause.

Technically, this scoping rule is captured by the following handling of binding tuples. The inner FROM clause is evaluated with a variables environment \$p = <<x:...>>\$; its x is the one defined by the outer FROM. Then the inner FROM clause outputs a binding \$b = <<x..>>\$; this x is defined by thinner FROM. Then the x.c is evaluated in the concatenation \$p||b\$ and because x appears in both \$p\$ and \$b\$, the concatenation keeps only the x of its right argument. Essentially by putting \$b\$ as the right argument of the concatenation, the semantics indicate that the variables of \$b\$ have precedence over synonymous variables in the left argument (which was the \$p\$).

Generally, given two binding tuples \$b\$ and \$b'\$, their concatenation is a binding tuple, denoted as \$b||b'\$, that has the variable bindings of both \$b\$ and \$b'\$. This creates the possibility that both \$b\$ and \$b'\$ have the same variable \$x\$. In this case, the concatenation \$b||b'\$ will have the \$b'.x\$ and its value; it will not have the \$b.x\$ and its value.

Note, the above does not resolve scoping issues resulting from conflicts between the database environment and the variables environment. We resolve these conflicts by explicit rules.

1. PartiQL also supports a syntax improvement where SELECT is optionally written as the last clause since, anyway, that’s the proper way to read an SQL query.