Environments & Binding Tuples
Environments
bind-name ::= global-name
| variable
qualified-name ::= identifier '.' identifier ('.' identifier)*
variable-name ::= identifier
identifier ::= ('$'|'_'|letter) ('$'|'_'|letter|digit)*
| '"' quoted-identifier-body '"'
Each PartiQL (sub-)query and PartiQL (sub-)expression \$q\$ is evaluated within the database environment \$p_0\$ created by the database names and the variables environment \$p\$ created by the defined query variables. The pair of these environments, \$(p_0, p)\$ is collectively called the bindings environment.
In either case, an environment is a binding tuple \$<<x_1:v_1,...,x_n:v_n>>\$, where each \$x_i\$ is a bind name that is unique and binds to the PartiQL value \$v_i\$. The two distinct environments may also be thought of as global (the database object names) and local (the lexically defined variables in a particular scope of the query).
Similarly, for a given \$q\$ at compile (i.e. planning) time, a database type environment, \$Gamma_0\$, and variables type environment, \$Gamma\$ are defined. The type environment is a binding tuple \$<<x_1:tau_1,...,x_n:tau_n>>\$ , where each \$x_i\$ is a name that is unique and binds to the PartiQL type \$tau_i\$. For schema-less values, \$tau\$ can be considered the union of any possible type (for which all operations are potentially applicable).
Qualified names only ever appear in
the database environment. Lexically defined variable names
are always just simple identifiers. For example, a relational database
might define a compound name mydb.log
, where mydb
is the schema
(and not actually a value) and log
could be a table name within that
schema. Note, that a qualified name is distinct from a quoted
identifier that contains a dot. Thus, the qualified name mydb.log
is
distinct from the bind name "mydb.log"
.
Let us assume that we evaluate the following query on the database this
example value, whose top-level value is named
mydb.log
.
SELECT x.resourceId
FROM mydb.log.configurationItems x
The query is evaluated within the database environment \$p_0 =
<<"mydb.log : { 'fileversion': '1.0', 'configurationItems': ... }">>\$
and the variables environment \$p_1 = <<>>\$. Notice the database
environment \$p_0\$ has a single name/value pair, which corresponds
to the only name (mydb.log
) of the database of
example value. The variables environment has no
name/value pair because the above query is not a subquery of a larger
query.
Next, consider the subexpression x.resourceId
of the example’s
query. This subexpression will, generally, be evaluated many times -
once for each x
. Technically, each time it is evaluated within the
same database environment \$p_0\$ and within a variables
environment \$p_2 = (: x : ... :)\$, i.e., a variables environment
that defines the variable x
.
Remark on relationship of binding tuples to PartiQL tuples A binding
tuple is similar to a PartiQL tuple, if you think of the bind names as
attribute names. The characterization “binding” pertains to its use in
the semantics (e.g. an association of names to types) and the fact that
qualified names are not reified in the PartiQL data model, and we have a
representation. As we will see collections of binding tuples will be
homogenous, i.e., they will all have the same “attribute” names. Also
important, is that when we represent binding tuples we explicitly
represent a variable with a MISING
value, as opposed to omitting it because
the lack of a variable name is distinct from a variable whose value is MISSING
.
For example, we write \$<<x : 1, y: "MISSING">>\$ instead of \$<<x : 1>>\$.
Evaluation in environment The notation \$(p_0,p) |-- q -> v\$
denotes that the PartiQL query \$q\$ evaluates to the value
\$v\$ when evaluated within the database environment \$p_0\$ and
the variables environment \$p\$, i.e. when every variable of
\$q\$ is instantiated by its binding in \$p\$ and each database
name is instantiated to its value in \$p_0\$. For example, consider
the query x + y / 2
, the database environment \$p_0 = <<x:5>>\$
and the variables environment \$p = <<y:3>>\$. Then
\$(p_0,p) |-- "(x+y)/2" -> "5+3/2" -> 4\$.
PartiQL Semantics
The semantics of PartiQL are shorter than the semantics of SQL itself—despite being backwards compatible with SQL. The key reason is that the semantics of each clause of an SFW query in the PartiQL core can be understood in isolation from the other clauses. A clause is simply a function that inputs and outputs binding tuples. Thus the specifics of how the binding tuples of a query and of its subqueries are produced are a central part of the semantics. At a high level (which will be elaborated upon later) the construction of binding environments proceeds as follows:
-
When a query is submitted to a database, it is evaluated in an empty variables environment \$p = <<>>\$
-
The
FROM
clause of a SFW query produces new environments by concatenating bindings of theFROM
variables to the environment of its SFW query, as explained below.The subqueries that appear in the
WHERE
,SELECT
, etc clauses are evaluated in these new environments. The optionalGROUP BY
clause also produces additional variable bindings, as explained in GROUP BY.
\$p_0: << "mydb.r" : \[ 3, 'x'], "mydb.s" : < < {'a':1, 'b': 2}, {'a': 3} > > >> \$
\$p = <<>>\$
FROM mydb.r AS x, mydb.s AS y
\$B_"FROM"^"out" = B_"WHERE"^"in" = \$
\$< <\$
\$ <<x:3, y:{'a':1, 'b':2}>>\$
\$ <<x:3, y:{'a':3}>>\$
\$ <<x:'x', y:{'a':1, 'b':2}>>\$
\$ <<x:'x', y:{'a':3}>>\$
\$> >\$
WHERE x > y.b
\$B_"WHERE"^"out" = B_"SELECT"^"in" = \$
\$< <\$ \$<<x:3, y:{'a':1, 'b':2}>>\$ \$> >\$
SELECT x AS foo, y.a AS bar
\$"Result" = < < <<{"foo":3, "bar":1}>> > >\$
PartiQL Clauses as Operators
Similar to SQL semantics, the clauses of an PartiQL SFW query are
evaluated in the following order: WITH
, FROM
, LET
, WHERE
,
GROUP BY
, HAVING
, LETTING
(which is special to PartiQL), ORDER
BY
, LIMIT
/OFFSET
, and SELECT
(or SELECT VALUE
or PIVOT
,
which are both special to ion PartiQL). [1]
Using the example, we illustrate how the
clauses of an SFW query input and output binding tuples. In this query, the FROM
, WHERE
, and SELECT
clauses are displayed apart from each other so that the
example can also illustrate the binding tuples that flow from the one
clause to the next.
The query is evaluated within the bindings environment \$(p_0, p)\$
shown at the top of the <<example-sfw-bindings,example>. Consequently, the
FROM
clause is evaluated in the same environment. Thereafter the
FROM
clause outputs the bag of binding tuples \$B_"FROM"^"out"\$,
which has four binding tuples in the example. In each binding tuple of
\$B_"FROM"^"out"\$ , each variable of the FROM
clause is bound to
a value. There are no restrictions that a variable binds to homogenous
values across binding tuples. In the example, x
binds to two values
that are heterogeneous: some bindings of x
bind to a number, while
others to a string. It would also be possible that a variable binds
to, say, a scalar in one binding, while the same variable binds to a
complex value in another binding.
Each subsequent clause inputs a bag of binding tuples, evaluates the
constituent expressions of the clause (which may themselves contain
nested SFW queries), and outputs a bag of binding tuples that is in
turn input by the next clause. For instance, the WHERE
clause inputs
the bag of binding tuples that have been output by the FROM
clause
(\$B_"FROM"^"out" = B_"WHERE"^"in"\$), and outputs the subset
thereof that satisfies the condition expression of the WHERE
clause. This subset is the \$B_"WHERE"^"out" = B_"SELECT"^"in"\$.
In particular, the WHERE
’s condition is evaluated once for each
input binding tuple \$b\$ in \$B_"WHERE"^"in"\$. In general,
each evaluation is done within the bindings environment
\$(p_0,p || b)\$ , i.e., the concatenation of the binding tuple
\$p\$ (where \$p\$ is the binding environment of the SFW query)
with the binding tuple \$b\$ that has the variables of the
clause. In the particular example \$p || b\$ is simply \$b\$
since \$p=<<>>\$. The condition is evaluated once for each of the
four input binding tuples of \$B_"WHERE"^"in"\$. The variables
environment of the first evaluation is:
\$p = <<x:3, y: { 'a':1, 'b':2 } >>\$
The condition evaluates to for the first binding tuple of \$B_"WHERE"^"in"\$, since
\$(p_0,p) |-- x > y.b -> 3 > { 'a':1, 'b':3}.b -> true \$
Thus the first binding tuple of \$B_"WHERE"^"in"\$ is
output from the WHERE
and is input to SELECT
.
The pattern of “input bag of binding tuples, evaluate constituent
expressions, output bag of binding tuples” has a few exceptions:
First, the ORDER BY
clause inputs a bag of binding tuples and
outputs an array of binding tuples. Second, a LIMIT
/OFFSET
clause
need not evaluate its constituent expression for each input binding
tuple. For example a LIMIT 10
clause that inputs an array with 100
binding tuples need not access binding tuples 11-100.
Finally, the SELECT
clause is responsible for converting from
binding tuples to collections of arbitrary PartiQL elements. The
SELECT
inputs a bag (or array, if ORDER BY
is present) of binding
tuples, and outputs the SFW query’s result, which is a bag
(resp. array) with exactly one element for each input binding
tuple. In the example, the SELECT
expressions x
and y.a
are
evaluated once for each of the input binding tuples of
\$B_"SELECT"^"in"\$, which in this example happen to be just one
binding tuple.
Finally, notice that the above discussion of SFW queries did not
capture the set operators UNION
, INTERSECT
, and EXCEPT
. As is
the case with SQL semantics too, the coordination of with the set
operators requires attention.
In summary, each clause of PartiQL is an operator that inputs/outputs
binding tuples. As such, we can (and will) present the semantics of
each clause separately from the semantics of the other clauses. This
is not the case in SQL: Notably, in the presence of aggregation
functions the SELECT
, HAVING
, and WHERE
cannot be interpreted in
isolation; they can only be interpreted along with the GROUP BY
clause.
Scoping Rules of Variables
As in any programming language, the PartiQL semantics have to deal
with issues of variable scope. For example, how are references to x
resolved in the following query:
SELECT x.a AS a
FROM db1 AS x
WHERE x.b IN (SELECT x.c FROM db2 AS x)
Since this is an SQL query and PartiQL is backwards compatible to SQL,
it is easy to tell that the x
in x.c
resolves to the variable
defined by the inner query’s FROM
clause.
Technically, this scoping rule is captured by the following handling
of binding tuples. The inner FROM
clause is evaluated with a
variables environment \$p = <<x:...>>\$; its x
is the one defined
by the outer FROM
. Then the inner FROM
clause outputs a binding
\$b = <<x..>>\$; this x
is defined by thinner FROM
. Then the
x.c
is evaluated in the concatenation \$p||b\$ and because x
appears in both \$p\$ and \$b\$, the concatenation keeps only
the x
of its right argument. Essentially by putting \$b\$ as the
right argument of the concatenation, the semantics indicate that the
variables of \$b\$ have precedence over synonymous variables in the
left argument (which was the \$p\$).
Generally, given two binding tuples \$b\$ and \$b'\$, their concatenation is a binding tuple, denoted as \$b||b'\$, that has the variable bindings of both \$b\$ and \$b'\$. This creates the possibility that both \$b\$ and \$b'\$ have the same variable \$x\$. In this case, the concatenation \$b||b'\$ will have the \$b'.x\$ and its value; it will not have the \$b.x\$ and its value.
Note, the above does not resolve scoping issues resulting from conflicts between the database environment and the variables environment. We resolve these conflicts by explicit rules.
SELECT
is optionally written as the last clause since, anyway, that’s the proper way to read an SQL query.