Path Navigation
Tuple Path Navigation
A tuple path navigation \$t.a\$ from the tuple \$t\$ to its
attribute \$a\$ returns the value
of the attribute \$a\$. (We discuss below the corner case where a
tuple has multiple attributes a
.) \$t\$ is an expression but
\$a\$ is always an identifier. For example:
{'a': 1, 'b':2}.a
\$<=>\$
{'a': 1, 'b': 2}."a"
\$->\$
1
Even if there were a variable a
, bound to 'b'
, the result of the
above expression would still be 1
, because the identifier a
(or
"a"
) is interpreted as the "`look for the attribute named a
`"
when it follows the dot in a tuple path navigation. The semantics of
tuple path navigation do not depend on whether the tuple is ordered or
unordered by schema.
Array Path Navigation
An array navigation \$a\[i]\$ returns the \$i\$-th element when it is applied on an array \$a\$ and \$i\$ is an expression that evaluates into an integer. Both \$a\$ and \$i\$ are expressions. For example:
[2, 4, 6][1+1]
\$->\$
6
Tuple Navigation with Array Notation
The expression \$a\[s]\$ is a shorthand for the tuple path
navigation \$a.s\$ when the expression \$s\$ is either (a) a
string literal or (b) an expression that is explicitly CAST
into a
string. For example:
{'a': 1, 'b': 2}['a']
\$<=>\$
{'a': 1, 'b': 2}.'a'
\$->\$
1
Similarly:
{'attr': 1, 'b':2}[CAST('at' || 'tr' AS STRING)]
\$->\$
1
If \$s\$ is not a string literal or an expression that is cast into
a string, then \$a\[s]\$ is evaluated as an array path
navigation. Notice that in the absence of an explicit cast, the
navigation \$a\[e]\$ evaluates as an array navigation, even if
\$e\$ ends up evaluating to a string. For example, let us assume
that the variable v
is bound to at
and the variable w
is bound
to tr
. Still, the expression:
{'attr': 1, 'b':2}[v || w]
does not evaluate to 1
. It is treated as an array navigation with wrongly
typed index and it will return MISSING
, for reasons explained below.
Composition of Navigations
Notice that consecutive tuple/array navigations (e.g. r.no[1]
)
navigate deeply into complex values. Notice further that paths
consisting of plain tuple and array path navigations evaluate to a
unique value.
Tuple Navigation in Tuples with Duplicate Attributes
When the tuple t
has multiple attributes a
, the tuple path navigation t.a
will
return the first instance of a
. Note that for tuples whose order is
defined by schema, this is well-defined, for unordered tuples, it is
implementation defined which attribute is returned in permissive mode
or an error in type checking mode.
If one wants to access all instances of a
, she should use the
UNPIVOT
feature instead (see UNPIVOT). For example, the
following query returns the list of all a
values in a tuple t
.
SELECT VALUE v
FROM UNPIVOT t AS v AT attr
WHERE attr = 'a'
Tuple Path Evaluation on Wrongly Typed Data
In the case of tuple paths, since PartiQL does not assume a schema, the semantics must also specify the return value when:
-
\$t\$ is not a tuple (i.e., when the expression \$t\$ does not evaluate into a tuple), or
-
\$t\$ is a tuple that does not have an \$a\$ attribute.
Permissive Mode
PartiQL can operate in a permissive mode or in a conventional type checking mode, where the query fails once typing errors (such as the above mentioned ones) happen. In the permissive mode, typing errors are typically neglected by using the semantics outlined next.
In all of the above cases PartiQL returns the special value
MISSING
. Recall, the MISSING
is different from NULL
. The
distinction enables PartiQL to be able to distinguish between a tuple
(JSON object) that lacked an attribute a
and a tuple (JSON object)
whose a
attribute was NULL
. This distinction, coupled with
appropriate features on how result tuples are constructed (see
SELECT
clause in SELECT VALUE), enables PartiQL to easily
preserve (when needed) the distinction between absent attribute and
null-valued attribute.
For example, the expression 'not a tuple'.a
and the expression
{'a':1, 'b':2}
evaluate to MISSING
.
The above semantics apply regardless of whether the tuple navigation
is accomplished via the dot notation or via the array notation. For
example, the expression {'a':1, 'b':2}['noSuchAttribute']
will also
evaluate to MISSING
.
Type Checking Mode
In the type checking mode and in the absence of schema, PartiQL will fail when tuple path navigation is applied on wrongly typed data.
Role of Schema in Type Checking
In the presence of schema, PartiQL may return a compile-time error when
the query processor can prove that the path expression is guaranteed to
always produce MISSING
. The extent of error detection is
implementation-specific.
For example, in the presence of schema validation, an PartiQL query
processor can throw a compile-time error when given the path expression
{a:1, b:2}.c
. In a more important and common case, an PartiQL implementation can
utilize the input data schema to prove that a path expression always
returns MISSING
and thus throw a compile-time error. For example, assume that sometable
is
an SQL table whose schema does not include an attribute c
. Then, an
PartiQL implementation may throw a compile-time error when evaluating
the query:
SELECT t.a, t.c FROM sometable AS t
Apparently, such an PartiQL implementation is fully compatible with the behavior of an SQL processor. Generally, if a rigid schema is explicitly present, a tuple path navigation error can be caught during compilation time; this is the case in SQL itself, where referring to a non-existent attribute leads to a compilation error for the query.
Notice that operating with schema validation may not prevent all tuple
path navigations from being applied to wrongly typed data. The choice
between permissive mode versus type checking mode dictates what happens
next in these cases: If permissive, the tuple path navigation evaluates
into MISSING
. If in type checking mode, the query fails.
Array Navigation Evaluation on Wrongly Typed Data
In the permissive mode, an array navigation evaluation
\$a\[i]\$ will result into MISSING
in each of the following cases:
-
\$a\$ does not evaluate into an array, or
-
\$i\$ does not evaluate into a positive integer within the array’s bounds.
For example, [1,2,3][1.0]
evaluates to MISSING
since 1.0
is not
an integer - even though it is coercible to an integer.
In type checking mode, the query will fail in each one of the cases above.
Additional Path Syntax
The following additional path functionalities are explained by reduction to the basic tuple navigation and array navigation.
Wildcard Steps
The expression \$e\[**]\$ reduces to (i.e., is equivalent to):
SELECT VALUE v FROM e AS v
where \$v\$ is a fresh variable, i.e., a variable that does not
already appear in the query. Similarly, when the expression \$e.**\$
is not a SELECT
clause item of the form \$t.*\$, where \$t\$ is a
variable, it reduces to:
SELECT VALUE v FROM UNPIVOT e AS v
where \$v\$ is a fresh variable. An expression \$t.**\$, where
\$t\$ is a variable and the expression appears as a SELECT
clause
item, is interpreted according to the SELECT
clause semantics (see xref:dql/select.adoc#sql-star,SELECT *>>).
The expression:
[1,2,3][*]
\$<=>\$
SELECT VALUE v FROM [1, 2, 3] AS v
\$->\$
<< 1,2,3 >>
The expression:
{'a':1, 'b':2}.*
\$<=>\$
SELECT VALUE v FROM UNPIVOT {'a':1, 'b':2} AS v
\$->\$
<<1,2>>
Whereas the following query:
SELECT t.* FROM <<{'a':1, 'b':1}, {'a':2, 'b':2}>> AS t
\$->\$
<< {'a':1, 'b':1}, {'a':2, 'b':2} >>
does not do the transformation with UNPIVOT
. If one does not want
this behavior, SELECT VALUE
can be used (see SELECT VALUE).
Path Expressions with Wildcards
PartiQL also provides multi-step path expressions, called path collection expressions. Their semantics is a generalization of the semantics of a path expression with a single \$\[**]\$ or \$.**\$. Consider the path collection expression:
where \$e\$ is any expression; \$n>0\$; each wildcard step \$w_i\$ is either \$\[**]\$ or \$.**\$; each series of plain path steps \$p_i\$ is a sequence of zero or more tuple path navigations or array navigations (potentially mixed).
Then the path collection expression is equivalent to the SFW query
SELECT VALUE vn.pn
FROM
u1 e AS v1,
u2 v1.p1 AS v1,
. . .
un @vn-1.pn-1 AS vn
where each \$v_i\$ is a fresh variable and each \$u_i\$ is
UNPIVOT
if \$w_i\$ is a \$.**\$ and it is nothing if
\$w_i\$ is a \$\[**]\$. Intuitively \$v_i\$ corresponds to
the \$i\$-th star.
According to the above, consider the following query:
SELECT VALUE foo FROM e.* AS foo
reduces to
SELECT VALUE foo FROM (SELECT VALUE v FROM UNPIVOT e AS v) AS foo
which is equivalent to
SELECT VALUE foo FROM UNPIVOT e AS foo
Next, consider the path collection expression:
tables.items[*].product.*.nest
This expression reduces to
SELECT
VALUE v2.nest
FROM
tables.items AS v1,
UNPIVOT @v1.product AS v2