Path Navigation

Tuple Path Navigation

A tuple path navigation \$t.a\$ from the tuple \$t\$ to its attribute \$a\$ returns the value of the attribute \$a\$. (We discuss below the corner case where a tuple has multiple attributes a.) \$t\$ is an expression but \$a\$ is always an identifier. For example:

{'a': 1, 'b':2}.a

\$<=>\$

{'a': 1, 'b': 2}."a"

\$->\$

1

Even if there were a variable a, bound to 'b', the result of the above expression would still be 1, because the identifier a (or "a") is interpreted as the "`look for the attribute named a `" when it follows the dot in a tuple path navigation. The semantics of tuple path navigation do not depend on whether the tuple is ordered or unordered by schema.

Array Path Navigation

An array navigation \$a\[i]\$ returns the \$i\$-th element when it is applied on an array \$a\$ and \$i\$ is an expression that evaluates into an integer. Both \$a\$ and \$i\$ are expressions. For example:

[2, 4, 6][1+1]

\$->\$

6

Tuple Navigation with Array Notation

The expression \$a\[s]\$ is a shorthand for the tuple path navigation \$a.s\$ when the expression \$s\$ is either (a) a string literal or (b) an expression that is explicitly CAST into a string. For example:

{'a': 1, 'b': 2}['a']

\$<=>\$

{'a': 1, 'b': 2}.'a'

\$->\$

1

Similarly:

{'attr': 1, 'b':2}[CAST('at' || 'tr' AS STRING)]

\$->\$

1

If \$s\$ is not a string literal or an expression that is cast into a string, then \$a\[s]\$ is evaluated as an array path navigation. Notice that in the absence of an explicit cast, the navigation \$a\[e]\$ evaluates as an array navigation, even if \$e\$ ends up evaluating to a string. For example, let us assume that the variable v is bound to at and the variable w is bound to tr. Still, the expression:

{'attr': 1, 'b':2}[v || w]

does not evaluate to 1. It is treated as an array navigation with wrongly typed index and it will return MISSING, for reasons explained below.

Composition of Navigations

Notice that consecutive tuple/array navigations (e.g. r.no[1]) navigate deeply into complex values. Notice further that paths consisting of plain tuple and array path navigations evaluate to a unique value.

Tuple Navigation in Tuples with Duplicate Attributes

When the tuple t has multiple attributes a, the tuple path navigation t.a will return the first instance of a. Note that for tuples whose order is defined by schema, this is well-defined, for unordered tuples, it is implementation defined which attribute is returned in permissive mode or an error in type checking mode.

If one wants to access all instances of a, she should use the UNPIVOT feature instead (see UNPIVOT). For example, the following query returns the list of all a values in a tuple t.

SELECT VALUE v
FROM UNPIVOT t AS v AT attr
WHERE attr = 'a'

Tuple Path Evaluation on Wrongly Typed Data

In the case of tuple paths, since PartiQL does not assume a schema, the semantics must also specify the return value when:

  1. \$t\$ is not a tuple (i.e., when the expression \$t\$ does not evaluate into a tuple), or

  2. \$t\$ is a tuple that does not have an \$a\$ attribute.

Permissive Mode

PartiQL can operate in a permissive mode or in a conventional type checking mode, where the query fails once typing errors (such as the above mentioned ones) happen. In the permissive mode, typing errors are typically neglected by using the semantics outlined next.

In all of the above cases PartiQL returns the special value MISSING. Recall, the MISSING is different from NULL. The distinction enables PartiQL to be able to distinguish between a tuple (JSON object) that lacked an attribute a and a tuple (JSON object) whose a attribute was NULL. This distinction, coupled with appropriate features on how result tuples are constructed (see SELECT clause in SELECT VALUE), enables PartiQL to easily preserve (when needed) the distinction between absent attribute and null-valued attribute.

For example, the expression 'not a tuple'.a and the expression {'a':1, 'b':2} evaluate to MISSING.

The above semantics apply regardless of whether the tuple navigation is accomplished via the dot notation or via the array notation. For example, the expression {'a':1, 'b':2}['noSuchAttribute'] will also evaluate to MISSING.

Type Checking Mode

In the type checking mode and in the absence of schema, PartiQL will fail when tuple path navigation is applied on wrongly typed data.

Role of Schema in Type Checking

In the presence of schema, PartiQL may return a compile-time error when the query processor can prove that the path expression is guaranteed to always produce MISSING. The extent of error detection is implementation-specific.

For example, in the presence of schema validation, an PartiQL query processor can throw a compile-time error when given the path expression {a:1, b:2}.c. In a more important and common case, an PartiQL implementation can utilize the input data schema to prove that a path expression always returns MISSING and thus throw a compile-time error. For example, assume that sometable is an SQL table whose schema does not include an attribute c. Then, an PartiQL implementation may throw a compile-time error when evaluating the query:

SELECT t.a, t.c FROM sometable AS t

Apparently, such an PartiQL implementation is fully compatible with the behavior of an SQL processor. Generally, if a rigid schema is explicitly present, a tuple path navigation error can be caught during compilation time; this is the case in SQL itself, where referring to a non-existent attribute leads to a compilation error for the query.

Notice that operating with schema validation may not prevent all tuple path navigations from being applied to wrongly typed data. The choice between permissive mode versus type checking mode dictates what happens next in these cases: If permissive, the tuple path navigation evaluates into MISSING. If in type checking mode, the query fails.

Array Navigation Evaluation on Wrongly Typed Data

In the permissive mode, an array navigation evaluation \$a\[i]\$ will result into MISSING in each of the following cases:

  • \$a\$ does not evaluate into an array, or

  • \$i\$ does not evaluate into a positive integer within the array’s bounds.

For example, [1,2,3][1.0] evaluates to MISSING since 1.0 is not an integer - even though it is coercible to an integer.

In type checking mode, the query will fail in each one of the cases above.

Additional Path Syntax

The following additional path functionalities are explained by reduction to the basic tuple navigation and array navigation.

Wildcard Steps

The expression \$e\[**]\$ reduces to (i.e., is equivalent to):

SELECT VALUE v FROM e AS v

where \$v\$ is a fresh variable, i.e., a variable that does not already appear in the query. Similarly, when the expression \$e.**\$ is not a SELECT clause item of the form \$t.*\$, where \$t\$ is a variable, it reduces to:

SELECT VALUE v FROM UNPIVOT e AS v

where \$v\$ is a fresh variable. An expression \$t.**\$, where \$t\$ is a variable and the expression appears as a SELECT clause item, is interpreted according to the SELECT clause semantics (see xref:dql/select.adoc#sql-star,SELECT *>>).

Example 1.  

The expression:

[1,2,3][*]

\$<=>\$

SELECT VALUE v FROM [1, 2, 3] AS v

\$->\$

<< 1,2,3 >>

The expression:

{'a':1, 'b':2}.*

\$<=>\$

SELECT VALUE v FROM UNPIVOT {'a':1, 'b':2} AS v

\$->\$

<<1,2>>

Whereas the following query:

SELECT t.* FROM <<{'a':1, 'b':1}, {'a':2, 'b':2}>> AS t

\$->\$

<< {'a':1, 'b':1}, {'a':2, 'b':2} >>

does not do the transformation with UNPIVOT. If one does not want this behavior, SELECT VALUE can be used (see SELECT VALUE).

Path Expressions with Wildcards

PartiQL also provides multi-step path expressions, called path collection expressions. Their semantics is a generalization of the semantics of a path expression with a single \$\[**]\$ or \$.**\$. Consider the path collection expression:

\$e w_1 p_1...w_n p_n\$

where \$e\$ is any expression; \$n>0\$; each wildcard step \$w_i\$ is either \$\[**]\$ or \$.**\$; each series of plain path steps \$p_i\$ is a sequence of zero or more tuple path navigations or array navigations (potentially mixed).

Then the path collection expression is equivalent to the SFW query

SELECT VALUE vn.pn
FROM
    u1 e AS v1,
    u2 v1.p1 AS v1,
    . . .
    un @vn-1.pn-1 AS vn

where each \$v_i\$ is a fresh variable and each \$u_i\$ is UNPIVOT if \$w_i\$ is a \$.**\$ and it is nothing if \$w_i\$ is a \$\[**]\$. Intuitively \$v_i\$ corresponds to the \$i\$-th star.

Example 2.  

According to the above, consider the following query:

SELECT VALUE foo FROM e.* AS foo

reduces to

SELECT VALUE foo FROM (SELECT VALUE v FROM UNPIVOT e AS v) AS foo

which is equivalent to

SELECT VALUE foo FROM UNPIVOT e AS foo

Next, consider the path collection expression:

tables.items[*].product.*.nest

This expression reduces to

SELECT
  VALUE v2.nest
FROM
  tables.items AS v1,
  UNPIVOT @v1.product AS v2