Variable Resolution

Resolution Rules

A PartiQL query is compiled within a static environment that associates global identifiers with their respective PartiQL types. These types are used during variable resolution to determine if an identifier path possibly refers to a certain value. The word "possible" is key to understanding how PartiQL is able to resolve variables given partial schema. The PartiQL Planner follows the given rules to resolve variables.

The path @i₁.i₂. . . . .i_n always refers to a bound variable named i₁; if there is no such possible variable and i₁.i₂. . . . .i_m, m ≤ n is a database name then i₁.i₂. . . . .i_m refers to that database name. If there is a choice, choose the largest m. If both the resolution to variable and the resolution to database name fail, return MISSING or fail execution.
If i₁.i₂. . . . .i_n is a FROM path and i₁.i₂. . . . .i_m, m ≤ n is a possible database name, then i₁.i₂. . . . .i_m refers to that database name and i_m+1. . . . .i_n is a series of tuple path navigations starting from the database name i₁.i₂. . . . .i_m. If there is a choice, choose the largest m. If there is no such possible database name, then i₁.i₂. . . . .i_m refers to a variable (matching largest m). If both the resolution to a database name and the resolution to a variable fail, then fail compilation as this identifier path cannot be resolved.
If i₁.i₂. . . . .i_n is a non-FROM clause expression and i₁ is an environment variable then i₁ refers to such variable; if there is no such variable and i₁.i₂. . . . .i_m, m ≤ n is a database name, then i₁.i₂. . . . .i_m refers to that database name. If there is a choice, choose the largest m. If both the resolution to variable and the resolution to database name fail, return MISSING or fail execution.

Static Environment

For example, let’s see how x and T are resolved given different static environments.

SELECT x FROM T

Assume all static environments are closed.

env₀ :: {
  T : << closed :: { x: any } >>    — bag of closed structs, each with known field x
}

env₁ :: {
  T : << closed :: { y: any } >>    — bag of closed structs, each with known field y
}

env₂ :: {
  T : << open :: { y: any } >>      — bag of open structs, each with known field y
}

env₃ :: {
  T : << closed :: { x: any } >>,   — bag of closed structs, each with known fields x
  x : int                           — int
}

env₄ :: {
  T : << closed :: { y: any } >>,   — bag of closed structs, each with known fields y
  x : int                           — int
}

env₅ :: {
  T : << open :: { y: any } >>,     — bag of open structs, each with known fields y
  x : int                           — int
}

For all environments we have a known global with name T. In the example query, the identifier path T is on the right-hand-side of a FROM clause (ie it’s a FROM path) so we apply resolution rule 2. In all cases, this from path T is resolved to the known global T of the static environment.

Now let’s look how we resolve x in the various environments.

Env₀

In the query, the identifier path x is unambiguously bound to T. This enables us to attempt to resolve the query as

SELECT t.x FROM T as t

We apply rule 3 and find that t.x is possible because the static environment tells us the structs of T have a known field x.

Env₁

Unlike env₀, we cannot resolve the path t.x via Rule 3 because the static environments tells us that it is impossible for a struct of T to have a field x. Next, Rule 3 tells us to check for a global variable x, but again we find it’s not possible. This variable is unable to be resolved, and compilation fails.

Env₂

Env₂ differs from Env₁ because the structs within bag T are defined as having open schema. While we don’t explicitly know that x is a known field of structs in T, we do know that it’s possible for such a field to exist. In this case, we resolve the variable x to the path t.x.

Env₃

The case for env₃ is the same for env₀. The static environment tells us the structs of T have a known field x.

Env₄

Like env₁, the static environments tells us that it is impossible for a struct of T to have a field x. Next, Rule 3 tells us to check for a global variable x, and (unlike env₁) we find the static environment defines a global x. We resolve x as that global.

Env₅

Env₅ is just like Env₂, but the static environment defines a global x. However, the resolution is the exact same as Env₂. The identifier path x is unambiguously bound to T, and we see the t.x is possible in the static environment. We resolve x to the path t.x.

Resolution with Multiple FROM sources

For example, let’s see how x and T are resolved given different static environments.

SELECT x FROM T, S

Assume all static environments are closed.

env₀ :: {
  T : << closed :: { x: any } >>    — bag of closed structs, each with known field x
  S : << closed :: { y: any } >>    — bag of closed structs, each with known field x
}

env₁ :: {
  T : << closed :: { z: any } >>    — bag of closed structs, each with known field z
  S : << closed :: { z: any } >>    — bag of closed structs, each with known field z
  x : int                           — Global x of type int
}

Env₀

The output binding tuples of T CROSS JOIN S have the schema:

<<
  closed :: {
    t: << closed :: { x: any } >>,
    s: << closed :: { y: any } >>,
  }
>>

We know that x does not exist, so we check the globals per resolution rule 3. This fails, so the variable x is unresolved and we fail compilation.

Env₁

Same situation as Env₀ except that we find x to be a global / database name. We resolve x to the database name — even if it’s a strange query.

Variable Scopes Example

The scoping rules discussed in the present section discuss the resolution of naming conflicts between names defined in the database environment and the variables of the environment variables. The potential for such naming conflicts is driven by the nested data of PartiQL, as illustrated next.

Notice there are a few more naming conventions, pertaining to the use of attribute names defined in the SELECT clause into the GROUP BY and ORDER BY clause. These conventions are explained with the semantics of the respective clauses (see GROUP BY Clause and ORDER BY Clause).

Example 1.

The following example illustrates how SQL compatibility issues and the needs of navigating into nested data need to be carefully merged together. Consider the following database that has a table c, i.e. a collection of tuples, and also named data x.n and y.

t.c: <<
    {'a':1, 'n':[{'b':11, 'c':12}]},
    {'a':2, 'n':[{'b':21, 'c':22}]}
>>
x.n : << {'b':3} >>
y: {'a':1, 'b':2}

Then consider the query

SELECT t.a
FROM t.c AS x
WHERE x.a IN (SELECT y.b FROM x.n AS y)

This query poses many scoping issues:

Does x.n refer to the named value x.n or to the n attribute of the variable x? For SQL compatibility purposes it refers to the named value x.n. Read below how to refer to the variable x.
Does y.b refer to the b attribute of the y attribute or to the b attribute of the named value y? For SQL compatibility purposes it refers to the b attribute of the variable y.

Notice how SQL compatibility required the database environment to take priority over the variables environment in the FROM clause and then, vice versa, the variables environment to take priority over the database environment in the SELECT clause.

Example 2.

Assume database names coll, v.foo, w. Then in the query

SELECT v.foo
FROM coll AS v, @v.foo AS w,
     (SELECT w.a, u.b FROM @w.bar AS u)
         AS x

coll refers to the database name. The v in @v.foo refers to the variable v. If the @ were not there, v.foo would refer to the database name v.foo. The w in w.a refers to the variable defined in line 2.

Note, the expressions coll and @v.foo are FROM clause expressions because they appear in the FROM clause of the sfw_query of lines 1-4, in which they are immediately nested. Similarly, the expression @w.bar is a FROM clause expression because it appears in the FROM clause of the sfw_query of line 3, in which it is immediately nested. In contrast, the expressions w.a and u.b are not FROM clause expressions. Though they are nested into the FROM clause of the query of lines 1-4, they are not immediately nested into the query of lines 1-4.

Variable Resolution

Resolution Rules

Static Environment

Env0

Env1

Env2

Env3

Env4

Env5

Resolution with Multiple FROM sources

Env0

Env1

Variable Scopes Example

Env₀

Env₁

Env₂

Env₃

Env₄

Env₅

Env₀

Env₁