Variable Resolution

Resolution Rules

A PartiQL query is compiled within a static environment that associates global identifiers with their respective PartiQL types. These types are used during variable resolution to determine if an identifier path possibly refers to a certain value. The word "possible" is key to understanding how PartiQL is able to resolve variables given partial schema. The PartiQL Planner follows the given rules to resolve variables.

  1. The path @i1.i2. . . . .in always refers to a bound variable named i1; if there is no such possible variable and i1.i2. . . . .im, m ≤ n is a database name then i1.i2. . . . .im refers to that database name. If there is a choice, choose the largest m. If both the resolution to variable and the resolution to database name fail, return MISSING or fail execution.

  2. If i1.i2. . . . .in is a FROM path and i1.i2. . . . .im, m ≤ n is a possible database name, then i1.i2. . . . .im refers to that database name and im+1. . . . .in is a series of tuple path navigations starting from the database name i1.i2. . . . .im. If there is a choice, choose the largest m. If there is no such possible database name, then i1.i2. . . . .im refers to a variable (matching largest m). If both the resolution to a database name and the resolution to a variable fail, then fail compilation as this identifier path cannot be resolved.

  3. If i1.i2. . . . .in is a non-FROM clause expression and i1 is an environment variable then i1 refers to such variable; if there is no such variable and i1.i2. . . . .im, m ≤ n is a database name, then i1.i2. . . . .im refers to that database name. If there is a choice, choose the largest m. If both the resolution to variable and the resolution to database name fail, return MISSING or fail execution.

Static Environment

For example, let’s see how x and T are resolved given different static environments.

SELECT x FROM T

Assume all static environments are closed.

env0 :: {
  T : << closed :: { x: any } >>    — bag of closed structs, each with known field x
}

env1 :: {
  T : << closed :: { y: any } >>    — bag of closed structs, each with known field y
}

env2 :: {
  T : << open :: { y: any } >>      — bag of open structs, each with known field y
}

env3 :: {
  T : << closed :: { x: any } >>,   — bag of closed structs, each with known fields x
  x : int                           — int
}

env4 :: {
  T : << closed :: { y: any } >>,   — bag of closed structs, each with known fields y
  x : int                           — int
}

env5 :: {
  T : << open :: { y: any } >>,     — bag of open structs, each with known fields y
  x : int                           — int
}

For all environments we have a known global with name T. In the example query, the identifier path T is on the right-hand-side of a FROM clause (ie it’s a FROM path) so we apply resolution rule 2. In all cases, this from path T is resolved to the known global T of the static environment.

Now let’s look how we resolve x in the various environments.

Env0

In the query, the identifier path x is unambiguously bound to T. This enables us to attempt to resolve the query as

SELECT t.x FROM T as t

We apply rule 3 and find that t.x is possible because the static environment tells us the structs of T have a known field x.

Env1

Unlike env0, we cannot resolve the path t.x via Rule 3 because the static environments tells us that it is impossible for a struct of T to have a field x. Next, Rule 3 tells us to check for a global variable x, but again we find it’s not possible. This variable is unable to be resolved, and compilation fails.

Env2

Env2 differs from Env1 because the structs within bag T are defined as having open schema. While we don’t explicitly know that x is a known field of structs in T, we do know that it’s possible for such a field to exist. In this case, we resolve the variable x to the path t.x.

Env3

The case for env3 is the same for env0. The static environment tells us the structs of T have a known field x.

Env4

Like env1, the static environments tells us that it is impossible for a struct of T to have a field x. Next, Rule 3 tells us to check for a global variable x, and (unlike env1) we find the static environment defines a global x. We resolve x as that global.

Env5

Env5 is just like Env2, but the static environment defines a global x. However, the resolution is the exact same as Env2. The identifier path x is unambiguously bound to T, and we see the t.x is possible in the static environment. We resolve x to the path t.x.

Resolution with Multiple FROM sources

For example, let’s see how x and T are resolved given different static environments.

SELECT x FROM T, S

Assume all static environments are closed.

env0 :: {
  T : << closed :: { x: any } >>    — bag of closed structs, each with known field x
  S : << closed :: { y: any } >>    — bag of closed structs, each with known field x
}

env1 :: {
  T : << closed :: { z: any } >>    — bag of closed structs, each with known field z
  S : << closed :: { z: any } >>    — bag of closed structs, each with known field z
  x : int                           — Global x of type int
}

Env0

The output binding tuples of T CROSS JOIN S have the schema:

<<
  closed :: {
    t: << closed :: { x: any } >>,
    s: << closed :: { y: any } >>,
  }
>>

We know that x does not exist, so we check the globals per resolution rule 3. This fails, so the variable x is unresolved and we fail compilation.

Env1

Same situation as Env0 except that we find x to be a global / database name. We resolve x to the database name — even if it’s a strange query.

Variable Scopes Example

The scoping rules discussed in the present section discuss the resolution of naming conflicts between names defined in the database environment and the variables of the environment variables. The potential for such naming conflicts is driven by the nested data of PartiQL, as illustrated next.

Notice there are a few more naming conventions, pertaining to the use of attribute names defined in the SELECT clause into the GROUP BY and ORDER BY clause. These conventions are explained with the semantics of the respective clauses (see GROUP BY Clause and ORDER BY Clause).

Example 1.  

The following example illustrates how SQL compatibility issues and the needs of navigating into nested data need to be carefully merged together. Consider the following database that has a table c, i.e. a collection of tuples, and also named data x.n and y.

t.c: <<
    {'a':1, 'n':[{'b':11, 'c':12}]},
    {'a':2, 'n':[{'b':21, 'c':22}]}
>>
x.n : << {'b':3} >>
y: {'a':1, 'b':2}

Then consider the query

SELECT t.a
FROM t.c AS x
WHERE x.a IN (SELECT y.b FROM x.n AS y)

This query poses many scoping issues:

  1. Does x.n refer to the named value x.n or to the n attribute of the variable x? For SQL compatibility purposes it refers to the named value x.n. Read below how to refer to the variable x.

  2. Does y.b refer to the b attribute of the y attribute or to the b attribute of the named value y? For SQL compatibility purposes it refers to the b attribute of the variable y.

Notice how SQL compatibility required the database environment to take priority over the variables environment in the FROM clause and then, vice versa, the variables environment to take priority over the database environment in the SELECT clause.

Example 2.  

Assume database names coll, v.foo, w. Then in the query

SELECT v.foo
FROM coll AS v, @v.foo AS w,
     (SELECT w.a, u.b FROM @w.bar AS u)
         AS x

coll refers to the database name. The v in @v.foo refers to the variable v. If the @ were not there, v.foo would refer to the database name v.foo. The w in w.a refers to the variable defined in line 2.

Note, the expressions coll and @v.foo are FROM clause expressions because they appear in the FROM clause of the sfw_query of lines 1-4, in which they are immediately nested. Similarly, the expression @w.bar is a FROM clause expression because it appears in the FROM clause of the sfw_query of line 3, in which it is immediately nested. In contrast, the expressions w.a and u.b are not FROM clause expressions. Though they are nested into the FROM clause of the query of lines 1-4, they are not immediately nested into the query of lines 1-4.