Names and Identifiers

This pages defines how names and identifiers are implemented in partiql-lang-kotlin — it is backwards-compatible with existing PartiQL implementations and is compatible with SQL. This document is not a definitive specification for PartiQL names and identifiers, but simply a reference for the partiql-lang-kotlin implementation.

TL;DR

  • A name is some label associated with a reference-able entity.

  • An identifier is a reference to some named entity.

  • To resolve an identifier (a reference), it is matched against the in-scope names e.g. in-scope, reference-able entities such as local variables and tables (global variables).

  • Case-normalization is a syntactic rewrite which enables PartiQL to accommodate semi-structured data while maintaining compatibility with SQL systems.

  • Case-normalization modes are summarized in the Reduced Example Matrix.

Definitions

These definitions are independent of the syntax rules
  • Name — A name is some label associated with a reference-able entity e.g. a local variable binding or a catalog object (global variable binding).

  • Identifier — An identifier is a reference to some named entity.

SQL Definitions — SQL-99 Part 1, page 5

  • Object (as in "x object") — Any thing. An x object is a component of, or is otherwise associated with, some x, and cannot exist independently of that x. For example, an SQL object is an object that exists only in the context of SQL; an SQL-schema object is an object that exists in some SQL-schema.

  • Identifier — A means by which something is identified.

  • Identify — To properly reference something without ambiguity.

Syntax

The syntax rules govern the usage of identifiers and case-normalization behavior.

<identifier> ::= <delimited identifier> | <regular identifier>

<qualified identifier> ::= (<identifier> <period>)+ <identifier>

<delimited identifier> ::= <double quote> <delimited identifier body> <double quote>

<simple identifier> ::= <identifier body>

<name> ::= <identifier>

<qualified name> ::= <qualified identifier>
Notes
  • These are syntax rules and should not be confused with the definitions from part 1.

  • Names and identifiers share syntax rules, but that doesn’t make them the same!

  • Names have their own syntax rule because their case-normalization behavior differs from identifier case-normalization.

  • You assign names and you reference with identifiers.

  • See SQL-99 Names and Identifiers.

  • We must accommodate for arbitrarily long qualified identifiers (not just catalog-qualified and schema-qualified).

  • For now, assume SQL definition for <delimited identifier body> and <identifier body> (SQL-99 Part 2, page 113).

Syntax Rules
  • Identifier — An identifier is a single delimited or regular identifier.

  • Delimited Identifier — A delimited-identifier is a simple-identifier which is enclosed by double-quotes and is NOT case-normalized regardless of casing mode.

  • Regular Identifier — A regular-identifier is a simple-identifier which is NOT enclosed by double-quotes and MAY be case-normalized.

  • Qualified Identifier — A qualified-identifier is composed of two or more identifiers joined by a period.

  • Name — The name syntax rule is an identifier which MAY be case-normalized depending on the mode.

  • Qualified Name — A qualified-identifier which MAY be case-normalized

These are syntax rules where a name is an identifier used in certain contexts. The distinction is critical for defining the case-normalization rewrite.

Syntax Examples

Let’s look at some examples of identifiers and names in the context of PartiQL queries.

Example 1regular-identifier in a local-first-scoped expression

This query references some matching (local-first) in-scope variable.

SELECT abc FROM ..

-- `abc` is a regular identifier

Example 2delimited-identifier

Likewise, this query references some matching (local-first) in-scope variable.

SELECT "abc" FROM ..

-- `"abc"` is a delimited identifier

Example 3regular-identifier in a global-first-scoped expression

This query references some matching (global-first) in-scope variable.

SELECT .. FROM T

-- `T` is a regular identifier

Example 4qualified-identifier in a global-first-scoped expression

We are referencing the named entity with a qualified-identifier.

SELECT .. FROM my_cat.my_schema.T

-- `my_cat.my_schema.T` is qualified identifier

Example 5name used for variable bindings

We are assigning some name to the given expressions (int literals)

SELECT 1 AS x, 2 as "y" FROM ..

-- `x` is a name with a regular identifier
-- `"y"` is a name with with a delimited identifier

Example 6named used for table definition

We are assigning some name to the given table definition.

CREATE TABLE T (..)

-- `T` is a name using a regular identifier

Example 7qualified-name used for table definition

We are assigning some name to the given table definition.

CREATE TABLE my_schema.T (..)

-- `my_schema.T` is a qualified name.

Identifier Match Semantics

This section covers how we compare identifiers to names, see scoping and paths for resolving naming conflicts.

Recall that identifiers reference named entities such as local variables, tables, views, and functions. To resolve an identifier (a reference), it is matched against the in-scope names e.g. in-scope variables, tables, etc. What does it mean for an identifier to match a name?

Let’s clarify what is meant by identifier and name in this context.

  • Here, an identifier comes from the syntax rule; it is either regular or delimited.

  • Here, a name is a label for some in-scope variable (local, table, etc.) — a name can be thought of as an exact-case string.

An identifier is matched against a name following the two rules

  1. If the identifier is regular, match case-insensitively to the name.

  2. If the identifier is delimited, match case-sensitively to the name.

Note that this is not the complete picture…​

  • Qualified identifiers may reference nested database entities (think catalog.schema.table).

  • The details of the resolution semantics is dependent upon session state which is being covered by the PartiQL-Environment proposal.

  • The PartiQL-Environment proposal was implemented in partiql-lang-kotlin #1496.

Name and Identifier Normalization

PartiQL is able to accommodate partially typed, semi-structured data while maintaining compatibility with SQL systems through its normalization modes. Normalization is the process by which names and identifiers are re-written to new names and identifiers — this normalization is a syntactic rewrite and does NOT change identifier match semantics describe above. Notice that the identifier match semantics are describe independently of normalization.

PartiQL defines four casing modes which are described by their case-normalization function (CNF).

  • Let CNF denote a piecewise-defined function on names and identifiers.

  • Let regular(body) denoted a regular-identifier with the given body text.

  • Let delimited(body) denote a delimited-identifier with the given body text.

  • Let name(id) denote a name with identifier id.

Note that delimited identifiers are NEVER normalized; that is CNF(delimited) = delimited in all modes.

Summary

Reduced Example Matrix
MODE IDENTIFIERS NAMES

CNF

aBc

xYz

DEFAULT

aBc

"xYz"

UPPER

"ABC"

“XYZ”

LOWER

“abc”

“xyz”

EXACT

“aBc”

“xYz”

DEFAULT is the ONLY case in which identifiers in references (rvalues) are handled differently than identifiers in names (lvalues) — this is required for backwards compatibility.

Mode 1 — CNF Default

This mode is backwards compatible and is consistent with the existing PartiQL Specification.

Summary
  • Regular identifiers are NOT rewritten.

  • Names are rewritten to case-preserved delimited identifiers.

Definition
CNF(id) → id
CNF(regular(body)) → regular(body)
CNF(delimited(body)) → delimited(body)

CNF(name) → ...
CNF(name(regular(body))) → delimited(body)
CNF(name(delimited(body))) → delimited(body)
Example
-- input

SELECT aBc AS xYz FROM My_Table

-- output

SELECT aBc AS "xYz" FROM My_Table

Mode 2 — CNF Upper

This mode is compatible with SQL and identifier normalization is consistent across references and names.

Summary
  • Regular identifiers are case-normalized to upper case.

  • Names are case-normalized based upon their identifiers.

  • AKA “case-normalize upper and quote everything for the customer”

  • CNF(name(id)) ↔ CNF(id)

Definition
CNF(id) → ...
CNF(regular(body)) → delimited(upper(body))
CNF(delimited(body)) → delimited(body)

CNF(name(id)) → CNF(id)
CNF(name(regular(body))) → delimited(upper(body))
CNF(name(delimited(body))) → delimited(body)
Note that this function always returns delimited (exact-case) identifiers.
Example
-- input

SELECT aBc AS xYz FROM My_Table

-- output

SELECT "ABC" AS "XYZ" FROM "MY_TABLE"

Mode 3 — CNF Lower

This mode is compatible with Postgres and identifier normalization is consistent across references and names.

Summary
  • Regular identifiers are case-normalized to lower case.

  • Names are case-normalized based upon their identifiers.

  • AKA “case-normalize lower and quote everything for the customer”

  • CNF(name(id)) ↔ CNF(id)

Definition
CNF(id) → ..
CNF(regular(body)) → delimited(lower(body))
CNF(delimited(body)) → delimited(body)

CNF(name(id)) → CNF(id)
CNF(name(regular(body))) → delimited(lower(body))
CNF(name(delimited(body))) → delimited(body)
This function always returns delimited (exact-case) identifiers.
Example
-- input

SELECT aBc AS xYz FROM My_Table

-- output

SELECT "abc" AS "xyz" FROM "my_table"

Mode 4 — CNF Exact

This mode is compatible with document databases (DynamoDB) and other SQL++ implementations Asterix, N1QL — identifier normalization is consistent across references and names.

Summary
  • Regular identifiers are rewritten to case-preserved delimited identifiers.

  • Names are rewritten to case-preserved delimited identifiers.

  • AKA “quote everything for the customer”

  • CNF(name(id)) ↔ CNF(id)

Definition
CNF(id) → id
CNF(regular(body)) → delimited(body)
CNF(delimited(body)) → delimited(body)

CNF(name(id)) → CNF(id)
CNF(name(regular(body))) → delimited(body)
CNF(name(delimited(body))) → delimited(body)
Example
-- input

SELECT aBc AS xYz FROM My_Table

-- output

SELECT "aBc" AS "xYz" FROM "My_Table"
This function always returns delimited (exact-case) identifiers.

Mode Matrices

This section uses tables to summarize the case-normalization modes.

Reduced Matrix

This matrix omits delimited identifiers.

MODE IDENTIFIERS NAMES

CNF

regular(body)

name(regular(body))

DEFAULT

regular(body)

delimited(body)

UPPER

delimited(upper(body))

delimited(upper(body))

LOWER

delimited(lower(body))

delimited(lower(body))

EXACT

delimited(body)

delimited(body)

Reduced Example Matrix

MODE IDENTIFIERS NAMES

CNF

aBc

xYz

DEFAULT

aBc

“xYz”

UPPER

"ABC"

“XYZ”

LOWER

“abc”

“xyz”

EXACT

“aBc”

“xYz”

Complete Matrix

MODE IDENTIFIERS IDENTIFIERS NAMES NAMES

CNF

regular(body)

delimited(body)

name(regular(body))

name(delimited(body))

DEFAULT

regular(body)

delimited(body)

delimited(body)

delimited(body)

UPPER

delimited(upper(body))

delimited(body)

delimited(upper(body))

delimited(body)

LOWER

delimited(lower(body))

delimited(body)

delimited(lower(body))

delimited(body)

EXACT

delimited(body)

delimited(body)

delimited(body)

delimited(body)

Complete Example Matrix

MODE IDENTIFIERS IDENTIFIERS NAMES NAMES

CNF

aBc

"aBc"

xYz

“xYz”

DEFAULT

aBc

“aBc"

xYz

“xYz”

UPPER

"ABC"

“aBc"

“XYZ”

“xYz”

LOWER

“abc”

“aBc"

“xyz”

“xYz”

EXACT

“aBc”

“aBc"

“xYz”

“xYz”