SELECT Clauses

Core PartiQL SFW queries have a SELECT VALUE clause (in lieu of SQL’s SELECT clause) that can create outputs that are collections of anything (e.g., collections of tuples, collections of scalars, collections of arrays, collections of mixed type elements, etc.) SELECT VALUE describes the SELECT VALUE clause.

SQL’s well-known SELECT clause can be used as a mere syntactic sugar over SELECT VALUE, when we consider the top-level query. In particular, SQL SELECT shows that SQL’s SELECT is the special case where the SELECT VALUE produces collections of tuples. Furthermore, when SELECT is used as a subquery it is coerced into a scalar or a tuple, in the ways that SQL coerces the results of subqueries.

PIVOT can be used instead of SELECT VALUE. PIVOT creates a tuple, with a data dependent number of attribute/value pairs, where not only the values but the attributes as well could be originating from the data found in the binding tuples.

SELECT VALUE

The SELECT VALUE clause inputs a bag of binding tuples or an array of binding tuples (from the other clauses of the SQL query) and outputs a bag or an array. For example, if the query only has SELECT VALUE, FROM, and WHERE clauses, then the bindings that are output by the WHERE clause are input by SELECT VALUE the clause. Unlike SQL, the output of a SELECT VALUE clause need not be a bag or array of tuples. It is a bag or array of any kind of PartiQL values. For example, it may be a bag of integers, or a bag of arrays, etc. Indeed, the values may be heterogeneous. For example, the output may even be a bag that has both integers and arrays.

The core PartiQL clause:

SELECT VALUE e

inputs a bag or an array (depending on the presence or non-presence of ORDER BY) of binding tuples and outputs respectively a bag or an array of values. Let \$p_0\$ and \$p\$ be the environments of the SFW query. For each input binding tuple \$b in B_"SELECT"^"in"\$, outputs a value \$v\$, where \$p_0, (p || b) |-- e -> v\$. PartiQL expressions \$e\$ will typically be tuple or array or bag constructors, which enable the construction of respective results. In general \$e\$ can be any expression.

Example 1.

This example illustrates a SELECT VALUE that creates a collection of numbers.

SELECT VALUE 2*x.a
FROM [{'a':1}, {'a':2}, {'a':3}] as x

The result is

<< 2, 4, 6 >>

Tuple Constructors

A tuple constructor is of the form

{a1:e1, ..., an:en}

whereas \$a_1...a_n, e_1...e_n\$ are expressions, potentially being themselves constructors.

Example 2.

The query:

SELECT VALUE {'a':v.a, 'b':v.b}
FROM [{'a':1, 'b':1}, {'a':2, 'b':2}] AS v

results into

<<{'a':1, 'b':1}, {'a':2, 'b':2}>>

Mistyped Attribute Names

It is possible that an expression \$a_i\$ that computes an attribute name results into a non-string, i.e., a value that is not a legitimate attribute name. In such cases, under the permissive mode the attribute-value pair will be dismissed. Under the type checking mode the query will fail.

Example 3.

In the permissive mode, the query:

SELECT VALUE {v.a: v.b}
FROM [{'a':'legit', 'b':1}, {'a':400, 'b':2}] AS v

results in

<<{'legit':1}, {}>>

Notice that the attempt to create an attribute named 400 failed, thus leading to a tuple with no attributes.

Duplicate Attribute Names

It is possible that the constructed tuples contain twice or more the same attribute name.

Example 4.

The query:

SELECT VALUE {v.a: v.b,  v.c: v.d}
FROM [{'a':'same', 'b':1, 'c':'same', 'd':2}] AS v

results into

<<{'same':1, 'same':2}>>

Recall, a same path will only pick one of the two values.

Array Constructors

An array constructor has the form:

[e1, ..., en]

where \$e_1...e_n\$ are expressions. Notice that the arrays produced by such constructor will always have size \$n+1\$.

Example 5.

The query:

SELECT VALUE [v.a, v.b]
FROM [{'a':1, 'b':1}, {'a':2, 'b':2}] AS V

results in

<<[1, 1], [2, 2]>>

In the interest of compatibility to SQL, PartiQL also allows array constructors to be denoted with parentheses instead of brackets, when there are at least two elements in the array, i.e., \$(n+1) >= 2\$:

(e1, ..., en)

See Subuquery to Array Coercion for uses of this feature in SQL compatibility.

Bag Constructors

A bag constructor has the form:

<<e1, ..., en>>

where \$e_1..e_n\$ are expressions.

Example 6.

The query:

SELECT VALUE <<v.a, v.b>>
FROM [{'a':1, 'b':1}, {'a':2, 'b':2}] AS v

results into

<< <<1, 1>>, <<2, 2>> >>

MISSING in Constructors

MISSING may behave differently from NULL and differently from scalars. The following itemizes the behavior of MISSING in a number of cases:

Constructing Tuples

Whenever during tuple construction an attribute value evaluates to MISSING, then the particular attribute/value is omitted from the constructed tuple.

Example 7.

The query

SELECT VALUE {'a':v.a, 'b':v.b}
FROM [{'a':1, 'b':1}, {'a':2}]

results into

<<{'a':1, 'b':1}, {'a':2}>>

Constructing Arrays

Whenever an array element evaluates to MISSING, the resulting array will contain a MISSING.

Example 8.

The query

SELECT VALUE [v.a, v.b]
FROM [{'a':1, 'b':1}, {'a':2}]

results in

<<[1, 1], [2, MISSING]>>

Upon output serialization the will convert to the symbol that the serialization has chosen for serializing .

Constructing Bags

Whenever an element of a bag evaluates to MISSING, the resulting bag will contain a corresponding MISSING.

Example 9.

The query

SELECT VALUE v.b
FROM [{'a':1, 'b':1}, {'a':2}]

results in

<<1, MISSING>>

because {'a':2}.b evaluated to MISSING.

Example 10.

The query

SELECT VALUE <<v.a, v.b>>
FROM [{'a':1, 'b':1}, {'a':2}]

results in

<< <<1, 1>>, <<2, MISSING>> >>

SQL SELECT

SQL SELECT can be thought of a syntactic sugar for SELECT VALUE.

SELECT X, Y, Z

The SQL syntax:

SELECT e1 AS a1, ..., en AS an

is syntactic sugar for:

SELECT VALUE {'a1':e1, ...., 'an':en}

whereas if the attribute name \$a_i\$ is written as an identifier (e.g., a or "a") it is replaced by a single-quoted form \$a_1'\$ (e.g., 'a').

When the expression \$e_i\$ is of the form \$e_i`.n\$ (i.e. a path that navigates into tuple attribute \$n\$), PartiQL follows SQL in allowing the attribute name to be optional. In this case,

SELECT ...ei.n...

is equivalent to

SELECT ...ei.n AS n...

In the case that the expression \$e_i\$ is not of the form \$e_i`.n\$ the clause:

SELECT ...ei...

is equivalent to

SELECT ...e_i AS a_i...

where \$a_i\$ is a system-generated name. SQL and PartiQL do not provide a standard convention.

SELECT *

Consider a query whose FROM defines a variable x that has no schema and the SELECT clause includes at least one x.*. Let us first consider the simpler case where the SELECT clause is a single item x. Then the clause

SELECT x.*

reduces to

SELECT VALUE CASE WHEN NOT x IS TUPLE THEN {'_1': x} ELSE x END

Notice that PartiQL extends the .* to also operate on x bindings that are not tuples. These are converted to singleton tuples with a synthetic name.

Example 11.

The query

SELECT x.*
FROM [{'a':1, 'b':1}, {'a':2}, 'foo'] AS x

results into

<< {'a':1, 'b':1}, {'a':2}, {'_1':'foo'} >>

Notice that the input has a non-tuple that was converted to a tuple with a synthetic attribute name _1, this is because the result of a traditional SELECT is always a container of tuples.

We generalize the semantics of a SELECT list, where at least one of the items is a .* item, we use the function TUPLEUNION. When all of \$t_1, t_2,...,t_n\$ are tuples \$"TUPLEUNION"(t_1, t_2,...t_n)\$ outputs a tuple \$t\$ such that for each attribute name/value pair \$n:v\$ of any \$t_i\$, the tuple \$t\$ has a respective \$n:v\$. Notice the possibility that the output \$t\$ has duplicate attribute names because either (i) two different inputs \$t_i\$ and \$t_j\$ had the same attribute name, or (ii) because an input \$t_i\$ already had a duplicate attribute name.

Using TUPLEUNION, we rewrite the SELECT clause as illustrated by the following example, which has two .* items and one conventional item. The generalization to more items, of either kind should be obvious. Notice that if \$v_1\$ (resp. \$v_3\$) is bound to a non-tuple value \$v\$, then it is treated as if it were the tuple \${"'_1'":v_1}\$ (resp. \${"'_2'":v_3}\$.

SELECT v1.*, e2 AS a, v3.*

is equivalent to

SELECT VALUE TUPLEUNION(
       CASE WHEN v1 IS TUPLE THEN v1 ELSE {'_1': v1} END,
       {'a':e2 },
       CASE WHEN v3 IS TUPLE THEN v3 ELSE {'_2': v3} END
)

Notice that the attribute names '_1', '_2' have been invented.

PIVOT

The PIVOT clause may appear in lieu of SELECT VALUE. The PIVOT clause outputs a tuple; in contrast, a SELECT VALUE outputs a collection (bag or array). The syntax is

PIVOT e_v AT e_a
...

where the other clauses, …, are the usual FROM, WHERE, etc. The semantics are similar to SELECT VALUE. Let \$p_0\$ and \$p\$ be the environments of the SFW query. For each input binding tuple \$b in B_"PIVOT"^"in"\$ PIVOT, outputs an attribute name/value pair \$a,v\$, where the name \$a\$ is the result of \$e_a\$ and the value \$v\$ is the result of \$e_v\$. (Technically, \$p_0, (p || b) |-- e_a |-> a\$ and \$p_0, (p || b) |-- e_v |-> v\$.) Regardless of whether \$B_"PIVOT"^"in"\$ is a bag (i.e., the SFW query did not have an ) or an array (i.e., the SFW query had an ORDER BY), the output tuple is unordered. Schema may be applied extantly to obtain an ordered tuple.

Example 12.

The query:

PIVOT t.price AT t.symbol
FROM [{'symbol':'tdc', 'price': 31.52}, {'symbol': 'amzn', 'price': 840.05}] AS t

results in the tuple

{'tdc':31.52, 'amzn':840.05}

The treatment of MISSING is same to the treatment of MISSING by SELECT VALUE (Tuple Constructors). Namely, whenever an attribute name or attribute value evaluates to MISSING, the corresponding attribute name/value pair will not appear in the tuple.

Example 13.

The query

PIVOT t.price AT t.symbol
FROM [{'symbol':25, 'price':31.52}, {'symbol':'amzn', 'price':840.05}] AS t

results in the tuple

{'amzn': 840.05}

since 25 is not a legitimate attribute name.

Examples

Example 14.

A SFW subquery may appear in the clause of a query, enabling the creation of nested results.

Consider the database

sensors :  [
  {'sensor':1},
  {'sensor':2}
]
logs:  [
  {'sensor':1, 'co':0.4},
  {'sensor':1, 'co':0.2},
  {'sensor':2, 'co':0.3}
]

The query

SELECT VALUE {
  'sensor':  s.sensor,
  'readings': (SELECT VALUE l.co FROM logs AS l WHERE l.sensor = s.sensor)
}
FROM sensors AS s

results into

<<
  {'sensor':1, 'readings':<<0.4, 0.2>>},
  {'sensor':2, 'readings':<<0.3>>}
>>

Notice that each tuple of the result has a nested array, which has been created by the inner SELECT VALUE.

The query could also have been written using SELECT (instead of SELECT VALUE) for the outer query, as follows:

SELECT s.sensor AS sensor,
       ( SELECT VALUE l.co
         FROM logs AS l
         WHERE l.sensor = s.sensor) AS readings
FROM sensors AS s

Furthermore, the AS sensor could be ommitted (as in SQL).

Example 15.

This example shows how the combined action of UNPIVOT and PIVOT enables to analyze the attribute names. Consider the following database that has a sequence of measurements of various gases.

sensors : [
  {'no2':0.6, 'co':0.7, 'co2':0.5},
  {'no2':0.5, 'co':0.4, 'co2':1.3}
]

The following query keeps only the carbon oxides. ^[1]

SELECT VALUE (PIVOT v AT g
              FROM UNPIVOT r AS v AT g
              WHERE g LIKE 'co%')
FROM sensors AS r

The result is

[
  {'co':0.7, 'co2':0.5},
  {'co':0.4, 'co2':1.3}
]

Intuitively, the UNPIVOT turns every instance of the tuple t into a collection. The WHERE filters the collections. The PIVOT pivots the filtered collections back into tuples.

1. The query author is pretty weak in chemistry and cannot enumerate the carbon oxides explicitly in their query.