SELECT Clauses
Core PartiQL SFW queries have a SELECT VALUE
clause (in lieu of
SQL’s SELECT
clause) that can create outputs that are collections of
anything (e.g., collections of tuples, collections of scalars,
collections of arrays, collections of mixed type elements, etc.)
SELECT VALUE describes the SELECT VALUE
clause.
SQL’s well-known SELECT
clause can be used as a mere syntactic sugar
over SELECT VALUE
, when we consider the top-level query. In
particular, SQL SELECT shows that SQL’s SELECT
is the
special case where the SELECT VALUE
produces collections of
tuples. Furthermore, when SELECT
is used as a subquery it is coerced
into a scalar or a tuple, in the ways that SQL coerces the results of
subqueries.
PIVOT can be used instead of SELECT
VALUE
. PIVOT
creates a tuple, with a data dependent number of
attribute/value pairs, where not only the values but the attributes as
well could be originating from the data found in the binding tuples.
SELECT VALUE
The SELECT VALUE
clause inputs a bag of binding tuples or an array
of binding tuples (from the other clauses of the SQL query) and
outputs a bag or an array. For example, if the query only has SELECT
VALUE
, FROM
, and WHERE
clauses, then the bindings that are output
by the WHERE
clause are input by SELECT VALUE
the clause. Unlike
SQL, the output of a SELECT VALUE
clause need not be a bag or array
of tuples. It is a bag or array of any kind of PartiQL values. For
example, it may be a bag of integers, or a bag of arrays, etc. Indeed,
the values may be heterogeneous. For example, the output may even be a
bag that has both integers and arrays.
The core PartiQL clause:
SELECT VALUE e
inputs a bag or an array (depending on the presence or non-presence of
ORDER BY
) of binding tuples and outputs respectively a bag or an
array of values. Let \$p_0\$ and \$p\$ be the environments of
the SFW query. For each input binding tuple \$b in
B_"SELECT"^"in"\$, outputs a value \$v\$, where \$p_0, (p ||
b) |-- e -> v\$. PartiQL expressions \$e\$ will typically be tuple or array or
bag constructors, which enable the construction of respective results. In general \$e\$ can be any expression.
This example illustrates a SELECT VALUE
that creates a collection of numbers.
SELECT VALUE 2*x.a
FROM [{'a':1}, {'a':2}, {'a':3}] as x
The result is
<< 2, 4, 6 >>
Tuple Constructors
A tuple constructor is of the form
{a1:e1, ..., an:en}
whereas \$a_1...a_n, e_1...e_n\$ are expressions, potentially being themselves constructors.
The query:
SELECT VALUE {'a':v.a, 'b':v.b}
FROM [{'a':1, 'b':1}, {'a':2, 'b':2}] AS v
results into
<<{'a':1, 'b':1}, {'a':2, 'b':2}>>
Mistyped Attribute Names
It is possible that an expression \$a_i\$ that computes an attribute name results into a non-string, i.e., a value that is not a legitimate attribute name. In such cases, under the permissive mode the attribute-value pair will be dismissed. Under the type checking mode the query will fail.
In the permissive mode, the query:
SELECT VALUE {v.a: v.b}
FROM [{'a':'legit', 'b':1}, {'a':400, 'b':2}] AS v
results in
<<{'legit':1}, {}>>
Notice that the attempt to create an
attribute named 400
failed, thus leading to a tuple with no attributes.
Duplicate Attribute Names
It is possible that the constructed tuples contain twice or more the same attribute name.
The query:
SELECT VALUE {v.a: v.b, v.c: v.d}
FROM [{'a':'same', 'b':1, 'c':'same', 'd':2}] AS v
results into
<<{'same':1, 'same':2}>>
Recall, a same
path will only pick one of the two values.
Array Constructors
An array constructor has the form:
[e1, ..., en]
where \$e_1...e_n\$ are expressions. Notice that the arrays produced by such constructor will always have size \$n+1\$.
The query:
SELECT VALUE [v.a, v.b]
FROM [{'a':1, 'b':1}, {'a':2, 'b':2}] AS V
results in
<<[1, 1], [2, 2]>>
In the interest of compatibility to SQL, PartiQL also allows array constructors to be denoted with parentheses instead of brackets, when there are at least two elements in the array, i.e., \$(n+1) >= 2\$:
(e1, ..., en)
See Subuquery to Array Coercion for uses of this feature in SQL compatibility.
Bag Constructors
A bag constructor has the form:
<<e1, ..., en>>
where \$e_1..e_n\$ are expressions.
The query:
SELECT VALUE <<v.a, v.b>>
FROM [{'a':1, 'b':1}, {'a':2, 'b':2}] AS v
results into
<< <<1, 1>>, <<2, 2>> >>
MISSING in Constructors
MISSING
may behave differently from NULL
and differently from scalars. The following
itemizes the behavior of MISSING
in a number of cases:
Constructing Tuples
Whenever during tuple construction an attribute value evaluates to
MISSING
, then the particular attribute/value is omitted from the
constructed tuple.
The query
SELECT VALUE {'a':v.a, 'b':v.b}
FROM [{'a':1, 'b':1}, {'a':2}]
results into
<<{'a':1, 'b':1}, {'a':2}>>
Constructing Arrays
Whenever an array element evaluates to MISSING
, the resulting array
will contain a MISSING
.
The query
SELECT VALUE [v.a, v.b]
FROM [{'a':1, 'b':1}, {'a':2}]
results in
<<[1, 1], [2, MISSING]>>
Upon output serialization the will convert to the symbol that the serialization has chosen for serializing .
Constructing Bags
Whenever an element of a bag evaluates to MISSING
, the resulting bag
will contain a corresponding MISSING
.
The query
SELECT VALUE v.b
FROM [{'a':1, 'b':1}, {'a':2}]
results in
<<1, MISSING>>
because {'a':2}.b
evaluated to MISSING
.
The query
SELECT VALUE <<v.a, v.b>>
FROM [{'a':1, 'b':1}, {'a':2}]
results in
<< <<1, 1>>, <<2, MISSING>> >>
SQL SELECT
SQL SELECT can be thought of a syntactic sugar for SELECT VALUE.
SELECT X, Y, Z
The SQL syntax:
SELECT e1 AS a1, ..., en AS an
is syntactic sugar for:
SELECT VALUE {'a1':e1, ...., 'an':en}
whereas if the attribute name \$a_i\$ is written as an identifier
(e.g., a
or "a"
) it is replaced by a single-quoted form
\$a_1'\$ (e.g., 'a'
).
When the expression \$e_i\$ is of the form \$e_i`.n\$ (i.e. a path that navigates into tuple attribute \$n\$), PartiQL follows SQL in allowing the attribute name to be optional. In this case,
SELECT ...ei.n...
is equivalent to
SELECT ...ei.n AS n...
In the case that the expression \$e_i\$ is not of the form \$e_i`.n\$ the clause:
SELECT ...ei...
is equivalent to
SELECT ...e_i AS a_i...
where \$a_i\$ is a system-generated name. SQL and PartiQL do not provide a standard convention.
SELECT *
Consider a query whose FROM
defines a variable x
that has no
schema and the SELECT
clause includes at least one x.*
. Let us
first consider the simpler case where the SELECT
clause is a single
item x
. Then the clause
SELECT x.*
reduces to
SELECT VALUE CASE WHEN NOT x IS TUPLE THEN {'_1': x} ELSE x END
Notice that PartiQL extends the .*
to also operate on x
bindings
that are not tuples. These are converted to singleton tuples with a
synthetic name.
The query
SELECT x.*
FROM [{'a':1, 'b':1}, {'a':2}, 'foo'] AS x
results into
<< {'a':1, 'b':1}, {'a':2}, {'_1':'foo'} >>
Notice that the input has a non-tuple that was converted to a tuple
with a synthetic attribute name _1
, this is because the result of a
traditional SELECT
is always a container of tuples.
We generalize the semantics of a SELECT
list, where at least one of
the items is a .*
item, we use the function TUPLEUNION
. When all
of \$t_1, t_2,...,t_n\$ are tuples \$"TUPLEUNION"(t_1,
t_2,...t_n)\$ outputs a tuple \$t\$ such that for each attribute
name/value pair \$n:v\$ of any \$t_i\$, the tuple \$t\$ has a
respective \$n:v\$. Notice the possibility that the output \$t\$
has duplicate attribute names because either (i) two different inputs
\$t_i\$ and \$t_j\$ had the same attribute name, or (ii) because
an input \$t_i\$ already had a duplicate attribute name.
Using TUPLEUNION
, we rewrite the SELECT
clause as illustrated by
the following example, which has two .*
items and one conventional
item. The generalization to more items, of either kind should be
obvious. Notice that if \$v_1\$ (resp. \$v_3\$) is bound to a
non-tuple value \$v\$, then it is treated as if it were the tuple
\${"'_1'":v_1}\$ (resp. \${"'_2'":v_3}\$.
SELECT v1.*, e2 AS a, v3.*
is equivalent to
SELECT VALUE TUPLEUNION(
CASE WHEN v1 IS TUPLE THEN v1 ELSE {'_1': v1} END,
{'a':e2 },
CASE WHEN v3 IS TUPLE THEN v3 ELSE {'_2': v3} END
)
Notice that the attribute names '_1'
, '_2'
have been invented.
PIVOT
The PIVOT
clause may appear in lieu of SELECT VALUE
. The PIVOT
clause outputs a tuple; in
contrast, a SELECT VALUE
outputs a collection (bag or array). The syntax is
PIVOT e_v AT e_a
...
where the other clauses, …
, are the usual FROM
,
WHERE
, etc. The semantics are similar to SELECT VALUE
. Let
\$p_0\$ and \$p\$ be the environments of the SFW query. For each
input binding tuple \$b in B_"PIVOT"^"in"\$ PIVOT
, outputs an
attribute name/value pair \$a,v\$, where the name \$a\$ is the
result of \$e_a\$ and the value \$v\$ is the result of
\$e_v\$. (Technically, \$p_0, (p || b) |-- e_a |-> a\$ and
\$p_0, (p || b) |-- e_v |-> v\$.) Regardless of whether
\$B_"PIVOT"^"in"\$ is a bag (i.e., the SFW query did not have an )
or an array (i.e., the SFW query had an ORDER BY
), the output tuple
is unordered. Schema may be applied extantly to obtain an ordered
tuple.
The query:
PIVOT t.price AT t.symbol
FROM [{'symbol':'tdc', 'price': 31.52}, {'symbol': 'amzn', 'price': 840.05}] AS t
results in the tuple
{'tdc':31.52, 'amzn':840.05}
The treatment of MISSING
is same to the treatment of MISSING
by
SELECT VALUE
(Tuple Constructors). Namely, whenever an
attribute name or attribute value evaluates to MISSING
, the
corresponding attribute name/value pair will not appear in the tuple.
The query
PIVOT t.price AT t.symbol
FROM [{'symbol':25, 'price':31.52}, {'symbol':'amzn', 'price':840.05}] AS t
results in the tuple
{'amzn': 840.05}
since 25
is not a legitimate attribute name.
Examples
A SFW subquery may appear in the clause of a query, enabling the creation of nested results.
Consider the database
sensors : [
{'sensor':1},
{'sensor':2}
]
logs: [
{'sensor':1, 'co':0.4},
{'sensor':1, 'co':0.2},
{'sensor':2, 'co':0.3}
]
The query
SELECT VALUE {
'sensor': s.sensor,
'readings': (SELECT VALUE l.co FROM logs AS l WHERE l.sensor = s.sensor)
}
FROM sensors AS s
results into
<<
{'sensor':1, 'readings':<<0.4, 0.2>>},
{'sensor':2, 'readings':<<0.3>>}
>>
Notice that each tuple of the result has a nested array, which has been
created by the inner SELECT VALUE
.
The query could also have been written using SELECT
(instead of SELECT VALUE
) for the outer
query, as follows:
SELECT s.sensor AS sensor,
( SELECT VALUE l.co
FROM logs AS l
WHERE l.sensor = s.sensor) AS readings
FROM sensors AS s
Furthermore, the AS sensor
could be ommitted (as in SQL).
This example shows how the combined action of UNPIVOT
and PIVOT
enables to analyze the
attribute names. Consider the following database that has a sequence of
measurements of various gases.
sensors : [
{'no2':0.6, 'co':0.7, 'co2':0.5},
{'no2':0.5, 'co':0.4, 'co2':1.3}
]
The following query keeps only the carbon oxides. [1]
SELECT VALUE (PIVOT v AT g
FROM UNPIVOT r AS v AT g
WHERE g LIKE 'co%')
FROM sensors AS r
The result is
[
{'co':0.7, 'co2':0.5},
{'co':0.4, 'co2':1.3}
]
Intuitively, the UNPIVOT
turns every instance of the tuple t
into
a collection. The WHERE
filters the collections. The PIVOT
pivots
the filtered collections back into tuples.