Using Datum
If there are inaccuracies discovered with this documentation, please submit a GitHub issue. |
Who is this for?
This usage guide is aimed at developers who want to know how to use the Datum API. If you are reading the output of a PartiQL statement, implementing your own tables (see Implementing a Table), or writing your own scalar and aggregate functions (see Implementing Scalar Functions), you’ll want to learn how to use Datum.
Prerequisites
To get the APIs discussed in this usage guide, please take a dependency on the SPI package.
dependencies {
implementation("org.partiql:partiql-spi:1.+")
}
What is Datum?
Datum is a representation of a runtime value.
Type References (Gradual Typing)
PartiQL is a gradually-typed query language — allowing for both static and dynamic resolution of operations. With this flexibility, to guarantee that the dynamic nature of our execution behaves the same as our static counterpart, our runtime values must hold a reference to their associated types.
Lazy Materialization (Large, Semi-Structured Data)
PartiQL treats semi-structured data, such as collections and tuples, as first-class types.
As such, the data output of a particular scalar expression may not fit into memory, therefore, Datum
may be lazily materialized.
Consumers of the Datum
API should handle this to avoid inaccurate or excessive computation.
If you execute a PartiQL query, the returned datum’s underlying values will likely be lazily materialized. Due to this, it is important to:
-
Invoke Datum’s APIs in an orderly fashion.
-
Memoize the data (if memory/storage constraints permit), especially when the intention is to reference the value more than once.
Usage Guides
Please see Basic Usage, Orderly Invocation of Datum’s APIs, and Datum Memoization to learn how to properly handle Datum’s APIs.
Basic Usage
Datum has several methods including getType()
, isNull()
, isMissing()
, and many getters (i.e. getInteger()
, getBoolean()
, etc.).
Unless otherwise indicated (see Usage Caveats), in order to safely grab the underlying data that is backing the Datum, you must check the outputs of getType()
, isNull()
, and isMissing()
.
-
isNull()
andisMissing()
are used to indicate whether the data is null or missing, respectively. -
getType()
will dictate which available getter can be invoked. Attempting to invoke a getter that is not associated with the corresponding type will produce a runtime exception. To find out which getter is associated with each type, please see the Javadocs. For example, given a PartiQL value of the integer type, you may invokegetInt()
(see the getInt() Javadocs). If the value is null or missing, the getter associated with the type will produce a runtime exception.
Below is an example of how you may grab the underlying data of a Datum.
final class DatumExample {
public String getDataAsString(@NotNull Datum value) {
// Retrieve the type information
PType type = value.getType();
int typeCode = type.code();
// Check if null
if (value.isNull()) {
return "null";
}
// Check if missing
if (value.isMissing()) {
return "missing";
}
// Check the type of the value
switch (typeCode) {
case PType.INT:
return value.getInteger();
case PType.BIGINT:
return value.getLong();
case PType.SMALLINT:
return value.getShort();
case PType.STRING:
return value.getString();
default:
throw new RuntimeException("This code doesn't handle the input type yet: " + type);
}
return "ionfs";
}
}
Orderly Invocation of Datum’s APIs
If the datum at hand is backed by a stateful process (such as a compiled PartiQL query), and the datum’s type is a semi-structured type (bag, array, struct, row), it is important to fully materialize its first element before materializing the second element (and so on, and so forth).
public fun materializeInOrder(d: Datum) {
val type = d.type
val typeCode = type.code()
when (typeCode) {
PType.ARRAY, PType.BAG -> materializeCollection(d)
PType.ROW, PType.STRUCT -> materializeStructure(d)
else -> materializeScalar(d)
}
}
public fun materializeCollection(d: Datum) {
d.forEach { element ->
materializeInOrder(element)
}
}
public fun materializeStructure(d: Datum) {
d.fields.forEach { field ->
materializeInOrder(field.value)
}
}
public fun materializeScalar(d: Datum) {
// Check if null/missing
if (d.isNull() || d.isMissing()) {
return
}
// Materialize the scalar values
val type = d.type
val typeCode = type.code()
when (typeCode) {
PType.INT -> type.integer;
PType.SMALLINT -> type.short;
// the rest of the scalar types
else -> error("This doesn't support values of type: $type")
}
}
Datum Memoization
If you intend on iterating over a Datum
multiple times or accessing its elements out of order, then it is strongly recommended to memoize the data into structures that your application controls.
Note that the structures you choose may be in-memory, on disk, or over the wire.
We will modify the above code snippet to instead return a Datum
, utilizing in-memory data structures.
public fun materializeInOrder(d: Datum): Datum {
val type = d.type
val typeCode = type.code()
return when (typeCode) {
PType.BAG -> materializeBag(d)
PType.ARRAY -> materializeArray(d) // Not implemented in this file, for brevity
PType.STRUCT -> materializeStructure(d)
PType.ROW -> materializeRow(d) // Not implemented in this file, for brevity
else -> materializeScalar(d)
}
}
// Back the bag's elements with a List
public fun materializeBag(d: Datum): Datum {
val elements: List<Datum> = d.map { element ->
materializeInOrder(element)
}
return Datum.bag(elements)
}
// Back the struct's fields with a List
public fun materializeStruct(d: Datum): Datum {
val fields: List<PTypeField> = d.fields.map { field ->
val newValue: Datum = materializeInOrder(field.value)
PTypeField.of(field.key, newValue)
}
return Datum.struct(fields)
}
public fun materializeScalar(d: Datum): Datum {
// Check if null
if (d.isNull()) {
return Datum.null(d.type)
}
// Check if missing
if (d.isMissing()) {
return Datum.missing(d.type)
}
// Memoize the scalar values
val type = d.type
val typeCode = type.code()
when (typeCode) {
PType.INT -> Datum.integer(type.integer);
PType.SMALLINT -> Datum.smallint(type.short);
// the rest of the scalar types
else -> error("This doesn't support values of type: $type")
}
}
Now, with the Datum returned by materializeInOrder()
, you can ensure that the data is backed by your own provided in-memory data structures — and not backed by the PartiQL engine.
This will avoid re-evaluation of PartiQL queries when calling the Datum’s APIs.
Usage Caveats
There are times when invoking isNull()
, isMissing()
, and getType()
are redundant, specifically:
-
When the datum is an argument to a scalar function whose signature’s parameters have a specified type and have handled null and missing values.
In this case, the compiler has provided enough compile-time and runtime checks to ensure that the implementations of the scalar functions will never receive an unexpected value. Therefore, invoking any of the aforementioned APIs will lead to redundant computation.
Please see the Implementing Scalar Functions usage guide for more information.