Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Database Overview

Beamline provides local data and schema generation capability. This allows to create a local copy of the generated data in a local catalog directory.

What is BeamlineLite?

BeamlineLite is Beamline’s local database generation capability that creates filesystem-based databases containing:

  • Generated data in Ion format
  • Inferred schemas in both Ion and SQL DDL formats
  • Metadata about generation parameters
  • Original scripts for reproducibility

Database vs Data Generation

Data Generation (gen data)

beamline gen data \
  --seed 42 \
  --start-auto \
  --script-path sensors.ion \
  --sample-count 1000 \
  --output-format ion-pretty

Output: Stream of data records to stdout or file

Use cases: Data processing pipelines, API testing, analysis

Database Generation (gen db)

beamline gen db beamline-lite \
  --seed 42 \
  --start-auto \
  --script-path sensors.ion \
  --sample-count 1000

Output: Complete database directory with data + schemas

Use cases: Local development databases, testing environments, demos

BeamlineLite Database Structure

Catalog Directory Layout

A BeamlineLite database creates a catalog directory with this structure:

beamline-catalog/
├── .beamline-manifest          # Generation metadata (JSON)
├── .beamline-script           # Original Ion script
├── <dataset>.ion              # Data files (one per dataset)
├── <dataset>.shape.ion        # Ion format schemas
└── <dataset>.shape.sql        # SQL DDL schemas

Real Example from client-service.ion

$ beamline gen db beamline-lite \
    --seed-auto \
    --start-auto \
    --script-path client-service.ion \
    --sample-count 1000

writing manifest file ./beamline-catalog/.beamline-manifest ...[COMPLETED]
writing script file ./beamline-catalog/.beamline-script ...[COMPLETED]
writing shape file(s)...[COMPLETED]
writing data file(s)...[COMPLETED]
done!

$ tree beamline-catalog/
beamline-catalog/
├── .beamline-manifest
├── .beamline-script
├── service.ion
├── service.shape.ion
├── service.shape.sql
├── client_0.ion
├── client_0.shape.ion
├── client_0.shape.sql
├── client_1.ion
├── client_1.shape.ion  
├── client_1.shape.sql
└── ... (more client datasets)

Database Files Deep Dive

Manifest File (.beamline-manifest)

Contains generation metadata in JSON format:

$ cat beamline-catalog/.beamline-manifest
{"seed": "949665520117506306", "start": "2023-02-06T12:52:29.000000000Z", "ddl_syntax.version": "partiql_datatype_syntax.0.1"}

Contents:

  • seed: Random seed used for generation (for reproducibility)
  • start: Simulation start timestamp
  • ddl_syntax.version: SQL DDL syntax version used in .shape.sql files

Script File (.beamline-script)

Preserved copy of the original Ion script:

$ cat beamline-catalog/.beamline-script
rand_processes::{
    // generate between 5 & 20 customers
    $n: UniformU8::{ low: 5, high: 20 },
    
    // A generator for client ids
    $id_gen: UUID,
    
    // ... rest of original script
}

Purpose:

  • Reproducibility: Regenerate identical database later
  • Documentation: What script created this database
  • Version control: Track script changes over time

Data Files (dataset.ion)

Contains generated data in compact Ion format:

$ cat beamline-catalog/client_0.ion
{success: true, id: "7dbd12cf-b506-22ad-2d81-b0a1cd259697", request_id: "0de35d1e-a87c-e540-734d-6f2a4fa410c3", request_time: 2021-01-05T03:55:01.035000000+00:00}
{success: true, id: "7dbd12cf-b506-22ad-2d81-b0a1cd259697", request_id: "3539cdf0-6f7e-6bdc-c25a-4e0b7d8f8bac", request_time: 2021-01-05T03:55:01.182000000+00:00}

Characteristics:

  • One record per line: Newline-delimited Ion format
  • Complete type information: All Ion types preserved
  • Temporal ordering: Records ordered by generation time

Schema Files (dataset.shape.ion)

Ion format schema definitions:

$ cat beamline-catalog/client_0.shape.ion
{
  type: "bag",
  items: {
    type: "struct",
    constraints: [ordered, closed],
    fields: [
      { name: "id", type: "string" },
      { name: "request_id", type: "string" },
      { name: "request_time", type: "datetime" },
      { name: "success", type: "bool" }
    ]
  }
}

Use cases:

  • PartiQL validation: Validate queries against schema
  • Type checking: Ensure data types match expectations
  • Tool integration: Tools can use schema information

Schema Files (dataset.shape.sql)

SQL DDL format schemas:

$ cat beamline-catalog/service.shape.sql
"Account" VARCHAR,
"Distance" DECIMAL(2, 0),
"Operation" VARCHAR,
"Program" VARCHAR,
"Request" VARCHAR,
"StartTime" TIMESTAMP,
"Weight" DECIMAL(5, 4),
"client" VARCHAR,
"success" BOOL

Use cases:

  • Database creation: Create tables in SQL databases
  • Schema documentation: Human-readable schema reference
  • Migration scripts: Database schema evolution