Core Concepts

Before diving deeper into Beamline’s advanced features, it’s essential to understand the fundamental concepts that power its data generation capabilities. This chapter will introduce you to the mathematical and computational foundations that make Beamline both powerful and reliable.

Stochastic Processes

At the heart of Beamline lies the concept of stochastic processes — mathematical models that describe systems appearing to vary randomly over time.

What is a Stochastic Process?

A stochastic process is a collection of random variables indexed by time or space. In simpler terms, it is a way to model how things change randomly over time while still following certain patterns or rules.

Real-world examples:

Stock prices over time
Sensor readings from IoT devices
User activity on a website
Network traffic patterns
Temperature measurements

Why Stochastic Processes Matter

Traditional random data generators often produce data that looks random but lacks the realistic patterns found in real-world data. Stochastic processes allow Beamline to:

Model Temporal Relationships: Data points aren’t just random — they follow realistic time-based patterns
Create Correlations: Different data elements can be related in meaningful ways
Simulate Real Patterns: Generate data that behaves like real-world systems
Maintain Consistency: Ensure generated data follows logical rules and constraints

Example: Sensor Data

Consider a temperature sensor:

Simple Random: Each reading is completely independent
Stochastic Process: Readings follow realistic patterns (gradual changes, daily cycles, seasonal trends)

// Simple random (unrealistic)
temperature: UniformF64::{ low: -10.0, high: 40.0 }

// Stochastic process (realistic)
temperature: NormalF64::{ mean: 22.0, std_dev: 5.0 }

Random Processes in Beamline

Beamline implements stochastic processes through random processes defined in scripts in Amazon Ion Format.

Anatomy of a Random Process

rand_process::{
    $arrival: HomogeneousPoisson:: { interarrival: minutes::5 },
    $data: {
        // Data structure definition
    }
}

Every random process has two key components:

Arrival Process ($arrival): Defines the statistical pattern of new data arrivals, i.e., when the data arrives
Data Structure ($data): Defines what data is generated

Arrival Processes

Arrival processes control the timing of data generation. Beamline only supports Homogeneous Poisson Process at the moment:

Homogeneous Poisson Process

The most common arrival process, modeling events that occur at a constant average rate:

$arrival: HomogeneousPoisson:: { interarrival: minutes::5 }

Characteristics:

Events occur independently
Average rate is constant over time
Time between events follows an exponential distribution
Models many real-world phenomena (customer arrivals, system events, etc.)

Use cases:

Web server requests
Sensor readings
User logins
System alerts

Time Units

Beamline supports various time units for arrival processes:

// Different time units
seconds::30      // 30 seconds
minutes::5       // 5 minutes  
hours::2         // 2 hours
days::1          // 1 day
milliseconds::100 // 100 milliseconds

// Discrete uniform (integers)
age: UniformU8::{ low: 18, high: 65 }

// Continuous uniform (floats)
temperature: UniformF64::{ low: 20.0, high: 30.0 }

// Uniform choice from literals
status: Uniform::{ choices: ["active", "inactive", "pending"] }

Use cases:

IDs, categories, discrete choices
Baseline random values
Testing edge cases

Normal (Gaussian) Distributions

Generate values that cluster around a mean with a bell-curve distribution:

height: NormalF64::{ mean: 170.0, std_dev: 10.0 }

Characteristics:

Most values near the mean
Symmetric distribution
Models many natural phenomena

Use cases:

Physical measurements (height, weight)
Performance metrics
Error values

Other Distributions

// Exponential (for modeling wait times)
response_time: ExpF64::{ rate: 0.1 }

// Log-normal (for modeling sizes, prices)
file_size: LogNormalF64::{ location: 10.0, scale: 1.0 }

// Weibull (for modeling lifetimes, reliability)
device_lifetime: WeibullF64::{ shape: 2.0, scale: 1000.0 }

Data Types

Beamline supports the following data types:

Scalar Types

// Numbers
integer_val: UniformI32::{ low: 1, high: 1000 }
float_val: UniformF64::{ low: 0.0, high: 1.0 }
decimal_val: UniformDecimal::{ low: 1.99, high: 999.99 }

// Text
name: LoremIpsumTitle
description: LoremIpsum::{ min_words: 10, max_words: 50 }
pattern_text: Regex::{ pattern: "[A-Z]{2}[0-9]{4}" }

// Boolean
active: Bool::{ p: 0.8 }  // 80% chance of true

// Temporal
created_at: Instant
birth_date: Date

// Identifiers
user_id: UUID

Complex Types

// Structures
user: {
    id: UUID,
    name: LoremIpsumTitle,
    age: UniformU8::{ low: 18, high: 65 },
    preferences: {
        theme: Uniform::{ choices: ["light", "dark"] },
        notifications: Bool::{ p: 0.7 }
    }
}

// Arrays
tags: UniformArray::{ 
    min_size: 1, 
    max_size: 5, 
    element_type: LoremIpsumTitle 
}

// Union types
value: UniformAnyOf::{ types: [
    UniformI32::{ low: 1, high: 100 },
    LoremIpsumTitle,
    Bool
]}

Variables and References

Beamline supports variables for creating relationships and reusing values:

Variable Definition

rand_processes::{
$n: UniformU8::{ low: 2, high: 10 },

    sensors: $n::[
        rand_process::{
            $r: Uniform::{ choices: [5,10] },
            $arrival: HomogeneousPoisson:: { interarrival: minutes::$r },
            $weight: UniformDecimal::{ nullable: 0.75, low: 1.995, high: 4.9999, optional: true },
            $anyof: UniformAnyOf::{ types: [Tick, UniformF64, UUID, UniformDecimal::{ low: 1.995, high: 4.9999, nullable: false }] },
            $array: UniformArray::{
                min_size: 3,
                max_size: 3,
                element_type: UniformDecimal::{ low: 0.5, high: 1.5 }
            },
            $data: {
                tick: Tick,
                i8: UniformI8,
                f: UniformF64,
                w: $weight,
                d: UniformDecimal::{ low: 0d0, high: 4.2d1, nullable: false },
                a: $anyof,
                ar1: $array,
                ar2: UniformArray::{ min_size: 2, max_size: 4, element_type: UUID },
                ar3: UniformArray::{ min_size: 2, max_size: 4, element_type: $weight },
                ar4: UniformArray::{ min_size: 2, max_size: 4, element_type: UniformI8::{ low: 2, high: 10 } },
                ar5: UniformArray::{ min_size: 1, max_size: 1, element_type: $anyof }
            }
        }
    ],
}

Variable Types

Generator Variables

Store data generators for reuse:

$temperature_sensor: NormalF64::{ mean: 22.0, std_dev: 3.0 }
$id_gen: UUID

Value Variables

Store computed values:

$success_rate: UniformF64::{ low: 0.95, high: 1.0 },
$is_successful: Bool::{ p: $success_rate }

Evaluation Control

Control when variables are evaluated:

// Evaluate once at script read time
$user_id: $id_gen::()

// Evaluate each time it's used
$request_id: $id_gen

Datasets and Collections

Beamline organizes generated data into datasets, which represent collections of related data.

Single Dataset

rand_processes::{
    sensors: rand_process::{
        $data: { /* sensor data */ }
    }
}

Multiple Datasets

rand_processes::{
    users: rand_process::{
        $data: { /* user data */ }
    },
    
    orders: rand_process::{
        $data: { /* order data */ }
    }
}

Dynamic Datasets

Create multiple related datasets:

rand_processes::{
    $n: UniformU8::{ low: 3, high: 8 },
    
    // Creates client_1, client_2, ..., client_n datasets
    clients: $n::[
        'client_{ $@n }': rand_process::{
            $data: {
                client_id: '$@n',
                // ... other fields
            }
        }
    ]
}

Reproducibility and Determinism

One of Beamline’s key strengths is its ability to generate reproducible data.

Seeds

Seeds control the random number generation:

# Same seed = same data
beamline gen data --seed 42 --start-auto --script-path my-script.ion
beamline gen data --seed 42 --start-auto --script-path my-script.ion  # Identical output

Timestamps

Control the simulation start time:

# Same timestamp = same temporal patterns
beamline gen data --seed 42 --start-iso "2024-01-01T00:00:00Z" --script-path my-script.ion

Deterministic Behavior

Beamline ensures that:

Same inputs always produce same outputs
Random sequences are predictable and reproducible
Debugging is possible with consistent data
Tests can be reliable and repeatable

Static vs. Dynamic Data

Beamline supports both static and dynamic data generation:

Dynamic Data (Default)

Generated during simulation with temporal patterns:

rand_process::{
    $arrival: HomogeneousPoisson:: { interarrival: minutes::5 },
    $data: {
        timestamp: Instant,
        value: UniformF64
    }
}

Static Data

Generated once at the beginning of simulation:

static_data::{
    $data: {
        id: UUID,
        created_at: Instant,  // Will be simulation start time
        config: LoremIpsum
    }
}

Use cases for static data:

Reference tables
Configuration data
Lookup tables
Master data

Summary

Understanding these core concepts is crucial for effectively using Beamline:

Stochastic Processes: Mathematical foundation for realistic data patterns
Random Processes: Implementation of stochastic processes in Beamline
Arrival Processes: Control timing of data generation
Data Generators: Create realistic values using probability distributions
Variables: Enable relationships and reuse in data generation
Datasets: Organize generated data into meaningful collections
Reproducibility: Ensure consistent, debuggable data generation
Static vs. Dynamic: Choose appropriate data generation patterns

In the next chapter, we’ll dive deeper into scripts and random processes, exploring how to create more sophisticated data generation patterns and relationships.

PartiQL Beamline

Core Concepts

Stochastic Processes

What is a Stochastic Process?

Why Stochastic Processes Matter

Example: Sensor Data

Random Processes in Beamline

Anatomy of a Random Process

Arrival Processes

Homogeneous Poisson Process

Time Units

Data Generators

Probability Distributions

Uniform Distributions

Normal (Gaussian) Distributions

Other Distributions

Data Types

Scalar Types

Complex Types

Variables and References

Variable Definition

Variable Types

Generator Variables

Value Variables

Evaluation Control

Datasets and Collections

Single Dataset

Multiple Datasets

Dynamic Datasets

Reproducibility and Determinism

Seeds

Timestamps

Deterministic Behavior

Static vs. Dynamic Data

Dynamic Data (Default)

Static Data

Summary

Keyboard shortcuts

PartiQL Beamline