Scripts and Random Processes
Beamline uses Ion-based scripts to define data generation configurations and stochastic processes to model how data arrives and evolves over time. This combination provides powerful, flexible control over synthetic data generation.
Ion Scripts Overview
What are Ion Scripts?
Ion scripts are configuration files written in Amazon Ion format that define:
- What data to generate (data types, structures, values)
- How data arrives (temporal patterns, frequencies)
- How data relates (cross-field dependencies, correlations)
- How much data (counts, durations, stopping conditions)
Basic Script Structure
Every Beamline script follows this structure:
rand_processes::{
// Variable definitions (optional)
$variable_name: GeneratorType::{ configuration },
// Dataset definitions (required)
dataset_name: rand_process::{
$arrival: ArrivalProcess::{ configuration },
$data: {
field_name: GeneratorType::{ configuration },
// ... more fields
}
}
}
Real Example from Test Suite
From sensors.ion test script:
rand_processes::{
$n: UniformU8::{ low: 2, high: 10 },
sensors: $n::[
rand_process::{
$r: Uniform::{ choices: [5,10] },
$arrival: HomogeneousPoisson:: { interarrival: minutes::$r },
$weight: UniformDecimal::{ nullable: 0.75, low: 1.995, high: 4.9999, optional: true },
$data: {
tick: Tick,
i8: UniformI8,
f: UniformF64,
w: $weight,
d: UniformDecimal::{ low: 0d0, high: 4.2d1, nullable: false }
}
}
]
}
Ion Format Benefits
Ion provides several advantages for configuration:
- Type Safety: Native support for numbers, strings, booleans, timestamps
- Comments: Document your configuration inline with
// - Annotations: Add type annotations like
minutes::$r - Nested Structures: Define complex object hierarchies naturally
- Variable References: Use
$variablefor reusable components
Stochastic Processes
What are Stochastic Processes?
Stochastic processes are mathematical models that describe how events occur over time in a seemingly random but statistically predictable way. In Beamline, they’re defined using the $arrival field in rand_process blocks.
Arrival Process Types
1. Homogeneous Poisson Process
Models events that occur at a constant average rate with random intervals:
rand_processes::{
sensor_readings: rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: minutes::5 },
$data: {
sensor_id: UUID,
reading: UniformF64::{ low: 0.0, high: 100.0 },
timestamp: Instant
}
}
}
Time Units:
milliseconds::N- N milliseconds between eventsseconds::N- N seconds between eventsminutes::N- N minutes between eventshours::N- N hours between eventsdays::N- N days between events
Use Cases:
- User logins to a website
- Network packet arrivals
- Customer service calls
- Sensor readings
2. Variable Arrival Rates
Use variables to create dynamic arrival patterns:
rand_processes::{
user_events: rand_process::{
$r: Uniform::{ choices: [2, 5, 10] }, // Variable rate
$arrival: HomogeneousPoisson:: { interarrival: minutes::$r },
$data: {
event_type: Uniform::{ choices: ["login", "logout", "action"] },
user_id: UUID
}
}
}
Data Generators
Basic Generator Types
From the actual implementation:
Numeric Generators
rand_processes::{
numeric_data: rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: seconds::1 },
$data: {
// Integer generators
small_int: UniformI8, // -127 to 127
medium_int: UniformI16::{ low: 100, high: 1000 }, // Custom range
large_int: UniformU32::{ low: 1, high: 1000000 }, // Unsigned
// Float generators
decimal_value: UniformDecimal::{ low: 1.99, high: 99.99 }, // Exact decimal
float_value: UniformF64::{ low: 0.0, high: 1.0 }, // Float
// Statistical distributions
normal_score: NormalF64::{ mean: 100.0, std_dev: 15.0 },
exponential_wait: ExpF64::{ rate: 0.1 },
weibull_lifetime: WeibullF64::{ shape: 2.0, scale: 1000.0 }
}
}
}
String Generators
rand_processes::{
text_data: rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: seconds::2 },
$data: {
// UUID generator
id: UUID,
// Lorem Ipsum text
description: LoremIpsum::{ min_words: 5, max_words: 20 },
title: LoremIpsumTitle, // 3-8 title-cased words
// Regular expressions
country_code: Regex::{ pattern: "[A-Z]{2}" },
phone: Regex::{ pattern: "[0-9]{3}-[0-9]{3}-[0-9]{4}" },
// Format strings with variables
formatted_name: Format::{ pattern: "User #{UUID}" }
}
}
}
System Generators
rand_processes::{
system_data: rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: seconds::1 },
$data: {
// System state generators
current_time: Instant, // Current simulation time
current_date: Date, // Current simulation date
event_tick: Tick, // Current tick counter
// Boolean generator
active: Bool, // 50% true by default
premium: Bool::{ p: 0.1 } // 10% true
}
}
}
Complex Type Generators
rand_processes::{
complex_data: rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: seconds::5 },
$data: {
// Array generator
measurements: UniformArray::{
min_size: 3,
max_size: 8,
element_type: UniformF64::{ low: 0.0, high: 100.0 }
},
// Union type generator (any of several types)
mixed_value: UniformAnyOf::{
types: [
UUID,
UniformI32::{ low: 1, high: 1000 },
LoremIpsumTitle
]
},
// Choice from literals
status: Uniform::{ choices: ["active", "inactive", "pending"] }
}
}
}
Advanced Script Features
Variable Definitions and References
From the real client-service.ion script:
rand_processes::{
// Define reusable generators
$n: UniformU8::{ low: 5, high: 20 },
$id_gen: UUID,
$rid_gen: UUID,
requests: $n::[
{
// Force evaluation at script read time
$id: $id_gen::(),
$rate: UniformF64::{ low: 0.995e0, high: 1.0e0 },
$success: Bool::{ p: $rate },
service: rand_process::{
$r: UniformU8::{ low: 20, high: 150 },
$arrival: HomogeneousPoisson:: { interarrival: milliseconds::$r },
$data: {
Request: $rid_gen,
Account: $id,
client: Format::{ pattern: "customer #{ $@n }" },
success: $success
}
}
}
]
}
Key concepts:
- Variables:
$variable_namefor reusable generators - Forced evaluation:
$id_gen::()evaluates once at script read time - Loop arrays:
$n::[...]creates N instances - Loop index:
$@naccesses current iteration index
Static Data
From the orders.ion test script:
rand_processes::{
$n: UniformU8::{ low: 5, high: 20 },
$id_gen: UUID,
customers: $n::[
{
$id: $id_gen::(),
// Static data - generated once at simulation start
customer_table: static_data::{
$data: {
id: $id,
address: Format::{ pattern: "{ $@n } Foo Bar Ave" }
}
},
// Dynamic data - generated over time
orders: rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: days::UniformU8::{ low: 1, high: 150 } },
$data: {
Order: UUID,
Customer: $id,
Time: Instant
}
}
}
]
}
Nullability and Optionality
Real syntax from test scripts:
rand_processes::{
nullable_data: rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: seconds::1 },
$data: {
// 75% chance of NULL, can also be MISSING
weight: UniformDecimal::{
nullable: 0.75,
optional: true,
low: 1.995,
high: 4.9999
},
// Never NULL
id: UUID::{ nullable: false },
// 10% chance of MISSING (field won't appear)
optional_field: UniformI32::{ optional: 0.1, low: 1, high: 100 }
}
}
}
Real Script Examples
Simple Sensor Script
Based on the actual sensors.ion test:
rand_processes::{
$n: UniformU8::{ low: 2, high: 10 },
sensors: $n::[
rand_process::{
$r: Uniform::{ choices: [5,10] },
$arrival: HomogeneousPoisson:: { interarrival: minutes::$r },
$weight: UniformDecimal::{ nullable: 0.75, low: 1.995, high: 4.9999, optional: true },
$data: {
tick: Tick,
i8: UniformI8,
f: UniformF64,
w: $weight,
d: UniformDecimal::{ low: 0d0, high: 4.2d1, nullable: false }
}
}
]
}
Test this script:
target/release/beamline gen data \
--seed 42 \
--start-auto \
--script-path partiql-beamline-sim/tests/scripts/sensors.ion \
--sample-count 10 \
--output-format ion-pretty
Client-Service System
Based on client-service.ion test:
rand_processes::{
// Generate between 5 & 20 customers
$n: UniformU8::{ low: 5, high: 20 },
// Shared generators
$id_gen: UUID,
$rid_gen: UUID,
requests: $n::[
{
// Each customer gets unique ID
$id: $id_gen::(),
$rate: UniformF64::{ low: 0.995e0, high: 1.0e0 },
$success: Bool::{ p: $rate },
// Service dataset
service: rand_process::{
$r: UniformU8::{ low: 20, high: 150 },
$arrival: HomogeneousPoisson:: { interarrival: milliseconds::$r },
$data: {
Request: $rid_gen,
StartTime: Instant,
Program: "FancyService",
Operation: "GetMyData",
Account: $id,
client: Format::{ pattern: "customer #{ $@n }" },
success: $success
}
},
// Individual client datasets
'client_{ $@n }': rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: milliseconds::$r },
$data: {
id: $id,
request_time: Instant,
request_id: $rid_gen,
success: $success
}
}
}
]
}
Transaction Data Script
Based on simple_transactions.ion test:
rand_processes::{
test_data: rand_process::{
$r: Uniform::{ choices: [5,10] },
$arrival: HomogeneousPoisson:: { interarrival: milliseconds::$r },
$data: {
transaction_id: UUID::{ nullable: false },
marketplace_id: UniformU8::{ nullable: false },
country_code: Regex::{ pattern: "[A-Z]{2}" },
created_at: Instant,
completed: Bool,
description: LoremIpsum::{ min_words:10, max_words:200 },
price: UniformDecimal::{ low: 2.99, high: 99999.99, optional: true }
}
}
}
Advanced Script Patterns
Complex Statistical Distributions
From numbers.ion test script:
rand_processes::{
test_data: rand_process::{
$r: Uniform::{ choices: [5,10] },
$arrival: HomogeneousPoisson:: { interarrival: milliseconds::$r },
$data: {
uniform: {
// Uniform distributions
uniform_u8: UniformU8::{ low: 13, high: 42 },
uniform_f64: UniformF64::{ low: -13.0, high: 42.0 },
uniform_decimal: UniformDecimal::{ low: 0.995, high: 499.9999 }
},
statistical: {
// Statistical distributions
normal: NormalF64::{ mean: 14.3, std_dev: 3.0 },
lognormal: LogNormalF64::{ location: 14.3, scale: 3.0 },
weibull: WeibullF64::{ shape: 14.3, scale: 3.0 },
exponential: ExpF64::{ rate: 3.0 }
},
// With nullability and optionality
nullable_field: UniformI32::{
nullable: 0.2, // 20% NULL
optional: 0.1, // 10% MISSING
low: 1,
high: 100
}
}
}
}
Multiple Datasets with Relationships
Real pattern from client-service.ion:
rand_processes::{
$n: UniformU8::{ low: 5, high: 20 },
$id_gen: UUID,
requests: $n::[
{
$id: $id_gen::(), // One ID per customer
$rid_gen: UUID, // Separate request ID generator per customer
// Shared service dataset
service: rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: milliseconds::50 },
$data: {
Request: $rid_gen,
StartTime: Instant,
Account: $id,
client: Format::{ pattern: "customer #{ $@n }" }
}
},
// Individual client dataset for this customer
'client_{ $@n }': rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: milliseconds::50 },
$data: {
id: $id,
request_time: Instant,
request_id: $rid_gen
}
}
}
]
}
Static Data with Dynamic References
From orders.ion test script:
rand_processes::{
$n: UniformU8::{ low: 5, high: 20 },
$id_gen: UUID,
$oid_gen: UUID,
customers: $n::[
{
$id: $id_gen::(),
// Static customer data (generated once)
customer_table: static_data::{
$data: {
id: $id,
address: Format::{ pattern: "{ $@n } Foo Bar Ave" }
}
},
// Dynamic orders (generated over time)
orders: rand_process::{
$r: UniformU8::{ low: 1, high: 150 },
$arrival: HomogeneousPoisson:: { interarrival: days::$r },
$data: {
Order: $oid_gen,
Time: Instant,
Customer: $id // Links to customer_table
}
}
}
]
}
Script Testing and Validation
Testing Script Syntax
# Test script with minimal data generation
target/release/beamline gen data \
--seed 1 \
--start-auto \
--script-path new_script.ion \
--sample-count 1
# Check inferred schema
target/release/beamline infer-shape \
--seed 1 \
--start-auto \
--script-path new_script.ion \
--output-format basic-ddl
Testing with Small Samples
# Test each dataset individually
target/release/beamline gen data \
--seed 42 \
--start-auto \
--script-path complex_script.ion \
--sample-count 5 \
--dataset specific_dataset
# Test all datasets with small sample
target/release/beamline gen data \
--seed 42 \
--start-auto \
--script-path complex_script.ion \
--sample-count 5 \
--output-format text
Best Practices
1. Use Real Test Script Patterns
// Good - follows actual Beamline syntax
rand_processes::{
$arrival_rate: Uniform::{ choices: [5, 10] },
events: rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: minutes::$arrival_rate },
$data: {
event_id: UUID,
timestamp: Instant,
value: UniformF64::{ low: 0.0, high: 100.0 }
}
}
}
2. Test Scripts Incrementally
# Start with basic structure
echo 'rand_processes::{ test: rand_process::{ $arrival: HomogeneousPoisson:: { interarrival: seconds::1 }, $data: { id: UUID } } }' > minimal.ion
# Test basic structure
target/release/beamline gen data --seed 1 --start-auto --script-path minimal.ion --sample-count 3
3. Use Meaningful Variable Names
rand_processes::{
// Clear variable names
$customer_count: UniformU8::{ low: 10, high: 50 },
$order_frequency: Uniform::{ choices: [1, 3, 7] }, // Days
$customer_id_generator: UUID,
orders: $customer_count::[
rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: days::$order_frequency },
$data: {
customer_id: $customer_id_generator,
order_time: Instant
}
}
]
}
4. Document Complex Patterns
rand_processes::{
// === Customer Simulation Configuration ===
// Generate 10-50 customers, each placing orders every 1-30 days
$customer_count: UniformU8::{ low: 10, high: 50 },
$shared_customer_id: UUID,
customer_orders: $customer_count::[
{
// Each customer gets unique ID for all their orders
$id: $shared_customer_id::(),
// Customer places orders with variable frequency
orders: rand_process::{
$days_between_orders: UniformU8::{ low: 1, high: 30 },
$arrival: HomogeneousPoisson:: { interarrival: days::$days_between_orders },
$data: {
customer_id: $id,
order_id: UUID,
order_time: Instant,
amount: UniformDecimal::{ low: 10.00, high: 500.00 }
}
}
}
]
}
Common Script Errors and Solutions
Error: Invalid Ion Syntax
// Wrong - missing closing brace
rand_processes::{
test: rand_process::{
$data: { id: UUID }
// Missing closing brace for rand_processes
Error: Missing Required Fields
// Wrong - missing $arrival
rand_processes::{
test: rand_process::{
$data: { id: UUID } // Missing $arrival definition
}
}
// Correct
rand_processes::{
test: rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: seconds::1 },
$data: { id: UUID }
}
}
Error: Invalid Generator Configuration
// Wrong - low > high
rand_processes::{
test: rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: seconds::1 },
$data: {
bad_range: UniformI32::{ low: 100, high: 50 } // Invalid
}
}
}
// Correct
rand_processes::{
test: rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: seconds::1 },
$data: {
good_range: UniformI32::{ low: 50, high: 100 }
}
}
}
Performance Optimization
Efficient Generator Usage
rand_processes::{
// Efficient - reuse expensive generators
$expensive_distribution: NormalF64::{ mean: 100.0, std_dev: 15.0 },
$simple_uuid: UUID,
efficient_data: rand_process::{
$arrival: HomogeneousPoisson:: { interarrival: seconds::1 },
$data: {
// Reuse expensive distribution
score1: $expensive_distribution,
score2: $expensive_distribution,
score3: $expensive_distribution,
// Simple generators are fast
id: $simple_uuid,
active: Bool,
count: UniformI32::{ low: 1, high: 1000 }
}
}
}
Testing Commands
# Test with small samples first
target/release/beamline gen data \
--seed 1 \
--start-auto \
--script-path new_script.ion \
--sample-count 5 \
--output-format text
# Scale up after validation
target/release/beamline gen data \
--seed 1 \
--start-auto \
--script-path new_script.ion \
--sample-count 10000 \
--output-format ion-binary
Next Steps
Now that you understand real Ion scripts and stochastic processes, you’re ready to dive deeper into the Data Generation section, where you’ll learn about specific generator types, output formats, and advanced data modeling techniques using the actual Beamline syntax.