Output Formats

Beamline supports multiple output formats for generated data, each optimized for different use cases. Understanding these formats helps you choose the right one for your workflow.

Available Formats

The CLI supports four main output formats via --output-format:

Format	Description	Use Case	Performance
`text`	Human-readable timestamped format	Debugging, inspection	Moderate
`ion`	Compact Ion text format	Data processing	Fast
`ion-pretty`	Pretty-printed Ion with metadata	Configuration, documentation	Slower
`ion-binary`	Binary Ion format	High-performance storage	Fastest

Text Format (Default)

Characteristics

Human-readable: Easy to read and debug
Timestamped: Each record includes generation timestamp and dataset name
Streaming: Records appear as they’re generated
Metadata: Shows seed and start time

Output Structure

Seed: <seed_value>
Start: <start_timestamp>
[<timestamp>] : "<dataset_name>" { <ion_data> }
[<timestamp>] : "<dataset_name>" { <ion_data> }
...

Example Output

$ beamline gen data \
    --seed 1234 \
    --start-auto \
    --script-path sensors.ion \
    --sample-count 5 \
    --output-format text

Seed: 1234
Start: 2024-05-10T04:04:53.000000000Z
[2024-05-10 4:06:07.274 +00:00:00] : DataSetName("sensors") { 'tick': 74274, 'i8': -86, 'f': 48.07286740416876, 'w': NULL, 'd': 23, 'a': 3.1640, 'ar1': [0.8, 1.1, 1.1], 'ar2': ['e8b12a6c-7cf1-45b6-a8a4-89cd6a418660', 'ba408184-3b94-41e7-860f-6042708bb4be'], 'ar3': [NULL, NULL], 'ar4': [6, 4], 'ar5': [3.1640] }
[2024-05-10 4:08:15.65 +00:00:00] : DataSetName("sensors") { 'tick': 202650, 'i8': 6, 'f': 45.56429323253781, 'w': NULL, 'd': 26, 'a': '613de2a3-195c-410f-8dac-56237f53aa99', 'ar1': [1.1, 0.9, 0.7], 'ar2': ['e0c6700e-f429-429a-a461-c018820fbafe', '9fce83a7-45ef-4210-affe-b87b45e3ac73'], 'ar3': [NULL, 2.4409], 'ar4': [4, 8], 'ar5': ['613de2a3-195c-410f-8dac-56237f53aa99'] }

Use Cases

Development and debugging: Easy to read individual records
Log file analysis: Timestamped records for event correlation
Quick inspection: Rapid visual validation of generated data
Educational: Learning how data generation works

Ion Format

Characteristics

Compact: No pretty-printing or extra whitespace
Fast: Minimal formatting overhead
Ion text: Preserves all Ion type information
Processable: Easy to parse with Ion libraries

Output Structure

{seed:<seed>,start:"<timestamp>",data:{<dataset_name>:[{<record>},{<record>}...]}}

Example Output

$ beamline gen data \
    --seed 42 \
    --start-auto \
    --script-path simple.ion \
    --sample-count 3 \
    --output-format ion

{seed:42,start:"2024-01-01T00:00:00Z",data:{sensors:[{f:-2.543639,i8:4,tick:125532},{f:-63.493088,i8:4,tick:218756},{f:12.345679,i8:-12,tick:253123}]}}

Use Cases

Data processing pipelines: Efficient parsing and processing
API responses: Compact data transmission
Intermediate storage: Balance between readability and efficiency
Configuration files: Structured data that’s still readable

Ion Pretty Format

Characteristics

Human-readable: Well-formatted with indentation
Complete metadata: Includes seed, start time, and full data structure
Ion text format: Preserves all type information
Structured: Clear hierarchical organization

Output Structure

{
  seed: <seed>,
  start: "<timestamp>", 
  data: {
    <dataset_name>: [
      {
        <field>: <value>,
        <field>: <value>
      },
      {
        <field>: <value>,
        <field>: <value>
      }
    ]
  }
}

Example Output

$ beamline gen data \
    --seed 123 \
    --start-auto \
    --script-path sensors.ion \
    --sample-count 2 \
    --output-format ion-pretty

{
  seed: 123,
  start: "2024-01-20T10:30:00.000000000Z",
  data: {
    sensors: [
      {
        f: -2.5436390152455175e0,
        i8: 4,
        tick: 125532
      },
      {
        f: -63.49308817145054e0,
        i8: 4,
        tick: 218756
      }
    ]
  }
}

Use Cases

Configuration files: Readable but structured data
Documentation: Examples and samples in documentation
Data inspection: Understanding complex nested structures
Archive storage: Long-term storage with metadata

Ion Binary Format

Characteristics

Most compact: Smallest file size
Fastest: Highest performance for generation and parsing
Type preservation: All Ion types preserved exactly
Not human-readable: Requires Ion tools to read

Example Usage

$ beamline gen data \
    --seed 999 \
    --start-auto \
    --script-path large_dataset.ion \
    --sample-count 1000000 \
    --output-format ion-binary > data.ion

Use Cases

Large datasets: Maximum efficiency for big data generation
High-performance applications: Minimal parsing overhead
Storage optimization: Smallest possible file sizes
Data transmission: Efficient network transfer

Format Comparison

Size Comparison

For the same dataset with 1000 records:

# Generate in all formats for comparison
beamline gen data --seed 1 --start-auto --script-path data.ion --sample-count 1000 --output-format text > data.txt
beamline gen data --seed 1 --start-auto --script-path data.ion --sample-count 1000 --output-format ion > data.ion  
beamline gen data --seed 1 --start-auto --script-path data.ion --sample-count 1000 --output-format ion-pretty > data_pretty.ion
beamline gen data --seed 1 --start-auto --script-path data.ion --sample-count 1000 --output-format ion-binary > data.bin

# Compare sizes
ls -lh data.*
# Example results:
# -rw-r--r-- 1 user user 245K data.txt        (text - largest)
# -rw-r--r-- 1 user user 156K data.ion        (ion - medium)  
# -rw-r--r-- 1 user user 189K data_pretty.ion (pretty - larger due to formatting)
# -rw-r--r-- 1 user user 98K  data.bin        (binary - smallest)

Performance Comparison

For generation of 100,000 records:

ion-binary: Fastest (baseline)
ion: ~10% slower than binary
text: ~25% slower than binary
ion-pretty: ~40% slower than binary (due to formatting)

Format-Specific Features

Text Format Features

Timestamp visibility: See exactly when each event occurred in simulation time

[2024-01-01 08:15:23.456 +00:00] : "orders" { 'order_id': '123e4567', 'amount': 99.99 }
[2024-01-01 08:20:45.789 +00:00] : "orders" { 'order_id': '987fcdeb', 'amount': 149.50 }

Dataset identification: Clear dataset labels for multi-dataset scripts

Ion Formats Features

Type preservation: All Ion types are preserved exactly

{
  decimal_field: 123.45,           // Exact decimal
  float_field: 123.45e0,           // Float with exponent notation
  timestamp: 2024-01-01T00:00:00Z, // Full timestamp precision
  uuid: "123e4567-e89b-12d3-a456-426614174000"
}

Structured data: Complex nested structures preserved

{
  user: {
    profile: {
      preferences: ["dark_mode", "notifications"]
    }
  }
}

NULL and MISSING Representation

Different formats handle absent values differently:

Text Format

[timestamp] : "dataset" { 'present': 42, 'null_field': null }  // MISSING fields omitted

Ion Formats

{
  present: 42,
  null_field: null
  // missing_field is omitted entirely
}

Multiple Dataset Output

Text Format with Multiple Datasets

$ beamline gen data \
    --seed 100 \
    --start-auto \
    --script-path client_service.ion \
    --sample-count 10 \
    --output-format text

Seed: 100
Start: 2024-01-01T00:00:00Z
[2024-01-01 00:00:00.000 +00:00] : "customer_table" { 'id': 'abc-123', 'address': '0 Main St' }
[2024-01-01 00:00:00.000 +00:00] : "customer_table" { 'id': 'def-456', 'address': '1 Main St' }  
[2024-01-01 00:05:30.123 +00:00] : "service" { 'Request': 'req-001', 'Account': 'abc-123' }
[2024-01-01 00:05:30.124 +00:00] : "client_1" { 'id': 'abc-123', 'request_id': 'req-001' }

Ion Pretty Format with Multiple Datasets

{
  seed: 100,
  start: "2024-01-01T00:00:00Z",
  data: {
    customer_table: [
      {
        id: "abc-123",
        address: "0 Main St"
      },
      {
        id: "def-456", 
        address: "1 Main St"
      }
    ],
    service: [
      {
        Request: "req-001",
        Account: "abc-123",
        StartTime: 2024-01-01T00:05:30.123Z
      }
    ],
    client_1: [
      {
        id: "abc-123",
        request_id: "req-001",
        request_time: 2024-01-01T00:05:30.124Z
      }
    ]
  }
}

Choosing the Right Format

Development and Testing

# Use text for quick debugging
beamline gen data --script-path debug.ion --sample-count 5 --output-format text

# Use ion-pretty for understanding structure  
beamline gen data --script-path complex.ion --sample-count 10 --output-format ion-pretty

Production and Performance

# Use ion-binary for large datasets
beamline gen data --script-path production.ion --sample-count 1000000 --output-format ion-binary

# Use ion for balance of efficiency and readability
beamline gen data --script-path data.ion --sample-count 100000 --output-format ion

Integration Workflows

# Generate for different consumers
beamline gen data --seed 42 --start-auto --script-path data.ion --sample-count 10000 --output-format ion-binary > high_perf.ion
beamline gen data --seed 42 --start-auto --script-path data.ion --sample-count 100 --output-format ion-pretty > documentation.ion
beamline gen data --seed 42 --start-auto --script-path data.ion --sample-count 1000 --output-format text > debug.txt

Format-Specific Processing

Processing Text Format

# Extract specific datasets
beamline gen data --script-path multi.ion --output-format text | \
  grep '"sensors"' | \
  head -10

# Analyze timestamps
beamline gen data --script-path temporal.ion --output-format text | \
  awk -F'\\[|\\]' '{print $2}' | \
  head -20

Processing Ion Formats

# Use Ion tools for processing
beamline gen data --script-path data.ion --output-format ion-binary | \
  ion-cli query "SELECT * FROM data.sensors WHERE f > 0"

# Convert between formats
beamline gen data --script-path data.ion --output-format ion | \
  ion-cli pretty > formatted.ion

Pipeline Integration

# Generate and immediately process
beamline gen data \
  --seed 123 \
  --start-auto \
  --script-path metrics.ion \
  --sample-count 10000 \
  --output-format ion-pretty | \
  jq '.data.metrics[] | select(.temperature > 25)' | \
  head -10

Database Generation Formats

Database generation creates multiple file formats automatically:

$ beamline gen db beamline-lite \
    --seed 42 \
    --start-auto \
    --script-path data.ion \
    --sample-count 1000

$ ls -la beamline-catalog/
-rw-r--r-- 1 user user   145 .beamline-manifest    # JSON metadata
-rw-r--r-- 1 user user  2.1K .beamline-script      # Ion script
-rw-r--r-- 1 user user   89K sensors.ion           # Data in Ion format
-rw-r--r-- 1 user user   412 sensors.shape.ion     # Schema in Ion format  
-rw-r--r-- 1 user user   298 sensors.shape.sql     # Schema in SQL DDL format

Data Files (Ion Format)

$ head -3 beamline-catalog/sensors.ion
{f: -2.5436390152455175e0, i8: 4, tick: 125532}
{f: -63.49308817145054e0, i8: 4, tick: 218756}
{f: 12.34567890123456e0, i8: -12, tick: 253123}

Schema Files (Ion Format)

$ cat beamline-catalog/sensors.shape.ion
{
  type: "bag",
  items: {
    type: "struct",
    constraints: [ordered, closed],
    fields: [
      { name: "f", type: "double" },
      { name: "i8", type: "int8" },
      { name: "tick", type: "int8" }
    ]
  }
}

Schema Files (SQL DDL Format)

$ cat beamline-catalog/sensors.shape.sql
"f" DOUBLE,
"i8" INT8,
"tick" INT8

Format Selection Guidelines

By Use Case

Use Case	Recommended Format	Rationale
Quick debugging	`text`	Timestamps and human readability
Data inspection	`ion-pretty`	Structure visibility with metadata
Large dataset generation	`ion-binary`	Maximum performance and compression
Data processing	`ion`	Good balance of efficiency and readability
Documentation	`ion-pretty`	Clear structure for examples
Long-term storage	`ion-binary`	Most compact and preserves all types

By Dataset Size

Dataset Size	Recommended Format	Alternative
< 100 records	`text` or `ion-pretty`	For inspection
100 - 10K records	`ion` or `ion-pretty`	Based on use case
10K - 100K records	`ion` or `ion-binary`	For efficiency
> 100K records	`ion-binary`	Maximum performance

By Integration Target

Target System	Recommended Format	Notes
Ion-aware tools	`ion-binary`	Native format
JSON processors	`ion` + conversion	Ion can be converted to JSON
SQL databases	Use `gen db`	Creates SQL schemas automatically
Log analysis	`text`	Timestamped format
Documentation	`ion-pretty`	Human-readable structure

Format Conversion Patterns

Manual Conversion

# Generate in efficient format, convert for specific use
beamline gen data --script-path data.ion --sample-count 10000 --output-format ion-binary > efficient.ion

# Convert to pretty format for inspection
ion-cli pretty < efficient.ion > readable.ion

# Extract specific fields
ion-cli query "SELECT data.sensors[*].temperature FROM `efficient.ion`" > temperatures.ion

Multi-Format Generation

#!/bin/bash
# generate-multi-format.sh

SCRIPT="$1"
SEED="$2"
COUNT="$3"

# Generate in multiple formats
beamline gen data --seed $SEED --start-auto --script-path $SCRIPT --sample-count $COUNT --output-format ion-binary > data.bin
beamline gen data --seed $SEED --start-auto --script-path $SCRIPT --sample-count 100 --output-format ion-pretty > sample.ion
beamline gen data --seed $SEED --start-auto --script-path $SCRIPT --sample-count 10 --output-format text > debug.txt

echo "Generated:"
echo "- data.bin (binary, $COUNT records)"  
echo "- sample.ion (pretty, 100 records)"
echo "- debug.txt (text, 10 records)"

Integration Examples

Web API Integration

# Generate data for API testing
beamline gen data \
  --seed 42 \
  --start-auto \
  --script-path api_test_data.ion \
  --sample-count 1000 \
  --output-format ion-pretty | \
  jq '.data' > api_test_payload.json

Database Loading

# Generate data and schema for database
beamline gen db beamline-lite \
  --seed 100 \
  --start-auto \
  --script-path warehouse_data.ion \
  --sample-count 50000

# Use generated SQL schema
psql -d warehouse -f beamline-catalog/orders.shape.sql

# Convert data for loading (would need custom conversion)
# partiql-to-csv beamline-catalog/orders.ion > orders.csv
# COPY orders FROM 'orders.csv' WITH CSV HEADER;

Analytics Pipeline

#!/bin/bash
# analytics-pipeline.sh

# Generate raw data efficiently  
beamline gen data \
  --seed 202401 \
  --start-iso "2024-01-01T00:00:00Z" \
  --script-path analytics.ion \
  --sample-count 1000000 \
  --output-format ion-binary > raw_data.ion

# Generate sample for validation
beamline gen data \
  --seed 202401 \
  --start-iso "2024-01-01T00:00:00Z" \
  --script-path analytics.ion \
  --sample-count 100 \
  --output-format ion-pretty > sample_validation.ion

echo "Analytics data generated:"
echo "- Raw data: $(wc -l < raw_data.ion) records in binary format"
echo "- Validation sample: 100 records in pretty format"

Best Practices

1. Match Format to Purpose

# Debugging - use text
beamline gen data --script-path new_script.ion --sample-count 5 --output-format text

# Production - use binary
beamline gen data --script-path prod_data.ion --sample-count 1000000 --output-format ion-binary

# Documentation - use pretty
beamline gen data --script-path examples.ion --sample-count 10 --output-format ion-pretty

2. Consider File Size for Large Datasets

# Check estimated size first
beamline gen data --script-path large.ion --sample-count 1000 --output-format ion-binary | wc -c
# If 1000 records = 50KB, then 1M records ≈ 50MB

3. Use Appropriate Format for Storage

# Long-term storage
beamline gen data --script-path archive.ion --sample-count 100000 --output-format ion-binary

# Working files  
beamline gen data --script-path working.ion --sample-count 1000 --output-format ion-pretty

# Quick inspection
beamline gen data --script-path inspect.ion --sample-count 20 --output-format text

4. Document Format Choices

# Document why you chose specific formats
echo "# Data Formats Used
- raw_data.bin: ion-binary for maximum efficiency (1M+ records)
- sample.ion: ion-pretty for human inspection (100 records)  
- debug.txt: text format for timestamp analysis (50 records)
" > FORMAT_NOTES.md

Next Steps

Scripts - Advanced Ion scripting techniques
Datasets - Working with multiple datasets and relationships
CLI Data Commands - Complete CLI format options reference