Skip to content

Terraform

Interlock ships as a reusable Terraform module in deploy/terraform/. The module creates all required AWS infrastructure: DynamoDB tables, Lambda functions, Step Functions, EventBridge, and IAM roles. Your project consumes the module and provides pipeline YAML files – no framework code runs in your repository.

Prerequisites

  • Terraform >= 1.5
  • AWS CLI v2 configured with appropriate credentials
  • AWS provider >= 5.0
  • Go 1.24+ (to build Lambda binaries)

Quick Start

1. Build Lambda Binaries

./deploy/build.sh

This cross-compiles the 6 Lambda handlers (stream-router, orchestrator, sla-monitor, watchdog, event-sink, alert-dispatcher) for linux/arm64 and outputs them to deploy/dist/.

2. Write Pipeline Configs

Create YAML files in a pipelines/ directory. Each file defines one pipeline:

# pipelines/gold-revenue.yaml
pipeline:
  id: gold-revenue
  owner: analytics-team
  description: Gold-tier revenue aggregation pipeline

schedule:
  cron: "0 8 * * *"
  timezone: UTC
  evaluation:
    window: 1h
    interval: 5m

sla:
  deadline: "10:00"
  expectedDuration: 30m

validation:
  trigger: "ALL"
  rules:
    - key: upstream-complete
      check: equals
      field: status
      value: ready
    - key: row-count
      check: gte
      field: count
      value: 1000

job:
  type: glue
  config:
    jobName: gold-revenue-etl
  maxRetries: 2
  jobPollWindowSeconds: 3600

See Pipeline Configuration for the full YAML schema, including per-source rerun limits and failure classification.

3. Use the Module

Reference the Interlock module from your Terraform configuration:

module "interlock" {
  source = "path/to/interlock/deploy/terraform"

  environment    = "production"
  dist_path      = "path/to/interlock/deploy/dist"
  pipelines_path = "${path.module}/pipelines"
  calendars_path = "${path.module}/calendars"

  tags = {
    Project     = "interlock"
    Environment = "production"
  }
}

4. Deploy

terraform init
terraform plan
terraform apply

Module Variables

VariableTypeDefaultDescription
environmentstringEnvironment name (e.g., staging, production). Prefixes all resource names.
dist_pathstringPath to the directory containing Lambda zip files
pipelines_pathstringPath to directory containing pipeline YAML configuration files
calendars_pathstring""Path to directory containing calendar YAML files (optional)
tagsmap(string){}Tags to apply to all resources
lambda_memory_sizenumber128Memory size for Lambda functions (MB)
log_retention_daysnumber30CloudWatch log retention in days
watchdog_schedulestringrate(5 minutes)EventBridge schedule expression for the watchdog Lambda
sfn_timeout_secondsnumber14400Step Functions global execution timeout in seconds (default 4h). Set this to accommodate your longest pipeline’s total window (evaluation + job poll + post-run)
trigger_max_attemptsnumber3Max infrastructure retry attempts for the Trigger state (exponential backoff: 30s, 60s, 120s, 240s)
enable_glue_triggerboolfalseGrant orchestrator Lambda permission to start Glue jobs
enable_emr_triggerboolfalseGrant orchestrator Lambda permission to submit EMR steps
enable_emr_serverless_triggerboolfalseGrant orchestrator Lambda permission to start EMR Serverless jobs
enable_sfn_triggerboolfalseGrant orchestrator Lambda permission to start Step Functions executions
enable_lambda_triggerboolfalseGrant orchestrator Lambda permission to invoke Lambda functions
lambda_trigger_arnslist(string)["*"]Lambda function ARNs the orchestrator may invoke (scoped IAM). Only used when enable_lambda_trigger is true
slack_bot_tokenstring (sensitive)""Slack Bot API token with chat:write scope for alert-dispatcher
slack_channel_idstring""Slack channel ID for alert notifications
slack_secret_arnstring""AWS Secrets Manager ARN containing the Slack bot token. When set, alert-dispatcher reads the token from Secrets Manager instead of the slack_bot_token environment variable
events_table_ttl_daysnumber90TTL in days for events table records
lambda_concurrencyobjectSee belowReserved concurrent executions per Lambda function
sns_alarm_topic_arnstring""SNS topic ARN for CloudWatch alarm notifications. When set, all alarms send to this topic

Lambda Concurrency Defaults

The lambda_concurrency variable is an object with per-function keys:

lambda_concurrency = {
  stream_router    = 10
  orchestrator     = 10
  sla_monitor      = 5
  watchdog         = 2
  event_sink       = 5
  alert_dispatcher = 3
}

Set any key to -1 to use unreserved concurrency (Lambda default).

Secrets Manager

By default, alert-dispatcher reads the Slack bot token from the SLACK_BOT_TOKEN environment variable. For production deployments, store the token in AWS Secrets Manager and pass the ARN:

module "interlock" {
  source = "path/to/interlock/deploy/terraform"

  slack_secret_arn = "arn:aws:secretsmanager:us-east-1:123456789012:secret:interlock/slack-token-AbCdEf"
  slack_channel_id = "C0123456789"
  # ...
}

When slack_secret_arn is set, the module automatically grants secretsmanager:GetSecretValue to the alert-dispatcher Lambda role. The token must have the xoxb- prefix (Bot token).

CloudWatch Alarms

The module creates CloudWatch alarms across four categories. All alarms conditionally route to an SNS topic when sns_alarm_topic_arn is set.

CategoryAlarmsThreshold
Lambda errorsOne per function (6 total)Errors >= 1 per 5-minute period
Step Functions failuresOne for the pipeline state machineExecutionsFailed >= 1 per 5-minute period
DLQ depthOne per dead-letter queue (3 total)ApproximateNumberOfMessagesVisible >= 1
Stream iterator ageOne per DynamoDB Stream mapping (2 total)IteratorAge >= 300,000ms (5 minutes)

CloudWatch alarm state changes are reshaped via EventBridge input transformers into INFRA_ALARM events and routed to both event-sink and alert-dispatcher. This means infrastructure alerts appear in the events table and Slack alongside pipeline lifecycle events — no additional Go code required.

module "interlock" {
  source = "path/to/interlock/deploy/terraform"

  sns_alarm_topic_arn = aws_sns_topic.ops_alerts.arn
  # ...
}

What Gets Created

DynamoDB Tables

The module creates 4 tables, all using on-demand billing:

TableName PatternPurpose
Control{env}-interlock-controlPipeline configs, sensor data, locks, evaluation state
Job Log{env}-interlock-joblogJob execution history and run status
Rerun{env}-interlock-rerunRerun requests and tracking
Events{env}-interlock-eventsCentralized event log for dashboards and audit (GSI1: eventType → timestamp)

All tables have TTL enabled on the ttl attribute. The control and joblog tables have DynamoDB Streams enabled (NEW_IMAGE) to drive the stream-router Lambda.

Lambda Functions

FunctionName PatternPurpose
stream-router{env}-interlock-stream-routerProcesses DynamoDB stream events, starts Step Functions executions
orchestrator{env}-interlock-orchestratorEvaluates validation rules, triggers jobs, polls job status
sla-monitor{env}-interlock-sla-monitorChecks SLA deadlines, emits breach events
watchdog{env}-interlock-watchdogScans for missed schedules and stale triggers
event-sink{env}-interlock-event-sinkCaptures all EventBridge events to the events table
alert-dispatcher{env}-interlock-alert-dispatcherFormats and delivers Slack notifications from the alert queue

All Lambdas run on provided.al2023 (custom runtime) with arm64 architecture.

Pipeline Config Loader

The module reads all *.yaml and *.yml files from pipelines_path and writes them as CONFIG rows to the control table. Each pipeline’s YAML content is stored as a string attribute, keyed by PIPELINE#{id} / CONFIG.

This means pipeline configuration is managed entirely through Terraform – update a YAML file, run terraform apply, and the new config is live.

Step Functions

A single state machine ({env}-interlock-pipeline) orchestrates the full pipeline lifecycle:

  1. Evaluate – orchestrator Lambda evaluates validation rules
  2. Wait – if not ready, wait for the configured interval
  3. Re-evaluate – loop until ready or window expires
  4. Trigger – orchestrator starts the configured job
  5. Poll – orchestrator checks job status at intervals
  6. Complete – emit success/failure event to EventBridge

SLA monitoring uses EventBridge Scheduler — one-time schedule entries fire warning and breach alerts at exact timestamps, independent of the main evaluation loop.

The ASL definition is at deploy/statemachine.asl.json and uses templatefile() to substitute Lambda ARNs.

EventBridge

ResourceName PatternPurpose
Custom event bus{env}-interlock-eventsAll Interlock events (SLA breaches, completions, failures, missed schedules)
Watchdog schedule{env}-interlock-watchdogInvokes the watchdog Lambda on a recurring schedule

All 4 Lambda functions publish events to the custom bus. Create EventBridge rules on this bus to route events to your alerting systems (SNS, Slack, PagerDuty, etc.).

IAM

Each Lambda gets its own execution role with least-privilege policies:

RolePermissions
stream-routerDynamoDB read/write (all 3 tables), DynamoDB Streams (control + joblog), Step Functions StartExecution, EventBridge PutEvents, SQS SendMessage (DLQs)
orchestratorDynamoDB read/write (all 3 tables), EventBridge PutEvents, conditional trigger permissions
sla-monitorEventBridge PutEvents
watchdogDynamoDB read (all 3 tables), DynamoDB write (control only), EventBridge PutEvents

The Step Functions execution role has permission to invoke the orchestrator and sla-monitor Lambdas.

Dead-Letter Queues

Two SQS DLQs capture failed DynamoDB stream processing:

QueueSource
{env}-interlock-sr-control-dlqControl table stream failures
{env}-interlock-sr-joblog-dlqJob log table stream failures

Stream event source mappings use bisect_batch_on_function_error with 3 retry attempts before routing to the DLQ.

Module Outputs

OutputDescription
control_table_nameName of the control DynamoDB table
control_table_arnARN of the control DynamoDB table
joblog_table_nameName of the job log DynamoDB table
rerun_table_nameName of the rerun DynamoDB table
event_bus_nameName of the EventBridge event bus
event_bus_arnARN of the EventBridge event bus
sfn_arnARN of the pipeline state machine
sfn_nameName of the pipeline state machine

Use these outputs to integrate Interlock with your existing infrastructure (e.g., create EventBridge rules, grant external services write access to the control table).

Tear Down

terraform destroy

This removes all resources created by the module. DynamoDB tables with data will be deleted – ensure you have backups if needed.