Spark Pipeline Framework|DocsComponentsHooks

Packages

  • package root
    Definition Classes
    root
  • package io
    Definition Classes
    root
  • package github
    Definition Classes
    io
  • package dwsmith1983
    Definition Classes
    github
  • package spark
    Definition Classes
    dwsmith1983
  • package pipeline
    Definition Classes
    spark
  • package runner

    Runner package for spark-pipeline-framework.

    Runner package for spark-pipeline-framework.

    This package provides the entry point for executing pipelines:

    - runner.PipelineRunner - Trait defining the pipeline execution interface - runner.SimplePipelineRunner - Main class for spark-submit (sequential execution)

    Lifecycle Hooks

    Use PipelineHooks from the config package to monitor or customize pipeline execution:

    import io.github.dwsmith1983.spark.pipeline.config._
    
    val hooks = new PipelineHooks {
      override def afterComponent(config: ComponentConfig, index: Int, total: Int, durationMs: Long): Unit =
        println(s"Component $${config.instanceName} completed in $${durationMs}ms")
    }
    
    SimplePipelineRunner.run(config, hooks)

    Deployment

    The runner JAR should be used as the main artifact for spark-submit, with user pipeline JARs added via --jars:

    spark-submit \
      --class io.github.dwsmith1983.spark.pipeline.runner.SimplePipelineRunner \
      --jars /path/to/user-pipeline.jar \
      /path/to/spark-pipeline-runner-spark3_2.12.jar \
      -Dconfig.file=/path/to/pipeline.conf

    Configuration Loading

    Configuration is loaded via Typesafe Config in this order: 1. System properties (-Dconfig.file, -Dconfig.resource, -Dconfig.url) 2. application.conf in classpath 3. reference.conf in classpath

    Definition Classes
    pipeline
  • PipelineRunner
  • SchemaContractViolationException
  • SimplePipelineRunner

trait PipelineRunner extends AnyRef

Trait defining the interface for pipeline execution.

Implementations of this trait are responsible for: - Loading and parsing pipeline configuration - Configuring the SparkSession - Instantiating and executing pipeline components - Invoking lifecycle hooks at appropriate points

Standard Implementation

The framework provides SimplePipelineRunner as the standard implementation, which executes components sequentially in the order they appear in the configuration.

Custom Implementations

You can create custom runners for specialized behavior: - Parallel execution of independent components - Dry-run mode that validates without executing - Conditional execution based on external state

Example

// Using the standard runner with custom hooks
val hooks = new PipelineHooks {
  override def afterComponent(config: ComponentConfig, index: Int, total: Int, durationMs: Long): Unit =
    println(s"Component $${config.instanceName} completed in $${durationMs}ms")
}

SimplePipelineRunner.run(config, hooks)
Linear Supertypes
AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. PipelineRunner
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Abstract Value Members

  1. abstract def dryRun(config: Config, hooks: PipelineHooks): DryRunResult

    Validates the pipeline configuration without executing components.

    Validates the pipeline configuration without executing components.

    This method parses the configuration and instantiates all components to verify they can be created, but does not call their run() methods. Lifecycle hooks are invoked for beforePipeline and afterPipeline only.

    config

    The loaded HOCON configuration containing pipeline and optionally spark blocks

    hooks

    Lifecycle hooks to invoke (only beforePipeline/afterPipeline are called)

    returns

    Validation result indicating success or listing errors

  2. abstract def run(config: Config, hooks: PipelineHooks): Unit

    Runs the pipeline defined in the given configuration with lifecycle hooks.

    Runs the pipeline defined in the given configuration with lifecycle hooks.

    May throw ComponentInstantiationException if a component cannot be instantiated, or ConfigurationException if the configuration is invalid.

    config

    The loaded HOCON configuration containing pipeline and optionally spark blocks

    hooks

    Lifecycle hooks to invoke during execution

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  6. def dryRun(config: Config): DryRunResult

    Validates the pipeline configuration without executing components.

    Validates the pipeline configuration without executing components.

    This method parses the configuration and instantiates all components to verify they can be created, but does not call their run() methods. Useful for CI/CD pipelines to validate configurations before deployment.

    This is a convenience method that uses PipelineHooks.NoOp.

    config

    The loaded HOCON configuration containing pipeline and optionally spark blocks

    returns

    Validation result indicating success or listing errors

  7. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  8. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  9. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  10. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  11. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  12. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  14. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  15. def run(config: Config): Unit

    Runs the pipeline defined in the given configuration.

    Runs the pipeline defined in the given configuration.

    This is a convenience method that uses PipelineHooks.NoOp.

    May throw ComponentInstantiationException if a component cannot be instantiated, or ConfigurationException if the configuration is invalid.

    config

    The loaded HOCON configuration containing pipeline and optionally spark blocks

  16. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  17. def toString(): String
    Definition Classes
    AnyRef → Any
  18. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  19. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  20. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from AnyRef

Inherited from Any

Ungrouped