Datapipe CLI

Datapipe provides datapipe CLI tool which can be useful for inspecting pipeline, tables, and running steps.

datapipe CLI is build using click and provides several levels of commands and subcommands each of which can have parameters. click parameters are level-specific, i.e. global-level arguments should be specified at global level only:

datapipe --debug run, but NOT datapipe run --debug

Global arguments

--pipeline

By default datapipe looks for a file app.py in working directory and looks for app object of type DatapipeApp inside this file. --pipeline argument allows user to provide location for DatapipeApp object.

Format: <module.import.path>:<symbol>

Format is similar to other systems, like uvicorn.

Example: datapipe --pipeline my_project.pipeline:app will try to import module my_project.pipeline and will look for object app, it will expect this object to be of type DatapipeApp.

--executor

Possible values:

  • SingleThreadExecutor
  • RayExecutor

TODO add separate section which describes Executor

--debug, --debug-sql

--debug turns on debug logging in most places and shows internals of datapipe processing.

--debug-sql additionally turns on logging for all SQL queries which might be quite verbose, but provides insight on how datapipe interacts with database.

--trace-*

  • --trace-stdout
  • --trace-jaeger
  • --trace-jaeger-host HOST
  • --trace-jaeger-port PORT
  • --trace-gcp

This set of flags turns on different exporters for OpenTelemetry

db

create-all

datapipe db create-all is a handy shortcut for local development. It makes datapipe to create all known SQL tables in a configured database.

lint

Runs checks on current state of database. Can detect and fix commong issues.

run

step

  • --name is to provide a filter of steps with prefix matching of step name. Example: datapipe step --name=my_step_name run.
  • --labels is to provide a filter of steps according to its labels. Example: datapipe step --labels=my_label_name=my_label_value run.

run

Run steps. Could be used with --name and --labels options to filter steps.

list

Show steps in data pipeline. Could be used with --name and --labels options to filter steps.

  • --status adds info about indexes to process.

reset-metadata

Mark data as unprocessed. Could be used with --name and --labels options to filter steps.

table