There are several different ways to execute jobs. This page explains different ways to do one-off execution of jobs: Dagit, Dagster CLI, or Python APIs.
You can also launch jobs in other ways:
Schedules can be used to launch runs on a fixed interval.
Sensors allow you to launch runs based on external state changes.
from dagster import job, op
@opdefreturn_one():return1@opdefadd_two(i:int):return i +2@opdefmulti_three(i:int):return i *3@jobdefmy_job():
multi_three(add_two(return_one()))
Click on the "Launchpad" tab, then press the "Launch Run" button to execute the job. You will then see Dagit launch a job run:
By default, Dagit will run the job using the multiprocess executor - that means each step in the job runs in its own process, and steps that don't depend on each other can run in parallel.
Dagit Launchpad also offers a configuration editor to let you interactively build up the configuration. See details in Dagit.
This executor_def property can be set to allow for different types of isolation and parallelism, ranging from executing all the ops in the same process to executing each op in its own Kubernetes pod. See Executors for more details.
The default job executor definition defaults to multiprocess execution. It also allows you to toggle between in process and multiprocess execution via config.
Here is an example of run config as yaml you could provide in the dagit playground to do an in process execution.
execution:
config:
in_process:
Additional config options are available for multiprocess execution that can help with performance. This includes limiting the max concurrent subprocesses and controlling how those subprocess are spawned.
The example below sets the run config directly on the job to explicitly set the max concurrent subprocesses to 4, and change the subprocess start method to use a forkserver.
Using forkserver is a great way to reduce per process overhead during multiprocess execution, but can cause issues with certain libraries. More details can be found here.