Dask (dagster-dask)¶

See also the Dask deployment guide.

dagster_dask.dask_executor ExecutorDefinition[source]¶

Config Schema:

cluster (selector):

Config Schema:

existing (strict dict):

Connect to an existing scheduler.

Config Schema:

address (dagster.StringSource):

local (permissive dict, optional):

Local cluster configuration.

yarn (permissive dict, optional):

YARN cluster configuration.

ssh (permissive dict, optional):

SSH cluster configuration.

pbs (permissive dict, optional):

PBS cluster configuration.

moab (permissive dict, optional):

Moab cluster configuration.

sge (permissive dict, optional):

SGE cluster configuration.

lsf (permissive dict, optional):

LSF cluster configuration.

slurm (permissive dict, optional):

SLURM cluster configuration.

oar (permissive dict, optional):

OAR cluster configuration.

kube (permissive dict, optional):

Kubernetes cluster configuration.

Dask-based executor.

The ‘cluster’ can be one of the following: (‘existing’, ‘local’, ‘yarn’, ‘ssh’, ‘pbs’, ‘moab’, ‘sge’, ‘lsf’, ‘slurm’, ‘oar’, ‘kube’).

If the Dask executor is used without providing executor-specific config, a local Dask cluster will be created (as when calling dask.distributed.Client() with dask.distributed.LocalCluster()).

The Dask executor optionally takes the following config:

cluster:
    {
        local?: # takes distributed.LocalCluster parameters
            {
                timeout?: 5,  # Timeout duration for initial connection to the scheduler
                n_workers?: 4  # Number of workers to start
                threads_per_worker?: 1 # Number of threads per each worker
            }
    }

To use the dask_executor, set it as the executor_def when defining a job:

from dagster import job
from dagster_dask import dask_executor

@job(executor_def=dask_executor)
def dask_enabled_job():
    pass