Serverless deployment in Dagster Cloud#

This guide is applicable to Dagster Cloud.

Dagster Cloud Serverless is a fully managed version of Dagster Cloud, and is the easiest way to get started with Dagster. With Serverless, you can run your Dagster jobs without spinning up any infrastructure.

When to choose Serverless#

Serverless works best with workloads that primarily orchestrate other services or perform light computation. Most workloads fit into this category, especially those that orchestrate third-party SaaS products like cloud data warehouses and ETL tools.

If any of the following are applicable, you should select Hybrid deployment:

You require substantial computational resources. For example, training a large machine learning (ML) model in-process.
Your dataset is too large to fit in memory. For example, training a large machine learning (ML) model in-process on a terabyte of data.
You need to distribute computation across many nodes for a single run. Dagster Cloud runs currently execute on a single node with 4 CPUs.
You don't want to add Elementl as a data processor.

Limitations#

Serverless is currently in early access and is subject to the following limitations:

Maximum of 100 GB of bandwidth per day
Maximum of 4500 step-minutes per day
Runs receive 4 vCPU cores, 16 GB of RAM and 128 GB of ephemeral disk
Sensors receive 0.25 vCPU cores and 512 MB of RAM
All Serverless jobs run in the United States

Enterprise customers may request a quota increase by contacting Sales.

Getting started with Serverless#

With GitHub
Without GitHub (GitLab, BitBucket, or local development)
Adding secrets

With GitHub#

If you are a GitHub user, our GitHub integration is the fastest way to get started. It uses a GitHub app and GitHub Actions to set up a repo containing skeleton code and configuration consistent with Dagster Cloud's best practices with a single click.

When you create a new Dagster Cloud organization, you'll be prompted to choose Serverless or Hybrid deployment. Once activated, our GitHub integration will scaffold a new git repo for you with Serverless and Branch Deployments already configured. Pushing to the main branch will deploy to your prod Serverless deployment. Pull requests will spin up ephemeral branch deployments using the Serverless agent.

Without GitHub (GitLab, BitBucket, or local development)#

If you don't want to use our GitHub integration, we offer a powerful CLI that you can use in another CI environment or on your local laptop.

First, create a new project with the Dagster open-source CLI.


pip install dagster
dagster project from-example \
  --name my-dagster-project \
  --example assets_modern_data_stack

Once scaffolded, add dagster-cloud as a dependency in your setup.py file.

Next, install the dagster-cloud CLI and log in to your org. Note: The CLI requires a recent version of Python 3 and Docker.


pip install dagster-cloud
dagster-cloud configure

You can also configure the dagster-cloud tool noninteractively; see the CLI docs for more information.

Finally, deploy your project with Dagster Cloud Serverless:


dagster-cloud serverless deploy \
  --location-name example \
  --package-name assets_modern_data_stack

Adding secrets#

Often you'll need to securely access secrets from your jobs. Dagster Cloud supports several methods for adding secrets - refer to the Dagster Cloud environment variables and secrets documentation for more info.

Adding dependencies#

Any dependencies specified in either requirements.txt or setup.py will be installed for you automatically by the Dagster Cloud Serverless infrastructure.

Customizing the container#

Many apps will work fine with the default Dagster Cloud Serverless setup. However, some apps may need to make changes to the base image, either to use a different OS, different Python version, or install some native dependencies. You can customize your image through our Serverless lifecycle hooks, and/or changing the base image.

Lifecycle hooks#

This method is the easiest to set up, and does not require setting up any additional infrastructure.

In the root of your repo, you can provide two optional shell scripts: dagster_cloud_pre_install.sh and dagster_cloud_post_install.sh. These will run before and after Python dependencies are installed. They are useful for installing any non-Python dependencies or otherwise configuring your environment.

Changing the base image#

This method is the most flexible, but requires setting up a pipeline outside of Dagster to build a custom base image.

The default base image is debian:bullseye-slim, but it can be changed.

With GitHub: you can provide a base_image input parameter to the "build and deploy" step in your GitHub Actions configuration (usually at .github/workflows/deploy.yml):


- name: Build and deploy to Dagster Cloud serverless
  uses: dagster-io/dagster-cloud-action/actions/serverless_prod_deploy@v0.1
  with:
    dagster_cloud_api_token: ${{ secrets.DAGSTER_CLOUD_API_TOKEN }}
    location: ${{ toJson(matrix.location) }}
    # Use a custom base image
    base_image: "my_base_image:latest"
    organization_id: ${{ secrets.ORGANIZATION_ID }}
  env:
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

With the CLI: add the --base-image CLI argument to the deploy command to specify the registry path to the desired base image.


    dagster-cloud serverless deploy --location-name=my_location --base-image=my_base_image:latest

Transitioning to Hybrid#

If your organization begins to hit the limitations of Serverless, you should transition to a Hybrid deployment. Hybrid deployments allow you to run an agent in your own infrastructure and give you substantially more flexibility and control over the Dagster environment.

To switch to Hybrid, navigate to Status > Agents in your Dagster Cloud account. On this page, you can disable the Serverless agent on and view instructions for enabling Hybrid.

Security and data protection#

Unlike Hybrid, Serverless Deployments on Dagster Cloud require direct access to your data, secrets and source code.

Dagster Cloud Serverless does not provide persistent storage. Ephemeral storage is deleted when a run concludes.
Secrets and source code are built into the image directly. Images are stored in a per-customer container registry with restricted access.
User code is securely sandboxed using modern container sandboxing techniques.
All production access is governed by industry-standard best practices which are regularly audited.