Environment variables, which are key-value pairs configured outside your source code, allow you to dynamically modify application behavior depending on environment.
Using environment variables, you can define various configuration options for your Dagster application and securely set up secrets. For example, instead of hard-coding database credentials - which is bad practice and cumbersome for development - you can use environment variables to supply user details. This allows you to parameterize your pipeline without modifying code or insecurely storing sensitive data.
Setting environment variables in Dagster Open Source depends on where Dagster is deployed. Refer to the deployment guide for your platform for more info:
Using environment variables to provide secrets ensures sensitive info won't be visible in your code or the launchpad in the UI. In Dagster, best practice for handling secrets uses configuration and resources.
A resource is typically used to connect to an external service or system, such as a database. Resources can be configured separately from the rest of your app, allowing you to define it once and reuse it as needed.
Let's take a look at an example from the Dagster Crash Course, which creates a GitHub resource and supplies it to assets in a Dagster repository. Let's start by looking at the resource:
This code creates a GitHub resource named github_api
Using ConfigSchema, we've indicated that resource can accept a single config parameter, access_token
Using StringSource, we've indicated that the access_token config parameter can either be:
An environment variable, or
Provided directly in the configuration
As storing secrets in configuration is bad practice, we'll opt for using an environment variable. In this code, we're configuring the resource and supplying it to assets in the repository:
Using with_resources adds the github_api resource to the assets in the repository. In assets, we'll use the github_api resource key to reference the resource.
Using the configured method on the github_api resource, we can pass configuration info to the resource. In this example, we're telling Dagster to load the access_token from the GITHUB_ACCESS_TOKEN environment variable.
In this example, we'll demonstrate how to use different I/O manager configurations for local and production environments using configuration (specifically the configured API) and resources.
Using resource_defs, we've created a list of resource definitions, named after our local and production environments, that are available to the repository. In this example, we're using a Snowflake I/O manager.
For both local and production, we used the configured method on snowflake_io_manager to provide environment-specific run configuration. Note the differences in configuration between local and production, specifically where environment variables were used.
Following the list of resources, we define the deployment_name variable, which determines the current executing environment. This variable defaults to local, ensuring that DAGSTER_DEPLOYMENT=PRODUCTION must be set to use the production configuration.
Let's look at a function that determines the current deployment using the DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT environment variable:
defget_current_env():
is_branch_depl = os.getenv("DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT")=="1"assert is_branch_depl !=None# env var must be setreturn"branch"if is_branch_depl else"prod"
This function checks the value of DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT and, if equal to 1, returns a variable with the value of branch. This indicates that the current deployment is a Branch Deployment. Otherwise, the deployment is a full deployment and is_branch_depl will be returned with a value of prod.
Using this info, we can write code that executes differently when in a Branch Deployment or a full deployment.