Data applications are notoriously difficult to test and are therefore often un- or under-tested.
Creating testable and verifiable ops and jobs is one of the focuses of Dagster. We believe ensuring data quality is critical for managing the complexity of data systems. Here, we'll show how to write unit tests for Dagster jobs and ops.
Let's go back to the diamond job we wrote in the prior section, and ensure that it's working as expected by writing some unit tests.
We'll start by writing a test for the test_get_total_size op, which takes a dictionary of file sizes as input and returns the sum of the file sizes. To run an op, we can invoke it directly, as if it's a regular Python function:
deftest_get_total_size():
file_sizes ={"file1":400,"file2":50}
result = get_total_size(file_sizes)assert result ==450
We'll also write a test for the entire job. The JobDefinition.execute_in_process method synchronously executes a job and returns a ExecuteInProcessResult, whose methods let us investigate, in detail, the success or failure of execution, the outputs produced by ops, and (as we'll see later) other events associated with execution.
deftest_diamond():
res = diamond.execute_in_process()assert res.success
assert res.output_for_node("get_total_size")>0
Now we can use pytest, or another test runner of choice, to run these unit tests.
pytest test_complex_job.py
Obviously, in production we'll often execute jobs in a parallel, streaming way that doesn't admit this kind of API, which is intended to enable local tests like this.
Dagster is written to make testing easy in a domain where it has historically been very difficult. You can learn more about Testing in Dagster by reading the Testing page.