The dagster_pandas library provides utilities for using pandas with Dagster and for implementing validation on pandas DataFrames. A good place to start with dagster_pandas is the validation guide.
Constructs a custom pandas dataframe dagster type.
name (str) – Name of the dagster pandas type.
description (Optional[str]) – A markdown-formatted string, displayed in tooling.
columns (Optional[List[PandasColumn]]) – A list of PandasColumn
objects
which express dataframe column schemas and constraints.
event_metadata_fn (Optional[Callable[[], Union[Dict[str, Union[str, float, int, Dict, MetadataValue]], List[MetadataEntry]]]]) – A callable which takes your dataframe and returns a dict with string label keys and MetadataValue values. Can optionally return a List[MetadataEntry].
dataframe_constraints (Optional[List[DataFrameConstraint]]) – A list of objects that inherit from
DataFrameConstraint
. This allows you to express dataframe-level constraints.
loader (Optional[DagsterTypeLoader]) – An instance of a class that
inherits from DagsterTypeLoader
. If None, we will default
to using dataframe_loader.
materializer (Optional[DagsterTypeMaterializer]) – An instance of a class
that inherits from DagsterTypeMaterializer
. If None, we will
default to using dataframe_materializer.
A dataframe constraint that validates the expected count of rows.
num_allowed_rows (int) – The number of allowed rows in your dataframe.
error_tolerance (Optional[int]) – The acceptable threshold if you are not completely certain. Defaults to 0.
A dataframe constraint that validates column existence and ordering.
strict_column_list (List[str]) – The exact list of columns that your dataframe must have.
enforce_ordering (Optional[bool]) – If true, will enforce that the ordering of column names must match. Default is False.
The main API for expressing column level schemas and constraints for your custom dataframe types.
name (str) – Name of the column. This must match up with the column name in the dataframe you expect to receive.
is_required (Optional[bool]) – Flag indicating the optional/required presence of the column. If th column exists, the validate function will validate the column. Defaults to True.
constraints (Optional[List[Constraint]]) – List of constraint objects that indicate the validation rules for the pandas column.
Define a type in dagster. These can be used in the inputs and outputs of ops.
type_check_fn (Callable[[TypeCheckContext, Any], [Union[bool, TypeCheck]]]) – The function that defines the type check. It takes the value flowing
through the input or output of the op. If it passes, return either
True
or a TypeCheck
with success
set to True
. If it fails,
return either False
or a TypeCheck
with success
set to False
.
The first argument must be named context
(or, if unused, _
, _context
, or context_
).
Use required_resource_keys
for access to resources.
key (Optional[str]) –
The unique key to identify types programmatically.
The key property always has a value. If you omit key to the argument
to the init function, it instead receives the value of name
. If
neither key
nor name
is provided, a CheckError
is thrown.
In the case of a generic type such as List
or Optional
, this is
generated programmatically based on the type parameters.
For most use cases, name should be set and the key argument should not be specified.
name (Optional[str]) – A unique name given by a user. If key
is None
, key
becomes this value. Name is not given in a case where the user does
not specify a unique name for this type, such as a generic class.
description (Optional[str]) – A markdown-formatted string, displayed in tooling.
loader (Optional[DagsterTypeLoader]) – An instance of a class that
inherits from DagsterTypeLoader
and can map config data to a value of
this type. Specify this argument if you will need to shim values of this type using the
config machinery. As a rule, you should use the
@dagster_type_loader
decorator to construct
these arguments.
materializer (Optional[DagsterTypeMaterializer]) – An instance of a class
that inherits from DagsterTypeMaterializer
and can persist values of
this type. As a rule, you should use the
@dagster_type_materializer
decorator to construct these arguments.
required_resource_keys (Optional[Set[str]]) – Resource keys required by the type_check_fn
.
is_builtin (bool) – Defaults to False. This is used by tools to display or
filter built-in types (such as String
, Int
) to visually distinguish
them from user-defined types. Meant for internal use.
kind (DagsterTypeKind) – Defaults to None. This is used to determine the kind of runtime type for InputDefinition and OutputDefinition type checking.
typing_type – Defaults to None. A valid python typing type (e.g. Optional[List[int]]) for the value contained within the DagsterType. Meant for internal use.