Skip to content

openclimatefix/dagster-dags

Repository files navigation

Dagster Dags

Dagster defintions for OCF's archival datasets


Ubiquitous language

The following terms are used throughout the codebase and documentation. They are defined here to avoid ambiguity.

  • InitTime - The time at which a forecast is initialized. For example, a forecast initialized at 12:00 on 1st January.
  • TargetTime - The time at which a predicted value is valid. For example, a forecast with InitTime 12:00 on 1st January predicts that the temperature at TargetTime 12:00 on 2nd January at position x will be 10 degrees.

Repository structure

Produced by eza:

eza --tree --git-ignore -F -I "*init*|test*.*|build"
./
├── cloud_archives/ # Dagster definitions for cloud-stored archival datasets
│  └── nwp/ # Specifications for Numerical Weather Predication data sources
│     └── icon/ 
├── constants.py # Values used across the project
├── dags_tests/ # Tests for the project
├── local_archives/ # Dagster defintions for locally-stored archival datasets
│  ├── nwp/ # Specifications for Numerical Weather Prediction data source
│  │  ├── cams/
│  │  └── ecmwf/
│  └── sat/ # Specifications for Satellite image data sources
├── managers/ # IO Managers for use across the project
├── pyproject.toml # The build configuration for the service
└── README.md

Conventions

The storage of data is handled automatically into locations defined by features of the data in question. The only configurable part of the storage is the Base Path - the root point from which dagster will then handle the subpaths. The full storage paths then take into account the following features:

  • The flavor of the data (NWP, Satellite etc)
  • The Provider of the data (CEDA, ECMWF etc)
  • The Region the data covers (UK, EU etc)
  • The InitTime the data refers to

Paths are then generated viabase/flavor/provider/region/inittime. See managers for an example implementation. For this to work, each asset must have an asset key prefix conforming to this structure [flavor, provider, region]. The Base Paths are defined in constants.py.

Local Development

First, install your Dagster code location as a Python package. By using the --editable flag, pip will install your Python package in "editable mode" so that as you develop, local code changes will automatically apply.

pip install -e ".[dev]"

Then, start the Dagster UI web server:

dagster dev --module-name=local_archives

Open http://localhost:3000 with your browser to see the project.

Add your assets to the relevant code location. See Repository Structure for details.

Useful links