Add classmethod to Hazard for reading raster-like data from NetCDF file #487

peanutfun · 2022-06-21T14:31:12Z

The Hazard class offers several options to instantiate it from data files, e.g. from_raster, from_excel, etc. The classmethod from_raster, in particular, uses rasterio to open datasets and read their metadata, coordinates, and data. In this issue, I want to discuss if a general-purpose classmethod for reading data from a NetCDF file into a Hazard object might be useful, and how such a method could look like. A method implementing such a functionality to some extent can be found at climada_petals/blob/feature/wildfire/climada_petals/hazard/wildfire.py#L2247.

What the method should do

Use a single NetCDF file to load data for a consistent instance of Hazard, meaning that if data is missing, it will be set to a sensible default.

The minimal (i.e., essential) data supplied as variables in the file should be

hazard intensity data (2D or 3D dataset)
coordinates (1D dataset each)
time (1D dataset, if applicable)

Optional data could include:

hazard fraction data (same dimensions as intensity)
event frequency (1D)
event names (1D)
event IDs (1D)
coordinate system information (attributes/metadata)

Method signature

from_netcdf should take the following arguments:

data (path-like or xarray.Dataset, required): The dataset. Open the file if it is a path.
intensity_var (string, required): The name of the hazard intensity variable in the dataset
fraction_var (string, optional): The name of the hazard fraction variable in the dataset
coordinate_vars (dict, optional): A mapping from default coordinate names to the variables used as coords in the dataset, e.g. dict(longitude="lon", latitude="y")
tbd

Method outline

Suppose a netCDF file contains the following data:

intensity: 3D dataset (dims: "time", "longitude", "latitude")
1D coordinate dataset for each dimension

Then the following code creates a consistent Hazard instance from this data:

import xarray as xr
from scipy.sparse import csr_matrix
from climada.hazard import Hazard
from climada.hazard.centroids.centr import Centroids

data = xr.open_dataset("...")
hazard = Hazard()

# Transpose the data so we flatten it with longitude running "fastest"
intensity = data["intensity"].transpose("time", "latitude", "longitude")
hazard.intensity = csr_matrix(intensity.values.reshape((data.sizes["time"], -1)))
hazard.intensity.eliminate_zeros()

# Build centroids
lat, lon = np.meshgrid(data["latitude"].values, data["longitude"].values, indexing="ij")
hazard.centroids = Centroids.from_lat_lon(lat.flatten(), lon.flatten())
hazard.centroids.set_lat_lon_to_meta()

# Consistent Hazard also needs
# hazard.fraction, hazard.event_id, hazard.event_name, hazard.frequency, hazard.date
# but these can be defaulted, e.g.
hazard.event_id = np.array(range(1, data.sizes["time"] + 1))

The text was updated successfully, but these errors were encountered:

chahank · 2022-06-22T12:48:00Z

Great idea. Three points:

we need some flexibility in the input variable naming
we should keep the Hazard object in vector format
if would be nice to potentially extract some metadata for the 'tag'

timschmi95 · 2022-06-22T13:51:47Z

Some points from my side:

Thomas Röösli already wrote the centroids_from_nc method, which gets the centroids from a netcdf, of which we could probably use some parts. In particular, the method should probably be able to read netcdf file with 1D arrays for lat/lon (then it creates a meshsgrid), but also with 2D arrays of lat/lon in which case no meshgrid needs to be created.
I used the xr method .stack to flatten the dataset. Not sure if it's better or worse than the .transpose option in your suggestion
@peanutfun I'll send you my netcdf file and reader method on slack to compare

peanutfun · 2022-07-07T10:31:17Z

How should we handle optional information (i.e. information for which we can supply a default) in general? If we don't want users to always specify which data exactly to read from a file, we would need to use some kind of lookup. Consider the following:

The data file contains a coordinate "frequency". By default, the user probably expects this to be loaded as hazard event frequency, even without stating a frequency="frequency" parameter specifically. If such a coordinate does not exist, and the user did not specify a frequency parameter, then the method should choose a default value. Also, the user must be able to specify that the frequency coordinate should not be read.

So I see these use cases:

User does not specify a parameter. The method looks for the default coordinate name, and if found, uses this data. If not, it uses a "sensible" default value.
User specifies a coordinate name. The method should only use this coordinate and throw an error if it cannot find it.
User speficies "" as coordinate name. The method should fall back to the default value, even though the default coordinate is available in the data.

Examples in code:

# Signature:
def from_raster_netcdf(self, file, frequency=None, **kwargs):
    pass

Hazard.from_raster_netcdf(file)  # Load 'frequency' data if available, use default otherwise
Hazard.from_raster_netcdf(file, frequency="freq")  # Load 'freq' data or throw error
Hazard.from_raster_netcdf(file, frequency="")  # Ignore 'frequency' data, use default

peanutfun · 2022-07-14T11:39:44Z

@chahank @emanuel-schmid @timschmi95 Some input/opinions would be welcome here 🙏

How should we handle optional information (i.e. information for which we can supply a default) in general? If we don't want users to always specify which data exactly to read from a file, we would need to use some kind of lookup.

See #487 (comment)

chahank · 2022-07-15T12:51:36Z

User speficies "" as coordinate name. The method should fall back to the default value, even though the default coordinate is available in the data.

I like the general idea, but I am a bit confused by this use case. I would rather say that it does then not give any frequency value and the use should define it?

peanutfun · 2022-07-15T13:13:54Z

Good point! I think the goal should be that the new method always returns a consistent Hazard object, meaning that it is ready to be used in computations. This is not the case if hazard.frequency is set to None. Therefore, I would go for the default even in this use case, because the user still has to do the same steps if they want to add custom frequency information. Consider:

# Load hazard, ignoring the 'frequency' data in the file
hazard = Hazard.from_raster_netcdf(file, frequency="")

# Default frequency is loaded, 'hazard' is consistent
np.testing.assert_array_equal(hazard.frequency, np.ones(hazard.event_id.size))

# Overwrite the frequency
# NOTE: Exactly the same if hazard.frequency is None or the default np.array
hazard.frequency = my_fancy_frequency

chahank · 2022-07-15T13:30:33Z

Good point, it should return a consistent object.

peanutfun · 2023-02-08T10:22:11Z

Fixed by #507, #578, #589

peanutfun added the feature request label Jun 21, 2022

peanutfun mentioned this issue Jul 5, 2022

Add Hazard classmethod for loading xarray Datasets #507

Merged

11 tasks

peanutfun mentioned this issue Jul 12, 2022

Store Impact objects into NetCDF and load them again using xarray #514

Closed

peanutfun linked a pull request Feb 8, 2023 that will close this issue

Add Hazard classmethod for loading xarray Datasets #507

Merged

11 tasks

peanutfun closed this as completed Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add classmethod to Hazard for reading raster-like data from NetCDF file #487

Add classmethod to Hazard for reading raster-like data from NetCDF file #487

peanutfun commented Jun 21, 2022 •

edited

Loading

chahank commented Jun 22, 2022

timschmi95 commented Jun 22, 2022

peanutfun commented Jul 7, 2022 •

edited

Loading

peanutfun commented Jul 14, 2022

chahank commented Jul 15, 2022 •

edited

Loading

peanutfun commented Jul 15, 2022

chahank commented Jul 15, 2022

peanutfun commented Feb 8, 2023

Add classmethod to Hazard for reading raster-like data from NetCDF file #487

Add classmethod to Hazard for reading raster-like data from NetCDF file #487

Comments

peanutfun commented Jun 21, 2022 • edited Loading

What the method should do

Method signature

Method outline

chahank commented Jun 22, 2022

timschmi95 commented Jun 22, 2022

peanutfun commented Jul 7, 2022 • edited Loading

peanutfun commented Jul 14, 2022

chahank commented Jul 15, 2022 • edited Loading

peanutfun commented Jul 15, 2022

chahank commented Jul 15, 2022

peanutfun commented Feb 8, 2023

peanutfun commented Jun 21, 2022 •

edited

Loading

peanutfun commented Jul 7, 2022 •

edited

Loading

chahank commented Jul 15, 2022 •

edited

Loading