-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameterize Geoprocessing Rasters #3406
Conversation
Previously the layer names used for geoprocessing were hard-coded. As of #3399 we now have a number of NLCD layers, which may be used for modeling purposes. To replace the NLCD layer in all operations we switch to this tokenized scheme. Now the operation definitions only carry the tokens, with a base configuration defined with the previously used values. Both `run` and `multi`, the two geoprocessing functions, now take an optional `layer_overrides` dictionary, which is a mapping from token to layer name. At run time, these overrides are combined with the base configuration, which is in turn used to replace all tokens with real layer names in the operation.
Since the cached value now depends on the layer overrides provided, we now add them to the cache key. Any existing caches will have to be deleted if ever we make changes to the default layers in base settings.
Previously we had a separate JSON block for every NLCD year for land analysis, which differed by only one line. Now, using the new token feature, we reduce those to only one block, and replace the tokens according to the requested NLCD year. The schema validator is also updated to now use direct values, since the nlcd_XYZ_ara configs are no longer available.
A hitherto unnoticed bug was revealed when the Analyze Land endpoint switched to using tokenized instances of the same JSON config: the token replacements would change the default config itself, since dict.copy() is a shallow copy. This is fixed by using copy.deepcopy() instead. It is unlikely that this would have occurred in the past, as 1. We used a separate JSON block for every geoprocessing run, rather than the same blocks as we are now with tokenization 2. The contents of the JSON block were not changed other than adding the input polygon, which would be overwritten on every call
There's a failing test, taking a look at it now
|
It was a seemingly transient failure. Am not seeing that test fail on my local, and it now it passes on CI as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to run through the testing steps and see the same changes between 2001 and 2019 on both staging and this branch. I left a few comments not related to the functionality.
Thanks for all the great suggestions! They have all been recorded in future issues or addressed herein. Going to merge this now so it's ready for demo tomorrow, especially since the last commit is only documentation changes. |
Overview
Previously, the raster layer names were hardcoded in specific JSON blocks of configuration. Now, since we may use a variant for different types of raster layers (land, soil, etc), we parameterize the configuration, replacing a token with the real layer name at run time.
There is a default set of layers that will be used if no overrides are provided. Only those layers that are being overridden need to be provided, the rest will come from the default.
When using the default layers, the existing cache keys will be used. When using overridden layers, those layer names will be added to the cache key. If the default layers are ever changed, we'll have to clear the geoprocessing cache. This PR only tokenizes the layers, and does not change the default ones.
This will mainly be used for MapShed modeling, but we also refactor Land analysis to use this, which cuts down on lines of code and demos the feature.
Connects #3405
Testing Instructions
$ vagrant ssh services -c 'redis-cli -n 1 --raw KEYS ":1:geop_*" | xargs redis-cli -n 1 DEL'
$ vagrant ssh worker -c 'sudo service celeryd restart'