Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding nc_def_var_quantize()/nc_inq_var_quantize() - second attempt #2088

Merged
merged 57 commits into from
Oct 1, 2021

Conversation

edwardhartnett
Copy link
Contributor

@edwardhartnett edwardhartnett commented Aug 24, 2021

Adding nc_def_var_quantize()/nc_inq_var_quantize() as discussed in recent meeting.

This is a second try, after #2081. In this version I add the inq function directly to the dispatch table, as suggested by @DennisHeimbigner , instead of extending inq_var_all(). This leads to a cleaner implementation, with fewer changes in the other dispatch layers.

This is now complete and fully documented and tested. Comments and questions welcome.

Fixes #1548

Fortran changes can be found in Unidata/netcdf-fortran#304.

I have issue open to determine where this should be documented in the language-neutral docs: Unidata/netcdf#46.

Copy link
Contributor

@czender czender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the draft PR, which looks to be in really good shape. It always bemuses me that the netCDF API is so elegant, yet conceals so much fine detail under-the-hood. This is probably a good chance to comment while things are in flux.

int quantize_mode;           /**< Quantize mode. NC_NOQUANTIZE is 0, and means no quantization. */
int nsd;                     /**< Number of significant digits if quantization is used, 0 if not. */

Using nsd == 0 for no quantization is consistent with the existing API, e.g., DEFLATE level=0 indicates uncompressed data. However, nsd=0 seems inverted since the smaller the NSD the more quantization is performed. One alternative would be to use something like nsd=7 (for float) or nsd=15 (for double) to indicate no quantization. That would certainly be messier than nsd=0, though it would preserve the meaning of NSD. Thus I think your current implementation is better from a software engineering standpoint, though possibly more unclear to an end-user who might wonder how NSD can be 0.

Also I favor quantize_bitgroom_number_of_significant_digits or _quantize_bitgroom_number_of_significant_digits as Ed mentioned in an email to me, since those are more extensible to possible future quantization modes.

@DennisHeimbigner
Copy link
Collaborator

The mode and nsd arguments are integers, so that gives the option to use -1 (negative one)
as a signal flag. I think most users would recognize it as such.

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Aug 26, 2021

@czender would nsd=0 be a valid setting if I were not using it to turn off quantize?

We already have an easy way to turn quantize off, which is to pass a mode flag of NC_NOQUANTIZE (which is 0). The only reason to have nsd = 0 turn off quantize was to be consisten with the deflate call. If we are not going to be consistent with deflate in this regard, then I would suggest that we use the quantize_mode to turn quantize off, and return NC_EINVAL for any invalid nsd value.

If we change the name of the attribute to _quantize_bitgroom_number_of_significant_digits will you propose that as your CF convention?

Will NCO switch to the new attribute name? Because, IIRC, you are now using a different name with NCO.

@czender
Copy link
Contributor

czender commented Aug 26, 2021

@edwardhartnett nsd=0 is only valid theoretically, e.g., in the sense that 7 = 5 to 0 significant digits. However, when given NSD=0, the NCO BG implementation will complain that NSD must be a positive integer.

As to your next question, I'm unsure what it means to "turn quantize off". That seems to be an implementation concept inherited from lossless compression. Once a number has been quantized, it always remains quantized. Help me understand how you envision the netCDF implementation working. Will the _quantize_bitgroom_number_of_significant_digits (or whatever) attribute only be added/reported for variables that are initially defined/written with quantization? Or will ncdump -h report that attribute also for variables that were not quantized? Is it permissible to "turn quantize off" for a variable that has been quantized? What would that do to the variable's values (nothing, I presume) and metadata?

If we change the name of the attribute to _quantize_bitgroom_number_of_significant_digits will you propose that as your CF convention?

Yes

Will NCO switch to the new attribute name? Because, IIRC, you are now using a different name with NCO.

Yes, NCO will adopt whatever convention CF agrees on in the end. For the time-being, I will change it to _quantize_bitgroom_number_of_significant_digits in the next release. On second thought, I will remove the leading underscore from the NCO implementation for the time-being, to distinguish NCO-quantized datasets from netCDF-quantized datasets. It might make development less confusing that way.

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Aug 26, 2021

OK, so 0 is not a valid nsd value.

Quantize can only be turned off before any data are written or the variable created in the HDF5 file, just like deflate. So it can be turned on and off as many times as the user pleases, until the first enddef is called. At that point, it is written in stone, and can never be changed.

Once the variable is actually created as a dataset in the HDF5 file, attempts to change quantize will result in the appropriately named error code NC_ELATEDEF.

So quantize can not be changed on any var that has any data, or turned on after any data have been written. It can only be set at file create time, before anything is written to disk.

Quantize is on a variable by variable basis. So the quantization attribute will only be on the variables that have quantization. The attributes are variable attributes, not NC_GLOBAL. Does NCO use a global attribute and apply it to all variables?

@czender
Copy link
Contributor

czender commented Aug 26, 2021

NCO writes the quantization attribute on a variable-by-variable basis.

@WardF WardF added this to the 4.9.0 milestone Sep 9, 2021
@WardF WardF self-assigned this Sep 9, 2021
@WardF WardF merged commit 437060b into Unidata:main Oct 1, 2021
@edwardhartnett edwardhartnett deleted the ejh_quantize_2 branch October 2, 2021 12:02
DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this pull request Nov 3, 2021
re: PR Unidata#2088

Primary changes:
* Add NCZarr-specific quantize function to the dispatch table.
* Copy quantize code from libhdf5
* Add quantize invocation to zvar.c
* Add support for _QuantizeBitgroomNumberOfSignificantDigits to ncgen.
* Copy quantize test from nc_test4 to nczarr_tests. Remove some parts that are not relevant to NCZarr.

Other Changes:
* Break zsync.c into zsync.c (writing) and zload.c (reading).
* Clean up the fill value handling (many changes)
* Disable atexit() under Windows
* Move ncjson to libdispatch
* Add documentation of differences between netcdf-4 and NCZarr, especially WRT fill value.
* Some mingw fixes
* Remove some cruft
* Cleanup the handling of scalars
DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this pull request Jan 24, 2022
re: PR Unidata#2088
re: PR Unidata#2130
replaces: Unidata#2140

Changes:
* Add NCZarr-specific quantize functions to the dispatch table.
* Copy (modified) quantize code from libhdf5 to NCZarr
* Add quantize invocation to zvar.c
* Add support for _QuantizeBitgroomNumberOfSignificantDigits
and _QuantizeGranularBitgroomNumberOfSignificantDigits to ncgen.
* Modify nc_test4/tst_quantize.c to allow it to be used both for hdf5
  and for nczarr.
* Make dap4 properly handle quantize functions in dispatch table.
* Add quantize attribute support to ncgen.

Other changes:
* Caught and fixed some S3 problems
* Fixed some nczarr fillvalue problems.
* Fixed some nczarr cache problems.
* Cleanup some flaws in libdispatch/dinfermodel.c
* Allow byterange requests to S3 be readable by dinfermodel.c/check_file_type
* Remove the libnczarr ztracedispatch code (big change).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide a way to do bit grooming before compression
5 participants