Adding nc_def_var_quantize()/nc_inq_var_quantize() - second attempt #2088

edwardhartnett · 2021-08-24T07:56:19Z

Adding nc_def_var_quantize()/nc_inq_var_quantize() as discussed in recent meeting.

This is a second try, after #2081. In this version I add the inq function directly to the dispatch table, as suggested by @DennisHeimbigner , instead of extending inq_var_all(). This leads to a cleaner implementation, with fewer changes in the other dispatch layers.

This is now complete and fully documented and tested. Comments and questions welcome.

Fixes #1548

Fortran changes can be found in Unidata/netcdf-fortran#304.

I have issue open to determine where this should be documented in the language-neutral docs: Unidata/netcdf#46.

czender

I read the draft PR, which looks to be in really good shape. It always bemuses me that the netCDF API is so elegant, yet conceals so much fine detail under-the-hood. This is probably a good chance to comment while things are in flux.

int quantize_mode;           /**< Quantize mode. NC_NOQUANTIZE is 0, and means no quantization. */
int nsd;                     /**< Number of significant digits if quantization is used, 0 if not. */

Using nsd == 0 for no quantization is consistent with the existing API, e.g., DEFLATE level=0 indicates uncompressed data. However, nsd=0 seems inverted since the smaller the NSD the more quantization is performed. One alternative would be to use something like nsd=7 (for float) or nsd=15 (for double) to indicate no quantization. That would certainly be messier than nsd=0, though it would preserve the meaning of NSD. Thus I think your current implementation is better from a software engineering standpoint, though possibly more unclear to an end-user who might wonder how NSD can be 0.

Also I favor quantize_bitgroom_number_of_significant_digits or _quantize_bitgroom_number_of_significant_digits as Ed mentioned in an email to me, since those are more extensible to possible future quantization modes.

DennisHeimbigner · 2021-08-25T22:11:55Z

The mode and nsd arguments are integers, so that gives the option to use -1 (negative one)
as a signal flag. I think most users would recognize it as such.

edwardhartnett · 2021-08-26T09:07:47Z

@czender would nsd=0 be a valid setting if I were not using it to turn off quantize?

We already have an easy way to turn quantize off, which is to pass a mode flag of NC_NOQUANTIZE (which is 0). The only reason to have nsd = 0 turn off quantize was to be consisten with the deflate call. If we are not going to be consistent with deflate in this regard, then I would suggest that we use the quantize_mode to turn quantize off, and return NC_EINVAL for any invalid nsd value.

If we change the name of the attribute to _quantize_bitgroom_number_of_significant_digits will you propose that as your CF convention?

Will NCO switch to the new attribute name? Because, IIRC, you are now using a different name with NCO.

czender · 2021-08-26T15:43:46Z

@edwardhartnett nsd=0 is only valid theoretically, e.g., in the sense that 7 = 5 to 0 significant digits. However, when given NSD=0, the NCO BG implementation will complain that NSD must be a positive integer.

As to your next question, I'm unsure what it means to "turn quantize off". That seems to be an implementation concept inherited from lossless compression. Once a number has been quantized, it always remains quantized. Help me understand how you envision the netCDF implementation working. Will the _quantize_bitgroom_number_of_significant_digits (or whatever) attribute only be added/reported for variables that are initially defined/written with quantization? Or will ncdump -h report that attribute also for variables that were not quantized? Is it permissible to "turn quantize off" for a variable that has been quantized? What would that do to the variable's values (nothing, I presume) and metadata?

If we change the name of the attribute to _quantize_bitgroom_number_of_significant_digits will you propose that as your CF convention?

Yes

Will NCO switch to the new attribute name? Because, IIRC, you are now using a different name with NCO.

Yes, NCO will adopt whatever convention CF agrees on in the end. For the time-being, I will change it to _quantize_bitgroom_number_of_significant_digits in the next release. On second thought, I will remove the leading underscore from the NCO implementation for the time-being, to distinguish NCO-quantized datasets from netCDF-quantized datasets. It might make development less confusing that way.

edwardhartnett · 2021-08-26T16:19:13Z

OK, so 0 is not a valid nsd value.

Quantize can only be turned off before any data are written or the variable created in the HDF5 file, just like deflate. So it can be turned on and off as many times as the user pleases, until the first enddef is called. At that point, it is written in stone, and can never be changed.

Once the variable is actually created as a dataset in the HDF5 file, attempts to change quantize will result in the appropriately named error code NC_ELATEDEF.

So quantize can not be changed on any var that has any data, or turned on after any data have been written. It can only be set at file create time, before anything is written to disk.

Quantize is on a variable by variable basis. So the quantization attribute will only be on the variables that have quantization. The attributes are variable attributes, not NC_GLOBAL. Does NCO use a global attribute and apply it to all variables?

…_digits

czender · 2021-08-26T16:40:01Z

NCO writes the quantization attribute on a variable-by-variable basis.

…ARY_PATH first

re: PR Unidata#2088 Primary changes: * Add NCZarr-specific quantize function to the dispatch table. * Copy quantize code from libhdf5 * Add quantize invocation to zvar.c * Add support for _QuantizeBitgroomNumberOfSignificantDigits to ncgen. * Copy quantize test from nc_test4 to nczarr_tests. Remove some parts that are not relevant to NCZarr. Other Changes: * Break zsync.c into zsync.c (writing) and zload.c (reading). * Clean up the fill value handling (many changes) * Disable atexit() under Windows * Move ncjson to libdispatch * Add documentation of differences between netcdf-4 and NCZarr, especially WRT fill value. * Some mingw fixes * Remove some cruft * Cleanup the handling of scalars

re: PR Unidata#2088 re: PR Unidata#2130 replaces: Unidata#2140 Changes: * Add NCZarr-specific quantize functions to the dispatch table. * Copy (modified) quantize code from libhdf5 to NCZarr * Add quantize invocation to zvar.c * Add support for _QuantizeBitgroomNumberOfSignificantDigits and _QuantizeGranularBitgroomNumberOfSignificantDigits to ncgen. * Modify nc_test4/tst_quantize.c to allow it to be used both for hdf5 and for nczarr. * Make dap4 properly handle quantize functions in dispatch table. * Add quantize attribute support to ncgen. Other changes: * Caught and fixed some S3 problems * Fixed some nczarr fillvalue problems. * Fixed some nczarr cache problems. * Cleanup some flaws in libdispatch/dinfermodel.c * Allow byterange requests to S3 be readable by dinfermodel.c/check_file_type * Remove the libnczarr ztracedispatch code (big change).

edwardhartnett added 18 commits August 24, 2021 00:45

getting ready for next try at quantization code

9a18689

further preparation for try 2 at quantizing

dabe008

further preparation for try 2 at quantizing

d475e1f

adding quantize functions to all the dispatch tables

3202b8b

now qunatizing with inq function in dispatch table

d6d9825

fixed version numbers

74c4b9d

merged configure.ac and CMakeLists.txt with changes from master branch

f3435da

merged nc4hdf.c with changes from master branch

d053418

fixed comment

24ed2a4

further development

233ddfb

now reading quantize attribute to get settings

148706b

moving function

c9eca4b

perparing to apply bitgroom algorithm

0f26083

more testing of qunatize setting

a02faa0

more quantize testing

b2c0bb9

more quantize testing

4ac7fa9

more quantize testing

ee788d6

more quantize testing

c655488

czender reviewed Aug 25, 2021

View reviewed changes

edwardhartnett added 3 commits August 26, 2021 06:37

more tests for quantization

539578d

more quantize testing

c609a17

more quantize testing

0265953

edwardhartnett added 2 commits August 26, 2021 10:24

changed name of attribute to _quantize_bitgroom_number_of_significant…

2b3d2c1

…_digits

improved doxygen documenation

d29436c

improved doxygen documenation

4f96fcc

edwardhartnett added 11 commits August 31, 2021 07:10

testing with fill values

e3c8be8

testing with fill values

4cd4aff

merged master

684f73c

more tests

3e056f4

merged

30448b4

turned off failing quantize test

ae3b083

code clean up

d2656ba

refactored quantize code

e2570c3

more tests for quantize

09defc5

added parallel I/O quantize test

f880a63

added parallel I/O quantize test

18aebd9

edwardhartnett marked this pull request as ready for review September 2, 2021 16:22

edwardhartnett requested a review from WardF as a code owner September 2, 2021 16:22

This was referenced Sep 4, 2021

Add quantize feature to F77 and F90 APIs, with tests and documentation Unidata/netcdf-fortran#304

Merged

Where should I document quantization feature? Unidata/netcdf#46

Open

edwardhartnett and others added 6 commits September 7, 2021 10:44

Merge branch 'main' into ejh_quantize_2

0ce4637

improving benchmark program

7943172

changed makefile to allow tst_gfs_data_1 to pick up libz from LD_LIBR…

db72457

…ARY_PATH first

changed makefile to make benchmark bm_file work properly with zlib-ng

9cc39fe

changed name of tst_gfs_data_1.c to tst_compress_par.c

e8587b5

tinker with data algorithm for tst_compress_par.c

7806ded

WardF added this to the 4.9.0 milestone Sep 9, 2021

WardF self-assigned this Sep 9, 2021

now nsd of 0 is NC_EINVAL for nc_def_var_quantize()

5200477

WardF merged commit 437060b into Unidata:main Oct 1, 2021

edwardhartnett deleted the ejh_quantize_2 branch October 2, 2021 12:02

DennisHeimbigner mentioned this pull request Nov 3, 2021

Add bitgroom support to NCZarr #2140

Closed

DennisHeimbigner mentioned this pull request Jan 24, 2022

Add complete bitgroom support to NCZarr #2197

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding nc_def_var_quantize()/nc_inq_var_quantize() - second attempt #2088

Adding nc_def_var_quantize()/nc_inq_var_quantize() - second attempt #2088

edwardhartnett commented Aug 24, 2021 •

edited

Loading

czender left a comment •

edited

Loading

DennisHeimbigner commented Aug 25, 2021

edwardhartnett commented Aug 26, 2021 •

edited

Loading

czender commented Aug 26, 2021 •

edited

Loading

edwardhartnett commented Aug 26, 2021 •

edited

Loading

czender commented Aug 26, 2021

Adding nc_def_var_quantize()/nc_inq_var_quantize() - second attempt #2088

Adding nc_def_var_quantize()/nc_inq_var_quantize() - second attempt #2088

Conversation

edwardhartnett commented Aug 24, 2021 • edited Loading

czender left a comment • edited Loading

Choose a reason for hiding this comment

DennisHeimbigner commented Aug 25, 2021

edwardhartnett commented Aug 26, 2021 • edited Loading

czender commented Aug 26, 2021 • edited Loading

edwardhartnett commented Aug 26, 2021 • edited Loading

czender commented Aug 26, 2021

edwardhartnett commented Aug 24, 2021 •

edited

Loading

czender left a comment •

edited

Loading

edwardhartnett commented Aug 26, 2021 •

edited

Loading

czender commented Aug 26, 2021 •

edited

Loading

edwardhartnett commented Aug 26, 2021 •

edited

Loading