Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ncdump failed assert() when reading NCZarr file #2485

Closed
czender opened this issue Aug 22, 2022 · 6 comments
Closed

ncdump failed assert() when reading NCZarr file #2485

czender opened this issue Aug 22, 2022 · 6 comments
Assignees
Milestone

Comments

@czender
Copy link
Contributor

czender commented Aug 22, 2022

Hola Unidata,

Zarr'ification of NCO is proceeding in fits and starts. When I throw a non-trivial file (below) at the file:// scheme, I get the same breakage that I do not understand in both Unidata tools (ncdump) and NCO (ncks). I wonder if this might be an NCZarr library issue. The following demonstrates behavior with today's main tree of netcdf-c development on up-to-date MacOS 12.4, and the behavior seems replicable on other OS's.

The problem manifests as a failed assertion when reading a fairly detailed NCZarr file:

zender@spectral:~$ ncdump "file://${HOME}/in_zarr#mode=nczarr,file"
Assertion failed: (NCJlength(jdimrefs) == rank), function define_vars, file zsync.c, line 1727.
Abort trap: 6
zender@spectral:~$ ncks "file://${HOME}/in_zarr#mode=nczarr,file"
Assertion failed: (NCJlength(jdimrefs) == rank), function define_vars, file zsync.c, line 1727.
Abort trap: 6
zender@spectral:~$ 

The indicated zsync.c code is obscure to me, something to convert JSON dimension lists->netCDF. To create the input NCZarr file I used the latest ncgen on in_zarr.cdl (attached) as follows:

ncgen -lb -o "file://${HOME}/in_zarr#mode=nczarr,file" ${HOME}/in_zarr.cdl

The NCZarr production proceeds nominally. in_zarr.cdl.txt (attached) is the standard netCDF3-based (no netCDF4 atomic types, attributes, or features) test file in.cdl with the single modification that the time dimension has been changed from unlimited to fixed size = 10 (otherwise the conversion to NCZarr would fail). Manual inspection shows that the NCZarr file appears to be correctly created on disk, so I expect that both ncdump and ncks will print its contents just as they both do correctly for much simpler NCZarr files (e.g., one or a few variables). Instead the above error occurs. I hope it is easy for others to reproduce this error. I expect I might be asking NCZarr to handle a feature it does not yet support but I can't discern what that might be since the ncgen conversion works correctly. Any pointers or insight appreciated.

Thanks,
Charlie

@WardF WardF self-assigned this Aug 22, 2022
@WardF WardF added this to the 4.9.1 milestone Aug 22, 2022
@WardF
Copy link
Member

WardF commented Aug 22, 2022

I'm able to duplicate this on MacOS 12.5.1; any immediate idea, @DennisHeimbigner ? I'll keep digging in as well.

@DennisHeimbigner
Copy link
Collaborator

The problem appears to be the handling of scalars. Pure Zarr does not
support scalar values, but NCZarr does. So I have code to treat a scalar
as of shape "[1]" and then having hacks in various places to recognize
the situation.
Anyway, there is a bug in the handling of scalars.
A temporary work around is defining dimension named "scalar"
and convert the scalar variables such as lat_times_lon_nb
to lat_times_lon_nb[scalar].

@czender
Copy link
Contributor Author

czender commented Aug 22, 2022

Thanks for looking into this. I hadn't realized there was anything special about scalars in Zarr vs. NCZarr. I'll continue the Zarr-ification and just test it without scalars until a patch for scalars is merged.

@DennisHeimbigner
Copy link
Collaborator

I had better make sure that is mentioned in the documentation.

@czender
Copy link
Contributor Author

czender commented Aug 23, 2022

I can confirm that adding a degenerate (size=1) dimension to all the scalars in that test file results in the expected/desired behavior. Also that I do not see the special treatment of scalars mentioned in the NCZarr documentation.

DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Aug 28, 2022
* re: Unidata#2278
* re: Unidata#2485
* re: Unidata#2474

This PR subsumes PR Unidata#2278.
Actually is a bit an omnibus covering several issues.

## PR Unidata#2278
Add support for the Zarr string type.
Zarr strings are restricted currently to be of fixed size.
The primary issue to be addressed is to provide a way for user to
specify the size of the fixed length strings. This is handled by providing
the following new attributes special:
1. **_nczarr_default_maxstrlen** —
This is an attribute of the root group. It specifies the default
maximum string length for string types. If not specified, then
it has the value of 64 characters.
2. **_nczarr_maxstrlen** —
This is a per-variable attribute. It specifies the maximum
string length for the string type associated with the variable.
If not specified, then it is assigned the value of
**_nczarr_default_maxstrlen**.

This PR also requires some hacking to handle the existing netcdf-c NC_CHAR
type, which does not exist in zarr. The goal was to choose numpy types for
both the netcdf-c NC_STRING type and the netcdf-c NC_CHAR type such that
if a pure zarr implementation read them, it would still work and an
NC_CHAR type would be handled by zarr as a string of length 1.

For writing variables and NCZarr attributes, the type mapping is as follows:
* "|S1" for NC_CHAR.
* ">S1" for NC_STRING && MAXSTRLEN==1
* ">Sn" for NC_STRING && MAXSTRLEN==n

Note that it is a bit of a hack to use endianness, but it should be ok since for
string/char, the endianness has no meaning.

For reading attributes with pure zarr (i.e. with no nczarr
atribute types defined), they will always be interpreted as of
type NC_CHAR.

## Issue: Unidata#2474
This PR partly fixes this issue because it provided more
comprehensive support for Zarr attributes that are JSON valued expressions.
This PR still does not address the problem in that issue where the
_ARRAY_DIMENSION attribute is incorrectly set. Than can only be
fixed by the creator of the datasets.

## Issue: Unidata#2485
This PR also fixes the scalar failure shown in this issue.
It generally cleans up scalar handling.
It also adds a note to the documentation describing that
NCZarr supports scalars while Zarr does not and also how
scalar interoperability is achieved.

## Misc. Other Changes
1. Convert the nczarr special attributes and keys to be all lower case. So "_NCZARR_ATTR" now used "_nczarr_attr. Support back compatibility for the upper case names.
2. Cleanup my too-clever-by-half handling of scalars in libnczarr.
@DennisHeimbigner
Copy link
Collaborator

Fixed(?) by #2492

@WardF WardF closed this as completed Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants