Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCZarr memory leak with NetCDF 4.9.0 #2733

Closed
uweschulzweida opened this issue Aug 9, 2023 · 5 comments · Fixed by #2737
Closed

NCZarr memory leak with NetCDF 4.9.0 #2733

uweschulzweida opened this issue Aug 9, 2023 · 5 comments · Fixed by #2737
Milestone

Comments

@uweschulzweida
Copy link
Contributor

I have a large ZARR data set. I want to read it time step by time step. This causes me to exceed the memory limit (1TB) on my machine. It looks like all read data is kept uncompressed in memory. Is this intentional or is it a memory leak?
I am using netCDF 4.9.0. In my application only one time step is stored at a time. When I read the same data set as NetCDF4, my application only needs 100MB.

@uweschulzweida
Copy link
Contributor Author

The ZARR dataset is only spatially chunked and not in time.

@DennisHeimbigner
Copy link
Collaborator

The data that is read is stored in a per-variable cache. If you have been accessing many variables you may
have a problem. It is also possible that the cache is not properly removing old entries.

@uweschulzweida
Copy link
Contributor Author

Is it possible to limit the size of the cache to a user defined value?

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Aug 10, 2023

Yes. Th
ere is a function in netcdf.h with the following signature

int nc_set_var_chunk_cache(int ncid, int varid, size_t size, size_t nelems, float preemption);

It sets parameters for the per-variable cache.
The "size" parameter specifies the max amount of space (in bytes) used by the cache.
The "nelems" parameter specifies the max number of pages stored in the cache.
The "preemption" parameter is unused by NCZarr.

So I would suggest calling the function for the given variable with the size set to the space you are willing to use, and with the nelems parameters set to some large number (say 1000) so that the size parameter is the only one that will have an effect. The preemption parameter can be set to 0.5 since it is unused.

@WardF WardF added this to the 4.9.3 milestone Aug 10, 2023
DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Aug 10, 2023
re: Unidata#2733

When addressing the above issue, I noticed that there was a disconnect
in NCZarr between nc_set_chunk_cache and nc_set_var_chunk cache.
Specifically, setting nc_set_chunk_cache had no impact on the per-variable cache parameters when nc_set_var_chunk_cache was not used.

So, modified the NCZarr code so that the per-variable cache parameters are set in this order (Unidata#1 is first choice):
1. The values set by nc_set_var_chunk_cache
2. The values set by nc_set_chunk_cache
3. The defaults set by configure.ac
@uweschulzweida
Copy link
Contributor Author

I have now tested nc_set_var_chunk_cache() and nc_set_chunk_cache() with different parameters and could not see any difference in overall memory usage for my workflow.
I have a time series of a relatively large field. And I write and read this time series time step by time step. This normally only requires the memory of one time step. As expected, this works fine with NetCDF4. With NetCDF zarr memory is used for all time steps.
A complete example for writing and reading zarr can be found at: https://github.com/uweschulzweida/nczarr_example/tree/main
With NetCDF4 130 MB and with Zarr 18000 MB is needed. The problem occurs both when reading and writing. I have now also tested it with NetCDF 4.9.2.
Here is a short example how I read the data:

  const char *filename = "file://testdata.zarr#mode=zarr,file";
  constexpr size_t chunkSize = 262144; // 256k
  constexpr size_t numCells = 50 * chunkSize;
  constexpr size_t numSteps = 360;

  int ncId;
  nce(nc_open(filename, NC_NOWRITE, &ncId));

  int varId;
  nce(nc_inq_varid(ncId, "var", &varId));

  size = 4 * numCells; // one float field at one time step
  nelems = 1000;
  preemption = 0.5;
  nce(nc_set_chunk_cache(size, nelems, preemption));
  nce(nc_set_var_chunk_cache(ncId, varId, size, nelems, preemption));

  {
    std::vector<float> var(numCells);
    for (size_t i = 0; i < numSteps; ++i)
      {
        size_t start[2] = {i, 0}, count[2] = {1, numCells};
        nce(nc_get_vara_float(ncId, varId, start, count, var.data()));
      }
  }

  nce(nc_close(ncId));

DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Aug 17, 2023
re: PR Unidata#2734
re: Issue Unidata#2733

As a result of an investigation by https://github.com/uweschulzweida,
I discovered a significant bug in the NCZarr cache management.
This PR extends the above PR to fix that bug.

## Change Overview
* Insert extra checks for cache overflow.
* Added test cases contingent on the --enable-large-file-tests option.
* The Columbia server is down, so it has been temporarily disabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants