Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions of netcdf-4 read performance on GFS surface and atmosphere data files #1543

Closed
edwardhartnett opened this issue Nov 20, 2019 · 27 comments
Assignees

Comments

@edwardhartnett
Copy link
Contributor

edwardhartnett commented Nov 20, 2019

We have two sample restart files from the Global Forecasting System (GFS) from NOAA. They are slow to read.

What's up with that?

See comments and sample files from @junwang-noaa in #1520.

@edwardhartnett
Copy link
Contributor Author

OK, I have been looking into this a bit.

I have been looking at file gfs.t00z.sfcf024.nc.

This file is 1087655544 bytes, or 1.09 GB. It is a netCDF-4 classic model file with 153 vars, there are 5 coord-vars, and 148 NC_FLOAT data vars. The data vars have a size of 1 x 1536 x 3072. (This is 4718592 values, or 18874368 bytes ~ 19 MB per var uncompressed.)

Here's a typical example of a data var:

	float albdo_ave(time, grid_yt, grid_xt) ;
		albdo_ave:long_name = "surface albedo" ;
		albdo_ave:units = "%" ;
		albdo_ave:missing_value = 9.99000009817369e+20 ;
		albdo_ave:cell_methods = "time: point" ;
		albdo_ave:output_file = "sfc" ;
		albdo_ave:_Storage = "chunked" ;
		albdo_ave:_ChunkSizes = 1, 768, 1536 ;
		albdo_ave:_DeflateLevel = 1 ;
		albdo_ave:_Shuffle = "true" ;
		albdo_ave:_Endianness = "little" ;

Note that the shuffle filter is in use, and deflate level is set to 1.

@edwardhartnett
Copy link
Contributor Author

@junwang-noaa in the other discussion you say:

I have two sample data on EMC ftp site at:

https://ftp.emc.ncep.noaa.gov/EIB/sample/gfsv16_compressed_netcdf/gfs.t00z.atmf024.nc
https://ftp.emc.ncep.noaa.gov/EIB/sample/gfsv16_compressed_netcdf/gfs.t00z.sfcf024.nc

The atmf file is compressed and the sfcf is not. The sequential read of the two files takes about 5 minuts, while previous parallel read of the plain binary files takes 20s.

I have looked at the sfcf file, and the data are compressed (deflated). Do I have something incorrect here?

@edwardhartnett
Copy link
Contributor Author

Some quick results from the surface data file:

  • Reading the original netCDF4 compressed file takes 8 s.
  • Reading a classic version of the same file (uncompressed) takes 4 s.
  • Reading a netCDF-4 uncompressed version of the file takes 1.5 s.

So compression is a significant factor here. However, it brings considerable benefit. Uncompressed, the surface file is 2.87 GB. Compressed it is only 1.09 GB.

@junwang-noaa
Copy link

junwang-noaa commented Nov 20, 2019 via email

@edwardhartnett
Copy link
Contributor Author

I find no indication of lossy compression in the file. Here's what I am seeing:

float cld_amt(time, pfull, grid_yt, grid_xt) ;

		cld_amt:long_name = "cloud amount" ;

		cld_amt:units = "1" ;

		cld_amt:missing_value = -1.e+10f ;

		cld_amt:_FillValue = -1.e+10f ;

		cld_amt:cell_methods = "time: point" ;

		cld_amt:output_file = "atm" ;

		cld_amt:max_abs_compression_error = 3.057718e-05f ;

		cld_amt:nbits = 14 ;

		cld_amt:_Storage = "chunked" ;

		cld_amt:_ChunkSizes = 1, 22, 308, 615 ;

		cld_amt:_DeflateLevel = 1 ;

		cld_amt:_Endianness = "little" ;

@WardF WardF self-assigned this Nov 20, 2019
@junwang-noaa
Copy link

junwang-noaa commented Nov 20, 2019 via email

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Nov 20, 2019

OK, what do you mean by nbits=14 for lossy compression? I see that the deflate level is 1, but the netcdf-4 deflation of vars has nothing to do with nbits... Did you apply some transformation on the data before you wrote it?

Each value is currently stored as a deflated 32-bit floating point. Is that what you intend?

@junwang-noaa
Copy link

junwang-noaa commented Nov 20, 2019 via email

@edwardhartnett
Copy link
Contributor Author

OK, thanks for that info.

Here's a chart showing the read rate (MB/s) for various combinations of chunksizes, deflation, and shuffle.

The thing to note here is the difference between all combinations with no deflation (the ones that are very high). As we see, uncompressing the data is a major factor in read rate:

Read Rate for GFS Surface Restart File

@edwardhartnett
Copy link
Contributor Author

Does the shuffle filter help when compressing these float data? Yes it does, across a wide range of chunksize choices.

Effect of Shuffle Filter on GFS Surface Restart File

I thought that the shuffle filter would make reads slower, but it has the opposite effect:

Read Rate vs  Shuffle Filter for GFS Surface Restart

@junwang-noaa
Copy link

junwang-noaa commented Nov 20, 2019 via email

@edwardhartnett
Copy link
Contributor Author

Yes, the surface file is faster and more compressed because you have used the shuffle filter.

I am going to take a look at the atmosphere file today.

So far it looks like netcdf-4 is a good deal faster than netcdf classic, until you turn on compression. THere's no free lunch! When you compress the data, it takes extra time to uncompress.

However, you should not be seeing the slowdown that you are. It does not take me anything like 5 minutes to read these files.

Does the GFS do all output in netcdf-4 compressed now?

@edwardhartnett edwardhartnett changed the title Questions of netcdf-4 read performance on GFS restart file Questions of netcdf-4 read performance on GFS surface and atmosphere data files Nov 21, 2019
@edwardhartnett
Copy link
Contributor Author

I wondered last night whether some buffering might be going on which was impacting the re-read time for these files. So this morning I changed the program to make a copy of the written file, and re-read that copy; This will defeat any buffering that is going on.

But the results are the same.

When comparing netCDF-4 and netCDF classic, netCDF-4 is 2 or 3 times faster reading the file

When compression is turned on, then netCDF-4 is much slower reading the file. But the file is much smaller, which makes it easier to store and also easier to transfer around.

I am continuing to experiment. I will keep this issue updated with my results...

@junwang-noaa
Copy link

junwang-noaa commented Nov 21, 2019 via email

@edwardhartnett
Copy link
Contributor Author

Yes, please send me your code.

Here's code that reads the surface file:

/*
  Copyright 2019, UCAR/Unidata
  See COPYRIGHT file for copying and redistribution conditions.

  This program benchmarks the reading of a GFS restart file in
  netCDF-4.

  Ed Hartnett 11/19/19
*/

#include <nc_tests.h>
#include <err_macros.h>
#include <time.h>
#include <sys/time.h> /* Extra high precision time info. */
#include <math.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

#define FILE_NAME "gfs.t00z.sfcf024.nc"
#define MILLION 1000000
#define NDIM3 3
#define DIMLEN_1 3072
#define DIMLEN_2 1536

/* Prototype from tst_utils.c. */
int nc4_timeval_subtract(struct timeval *result, struct timeval *x,
                         struct timeval *y);
int
main(int argc, char **argv)
{
    printf("Benchmarking GFS restart file.\n");
    printf("Reading a GFS restart file...\n");
    {
        int ncid;
        int ndims, nvars, ngatts, unlimdimid;
        char name[NC_MAX_NAME + 1];
        size_t dimlen[NDIM3];
        float *data;

        struct timeval start_time, end_time, diff_time;
        float read_us;

        int d, v;

        /* Start timer. */
        if (gettimeofday(&start_time, NULL)) ERR;

        /* if (nc_set_chunk_cache(DIMLEN_1 * DIMLEN_2, 10, .75)) ERR; */

        /* Open the file. */
        if (nc_open(FILE_NAME, NC_NOWRITE, &ncid)) ERR;

        if (nc_inq(ncid, &ndims, &nvars, &ngatts, &unlimdimid)) ERR;
        printf("ndims %d nvars %d ngatts %d unlimdimid %d\n", ndims, nvars,
               ngatts, unlimdimid);

        /* Check dims. */
        for (d = 0; d < ndims; d++)
        {
            if (nc_inq_dim(ncid, d, name, &dimlen[d])) ERR;
            printf("read dimid %d name %s len %ld\n", d, name, dimlen[d]);
        }

        /* Allocate storage for one timestep of a 3D var. */
        if (!(data = malloc(DIMLEN_1 * DIMLEN_2 * sizeof(float)))) ERR;

        /* Read var data. */
        for (v = 0; v < nvars; v++)
        {
            nc_type xtype;
            int natts;
            int dimids[NDIM3];
            int nvdims;

            if (nc_inq_var(ncid, v, name, &xtype, &nvdims, dimids, &natts)) ERR;

            /* Skip reading the coord vars. */
            if (nvdims != 3 || xtype != NC_FLOAT)
                continue;
            printf("reading var %s xtype %d nvdims %d dimids %d %d %d\n", name,
                   xtype, nvdims, dimids[0], dimids[1], dimids[2]);

            /* if (nc_set_var_chunk_cache(ncid, v, DIMLEN_1 * DIMLEN_2, 10, 0)) ERR; */
            if (nc_get_var_float(ncid, v, data)) ERR;
        }

        /* Free data storage. */
        free(data);

        /* Close the file. */
        if (nc_close(ncid)) ERR;

        /* Stop timer. */
        if (gettimeofday(&end_time, NULL)) ERR;
        if (nc4_timeval_subtract(&diff_time, &end_time, &start_time)) ERR;
        read_us = (int)diff_time.tv_sec + (float)diff_time.tv_usec / MILLION ;
        printf("reading took %g seconds.\n", read_us);
    }
    SUMMARIZE_ERR;
    FINAL_RESULTS;
}

@junwang-noaa
Copy link

junwang-noaa commented Nov 22, 2019 via email

@edwardhartnett
Copy link
Contributor Author

Seems like everything is happening in subroutine read_netcdf_2d_scatter(), but that is not included with the code. Where is it?

@edwardhartnett
Copy link
Contributor Author

Changing to Fortran should not and will not change the speed vs. the C library. Fortran is just a thin wrapper around the C functions. It just changes order of dimensions, and adds 1 to all the stuff (like count, start, stride) which is 0-based in C.

@junwang-noaa
Copy link

junwang-noaa commented Nov 22, 2019 via email

@edwardhartnett
Copy link
Contributor Author

One good test would be to turn off compression. Can you easily do that?

If so, then we can see how much of your delays are caused by uncompressing the data.

@edwardhartnett
Copy link
Contributor Author

Seems like you are reading the values, then copying them one by one into another array. Why not just read them to the destination array in one operation?

      do l=1,lm
      do j=1,jm

! jj=jm-j+1
jj=j
do i=1,im
dummy(i,j,l)=dummy2(i,jj,l)
if(dummy(i,j,l)==spval_netcdf)dummy(i,j,l)=spval
end do
end do
end do
end if

@junwang-noaa
Copy link

junwang-noaa commented Nov 22, 2019 via email

@edwardhartnett
Copy link
Contributor Author

Jun,

I have not abandoned this, but I need to get my poster together for AGU! ;-)

If you are at AGU, I hope you come by and see me. I'll be in the poster section Wed morning.

@junwang-noaa
Copy link

junwang-noaa commented Dec 3, 2019 via email

@edwardhartnett
Copy link
Contributor Author

Yes, netcdf-c does work fine with HDF5-1.10.5, and you would be well advised to upgrade to that if you have not already. (1.10.4 has a parallel I/O bug.)

We cannot yet write compressed data in parallel, but I hope to get that working early in the next year (after AGU).

WHen you say you plan to switch to netCDF format for GFSv16, what was being used before that? (Or did you mean you are switching from netCDF to netCDF-4?)

Did you try writing without compression to see what kind of performance you see?

@edwardhartnett
Copy link
Contributor Author

@junwang-noaa what is your NOAA email? Can you email me at Edward.Hartnett@noaa.gov?

@edwardhartnett
Copy link
Contributor Author

Much progress has been made here, and we are working this issue on other GitHub projects. I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants