Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better detection of corrupted HDF5 file headers #405

Merged
merged 7 commits into from
Jul 27, 2017

Conversation

DennisHeimbigner
Copy link
Collaborator

re e-support (UBS-599337)
Ncdump is aborting on reading damaged HDF5 file.
The problem was that the file magic number test code
in both nc4file.c and dfile.c are not testing the full
hdf5 magic header. They only look for the "XHDF"
where X is any character. The test should be for the full
eight characters of "\211HDF\r\n032\n".

Also, to avoid inconsistency, I have removed the check code from=
nc4file.c and made it use NC_check_file_type in dfile.c

Also added a test case.

Ncdump is aborting on reading damaged HDF5 file.
The problem was that the file magic number test code
in both nc4file.c and dfile.c are not testing the full
hdf5 magic header. They only look for the "XHDF"
where X is any character. The test should be for the full
eight characters of "\211HDF\r\n032\n".

Also, to avoid inconsistency, I have removed the check code from
nc4file.c and made it use NC_check_file_type in dfile.c

Also added a test case.
search required by hdf5 files. It currently only
looks at file position 0 instead of 512, 1024, 2048,...
Also, it turns out that the HDF4 magic number is assumed
to always be at the beginning of the file (unlike HDF5).
Change is localized to libdispatch/dfile.c
See https://support.hdfgroup.org/release4/doc/DSpec_html/DS.pdf
HatTip: cwardgar

In any case clean up the NC_check_file_type code to correctly
search for hdf5 magic number.

Misc. other changes.
1. Clean up the NAN definition code in ncgen
2. add -N flag to ncgen to force the netcdfdataset name in the output.
3. reduce size of ncdump/test_corrupt_magic.nc
4. remove calls to nc_finalize in various test programs in nc_test4
5. nc_test/run_diskless2 was not using test_common
@DennisHeimbigner
Copy link
Collaborator Author

Further header changes:
The NC_check_file_type code does not do the forward search required by hdf5 files. It currently only
looks at file position 0 instead of 512, 1024, 2048,... Also, it turns out that the HDF4 magic number is
assumed to always be at the beginning of the file (unlike HDF5).
Change is localized to libdispatch/dfile.c
See https://support.hdfgroup.org/release4/doc/DSpec_html/DS.pdf
HatTip: cwardgar

This tests that hdf5 files with an offset are
properly recognized.
@DennisHeimbigner
Copy link
Collaborator Author

Hold this pull request until 4.5 is released

@WardF WardF added this to the 4.5 milestone Jul 6, 2017
@WardF WardF self-assigned this Jul 6, 2017
@WardF
Copy link
Member

WardF commented Jul 6, 2017

Some conflicts, @DennisHeimbigner will resolve conflicts, merge into 4.5.0 release branch instead of master.

@DennisHeimbigner DennisHeimbigner changed the base branch from master to v4.5.0-release-branch July 6, 2017 21:53
@WardF
Copy link
Member

WardF commented Jul 7, 2017

Working through the pull requests on the branch pr-mass-merge, I'm seeing an error withncdump/tst_hdf5_offset. Details here:

http://cdash.unidata.ucar.edu/testDetails.php?test=66314&build=3580

@WardF
Copy link
Member

WardF commented Jul 8, 2017

The travis.ci tests aren't catching this and I'm not sure why. More work on my end to narrow down what's going on. I've observed it on OSX and Linux so far.

@WardF
Copy link
Member

WardF commented Jul 8, 2017

Ok, somehow on my system, the file L512.txt contains the -n in addition to the appropriate number of spaces. I have no idea why. I will sort this out.

@WardF
Copy link
Member

WardF commented Jul 8, 2017

Ok, changing the shell to bash from sh fixed this issue on my end. Is there a reason not to use bash in these circumstances?

@DennisHeimbigner
Copy link
Collaborator Author

Bash is the right choice. Probably should check other .sh files to see if they
are using sh instead of bash.

@DennisHeimbigner
Copy link
Collaborator Author

Perhaps that test case should not try to construct the l64 file but rather
include it as an EXTRA_DIST. I suspect that would fix the problem.

that it used a pre-constructed
binary file rather than using echo
to create it on the fly. This should
ensure that there are no extraneous
newlines or such.
@DennisHeimbigner
Copy link
Collaborator Author

Modified the offset test case so
that it used a pre-constructed
binary file rather than using echo
to create it on the fly. This should
ensure that there are no extraneous
newlines or such.

@WardF
Copy link
Member

WardF commented Jul 27, 2017

@DennisHeimbigner Did you still want to take a look at these conflicts?

@DennisHeimbigner
Copy link
Collaborator Author

yes.

@DennisHeimbigner
Copy link
Collaborator Author

ok, conflicts resolved; not sure why so many spurious modifications, but oh well.

@WardF WardF merged commit 86c3d2b into v4.5.0-release-branch Jul 27, 2017
@WardF
Copy link
Member

WardF commented Jul 28, 2017

This needs to be reopened and fixed, as it is causing issues with parallel tests, particularly when pnetcdf is enabled. I've reverted the merge locally and will push to github shortly.

See #445 for more info.

@WardF
Copy link
Member

WardF commented Jul 28, 2017

Ok. Reverting this merge doesn't negate the merge. I think I see how to get back to where we're at so this branch (magic.dmh) can be fixed and re-merged. It needs to be rebased against v4.5.0-release-branch, perhaps? I think?

@WardF
Copy link
Member

WardF commented Jul 28, 2017

Ok. I'm going to issue a new pull request since this one is closed, but I won't merge until we get these parallel failures sorted out.

DennisHeimbigner added a commit that referenced this pull request Oct 24, 2017
re pull request #405
re pull request #446

Notes:
1. This branch is a cleanup of the magic.dmh branch.
2. magic.dmh was originally merged, but caused problems with parallel IO.
   It was re-issued as pull request #446.
3. This branch an pull request replace any previous pull requests and magic.dmh branch.

Given an otherwise valid netCDF file that has a corrupted header,
the netcdf library currently crashes. Instead, it should return
NC_ENOTNC.

Additionally, the NC_check_file_type code does not do the
forward search required by hdf5 files. It currently only looks
at file position 0 instead of 512, 1024, 2048,... Also, it turns
out that the HDF4 magic number is assumed to always be at the
beginning of the file (unlike HDF5).
The change is localized to libdispatch/dfile.c See
https://support.hdfgroup.org/release4/doc/DSpec_html/DS.pdf

Also, it turns out that the code in NC_check_file_type is duplicated
(mostly) in the function libsrc4/nc4file.c#nc_check_for_hdf.

This branch does the following.
1. Make NC_check_file_type return NC_ENOTNC instead of crashing.
2. Remove nc_check_for_hdf and centralize all file format checking
   NC_check_file_type.
3. Add proper forward search for HDF5 files (but not HDF4 files)
   to look for the magic number at offsets of 0, 512, 1024...
4. Add test tst_hdf5_offset.sh. This tests that hdf5 files with
   an offset are properly recognized. It does so by prefixing
   a legal file with some number of zero bytes: 512, 1024, etc.
5. Off-topic: Cleaned up handling of NAN and INFINITE in ncgen.
6. Off-topic: Added -N flag to ncdump to force a specific output dataset name.
DennisHeimbigner added a commit that referenced this pull request Oct 24, 2017
re pull request #405
re pull request #446

Notes:
1. This branch is a cleanup of the magic.dmh branch.
2. magic.dmh was originally merged, but caused problems with parallel IO.
   It was re-issued as pull request #446.
3. This branch + pull request replace any previous pull requests and magic.dmh branch.

Given an otherwise valid netCDF file that has a corrupted header,
the netcdf library currently crashes. Instead, it should return
NC_ENOTNC.

Additionally, the NC_check_file_type code does not do the
forward search required by hdf5 files. It currently only looks
at file position 0 instead of 512, 1024, 2048,... Also, it turns
out that the HDF4 magic number is assumed to always be at the
beginning of the file (unlike HDF5).
The change is localized to libdispatch/dfile.c See
https://support.hdfgroup.org/release4/doc/DSpec_html/DS.pdf

Also, it turns out that the code in NC_check_file_type is duplicated
(mostly) in the function libsrc4/nc4file.c#nc_check_for_hdf.

This branch does the following.
1. Make NC_check_file_type return NC_ENOTNC instead of crashing.
2. Remove nc_check_for_hdf and centralize all file format checking
   NC_check_file_type.
3. Add proper forward search for HDF5 files (but not HDF4 files)
   to look for the magic number at offsets of 0, 512, 1024...
4. Add test tst_hdf5_offset.sh. This tests that hdf5 files with
   an offset are properly recognized. It does so by prefixing
   a legal file with some number of zero bytes: 512, 1024, etc.
5. Off-topic: Added -N flag to ncdump to force a specific output dataset name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants