Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address Windows and MacOS s3 issues #2759

Merged
merged 6 commits into from
Oct 16, 2023
Merged

Conversation

WardF
Copy link
Member

@WardF WardF commented Oct 2, 2023

A small mistake in logic was resulting in Amazon S3 SDK libraries not being properly linked against, even if found on the system. There are additional steps required for Windows (Visual Studio) builds, so this PR is not yet ready. See

We are also seeing strange hangs on MacOS and Windows.

Todo

  • Add changes required for Windows builds.
  • Update documentation as need be.
  • Update CI to include testing for AWS S3 to catch this sort of thing.

@WardF WardF added this to the 4.9.3 milestone Oct 2, 2023
@WardF
Copy link
Member Author

WardF commented Oct 2, 2023

@DennisHeimbigner I'm seeing an interesting error now on Windows.

I'm building without S3 support.

When running ctest, I see this:

217/220 Test #217: nczarr_test_run_quantize ..................***Failed    0.44 sec
WARN: Could not open file: /cygdrive/C/Users/wardf/.aws/config
WARN: Could not open file: /cygdrive/C/Users/wardf/.aws/credentials
WARN: AWS config file not loaded

*** Testing netcdf-4 variable quantization functions.
**** testing with NC_NETCDF4...
        **** testing quantization setting and error conditions...
                **** testing quantize algorithm 1...
Sorry! Unexpected result, C:\Users\wardf\Desktop\gh2759\netcdf-c\build\nczarr_test\test_quantize.c, line: 132

I don't have cygwin installed, so I'm not sure why it's checking /cygdrive/C. Any idea what's going on? Is the quantization failure related to these AWS warnings?

@DennisHeimbigner
Copy link
Collaborator

The aws warnings are just informational. But I am surprised you are seeing them.
I must have set some debug info and forgot to turn it off.
I will see if I can find it.

The path conversion code operates by converting the given path into a canonical path.
At the moment that canonical path is the cygwin representation. That is probably why
you are seeing a cygwin file path.
I can change it to some other canonical form with little effort.
But the proper change is to convert the canonical to the local platform form.

@WardF
Copy link
Member Author

WardF commented Oct 3, 2023

@DennisHeimbigner is there a way to have the path conversion only use cygwin representation if you are running under cygwin? I'm also seeing some new failures as a result of the PR we merged; for some reason, plugins aren't being found under MSYS, now. I think we should put a pause on other PR's until we get this sorted out, which hopefully we can do in short order.

@WardF
Copy link
Member Author

WardF commented Oct 3, 2023

If there is other code silently looking for /cygdrive/c, then that would explain pathing issues, since no such path exists in my environment.

@DennisHeimbigner
Copy link
Collaborator

That should not be the case. The use of the canonical path should only occur inside the path
conversions software; callers should never see it (except apparently for error messages generated
by the path conversion software).

@DennisHeimbigner
Copy link
Collaborator

The reason I use the canonical path idea is this.
Given an input path, I determine its format
and convert to the canonical form. Then I apply any necessary modifications
to the canonical form. Finally, when I am ready to produce and output path,
I figure out what the desired output format should be (usually the local
platform format) and convert from canonical form to the output format.
This is just another example of the old UNCOL concept.

@WardF
Copy link
Member Author

WardF commented Oct 3, 2023

@DennisHeimbigner that makes sense, but it feels like if it's determining the path should be /cygdrive/c on my machine, that is incorrect, as there exists no /cygdrive/ mount/mapping, nor any cygwin install.

@WardF
Copy link
Member Author

WardF commented Oct 3, 2023

@DennisHeimbigner I think I may have something figured out re: pathing on my system, so stand by while I see if I can narrow down that list of failures. The debug/diagnostic statements will need to be fixed but we can track that down easily enough.

@DennisHeimbigner
Copy link
Collaborator

@DennisHeimbigner that makes sense, but it feels like if it's determining the path should be /cygdrive/c on my machine, that is incorrect, as there exists no /cygdrive/ mount/mapping, nor any cygwin install.

The canonical path is supposed to never be used to actually access the file system.
Think of it this way. When I am given an input path, I have "decompose" it into two parts:
(1) a drive letter (so we can support running on a windows platform) and (2) the path relative to that drive.
For linux, the drive part does not exist, of course.
Given those two parts, we can then reassemble the input path into the output path by recombining
parts 1 and 2 into a complete path suitable for the platform on which we are running.
So for windows, we output :, for cygwin we output /cygdrive//,
for msys we output //, for linux, we output just /.
So, internally we need a way to represent the pair (drive,path) and be able to pass it around in the
path conversion code. There are two ways to do that.

  1. create a struct {char drive; char* path} type
  2. encode the drive + path in a canonical string.
    I chose the latter (possibly a mistake) and I chose to represent the drive+path as the string "/cygdrive//", which just happened to look like a cygwin path.
    I could have chosen many other canonical forms. For example, I could have used
    the form ":/path" or ";/path" or "{,/}", etc.
    Since the use of the cygwin form is confusing, I will refactor the code to use either the struct or
    some other string-based format. Do you have any preference?

@WardF
Copy link
Member Author

WardF commented Oct 3, 2023

@DennisHeimbigner I'm starting to figure out some of the differences for the tests that are failing and those that are passing. One difference is this:

Tests that are passing use a file path that looks like this:

ncdump 'file://C:/Users/wardf/Desktop/gh2759/netcdf-c/dap4_test/rawtestfiles/test_zerodim.nc?dap4.checksum=false#log&dap4'

Tests that fail use a file path that looks like this:

ncdump file:///c/Users/wardf/Desktop/gh2759/netcdf-c/dap4_test/rawtestfiles/test_zerodim.nc?dap4.checksum=false#log&dap4

Note the difference between file://C/ and file:///c/. I'm having a bit of trouble navigating the various nested shell scripts; where are the paths being constructed, specifically?

@DennisHeimbigner
Copy link
Collaborator

That info looks promising.
The "file://C/ " form is technically illegal, although as I recall, I make provision for it anyway.
The "file://C:/Users/wardf/Desktop/gh2759/netcdf-c/dap4_test/" prefix in the URL probably
comes from the value of something like ${abs_top_srcdir}.

@WardF
Copy link
Member Author

WardF commented Oct 3, 2023

@DennisHeimbigner A specific example. In netCDF-C 4.9.2, the test script ncdap_test/tst_ncdap3.sh calculates path names in the format of file://C:/Users/wardf/(...). This works.

In the current main branch, the test script ncdap_test/tst_ncdap3.sh calculates path names in the format of file:///c/users/wardf. So this suggests something has changed, especially since it's only in some of the test scripts. Most of the test scripts still work just fine.

@DennisHeimbigner
Copy link
Collaborator

I think the problem is that the dpathmgr.c code does not properly convert an msys path.
Try some experiments under msys with the program ncdump/ncpathcvt; it is like cygpath.
It usage is "ncpathcvt -X where -X is the desired output format.
ncpathcvt -h will show complete options.
ANyway try a couple of examples and see if the output looks right to you.

  1. ncpathcvt -c /d/xxx/yyy
  2. ncpathcvt -w /d/xxx/yyy
  3. ncpathcvt -m /d/xxx/yyy

@WardF
Copy link
Member Author

WardF commented Oct 4, 2023

I'm back to looking at this, is there any obvious reason why this behavior would have changed from v4.9.2 where it worked, for a small subset of tests? The path conversion is still being properly invoked in the majority of the scripts.

@DennisHeimbigner
Copy link
Collaborator

Not that I can think of.

@WardF
Copy link
Member Author

WardF commented Oct 4, 2023

So ncpathcvt -k is identifying my MSYS environment as win instead of msys. This feels like a starting point, perhaps the method being used to identify the environment has broken with later releases of MSYS? I'll investigate.

@DennisHeimbigner
Copy link
Collaborator

So ncpathcvt -k is identifying my MSYS environment as win instead of msys. This feels like a starting point, perhaps the method being used to identify the environment has broken with later releases of MSYS? I'll investigate.

Am I correct in thinking that MSYS accepts windows paths? I may have taken a shortcut to just always
treat MSYS as windows when outputting.

@WardF
Copy link
Member Author

WardF commented Oct 5, 2023 via email

@DennisHeimbigner
Copy link
Collaborator

Then the path conversion stuff should (famous last words) work for msys.
BTW, I made a mistake. The URL format "file://C:/..." is legal.

@DennisHeimbigner
Copy link
Collaborator

Does the same error occur in the MSYS github action?

@WardF
Copy link
Member Author

WardF commented Oct 10, 2023

The shell scripts are working, so great progress has been made. The canonical path code is still trying to pass cygwin-style paths to open the AWS credentials under MSYS2, which fails because MSYS requires Windows-style paths.

@WardF
Copy link
Member Author

WardF commented Oct 10, 2023

We are nearly there with the internal S3 API; as soon as I sort out the canonical path in MSYS trying to load credentials via a *nix style path instead of a Windows style path, we will have S3 functionality working using the internal API. We'll then have that in place while we sort out the befuddling issues around the aws s3 sdk.

@DennisHeimbigner
Copy link
Collaborator

Can you try the following?

  1. Edit the file libdispatch/dpathmgr.c
  2. Modify the function NCpathcvt by inserting something like this just before the final "return result".
    fprintf(stderr,">>> ncpathcvt: inpath=%s result=%s\n",inpath,result)

Then capture all the output from that fprintf and post it.

@WardF
Copy link
Member Author

WardF commented Oct 11, 2023

@DennisHeimbigner Yes. I'm using the same credentials which work under Linux and MacOS using the same options.

@DennisHeimbigner
Copy link
Collaborator

Ok, next step:

export NCTRACING=15

then rerun one of the failing tests and post the output.

@WardF
Copy link
Member Author

WardF commented Oct 11, 2023

@DennisHeimbigner here's the result of nczarr_test/run_notzarr:

@DennisHeimbigner
Copy link
Collaborator

Can you insert a "set -x" at the front of the script nczarr_test/run_notzarr.sh
and post that output

@WardF
Copy link
Member Author

WardF commented Oct 12, 2023

@DennisHeimbigner
Copy link
Collaborator

I see at least one problem:

215: warning: -k option does not start with '/': C:/msys64/netcdf-c/testset_1697062772/testdir_notzarr/notzarr.s3/notzarr.txt

The -k option should look like an S3 key, something like this:
netcdf-c/testset_1697062772/testdir_notzarr/notzarr.s3/notzarr.txt
Instead, the above warning appears to show the key has been passed thru ncpathcvt to convert it to a path, which will cause various kinds of failures including the auth failure that actually shows.
So now I need to figure out where it is being converted, and why it is not appearing in the debug output for the ncpathcvt function.

@WardF
Copy link
Member Author

WardF commented Oct 12, 2023

For what it's worth, and may bear repeating/keeping in mind. Working in the MSYS bash shell, *nix style paths are used. But when programs are working with the filesystem, Windows-style paths are Required. No idea if it's relevant, but just in case that's useful information, there you go. I'll also try to dig into it.

@WardF
Copy link
Member Author

WardF commented Oct 13, 2023

@DennisHeimbigner just a quick check-in, I'm running the run_notzarr on MacOS which reports the test passing. I'm generating the same verbose output, and I'm seeing failures in the test script itself. I'm inferring that these are expected failures? The script itself reports success. Does everything look as it should, here?

@WardF
Copy link
Member Author

WardF commented Oct 13, 2023

So it looks like prepending an additional / in the script when passing the argument to -k fixes the issue. At least, s3util does not complain about the lack of a leading / any more, and the run_notzarr.sh script returns success (I'm still uncertain about the output being generated re: failures being expected or not). The additional / does not cause any changes in behavior on linux or macos on my end. I'm wondering if this is a string parsing issue with the underlying bash/sh interpreter.

@WardF
Copy link
Member Author

WardF commented Oct 13, 2023

@DennisHeimbigner The failure for nczarr_test_run_interop looks to be different than the notzarr one. Does anything leap out at you? But and run using the same verbose output changes.

…RNAL and WITH_S3_TESTING on Windows and MSYS2.
@WardF
Copy link
Member Author

WardF commented Oct 13, 2023

Similar to run_notzarr.sh, I'm seeing a lot of errors in run_interops.sh on MacOS that do not prevent the test from returning passed. I'm a little concerned that the success may be a false positive, unless these are expected failures. The shell script itself doesn't indicate that there should be expected failures, so I'm uncertain.

@DennisHeimbigner
Copy link
Collaborator

If a failure is expected, then it should be noted in the script; usually by the phrase XFAIL.
Make sure that there is a "set -e" command near the front of the script.
Can you put a "set -x" into run_interop and post the output? I want to see if the errors
are similar to the one in run_notzarr.sh

Similar to run_notzarr.sh, I'm seeing a lot of errors in run_interops.sh on MacOS that do not prevent the test from returning passed. I'm a little concerned that the success may be a false positive, unless these are expected failures. The shell script itself doesn't indicate that there should be expected failures, so I'm uncertain.

@DennisHeimbigner
Copy link
Collaborator

BTW if the errors you are seeing are like these:

Exit: (11): NC_s3sdkinfo: err=(-139) 'NetCDF: Attempt to read empty NCZarr map key':len=0

then those are normal.
The code often probes the zarr dataset on s3 to see if an object exists or not.
So, it can tell by looking at the return error code.

@DennisHeimbigner
Copy link
Collaborator

216: C:\Users\wardf\Desktop\gh2759\netcdf-c\build\ncdump\ncdump.exe: C:/msys64/group_with_dims/var2D: No such variable

I think this is exactly the same error as with run_notzarr. The path "C:/msys64/group_with_dims/var2D"
is almost certainly supposed to be an S3 key path; it should be "/group_with_dims/var2D"

@DennisHeimbigner
Copy link
Collaborator

What I see is that the program s3util is called with:

...  -k /netcdf-c/testset_1697062772/testdir_notzarr/notzarr.s3/notzarr.txt ...

But the value seen by the program is in effect that -k value prefixed with "C:/msys64"
I have vague recollection that the msys shell sometimes modifies the arguments to a program
because it thinks it is file path and needs to be made an absolute windows path.
I believe there was an old issue whose solution turned on this "improvement" :-) by msys shell.
There must be some way to turn off this msys modification for a given argument: perhaps by
ensuring it is surrounded by double quotes or something.
Does this ring any bells?

@DennisHeimbigner
Copy link
Collaborator

I found this: msys2/MSYS2-packages#2316
you may have to insert

export MSYS2_ARG_CONV_EXCL='*'

into the file netcdf-c/test_common.in

@DennisHeimbigner
Copy link
Collaborator

@WardF
Copy link
Member Author

WardF commented Oct 13, 2023

That is good to know; thanks, @DennisHeimbigner. That may obviate a lot of these and future issues.

@WardF
Copy link
Member Author

WardF commented Oct 13, 2023

Interestingly enough, MSYS2_ARG_CONV_EXCL='*' causes 3 other tests which had been passing to fail. I'll keep digging into this but I'm hoping we're in a spot by next week to move on to the other pending PR's.

@WardF WardF marked this pull request as ready for review October 16, 2023 22:47
@WardF WardF merged commit 155ee93 into Unidata:main Oct 16, 2023
99 checks passed
@WardF WardF deleted the fix-s3-try2.wif branch October 16, 2023 22:48
@DennisHeimbigner
Copy link
Collaborator

Did you solve these three tests?

Interestingly enough, MSYS2_ARG_CONV_EXCL='*' causes 3 other tests which had been passing to fail. I'll keep digging into this but I'm hoping we're in a spot by next week to move on to the other pending PR's.

Copy link
Collaborator

@DennisHeimbigner DennisHeimbigner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised by the prefixing of "/" to S3ISOPATH.
What problem was it fixing?

@WardF
Copy link
Member Author

WardF commented Oct 17, 2023

@dennis the issue with the path conversion complaining about a lack of a leading '/'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants