Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_dir has unexpected behavior for "" #114149

Closed
justinlovinger opened this issue Jul 27, 2023 · 19 comments
Closed

read_dir has unexpected behavior for "" #114149

justinlovinger opened this issue Jul 27, 2023 · 19 comments
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. finished-final-comment-period The final comment period is finished for this PR / Issue. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@justinlovinger
Copy link

Path::parent returns Some("") for relative paths like foo.txt, but read_dir throws an error when given "", so code like if let Some(parent) = path.parent() { std::fs::read_dir(parent) } fails for files in the working directory.

Additionally, calling read_dir on a relative file without a ./ prefix returns file paths without ./ prefixes, but there is no way to achieve similar behavior for the working directory because read_dir("") throws an error. read_dir(".") returns file paths with ./ prefixes and is not a solution.

This was tested on Unix, and may be Unix-specific, but I lack a Windows machine to easily test against.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jul 27, 2023
@ilyvion
Copy link

ilyvion commented Jul 28, 2023

The behavior of Path::parent() you're describing is documented:

Returns the Path without its final component, if there is one.

This means it returns Some("") for relative paths with one component.

Returns None if the path terminates in a root or prefix, or if it's the empty string.

On (my) Windows, read_dir("") appears to read the current directory. This difference in behavior is also somewhat accounted for, under the "Platform-specific behavior" header:

This function currently corresponds to the opendir function on Unix and the FindFirstFile function on Windows.

Though if anything, I find the Windows behavior to be the one that's odd of the two. "" is not a meaningful path. I'm having a hard time imagining a scenario where having paths prefixed by ./ could be problematic when APIs like Path::file_name exist.

@ChrisDenton
Copy link
Member

Tbh, I would agree this looks like a bug in the Windows implementation of read_dir and should be a simple fix.

@ChrisDenton ChrisDenton added T-libs Relevant to the library team, which will review and decide on the PR/issue. A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jul 28, 2023
@justinlovinger
Copy link
Author

I'm having a hard time imagining a scenario where having paths prefixed by ./ could be problematic when APIs like Path::file_name exist.

Paths prefixed by ./ are a problem for a utility I am writing that does a lot of high-performance string parsing on paths. It would need extra logic to account for the . in ./ not being a directory, where every other string followed by / is a directory.

The behavior of Path::parent() you're describing is documented:

I do not consider the behavior of Path::parent() to be unexpected. It is the interaction with read_dir that I consider a problem. I expect to always be able to read the contents of the directory containing a file by calling read_dir on path.parent() for the path of that file.

Tbh, I would agree this looks like a bug in the Windows implementation of read_dir and should be a simple fix.

Do you mean a bug in the Unix implementation?

@ChrisDenton
Copy link
Member

No I meant the Windows implementation. Empty paths are always considered invalid. The fact that it isn't in this one implementation of this one function was not intentional.

@ChrisDenton ChrisDenton added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 28, 2023
@ChrisDenton
Copy link
Member

As this is a request to change the behaviour of read_dir to accept empty paths as meaning the current directory, I've tagged this as waiting on a libs-api decision.

@justinlovinger
Copy link
Author

Empty paths are always considered invalid.

If empty paths are always considered invalid, why does Path::parent sometimes return an invalid path?

@ChrisDenton
Copy link
Member

Sorry, I was imprecise. It's invalid at the point of use. OS APIs will reject this but due to a quirk in the Windows API we're using it happens to work in that case (the string we pass to the OS is never empty).

@ChrisDenton
Copy link
Member

This was discussed in a recent libs-api meeting. There was broad agreement that the failure of if let Some(parent) = path.parent() { std::fs::read_dir(parent) } is a problem.

However, there was no consensus on the proposed solution to interpret an empty path as the current directory for read_dir. It was decided that discussion should continue here.


There were also a few related things discussed:

  • Path::parent has some quite weird and somewhat inconsistent behaviour. Depending on the path, it can sometimes return . and other times return the empty path. Some felt it should return an enum rather than an Option but we can't change that now in any case.
  • DirEntry::path creates a path by joining the file name to the path that was given to read_dir. It is therefore possible to do the same manually using different logic.
  • There was discussion on if read_dir is special or if the proposed solution should be generalized to all filesystem APIs. I.e. the empty path becomes generally accepted as meaning the same as ..

@m-ou-se
Copy link
Member

m-ou-se commented Aug 15, 2023

I personally think that read_dir("") should list the current directory (on all platforms). It's true that the underlying opendir("") (on Unix) gives 'not found' for "", but our read_dir functionality is more than just a wrapper around the syscall: it also Path::join()s the returned file names to obtain a full path as part of DirEntry::path(), which is why I think we need to support "" to support unprefixed paths (i.e. that won't start with ./).

I agree that if we see read_dir(x) as "give me all files in directory x", it feels wrong for "" to refer to the current directory. But I think we should instead see read_dir(x) as "give me all files with parent path x", which comes down to pretty much the same thing, except it makes the situation with "", ".", "./." etc. clear:

  • read_dir("") gives "Cargo.toml", "src", etc.
  • read_dir(".") gives "./Cargo.toml", "./src", etc.
  • read_dir("./.") gives "././Cargo.toml", "././src", etc.

In all cases, the .parent() of the returned paths is exactly the input.

Perhaps it's just weird that Path::parent() returns Some("") for filenames without a parent, but that's not something we can really change. Assuming this will stay unchanged, I think it makes perfect sense for read_dir("") to give the files for which the parent is "": the files in the current directory.


In the team meeting, I was asked: "why not just use .file_name() instead of .path() if you don't want the ./ prefix?" My reasoning is that I might want to write code that works for both the current directory and a subdirectory, without adding unnecessary ./ prefixes:

for parent in ["", "src"] {
    for f in std::fs::read_dir(parent)? {
        println!("{}", f?.path().display());
    }
}

(Output: Cargo.toml, target, src, src/main.rs, etc.)

I could of course just use .file_name() and manually .join() that to the parent path, but that seems a bit odd given that we already have .path() on the DirEntry.

The follow-up question was why not just canonicalize or normalize the path before using it, to get rid of the ./ prefixes, to which I answered that I think there is value in preserving the exact parent path from the input, even if it contains redundant parts like ./ or a/../. For example, if I run a tool with --manifest-path a/../b/./Cargo.toml and it goes to list the files in the same directory as that Cargo.toml, I'd expect any messages/errors about other files to exactly start with a/../b/./, without any transformations or symlink resolutions. Similarly, the ./ should be preserved if the argument was ./Cargo.toml, and no prefix should be added the input was just Cargo.toml.


And last but not least, read_dir("") already works (lists the current directory) on Windows. Although that appears to have happened by accident, fixing that bug and breaking the behaviour on Windows might be more trouble than making the same work on other platforms.

That said, I'm curious to hear more about potential downsides. Will accepting "" for the current directory lead to any subtle bugs, for example?

@ChrisDenton
Copy link
Member

Btw, @BurntSushi should probably be specifically cc'd here, both as a libs-api member and as the maintainer of the popular walkdir crate. I think whatever is decided here should also be reflected in the wider ecosystem and it'd be nice to coordinate any changes if possible.

@justinlovinger
Copy link
Author

I would also like to note, I first became aware of this issue because read_dir("") worked in FakeFileSystem from the filesystem crate, so at least some parts of the ecosystem were already built under the assumption read_dir("") works.

@the8472
Copy link
Member

the8472 commented Sep 4, 2023

I personally think that read_dir("") should list the current directory (on all platforms). It's true that the underlying opendir("") (on Unix) gives 'not found' for "", but our read_dir functionality is more than just a wrapper around the syscall: it also Path::join()s the returned file names to obtain a full path as part of DirEntry::path(), which is why I think we need to support "" to support unprefixed paths (i.e. that won't start with ./).

That said, I'm curious to hear more about potential downsides. Will accepting "" for the current directory lead to any subtle bugs, for example?

All other APIs that take a path do error on "" and being able to read_dir() a path but not being able to call canonicalize, symlink_metadata or similar at the same time would be confusing and require special handling anyway.
Being able to pass the same path to more than just read_dir is important for anything that has more complex preconditions than "can I open it".

But I think we should instead see read_dir(x) as "give me all files with parent path x"

I think that's more the semantics that I'd ascribe to something like walkdir, not a shallow enumeration. ReadDir does not provide all files under this prefix.

read_dir("") gives "Cargo.toml", "src", etc.

Well, except that this errors on unix. So it never worked portably, so there shouldn't be any software relying on this behavior.

In the team meeting, I was asked: "why not just use .file_name() instead of .path() if you don't want the ./ prefix?" My reasoning is that I might want to write code that works for both the current directory and a subdirectory, without adding unnecessary ./ prefixes:

for parent in ["", "src"] {
   for f in std::fs::read_dir(parent)? {
       println!("{}", f?.path().display());
   }
}

This seems like a somewhat narrow case. Anything more complex and I'd already reach for walkdir.
And it is also a merely aesthetic issue, src and ./src both are correct as far as filesystem APIs are concerned.
So if prettyfication is needed that can be solved with a method that strips leading ./ when it is ok to do so.
Ideally strip_prefix would be less strict and act more like the relative_to method that some other language libraries provide then it cold be used for this purpose, alas...


Overall I think read_dir behaves consistently with all other FS APIs here (on unix) and windows is the odd one out.

While path.parent() returning Some("") is kind of weird but I assume this is meant to distinguish relative paths from absolute ones. I think it would have been better to return an enum that distinguishes the reached-absolute-root vs. reached-relative-root cases. But I'll file this under the general "Path is a pile of small mistakes" category, not as a reason to adjust read_dir.

@ChrisDenton
Copy link
Member

ChrisDenton commented Sep 5, 2023

For whatever it's worth, I went looking at other standard libraries and tools to see what they do and they mostly seem to treat "" as an invalid path for their read_dir equivalents (e.g. bash ls "" or C++ filesystem). There are some special cases though.

Python's pathlib.Path type automatically converts "" to . but this is not specific to listing a directory. It applies to constructing the Path, which has the weird effect of os.listdir("") being an error while os.listdir(Path("")) works.

Powershell will error for ls -LiteralPath "". But ls -Path "" defaults to the current directory. However, it takes a pattern (e.g. wildcards like *) so it's not quite the same as Rust's API. I'd assume (but don't know) that the reason for this was the same as the reason Rust-on-Windows accepts the empty path (it defaults to appending * before passing to the underlying OS API).

I can also think of a few examples in other contexts where the empty path is treated as the current directory. I.e. Solaris treats an empty symlink the same as . and an empty in the unix PATH environment variable is also often taken as the current directory. Though again, these aren't quite the same thing.

I've not fully surveyed every language so it's possible I'm missing some precedent.

@BurntSushi
Copy link
Member

I do personally find it odd to treat the empty string as equivalent to the current working directory. The empty string isn't a valid path, so it makes sense to me that it would return an error. The interaction with path.parent() seems unfortunate, but I don't think it's bad enough to warrant making read_dir("") work.

If we did end up deciding to make read_dir("") work despite me not signing off on it, I would still make the change to walkdir to ensure it's consistent. (That is, I won't hold inconsistency with walkdir as a hostage to avoid making the change in the first place.)

@Amanieu
Copy link
Member

Amanieu commented Oct 10, 2023

We discussed this in the libs-api meeting today and the team was still somewhat split on the issue. With that in mind, and taking @BurntSushi's comments into account, we decided to not add support for treating "" as the current directory.

As a consequence #116606 will make Windows match the current behavior on Unix.

@rfcbot fcp close

@rfcbot
Copy link

rfcbot commented Oct 10, 2023

Team member @Amanieu has proposed to close this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Oct 10, 2023
@rfcbot
Copy link

rfcbot commented Oct 12, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

harrysarson-signaloid added a commit to harrysarson-signaloid/this-week-in-rust that referenced this issue Oct 19, 2023
The issue rust-lang/rust#114149 has an FCP with disposition "close" (and not "merge"). Update the text to account for this
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Oct 22, 2023
@rfcbot
Copy link

rfcbot commented Oct 22, 2023

The final comment period, with a disposition to close, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

bors added a commit to rust-lang-ci/rust that referenced this issue Oct 23, 2023
On Windows make `read_dir` error on the empty path

This makes Windows consistent with other platforms. Note that this should not be taken to imply any decision on rust-lang#114149 has been taken. However it was felt that while there is a lack of libs-api consensus, we should be consistent across platforms in the meantime.

This is a change in behaviour for Windows so will also need an fcp before merging.

r? libs-api
@dtolnay
Copy link
Member

dtolnay commented Oct 23, 2023

Fixed by #116606.

@dtolnay dtolnay closed this as completed Oct 23, 2023
U007D pushed a commit to rust-lang/this-week-in-rust that referenced this issue Oct 26, 2023
The issue rust-lang/rust#114149 has an FCP with disposition "close" (and not "merge"). Update the text to account for this
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. finished-final-comment-period The final comment period is finished for this PR / Issue. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests