-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Unix FileStatus flag retrieval without perf regression #47348
Conversation
…' cases, but not failure. Use _secondaryCache and _initializedSecondaryCache.
…yExistIndependentlyOfTarget unit test (link should exist after main file deleted).
Need to preserve the _isDirectory field. The order and the place in which it acquires a value matter. It should also be set to false when EnsureStatInitialized is called. The unit test that verifies this is System.IO.Tests.DirectoryInfo_Exists.SymLinksMayExistIndependentlyOfTarget.
… both usage cases.
….ToFileSystemInfo, since it's the only place where it is used. Move the Debug.Assert to the top with the ROS<char> instead of the string.
Which of the 28 commits here fixes the actual issue? Lots of these looks like style. |
Only a few were style. This is the commit that fixes the hidden bug: 62b77b8 |
The CI is failing in OSX in tests related to the hidden flag. I'm investigating. Strangely, they are passing in my MacOS. Edit: I'm making it a draft while I address the failures. |
…eSystemEntry.Initialize calls to refresh individual caches.
… hidden check in Initialize to prevent perf regression.
src/libraries/System.IO.FileSystem/tests/Enumeration/AttributeTests.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right way to fix it is to move all the stat and lstat calling inside FileStatus
This is a very good idea. 👍
The changes look good, I would just improve the naming a little bit and encapsulate the caching logic within the FileStatus
type.
As soon as you fix the CI errors I am going to perform a more detailed review.
I get this error
As mentioned offline, you get an error about missing .NET 6 SDK. Perf repo always requires latest SDK, so we can have benchmarks that test new APIs (https://github.com/dotnet/performance/blob/master/docs/prerequisites.md#net-core-sdk)
if (useHiddenFlag) | ||
{ | ||
fileTwo.Attributes |= FileAttributes.Hidden; | ||
fileFour.Attributes |= FileAttributes.Hidden; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for my education: is it possible to set these attributes before creating the file and have FileInfo.Create
create the file with the attributes provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The file needs to exist. The Attributes
setter calls the FileStatus.SetAttributes
method (you can see it in this PR), which sets each flag into the file via a P/Invoke.
|
||
[Fact] | ||
[PlatformSpecific(TestPlatforms.AnyUnix)] | ||
public void SkippingHiddenFiles_Unix() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think that the test name should include the information about using prefixes.
public void SkippingHiddenFiles_Unix() | |
public void SkippingFilesHiddenWithDotPrefixOnUnix() |
@@ -21,25 +21,42 @@ protected virtual string[] GetPaths(string directory, EnumerationOptions options | |||
} | |||
|
|||
[Fact] | |||
public void SkippingHiddenFiles() | |||
[PlatformSpecific(TestPlatforms.Windows | TestPlatforms.OSX)] | |||
public void SkippingHiddenFiles_Windows_OSX() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think that the test name should include the information about using attributes.
public void SkippingHiddenFiles_Windows_OSX() | |
public void SkippingFilesHiddenWithAttributesOnWindowsAndOSX() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when the user sets the attribute on Linux? Is it just ignored and the user gets silent error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No error. It just gets ignored. I just tested it on Ubuntu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No error. It just gets ignored.
I wonder if we should address that somehow. Usually, we would just mark the enum member with UnsupportedOSPlatform
public enum FileAttributes
{
[UnsupportedOSPlatform("Linux")]
Hidden
}
But if I understand correctly, the enum can be returned by File.GetAttributes
for files that start with a dot on Linux, but has no effect when used with the corresponding setter. Should we at least mention it in the docs?
IEnumerable<string> enumerable = new FileSystemEnumerable<string>( | ||
testDirectory.FullName, | ||
(ref FileSystemEntry entry) => entry.ToFullPath(), | ||
new EnumerationOptions() { AttributesToSkip = 0 }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does setting AttributesToSkip
to 0
here has any effect on the test behavior? Would removing it change anything?
new EnumerationOptions() { AttributesToSkip = 0 }) | |
new EnumerationOptions()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the IsHiddenAttribute method starts failing if I don't explicitly set this to 0 (to signify "don't skip any attributes").
The reason is because when unset, then by default we skip the Hidden
and the System
attributes. That's how EnumerationOptions.AttributesToSkip
is documented.
Skip entries with the given attributes. Default is FileAttributes.Hidden | FileAttributes.System.
entry._status = default; | ||
entry._status.Invalidate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to encapsulate the logic that is modifying the FileStatus
below and move it into FileStatus
itself? So the FileStatus
could keep all the caching logic as it's private detail and return only a valid FileStatus
?
entry._status = default; | |
entry._status.Invalidate(); | |
entry._status = FileStatus.Create(entry, directoryEntry, out FileAttributes attributes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
else if (_fileStatus.Gid == Interop.Sys.GetEGid()) | ||
} | ||
|
||
internal bool HasSecondaryDirectoryFlag => HasDirectoryFlag(_secondaryCache); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HasSecondaryDirectoryFlag
does not tell me much. Is it checking if the symlink is a directory?
internal bool HasSecondaryDirectoryFlag => HasDirectoryFlag(_secondaryCache); | |
internal bool IsSymlinkToDirectory => HasDirectoryFlag(_secondaryCache); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is checking if the symbolic link's target is a directory.
Edit: I renamed it appropriately, based on the other related feedback.
|
||
private bool IsValid => | ||
_initializedMainCache == 0 && // Should always be successfully refreshed | ||
(_initializedSecondaryCache == -1 || _initializedSecondaryCache == 0); // Only refreshed when path is detected to be a symbolic link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(_initializedSecondaryCache == -1 || _initializedSecondaryCache == 0); // Only refreshed when path is detected to be a symbolic link | |
_initializedSecondaryCache <= 0; // Only refreshed when path is detected to be a symbolic link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 means successful retrieval. -1 means uninitialized. Any other value means it's a linux error number. Although I don't think error numbers can be negative, I wanted to be very specific about the two possible values I cared about.
I assume it has a slight performance improvement to do one check instead of two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it has a slight performance improvement to do one check instead of two?
If it has any, it's most probably very, very small. I was more thinking about simplifying the code. As long as everything is encapsulated within this type, we should rather not worry about values that are negative, but not equal -1
(as we control everything in this type and it's impossible). But it's a minor nit.
|
||
internal bool IsSecondaryCacheValid => _initializedSecondaryCache == 0; | ||
|
||
private bool IsValid => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: When I see FileStatus.IsValid
I think of the file being valid, not file status cache. Would HasValidCache
be a better name?
private bool IsValid => | |
private bool HasValidCache => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll use HasValidCaches
(plural).
} | ||
|
||
internal void SetLastWriteTime(string path, DateTimeOffset time) => SetAccessOrWriteTime(path, time, isAccessTime: false); | ||
internal void RefreshCaches(ReadOnlySpan<char> path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned before, it would be great if we could move all the methods that need to call this particular method inside FileStatus
and make it private, so the cache would become an implementation detail to the consumers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a difference between called EnsureCachesInitialized
and RefreshCaches
: the former will check if they are initialized, and if not, it will refresh them. The second one will always refresh them.
Maybe I can pass a boolean to indicate if we want to force refresh or not. That way, only one method is visible outside.
|
||
public FileSystemInfo ToFileSystemInfo() | ||
{ | ||
Debug.Assert(!PathInternal.IsPartiallyQualified(FullPath), "FullPath should be fully qualified when constructed from directory enumeration"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug.Assert in public API? If it is important check it should be a real time check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the other comment below, I decided to revert this change since it was unrelated to the purpose of this PR.
FileSystemInfo info = IsDirectory | ||
? new DirectoryInfo(fullPath, fileName: fileName, isNormalized: true) | ||
: new FileInfo(fullPath, fileName: fileName, isNormalized: true); | ||
|
||
info.Init(ref _status); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not create internal ctor-s with the parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I can revert this change. Originally it was not a constructor but a factory method. I thought we could save one method call if I put all that logic in the only place where it was called. But it's unrelated to this change anyway.
} | ||
private void ThrowOnCacheInitializationError(ReadOnlySpan<char> path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | |
private void ThrowOnCacheInitializationError(ReadOnlySpan<char> path) | |
} | |
private void ThrowOnCacheInitializationError(ReadOnlySpan<char> path) |
|
||
// We track intent of creation to know whether or not we want to (1) create a | ||
// DirectoryInfo around this status struct or (2) actually are part of a DirectoryInfo. | ||
internal bool InitiallyDirectory { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is seem assigned only in ctor.
internal bool InitiallyDirectory { get; set; } | |
internal bool InitiallyDirectory { get; private set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is seem assigned only in ctor.
So it should be { get; }
instead of { get; private set; }
?
// IMPORTANT: Attribute logic must match the logic in FileSystemEntry | ||
|
||
EnsureStatInitialized(path); | ||
EnsureCachesInitialized(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the comment? Is all logic in one place now?
Interop.CheckIo(Interop.Sys.LChflags(path, (_fileStatus.UserFlags & ~(uint)Interop.Sys.UserFlags.UF_HIDDEN)), path, InitiallyDirectory); | ||
} | ||
} | ||
return DateTimeOffset.FromFileTime(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return const?
if (!IsMainCacheValid) | ||
{ | ||
if (!TryRefreshMainCache(path)) | ||
{ | ||
if (!continueOnError) | ||
{ | ||
ThrowOnCacheInitializationError(path); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use one if
? Or move all the logic in TryRefreshMainCache()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move it all to one if
. I don't think we should throw from within a try method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean IsMainCacheValid
check could be in TryRefreshMainCache()
.
private unsafe void SetAccessOrWriteTime(string path, DateTimeOffset time, bool isAccessTime) | ||
{ | ||
// force a refresh so that we have an up-to-date times for values not being overwritten | ||
_fileStatusInitialized = -1; | ||
EnsureStatInitialized(path); | ||
Invalidate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalidate that? Can we use more informative name?
Had a conversation with @adamsitnik about this. I am going to look into splitting the Unix code into Linux and OSX, because it may be possible to use an API (at least on Linux) that lets us retrieve the list of files inside a folder with the hidden flag specified. This will allow us to have separate performance results for those platforms. I will do the same investigation for the Windows API. |
This could fix #31301 too. |
Currently I have worked on improving PowerShell to handle OneDrive. It is a reparse points. And we need to analyze a surrogates. This information is not present in FileSystemInfo.FileAttributes - there is only one flag RepasePoint but we need more and we have to call FindFirstFileEx and check dwReserved0. Thus, we need extended attributes not only on Unix, but also on Windows. Perhaps we could introduce new |
Draft Pull Request was automatically closed for inactivity. It can be manually reopened in the next 30 days if the work resumes. |
Fixes #37301
Addresses #41739
If you are reviewing this PR, please take a look at each commit individually (small incremental changes).
FileSystemEntry
andFileSystemInfo
were not reporting theirhidden
flags correctly on Linux and MacOS.On Linux and MacOS, the
hidden
attribute can be set by prefixing the file or folder name with a dot.On MacOS and Windows, the
hidden
attribute can be set by prefixing the Hidden flag on the file or folder.The original PR introduced a perf regression due to a new
stat
call.The
FileStatus
andFileSystemEntry
initialization code had duplicate code when setting flags. The right way to fix it is to move all thestat
andlstat
calling insideFileStatus
.FileSystemEntry
also has some special needs when detecting directories and symbolic links: since it has access to theInodeType
, some info can be inferred from them, savingstat
calls. So that logic had to be preserved in theFileSystemEntry
initialization method, while deferring as much as possible toFileStatus
.Across
FileStatus
, I ensured thestat
andlstat
calls only get executed lazily (don't call again if already initialized), unless explicitly asking to refresh them (forcing the newstat
call).I also added the original unit tests to verify both the hidden flag and the readonly flag (we didn't have unit tests for the last one).
I made sure all unit tests are passing in MacOS I also ran a temporary unit test with the below code to quickly verify that the performance regression is gone:
Temporary perf test
cc @iSazonov
Pending: I want to get official dotnet/performance results for all the IO.FIleSystem performance tests. But I need to figure out how to run the micro benchmarks in .NET 6.0 in MacOS, because the Preventing regressions instructions are not working anymore. I get this error:
Error message
Edits: