-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support nvidia and nvidia-frontend names when getting device major #330
Conversation
e294588
to
0f27d53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @tariq1890.
If we were to do the check for nvidia
or nvidia-frontend
in the devices.Get
method as below:
diff --git a/internal/info/proc/devices/devices.go b/internal/info/proc/devices/devices.go
index 95430428..65f3108f 100644
--- a/internal/info/proc/devices/devices.go
+++ b/internal/info/proc/devices/devices.go
@@ -32,8 +32,12 @@ const (
NVIDIACTLMinor = 255
NVIDIAModesetMinor = 254
+ // NVIDIADeviceMajor is hardcoded as NV_MAJOR_DEVICE_NUMBER in nvidia-modprobe:
+ // https://github.com/NVIDIA/nvidia-modprobe/blob/d6bce304f30b6661c9ab6a993f49340eafca7a7e/modprobe-utils/nvidia-modprobe-utils.c#L85
+ NVIDIADeviceMajor = 195
+
NVIDIAFrontend = Name("nvidia-frontend")
- NVIDIAGPU = NVIDIAFrontend
+ NVIDIAGPU = Name("nvidia")
NVIDIACaps = Name("nvidia-caps")
NVIDIAUVM = Name("nvidia-uvm")
@@ -61,12 +65,15 @@ var _ Devices = devices(nil)
// Exists checks if a Device with a given name exists or not
func (d devices) Exists(name Name) bool {
- _, exists := d[name]
+ _, exists := d.Get(name)
return exists
}
// Get a Device from Devices
func (d devices) Get(name Name) (Major, bool) {
+ if name == NVIDIAGPU || name == NVIDIAFrontend {
+ return NVIDIADeviceMajor, true
+ }
device, exists := d[name]
return device, exists
}
The change set seems a little easier to reason about. This also doesn't change the interface to the lower level package at all.
The general cleanups that we can make can be made as additional commits on top.
Is there a way that we can verify the failure of the existing code and the fix in a unit test?
This seems cleaner and more consistent given that the rest of the devices are all looked up using their major device file. |
Another option would be to inject the hardcoded name -> major mappings when we read the
This would prevent us from forgetting to use the |
Thanks for the reviews @klueska @elezar Agreed that this PR is more complex than it needs to be. My only concern is that we are returning Major number of the nvidia device at all times when passing in |
This would not be an accurate representation of the |
The fact that we're not maintaining behaviour for when the
where we add a Thanks for creating #334. Feel free to integrate those changes as part of this PR so that it does not require additional changes once this is merged. |
ad7277b
to
1024dae
Compare
Thanks for your patience with this @elezar . I've made the changes and it hopefully addresses all of your concerns. |
… device Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
1024dae
to
f414ac2
Compare
Signed-off-by: Evan Lezar <elezar@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tariq1890.
I have added a commit on top with some refactoring that should simplify testing this.
@@ -31,11 +31,25 @@ func TestCreateControlDevices(t *testing.T) { | |||
|
|||
nvidiaDevices := &devices.DevicesMock{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the use of a mock here means that the change to the Get
function in proc.Devices
are not tested here. I have added a simple commit on top of this that adds a devices.New()
function that allows a devices struct to be constructed explicitly. This can then be used in tests to construct two sets of "proc files" and ideally all tests are repeated for each of these.
I think the various functions in proc/devices/devices.go
would have to be cleaned up a bit to use the contructor correctly, but this can be considered out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
@@ -46,6 +60,7 @@ func TestCreateControlDevices(t *testing.T) { | |||
root string | |||
devices devices.Devices | |||
mknodeError error | |||
hasError bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally require.ErrorIs
should be used. If we're wrapping errors correctly, then things should work as expected.
@@ -126,9 +160,12 @@ func TestCreateControlDevices(t *testing.T) { | |||
d.mknoder = mknode | |||
|
|||
err := d.CreateNVIDIAControlDevices() | |||
require.ErrorIs(t, err, tc.expectedError) | |||
if tc.hasError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.ErrorIs
should hanldle the case where tc.expectedError
is nil
. The need to do this may have been affected by us not wrapping an error correctly somewhere in the callchain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This PR fixes #324
It checks the NVIDIA Devices struct constructed from
/proc/devices
for bothnvidia
andnvidia-frontend
to retrieve the major number.This enables support for newer driver versions ( >= 550.40 drivers) where the module was renamed while still allowing this to detect the kernel module not being loaded.