Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Powerscale CSI driver RO PVC-from-snapshot wrong zone #487

Closed
danthem opened this issue Sep 28, 2022 · 9 comments
Closed

[BUG]: Powerscale CSI driver RO PVC-from-snapshot wrong zone #487

danthem opened this issue Sep 28, 2022 · 9 comments
Assignees
Labels
area/csi-powerscale Issue pertains to the CSI Driver for Dell EMC PowerScale needs-triage Issue requires triage. type/bug Something isn't working. This is the default label associated with a bug issue.
Milestone

Comments

@danthem
Copy link

danthem commented Sep 28, 2022

Bug Description

In an environment where the storageclass Access Zone/AzServiceIP is different from the System zone (which is used as API endpoint), a Read-Only PVC from snapshot gets its NFS export on PowerScale created incorrectly in the wrong zone (System). This in turn results in pods being created using this PVC failing to start as they're unable to mount the NFS export. This is because the pods will try to mount using the correct AzServiceIP but the export is not created in that zone.

When creating a Read-Write PVC from snapshot it's working as the PVC gets correctly created in my 'csizone', however since it's RW it must create the new path and then copy the data from the snapshot to this new path. Depending on the use case for the RO PVC this may be inefficient and take a lot of time. A RO PVC would be able to point directly to the snapshot (since the snapshot itself is RO) and because of this there would be no need for any copy of data to happen, instead the PVC can be ready immediately.

Logs

Pod that fails to deploy indicates that it's being rejected to mount. This is because it's trying to access an NFS exports that was (incorrectly) created in System zone via an IP in my CSI Access Zone

Events:
  Type     Reason                  Age                   From                     Message
  ----     ------                  ----                  ----                     -------
  Normal   Scheduled               25m                   default-scheduler        Successfully assigned default/csizone-nginx-from-snap-ro to minikube
  Normal   SuccessfulAttachVolume  24m                   attachdetach-controller  AttachVolume.Attach succeeded for volume "k8s-4d90ffd4fa"
  Warning  FailedMount             7m9s (x2 over 9m25s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[task-pv-storage], unattached volumes=[kube-api-access-827km task-pv-storage]: timed out waiting for the condition
  Warning  FailedMount             2m37s (x8 over 22m)   kubelet                  Unable to attach or mount volumes: unmounted volumes=[task-pv-storage], unattached volumes=[task-pv-storage kube-api-access-827km]: timed out waiting for the condition
  Warning  FailedMount             2m21s (x19 over 24m)  kubelet                  MountVolume.SetUp failed for volume "k8s-4d90ffd4fa" : rpc error: code = Unknown desc = mount failed: exit status 32
mounting arguments: -t nfs -o vers=3,rw 10.60.34.217:/ifs/.snapshot/snapshot-078ad18e-cac2-4711-a011-ba8b4e6947e3/csi_zone/k8s-2c5069e508 /var/lib/kubelet/pods/9de52fdf-c800-4bc9-b1be-08be4920565d/volumes/kubernetes.io~csi/k8s-4d90ffd4fa/mount
output: mount.nfs: access denied by server while mounting 10.60.34.217:/ifs/.snapshot/snapshot-078ad18e-cac2-4711-a011-ba8b4e6947e3/csi_zone/k8s-2c5069e508

Screenshots

No response

Additional Environment Information

<see 'steps to reproduce'>

Steps to Reproduce

I have included my storageclass, original PVC, original pod, snapshotclass, volumsnap, rw_pvc_from_snap, ro_pvc_from_snap .yaml files as well as two additional pod yamls, one for creating an nginx instance from a rw pvc from snapshot and one for creating an nginx instance from a ro pvc from snapshot.

Find them below:
RO_snap.tar.gz

And my values.yaml:
values.yaml.txt

As you can see I use System zone for endpoint but then I have set csizone and an IP in that zone for the storageclass.

The whole setup can quickly be done by extracting them to a directory and then running:
$ for file in {1..9}_*.yaml; do kubectl create -f $file ;sleep 1; done

What you will see is:

[crcuser@grafinflx RO_snap]$ for file in {1..9}_*.yaml; do kubectl create -f $file ;sleep 1; done
storageclass.storage.k8s.io/csizone-storageclass created
persistentvolumeclaim/csizone-pvc created
pod/csizone-nginx-pod created
volumesnapshotclass.snapshot.storage.k8s.io/csizone-snapshotclass created
volumesnapshot.snapshot.storage.k8s.io/csizone-volumesnapshot created
persistentvolumeclaim/csizone-rw-pvc-from-snap created
persistentvolumeclaim/csizone-ro-pvc-from-snap created
pod/csizone-nginx-from-snap-rw created
pod/csizone-nginx-from-snap-ro created

and from kubectl we can see that the last pod fails to start:

[crcuser@grafinflx RO_snap]$ kubectl get pods
NAME                         READY   STATUS              RESTARTS   AGE
csizone-nginx-from-snap-ro   0/1     ContainerCreating   0          109s
csizone-nginx-from-snap-rw   1/1     Running             0          110s
csizone-nginx-pod            1/1     Running             0          116s

If we describe the pod we can see the problem:

[crcuser@grafinflx RO_snap]$ kubectl describe pod csizone-nginx-from-snap-ro
Name:         csizone-nginx-from-snap-ro
[...]
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Normal   Scheduled               99s                default-scheduler        Successfully assigned default/csizone-nginx-from-snap-ro to minikube
  Normal   SuccessfulAttachVolume  98s                attachdetach-controller  AttachVolume.Attach succeeded for volume "k8s-4d90ffd4fa"
  Warning  FailedMount             27s (x8 over 93s)  kubelet                  MountVolume.SetUp failed for volume "k8s-4d90ffd4fa" : rpc error: code = Unknown desc = mount failed: exit status 32
mounting arguments: -t nfs -o vers=3,rw 10.60.34.217:/ifs/.snapshot/snapshot-078ad18e-cac2-4711-a011-ba8b4e6947e3/csi_zone/k8s-2c5069e508 /var/lib/kubelet/pods/9de52fdf-c800-4bc9-b1be-08be4920565d/volumes/kubernetes.io~csi/k8s-4d90ffd4fa/mount
output: mount.nfs: access denied by server while mounting 10.60.34.217:/ifs/.snapshot/snapshot-078ad18e-cac2-4711-a011-ba8b4e6947e3/csi_zone/k8s-2c5069e508

We're getting access denied when trying to mount.. So let's look at PowerScale for that export:

Bacon-4# isi nfs exports list --zone=csizone
ID   Zone    Paths                        Description
--------------------------------------------------------------------------------------------
2    csizone /ifs/csi_zone                containeraccess
20   csizone /ifs/csi_zone/k8s-2c5069e508 CSI_QUOTA_ID:ai7pHgEAAAAAAAAAAAAAQHSLAAAAAAAA
21   csizone /ifs/csi_zone/k8s-ae2722dd5f CSI_QUOTA_ID:bS7pHgEAAAAAAAAAAAAAQHWLAAAAAAAA
--------------------------------------------------------------------------------------------
Total: 3

I only have two NFS exports created by CSI driver in this zone, the first one is the original PVC and the second one is the RW PVC from snapshot... So where's my RO PVC from snapshot? Let's look in the system zone:

Bacon-4# isi nfs exports list | grep k8
ID   Zone    Paths                        Description
--------------------------------------------------------------------------------------------
[...]
60   System /ifs/.snapshot/snapshot-078ad18e-cac2-4711-a011-ba8b4e6947e3/csi_zone/k8s-2c5069e508

So my newly RO PVC from snapshot was created in the system zone... This is why my new pod is also unable to access it, the pod is trying to mount via an IP in the csizone.

Expected Behavior

The expected behavior is for the RO PVC to create an NFS export on PowerScale within the correct access zone. This is done by creating the export under /ifs/<AZ>/<path-to-original-pvc>/.snapshot/<snapshot name> and not under /ifs/.snapshot/~~~.

For my example above the NFS export should have been created on path:
/ifs/csi_zone/k8s-2c5069e508/.snapshot/snapshot-078ad18e-cac2-4711-a011-ba8b4e6947e3 and with parameter --zone=csizone

Bacon-4# isi nfs exports create /ifs/csi_zone/k8s-2c5069e508/.snapshot/snapshot-078ad18e-cac2-4711-a011-ba8b4e6947e3 --zone=csizone
Bacon-4# 
Bacon-4# isi nfs exports list --zone=csizone
ID   Zone    Paths                                                                                Description
----------------------------------------------------------------------------------------------------------------------------------------------------
[...]
24   csizone /ifs/csi_zone/k8s-2c5069e508/.snapshot/snapshot-078ad18e-cac2-4711-a011-ba8b4e6947e3
---------

This export is created in the correct zone which means it will be possible to mount it via the IPs in my csizone, example:

[crcuser@grafinflx RO_snap]$  sudo mount 10.60.34.217:/ifs/csi_zone/k8s-2c5069e508/.snapshot/snapshot-078ad18e-cac2-4711-a011-ba8b4e6947e3 /mnt
[sudo] password for crcuser:
[crcuser@grafinflx RO_snap]$ 

CSM Driver(s)

CSI Driver for PowerScale 2.4.0

Installation Type

Helm

Container Storage Modules Enabled

registry.k8s.io/sig-storage/snapshot-controller:v6.0.1

Container Orchestrator

Kubernetes v1.24.3 / minikube 1.26.1

Operating System

RHEL 8.6

@danthem danthem added needs-triage Issue requires triage. type/bug Something isn't working. This is the default label associated with a bug issue. labels Sep 28, 2022
@nitesh3108 nitesh3108 added the area/csi-powerscale Issue pertains to the CSI Driver for Dell EMC PowerScale label Oct 12, 2022
@shefali-malhotra
Copy link
Collaborator

@danthem This is the expected behavior as on Isilon snapshot is always created under the system access zone.
To restore on non-system access zone, there is a change needed in storage class due to how Isilon created snapshots.
So, if one wants to mount the ROX volume, then you need to use the AzServiceIP of the System access zone.

@danthem
Copy link
Author

danthem commented Oct 23, 2022

Snapshots are kind of outside the access zones in a way: their path can be accessed both through the system zone (if going via /ifs/.snapshot/~~) or via the 'correct' access zone if you just use the right path (/ifs/<path to AZ base>/.snapshot/~~). The issue I see with the curent CSI driver behavior is that it uses the System zone path instead of the specified Access Zone path.

It is fully possible to create an NFS export to a snapshot that was created in a particular access zone path and then access that NFS export through that access zone (without going through the System zone), as I have demonstrated when opening this issue.

Currently CSI driver creates the NFS export in the System zone with a System zone path:
/ifs/.snapshot/snapshot-078ad18e-cac2-4711-a011-ba8b4e6947e3/csi_zone/k8s-2c5069e508

But it does not have to do that, if the CSI driver had instead created the export with a path like this:
/ifs/csi_zone/k8s-2c5069e508/.snapshot/snapshot-078ad18e-cac2-4711-a011-ba8b4e6947e3
and assigned it to the correct zone, it would have been reachable on an IP belonging to that zone.

Here I am mounting this snapshot directly through the 'csizone' access zone IP:
$ sudo mount 10.60.34.217:/ifs/csi_zone/k8s-2c5069e508/.snapshot/snapshot-078ad18e-cac2-4711-a011-ba8b4e6947e3 /mnt

@shefali-malhotra
Copy link
Collaborator

I tried creating snapshot of pvc on isilon . I could see pvc in the /ifs/NFS/integration/k8s-611993c2ac
but snapshot at below path only
/ifs/.snapshot/snapshot-73ea9018-83a4-43ca-bb96-d2f01e8cf2e7/NFS/integration/k8s-611993c2ac
There is no path like (/ifs//.snapshot/~~). getting created.

Is there any extra params needs to be used while creating snapshot?

Below are the details of
Storage Class
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
sc-csi0zone csi-isilon.dellemc.com Delete Immediate true 37h
[root@lglw3142 ~]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
source-pvc-csi0zone Bound k8s-611993c2ac 2Gi ROX sc-csi0zone 37h
[root@lglw3142 ~]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
k8s-611993c2ac 2Gi ROX Delete Bound default/source-pvc-csi0zone sc-csi0zone 37h
[root@lglw3142 ~]# kubectl get volumesnapshot
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
snapshot-of-source-pvc-csi0zone true source-pvc-csi0zone 0 isilon-snapclass-csi0zone snapcontent-73ea9018-83a4-43ca-bb96-d2f01e8cf2e7 36h 36h

@shefali-malhotra
Copy link
Collaborator

As referred by you that a specific snapshot directory is getting created in specific access zone . I could confirm that
a snapshot directory getting created under /ifs/NFS/integration/k8s-611993c2ac which is hidden and can be viewed via ssh . I have verified it it's there, but OneFS UI doesn't show this . To use this directory and create NFS export at this path requires code change and needs to be investigated considering backward compatibility. Will continue investigating the changes. Feature will be done in next quarter

@bharathsreekanth
Copy link
Contributor

@rajkumar-palani Pls help update and provide clarification.

@rajkumar-palani
Copy link

@bharathsreekanth - we have already created internal JIRA ticket to address this issue in Q1.

@shaynafinocchiaro
Copy link
Collaborator

@rajkumar-palani @bharathsreekanth have any updates been made here?

@hoppea2
Copy link
Collaborator

hoppea2 commented Jun 30, 2023

@rajkumar-palani Please close if the issue has been resolved in Q1

@shefali-malhotra
Copy link
Collaborator

Bug is fixed in CSM 1.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csi-powerscale Issue pertains to the CSI Driver for Dell EMC PowerScale needs-triage Issue requires triage. type/bug Something isn't working. This is the default label associated with a bug issue.
Projects
None yet
Development

No branches or pull requests

7 participants