fsGroup sometimes works sometimes breaks #107

dns2utf8 · 2020-02-06T10:19:32Z

Hi all

I am using this CSI driver to access HPE nimble storage over fiber channel.
Lately I noticed that sometimes the fsGroup is not applied to the storage.

Currently, there are three applications on the cluster running on the same node.

Gitlab with working fsGroup
Gitlab without working fsGroup
mfw where the fsGroup works in ~30% of deployments.

The relevant yaml:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: multi-file-writer
  namespace: snapshot-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: multi-file-writer
  minReadySeconds: 5
  strategy:
    type: Recreate
  template:
    spec:
      securityContext:
        runAsUser: 65534
        fsGroup: 65534

Debugging

The logs did not contain any hints regarding these applications:

grep -ri fsGroup /var/log/nimble* /var/log/syslog

Other containers emitted logs containing FSGroup:nil. Since they did not request a fsGroup that appears to be okay.

Cheers,
Stefan

The text was updated successfully, but these errors were encountered:

raunakkumar · 2020-02-07T02:22:54Z

Hi @dns2utf8 Could you please upload the logs for us to review.
You should be able to collect them using https://github.com/hpe-storage/csi-driver#log-collector
Also, is your issue related to kubernetes/examples#260?

dns2utf8 · 2020-02-07T10:19:50Z

Hi

Our issue is not related. This setup uses a SAN via FiberChannel and xfs on the LUNs.
The logs from the tree nodes are 1.2GB in total. Uploading them will take a while.

dns2utf8 · 2020-02-07T11:25:35Z

Uploaded the logs here

raunakkumar · 2020-02-07T19:12:32Z

Thanks but i am unable to reach https://gitlab.gyselroth.net/stefan.schindler/hpe-nimble-logs.

Did you apply the following parameters for the storage class for the underlying pvc listed below?

fsOwner | userId:groupId | The user id and group id that should own the root directory of the filesystem.
fsMode | Octal digits | 1 to 4 octal digits that represent the file mode to be applied to the root directory of the filesystem.

https://github.com/hpe-storage/csi-driver/tree/master/examples/kubernetes/hpe-nimble-storage#provisioning-parameters

dns2utf8 · 2020-02-08T17:32:36Z

There appears to be some sort of configuration error.
Please use the public instance for now: https://gitlab.com/dns2utf8/hpe-nimble-logs

Since I am on a different project for now, I hope @raffis can answer the pvc question.

raunakkumar · 2020-02-13T17:36:29Z

Hi @dns2utf8 ,
Thanks for the logs. Didn’t find anything suspicious in the logs wrt fsGroup and runAsUser.
Tried some experiments on our cluster and verified that the runAsUser and fsGroup are honored.
Could you please elaborate what you meant by 30% of the case worked rest didn’t?
Did the pods never went to Running state or were the fsGroup and runAsUser weren’t honored.
If it's the latter could you share the output of the commands listed below?

Below is an example of my test

Pod Spec

cat pod.yaml | grep -A 2 securityContext 
  securityContext:
     runAsUser: 2157
     fsGroup: 1001

id rkumar
uid=2157(rkumar) gid=1001(eng) groups=1001(eng)

*Pod running with user id 2157

 kubectl exec -it fsgroup-pod-1 -c pod-datelog-1 -- sh
/ $ ps
PID   USER     TIME  COMMAND
    1 2157      0:10 /bin/sh 
   75 2157      0:00 sh
  681 2157      0:00 sh
  689 2157      0:00 sleep 1
  690 2157      0:00 ps


Volume is mounted with group 1001 

/ $ cd /data
/data $ ls -ltr
total 2048
-rw-r--r--    1 2157     1001       1902168 Feb 13 16:52 mydata.txt

On the host where the pod is mounted

 mount | grep mpath
/dev/mapper/mpathat on /var/lib/kubelet/plugins/hpe.com/mounts/0634be4e62e74eae4d000000000000000000000101 type xfs (rw,relatime,attr2,inode64,noquota)
/dev/mapper/mpathat on /var/lib/kubelet/pods/2921bde4-b999-4a9a-8881-393fccb368d7/volumes/kubernetes.io~csi/pvc-fc29440f-bcc1-47bb-b29d-8559db04e92d/mount type xfs (rw,relatime,attr2,inode64,noquota)

 cd /var/lib/kubelet/plugins/hpe.com/mounts/0634be4e62e74eae4d000000000000000000000101
/var/lib/kubelet/plugins/hpe.com/mounts/0634be4e62e74eae4d000000000000000000000101# ls -ltr
total 2048
-rw-r--r-- 1 rkumar eng 1927514 Feb 13 08:59 mydata.txt

shivamerla · 2020-03-11T14:11:49Z

@dns2utf8 Can you respond to comment above if you are still seeing this issue?

dns2utf8 · 2020-03-16T14:34:37Z

Hi

So the 30% means this: While testing the deployment for the application I deleted the resources every now and then.
Then I realized in roughly 1 of 3 runs the storage would not attach correctly and the software would crash.

raunakkumar · 2020-11-12T17:24:18Z

Hi @dns2utf8 do you still face the issue with fsGroup. Is the behavior same without fsGroup ?

rgcostea assigned raunakkumar Feb 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fsGroup sometimes works sometimes breaks #107

fsGroup sometimes works sometimes breaks #107

dns2utf8 commented Feb 6, 2020

raunakkumar commented Feb 7, 2020

dns2utf8 commented Feb 7, 2020

dns2utf8 commented Feb 7, 2020

raunakkumar commented Feb 7, 2020 •

edited

Loading

dns2utf8 commented Feb 8, 2020

raunakkumar commented Feb 13, 2020

shivamerla commented Mar 11, 2020

dns2utf8 commented Mar 16, 2020

raunakkumar commented Nov 12, 2020

fsGroup sometimes works sometimes breaks #107

fsGroup sometimes works sometimes breaks #107

Comments

dns2utf8 commented Feb 6, 2020

Debugging

raunakkumar commented Feb 7, 2020

dns2utf8 commented Feb 7, 2020

dns2utf8 commented Feb 7, 2020

raunakkumar commented Feb 7, 2020 • edited Loading

dns2utf8 commented Feb 8, 2020

raunakkumar commented Feb 13, 2020

shivamerla commented Mar 11, 2020

dns2utf8 commented Mar 16, 2020

raunakkumar commented Nov 12, 2020

raunakkumar commented Feb 7, 2020 •

edited

Loading