Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsGroup sometimes works sometimes breaks #107

Open
dns2utf8 opened this issue Feb 6, 2020 · 9 comments
Open

fsGroup sometimes works sometimes breaks #107

dns2utf8 opened this issue Feb 6, 2020 · 9 comments
Assignees

Comments

@dns2utf8
Copy link

dns2utf8 commented Feb 6, 2020

Hi all

I am using this CSI driver to access HPE nimble storage over fiber channel.
Lately I noticed that sometimes the fsGroup is not applied to the storage.

Currently, there are three applications on the cluster running on the same node.

  1. Gitlab with working fsGroup
  2. Gitlab without working fsGroup
  3. mfw where the fsGroup works in ~30% of deployments.

The relevant yaml:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: multi-file-writer
  namespace: snapshot-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: multi-file-writer
  minReadySeconds: 5
  strategy:
    type: Recreate
  template:
    spec:
      securityContext:
        runAsUser: 65534
        fsGroup: 65534

Debugging

The logs did not contain any hints regarding these applications:

grep -ri fsGroup /var/log/nimble* /var/log/syslog

Other containers emitted logs containing FSGroup:nil. Since they did not request a fsGroup that appears to be okay.

Cheers,
Stefan

@raunakkumar
Copy link
Collaborator

Hi @dns2utf8 Could you please upload the logs for us to review.
You should be able to collect them using https://github.com/hpe-storage/csi-driver#log-collector
Also, is your issue related to kubernetes/examples#260?

@dns2utf8
Copy link
Author

dns2utf8 commented Feb 7, 2020

Hi

Our issue is not related. This setup uses a SAN via FiberChannel and xfs on the LUNs.
The logs from the tree nodes are 1.2GB in total. Uploading them will take a while.

@dns2utf8
Copy link
Author

dns2utf8 commented Feb 7, 2020

Uploaded the logs here

@raunakkumar
Copy link
Collaborator

raunakkumar commented Feb 7, 2020

Thanks but i am unable to reach https://gitlab.gyselroth.net/stefan.schindler/hpe-nimble-logs.

Did you apply the following parameters for the storage class for the underlying pvc listed below?

fsOwner | userId:groupId | The user id and group id that should own the root directory of the filesystem.
fsMode | Octal digits | 1 to 4 octal digits that represent the file mode to be applied to the root directory of the filesystem.

https://github.com/hpe-storage/csi-driver/tree/master/examples/kubernetes/hpe-nimble-storage#provisioning-parameters

@dns2utf8
Copy link
Author

dns2utf8 commented Feb 8, 2020

There appears to be some sort of configuration error.
Please use the public instance for now: https://gitlab.com/dns2utf8/hpe-nimble-logs

Since I am on a different project for now, I hope @raffis can answer the pvc question.

@raunakkumar
Copy link
Collaborator

Hi @dns2utf8 ,
Thanks for the logs. Didn’t find anything suspicious in the logs wrt fsGroup and runAsUser.
Tried some experiments on our cluster and verified that the runAsUser and fsGroup are honored.
Could you please elaborate what you meant by 30% of the case worked rest didn’t?
Did the pods never went to Running state or were the fsGroup and runAsUser weren’t honored.
If it's the latter could you share the output of the commands listed below?

Below is an example of my test

  • Pod Spec
cat pod.yaml | grep -A 2 securityContext 
  securityContext:
     runAsUser: 2157
     fsGroup: 1001
  • id rkumar
    uid=2157(rkumar) gid=1001(eng) groups=1001(eng)

*Pod running with user id 2157

 kubectl exec -it fsgroup-pod-1 -c pod-datelog-1 -- sh
/ $ ps
PID   USER     TIME  COMMAND
    1 2157      0:10 /bin/sh 
   75 2157      0:00 sh
  681 2157      0:00 sh
  689 2157      0:00 sleep 1
  690 2157      0:00 ps


Volume is mounted with group 1001 

/ $ cd /data
/data $ ls -ltr
total 2048
-rw-r--r--    1 2157     1001       1902168 Feb 13 16:52 mydata.txt
  • On the host where the pod is mounted
 mount | grep mpath
/dev/mapper/mpathat on /var/lib/kubelet/plugins/hpe.com/mounts/0634be4e62e74eae4d000000000000000000000101 type xfs (rw,relatime,attr2,inode64,noquota)
/dev/mapper/mpathat on /var/lib/kubelet/pods/2921bde4-b999-4a9a-8881-393fccb368d7/volumes/kubernetes.io~csi/pvc-fc29440f-bcc1-47bb-b29d-8559db04e92d/mount type xfs (rw,relatime,attr2,inode64,noquota)

 cd /var/lib/kubelet/plugins/hpe.com/mounts/0634be4e62e74eae4d000000000000000000000101
/var/lib/kubelet/plugins/hpe.com/mounts/0634be4e62e74eae4d000000000000000000000101# ls -ltr
total 2048
-rw-r--r-- 1 rkumar eng 1927514 Feb 13 08:59 mydata.txt

@shivamerla
Copy link
Collaborator

@dns2utf8 Can you respond to comment above if you are still seeing this issue?

@dns2utf8
Copy link
Author

Hi

So the 30% means this: While testing the deployment for the application I deleted the resources every now and then.
Then I realized in roughly 1 of 3 runs the storage would not attach correctly and the software would crash.

@raunakkumar
Copy link
Collaborator

Hi @dns2utf8 do you still face the issue with fsGroup. Is the behavior same without fsGroup ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants