Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add OTEL trace for clusterpedia-apiserver #604

Merged
merged 1 commit into from
Dec 5, 2023

Conversation

KubeKyrie
Copy link
Contributor

@KubeKyrie KubeKyrie commented Nov 27, 2023

What type of PR is this?
/kind feature

What this PR does / why we need it:

new support opentelemetry trace for clusterpedia-apiserver,and enhance observability capabilities

Which issue(s) this PR fixes:
Fixes #None

Special notes for your reviewer:

The dafault FeatureGate APIServerTracing is enabled, we also need provide the apiserver with a tracing configuration file with --tracing-config-file=<path-to-config>.

This is an example config that records spans for 1 in 10000 requests, and uses the default OpenTelemetry endpoint:

apiVersion: apiserver.config.k8s.io/v1beta1
kind: TracingConfiguration
# default value
#endpoint: localhost:4317
samplingRatePerMillion: 100

Detailed info can be referenced https://kubernetes.io/docs/concepts/cluster-administration/system-traces/

Does this PR introduce a user-facing change?:

support opentelemetry trace for clusterpedia-apiserver

@clusterpedia-bot
Copy link

Hi @KubeKyrie,
Thanks for your pull request!
If the PR is ready, use the /auto-cc command to assign Reviewer to Review.
We will review it shortly.

Details

Instructions for interacting with me using comments are available here.
If you have questions or suggestions related to my behavior, please file an issue against the gh-ci-bot repository.

@clusterpedia-bot clusterpedia-bot added the kind/feature New feature label Nov 27, 2023
@KubeKyrie
Copy link
Contributor Author

KubeKyrie commented Nov 27, 2023

Here is the test results

  1. clusterpedia installed by the way kubectl apply
    image

kubectl -n clusterpedia-system get deploy clusterpedia-apiserver -oyaml

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app":"clusterpedia-apiserver"},"name":"clusterpedia-apiserver","namespace":"clusterpedia-system"},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"clusterpedia-apiserver"}},"template":{"metadata":{"labels":{"app":"clusterpedia-apiserver"}},"spec":{"containers":[{"command":["/usr/local/bin/apiserver","--secure-port=443","--storage-config=/etc/clusterpedia/storage/internalstorage-config.yaml","--tracing-config-file=/etc/clusterpedia/trace/tracing-config.yaml","-v=3"],"env":[{"name":"DB_PASSWORD","valueFrom":{"secretKeyRef":{"key":"password","name":"internalstorage-password"}}}],"image":"ghcr.io/kubekyrie/clusterpedia/apiserver-amd64:latest","name":"apiserver","volumeMounts":[{"mountPath":"/etc/clusterpedia/storage","name":"internalstorage-config","readOnly":true},{"mountPath":"/etc/clusterpedia/trace","name":"tracing-config","readOnly":true}]}],"serviceAccountName":"clusterpedia-apiserver","volumes":[{"configMap":{"name":"clusterpedia-internalstorage"},"name":"internalstorage-config"},{"configMap":{"name":"clusterpedia-tracing-config"},"name":"tracing-config"}]}}}}
  creationTimestamp: "2023-11-30T10:09:18Z"
  generation: 2
  labels:
    app: clusterpedia-apiserver
  name: clusterpedia-apiserver
  namespace: clusterpedia-system
  resourceVersion: "5463952"
  uid: 5d30e0d4-0fd8-4398-b7ad-075855d4772c
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: clusterpedia-apiserver
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: clusterpedia-apiserver
    spec:
      containers:
      - command:
        - /usr/local/bin/apiserver
        - --secure-port=443
        - --storage-config=/etc/clusterpedia/storage/internalstorage-config.yaml
        - --tracing-config-file=/etc/clusterpedia/trace/tracing-config.yaml
        - -v=3
        env:
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              key: password
              name: internalstorage-password
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: insight-agent-opentelemetry-collector.insight-system.svc.cluster.local:4317
        - name: OTEL_SERVICE_NAME
          value: clusterpedia-apiserver
        - name: OTEL_K8S_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: OTEL_RESOURCE_ATTRIBUTES_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: OTEL_RESOURCE_ATTRIBUTES_POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: k8s.namespace.name=$(OTEL_K8S_NAMESPACE),k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME)
        image: ghcr.io/kubekyrie/clusterpedia/apiserver-amd64:latest
        imagePullPolicy: Always
        name: apiserver
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/clusterpedia/storage
          name: internalstorage-config
          readOnly: true
        - mountPath: /etc/clusterpedia/trace
          name: tracing-config
          readOnly: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: clusterpedia-apiserver
      serviceAccountName: clusterpedia-apiserver
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          name: clusterpedia-internalstorage
        name: internalstorage-config
      - configMap:
          defaultMode: 420
          name: clusterpedia-tracing-config
        name: tracing-config
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2023-11-30T10:09:18Z"
    lastUpdateTime: "2023-11-30T15:02:19Z"
    message: ReplicaSet "clusterpedia-apiserver-67d445b478" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: "2023-12-01T08:44:43Z"
    lastUpdateTime: "2023-12-01T08:44:43Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 2
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

@KubeKyrie
Copy link
Contributor Author

/auto-cc

@Iceber
Copy link
Member

Iceber commented Nov 27, 2023

@KubeKyrie KubeKyrie force-pushed the add-otel-trace branch 2 times, most recently from 36b79f1 to 7088ab1 Compare December 4, 2023 09:57
@KubeKyrie
Copy link
Contributor Author

What do you think about using https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/options/tracing.go?

ok, It has been implemented with reference to the implementation of kube-apiserver otel tracing.
I will modify the PR description

@KubeKyrie
Copy link
Contributor Author

  1. Here is one trace data collected by opentelemetry-collector. And it's recorded by executing the command request
    kubectl get --raw="/apis/clusterpedia.io/v1beta1/resources/clusters/cluster-example/apis/apps/v1/deployments"
    As we can see, the service name is apiserver and the namepsace/node/pod can be recorded too
ScopeSpans #0
ScopeSpans SchemaURL:
InstrumentationScope k8s.io/component-base/tracing
Span #0
    Trace ID       : a5412c674188db037f25eb844ab9e973
    Parent ID      : 83df1447618abc7c
    ID             : 1e219318f7890f4d
    Name           : SerializeObject
    Kind           : Internal
    Start time     : 2023-12-04 08:22:19.842986725 +0000 UTC
    End time       : 2023-12-04 08:22:19.844720696 +0000 UTC
    Status code    : Unset
    Status message :
Attributes:
     -> audit-id: Str(b1269ee0-4947-4092-b3ee-0a58456df10d)
     -> method: Str(GET)
     -> url: Str(/apis/apps/v1/deployments)
     -> protocol: Str(HTTP/2.0)
     -> mediaType: Str(application/json)
     -> encoder: Str({"encodeGV":"apps/v1","encoder":"{\"name\":\"json\",\"pretty\":\"false\",\"strict\":\"false\",\"yaml\":\"false\"}","name":"versioning"})
Events:
SpanEvent #0
     -> Name: About to start writing response
     -> Timestamp: 2023-12-04 08:22:19.843882751 +0000 UTC
     -> DroppedAttributesCount: 0
     -> Attributes::
          -> size: Int(39657)
SpanEvent #1
     -> Name: Write call succeeded
     -> Timestamp: 2023-12-04 08:22:19.844707008 +0000 UTC
     -> DroppedAttributesCount: 0
     -> Attributes::
          -> writer: Str(struct { httpsnoop.Unwrapper; http.ResponseWriter; http.Flusher; http.CloseNotifier })
          -> size: Int(39657)
          -> firstWrite: Bool(true)
Span #1
    Trace ID       : a5412c674188db037f25eb844ab9e973
    Parent ID      : 83df1447618abc7c
    ID             : 9013368f970a443c
    Name           : List
    Kind           : Internal
    Start time     : 2023-12-04 08:22:19.571331731 +0000 UTC
    End time       : 2023-12-04 08:22:19.844755767 +0000 UTC
    Status code    : Unset
    Status message :
Attributes:
     -> accept: Str(application/json, */*)
     -> audit-id: Str(b1269ee0-4947-4092-b3ee-0a58456df10d)
     -> client: Str(127.0.0.1)
     -> protocol: Str(HTTP/2.0)
     -> resource: Str(deployments)
     -> scope: Str(cluster)
     -> url: Str(/apis/apps/v1/deployments)
     -> user-agent: Str(kubectl/v1.27.4 (linux/amd64) kubernetes/fa3d799)
     -> verb: Str(LIST)
Events:
SpanEvent #0
     -> Name: About to List from storage
     -> Timestamp: 2023-12-04 08:22:19.57136339 +0000 UTC
     -> DroppedAttributesCount: 0
SpanEvent #1
     -> Name: Listing from storage done
     -> Timestamp: 2023-12-04 08:22:19.842912784 +0000 UTC
     -> DroppedAttributesCount: 0
SpanEvent #2
     -> Name: Writing http response done
     -> Timestamp: 2023-12-04 08:22:19.844753745 +0000 UTC
     -> DroppedAttributesCount: 0
     -> Attributes::
          -> count: Int(14)
ResourceSpans #1
Resource SchemaURL:
Resource attributes:
     -> k8s.namespace.name: Str(clusterpedia-system)
     -> k8s.node.name: Str(controller-node-1)
     -> k8s.pod.name: Str(clusterpedia-apiserver-67d445b478-w8vx7)
     -> service.instance.id: Str(apiserver-2zmxqjibauv4tlocfb4pupi374)
     -> service.name: Str(apiserver)
     -> k8s.cluster.id: Str(5aa703a3-7a22-47b7-8b9d-46542cf300e9)
ScopeSpans #0
ScopeSpans SchemaURL:
InstrumentationScope go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp semver:0.35.1
Span #0
    Trace ID       : a5412c674188db037f25eb844ab9e973
    Parent ID      :
    ID             : 83df1447618abc7c
    Name           : KubernetesAPI
    Kind           : Server
    Start time     : 2023-12-04 08:22:19.571190229 +0000 UTC
    End time       : 2023-12-04 08:22:19.844782952 +0000 UTC
    Status code    : Unset
    Status message :
Attributes:
     -> net.transport: Str(ip_tcp)
     -> net.peer.ip: Str(10.6.88.54)
     -> net.peer.port: Int(33417)
     -> net.host.ip: Str(10.233.25.116)
     -> net.host.port: Int(443)
     -> http.target: Str(/apis/clusterpedia.io/v1beta1/resources/clusters/cluster-example/apis/apps/v1/deployments)
     -> http.server_name: Str(KubernetesAPI)
     -> http.client_ip: Str(127.0.0.1)
     -> http.user_agent: Str(kubectl/v1.27.4 (linux/amd64) kubernetes/fa3d799)
     -> http.scheme: Str(https)
     -> http.host: Str(10.233.25.116:443)
     -> http.flavor: Str(2)
     -> http.method: Str(GET)
     -> http.wrote_bytes: Int(39657)
     -> http.status_code: Int(200)
ResourceSpans #2
Resource SchemaURL:
Resource attributes:
     -> k8s.namespace.name: Str(clusterpedia-system)
     -> k8s.node.name: Str(controller-node-1)
     -> k8s.pod.name: Str(clusterpedia-apiserver-67d445b478-w8vx7)
     -> service.instance.id: Str(apiserver-2zmxqjibauv4tlocfb4pupi374)
     -> service.name: Str(apiserver)
     -> k8s.cluster.id: Str(5aa703a3-7a22-47b7-8b9d-46542cf300e9)

@KubeKyrie
Copy link
Contributor Author

KubeKyrie commented Dec 4, 2023

The OTEL env such as OTEL_K8S_NAMESPACE is not necessary, but it's recommended to set for detailed trace requirement

Copy link
Member

@Iceber Iceber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks for adding this feature!

/kustomize also needs to be updated

@KubeKyrie KubeKyrie force-pushed the add-otel-trace branch 2 times, most recently from e85a8c4 to 51888c8 Compare December 5, 2023 08:13
Signed-off-by: KubeKyrie <shaolong.qin@daocloud.io>
@Iceber Iceber merged commit ed85cfc into clusterpedia-io:main Dec 5, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants