Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEP-0124] implement opentelemetry Jaeger tracing #5746

Merged
merged 9 commits into from
Jan 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 83 additions & 2 deletions cmd/controller/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,12 @@ limitations under the License.
package main

import (
"context"
"flag"
"log"
"net/http"
"os"
"time"

"github.com/tektoncd/pipeline/pkg/apis/pipeline"
"github.com/tektoncd/pipeline/pkg/apis/pipeline/v1beta1"
Expand All @@ -37,11 +39,23 @@ import (
"knative.dev/pkg/injection"
"knative.dev/pkg/injection/sharedmain"
"knative.dev/pkg/signals"

"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/jaeger"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/sdk/resource"
tracesdk "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.12.0"
"go.opentelemetry.io/otel/trace"
)

const (
// ControllerLogKey is the name of the logger for the controller cmd
ControllerLogKey = "tekton-pipelines-controller"
// TracerProviderPipelineRun is the name of TraceProvider used pipeline reconciler
TracerProviderPipelineRun = "pipeline-reconciler"
// TracerProviderTaskRun is the name of TracerProvider used in taskrun reconciler
TracerProviderTaskRun = "taskrun-reconciler"
)

func main() {
Expand Down Expand Up @@ -103,16 +117,83 @@ func main() {
log.Fatal(http.ListenAndServe(":"+port, mux)) // #nosec G114 -- see https://github.com/securego/gosec#available-rules
}()

// initialize opentelemetry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could probably extract this in a function to return the object we'll pass to the controller as well as the defer function for "cleanly shutdown".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. do you mean, to add the error check and return NoopTracerProvider from the tracerProvider(service) method itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking something like :

trPipelineRun, trTaskRun, deferFn := initializeTracers(…)
defer deferFn()

func initializeTracers(…) (…) {
    # …
    return trPipelineRun, trTaskRun, func() { 
        // the content of the defer below
    }
}

And do all "tracer" related code in there. But it's a nit, I'm happy to keep it as is as well 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like something we could fix in a follow-up?

tpPipelineRun, err := tracerProvider(TracerProviderPipelineRun)
if err != nil {
log.Printf("failed to initialize tracerProvider for pipelinerun, falling back to no-op provider, %s", err.Error())
tpPipelineRun = trace.NewNoopTracerProvider()
}
tpTaskrun, err := tracerProvider(TracerProviderTaskRun)
if err != nil {
log.Printf("failed to initialize tracerProvider for taskrun, falling back to no-op provider, %s", err.Error())
tpTaskrun = trace.NewNoopTracerProvider()
}
otel.SetTextMapPropagator(propagation.TraceContext{})
ctx, cancel := context.WithCancel(ctx)
defer cancel()

ctx = filteredinformerfactory.WithSelectors(ctx, v1beta1.ManagedByLabelKey)
sharedmain.MainWithConfig(ctx, ControllerLogKey, cfg,
taskrun.NewController(opts, clock.RealClock{}),
pipelinerun.NewController(opts, clock.RealClock{}),
taskrun.NewController(opts, clock.RealClock{}, tpTaskrun),
pipelinerun.NewController(opts, clock.RealClock{}, tpPipelineRun),
run.NewController(),
resolutionrequest.NewController(clock.RealClock{}),
customrun.NewController(),
)

// Cleanly shutdown and flush telemetry when the application exits.
defer func(ctx context.Context) {
// Do not make the application hang when it is shutdown.
ctx, cancel = context.WithTimeout(ctx, time.Second*5)
defer cancel()

// shutdown is only needed when tracerProvider is inialized with jaeger
// not needed when tracerProvider is NewNoopTracerProvider
if tp, ok := tpPipelineRun.(*tracesdk.TracerProvider); ok {
if err := tp.Shutdown(ctx); err != nil {
log.Printf("Unable to shutdown tracerProvider for pipelinerun, %s", err.Error())
}
}
if tp, ok := tpTaskrun.(*tracesdk.TracerProvider); ok {
if err := tp.Shutdown(ctx); err != nil {
log.Printf("Unable to shutdown tracerProvider for taskrun, %s", err.Error())
}
}
}(ctx)
}

func handler(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
}

// tracerProvider returns an OpenTelemetry TracerProvider configured to use
// the Jaeger exporter that will send spans to the provided url. The returned
// TracerProvider will also use a Resource configured with all the information
// about the application.
func tracerProvider(service string) (trace.TracerProvider, error) {
// Create the Jaeger exporter
// The following env variables are used by the sdk for creating the exporter
// - OTEL_EXPORTER_JAEGER_ENDPOINT is the HTTP endpoint for sending spans directly to a collector.
// - OTEL_EXPORTER_JAEGER_USER is the username to be sent as authentication to the collector endpoint.
// - OTEL_EXPORTER_JAEGER_PASSWORD is the password to be sent as authentication to the collector endpoint.

if _, e := os.LookupEnv("OTEL_EXPORTER_JAEGER_ENDPOINT"); !e {
// jaeger endpoint is not defined, disable tracing and return no-op tracerProvider
return trace.NewNoopTracerProvider(), nil
}

exp, err := jaeger.New(jaeger.WithCollectorEndpoint())
if err != nil {
return nil, err
}
// Initialize tracerProvider with the jaeger exporter
tp := tracesdk.NewTracerProvider(
tracesdk.WithBatcher(exp),
// Record information about the service in a Resource.
tracesdk.WithResource(resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceNameKey.String(service),
)),
)
return tp, nil
}
7 changes: 7 additions & 0 deletions config/controller.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,13 @@ spec:
value: /etc/ssl/certs
- name: METRICS_DOMAIN
value: tekton.dev/pipeline
# The following variables can be uncommented with correct values to enable Jaeger tracing
#- name: OTEL_EXPORTER_JAEGER_ENDPOINT
# value: http://jaeger-collector.jaeger:14268/api/traces
#- name: OTEL_EXPORTER_JAEGER_USER
# value: username
#- name: OTEL_EXPORTER_JAEGER_PASSWORD
# value: password
vdemeester marked this conversation as resolved.
Show resolved Hide resolved
securityContext:
allowPrivilegeEscalation: false
capabilities:
Expand Down
1 change: 1 addition & 0 deletions docs/developers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ channel for training and tutorials on Tekton!
- Developing on Tekton:
- [Local Setup](./local-setup.md): Getting your local environment set up to develop on Tekton.
- [Testing](../../test/README.md): Running Tekton tests.
- [Tracing](./tracing.md): Enabling Jaeger tracing
- How Tekton is run on Kubernetes:
- [Controller Logic](./controller-logic.md): How Tekton extends Kubernetes using Knative.
- [TaskRun Logic](./taskruns.md): How TaskRuns are run in pods.
Expand Down
35 changes: 35 additions & 0 deletions docs/developers/tracing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Tracing setup

This sections shows how to enable tracing for tekton reconcilers and
capture traces in Jaeger

## Prerequisites

Jaeger should be installed and accessible from the cluster. The easiest
way to set it up is using helm as below

the following command installs Jaeger in `jaeger` namespace

```
helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm upgrade -i jaeger jaegertracing/jaeger -n jaeger --create-namespace
```

Use port-forwarding to open the jaeger query UI or adjust the service
type to Loadbalancer for accessing the service directly

```
kubectl port-forward svc/jaeger-query -n jaeger 8080:80
```

Check the official [Jaeger docs](https://www.jaegertracing.io/docs/) on how to work with Jaeger

## Enabling tracing

Tekton pipelines controller expects the following environment variables to be able to connect to jaeger:

* `OTEL_EXPORTER_JAEGER_ENDPOINT` is the HTTP endpoint for sending spans directly to a collector.
* `OTEL_EXPORTER_JAEGER_USER` is the username to be sent as authentication to the collector endpoint.
* `OTEL_EXPORTER_JAEGER_PASSWORD` is the password to be sent as authentication to the collector endpoint.

`OTEL_EXPORTER_JAEGER_ENDPOINT` is the only manadatory variable to enable tracing. You can find these variables in the controller manifest as well.
35 changes: 31 additions & 4 deletions docs/pipeline-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -4569,6 +4569,9 @@ reasons that emerge from underlying resources are not included here</p>
</tr><tr><td><p>&#34;TaskRunImagePullFailed&#34;</p></td>
<td><p>TaskRunReasonImagePullFailed is the reason set when the step of a task fails due to image not being pulled</p>
</td>
</tr><tr><td><p>&#34;TaskRunResultLargerThanAllowedLimit&#34;</p></td>
<td><p>TaskRunReasonResultLargerThanAllowedLimit is the reason set when one of the results exceeds its maximum allowed limit of 1 KB</p>
</td>
Comment on lines +4572 to +4574
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these were missed in some previous PR, but I wonder by CI did not catch them.
@vdemeester @abayer any idea?

</tr><tr><td><p>&#34;Running&#34;</p></td>
<td><p>TaskRunReasonRunning is the reason set when the TaskRun is running</p>
</td>
Expand Down Expand Up @@ -6347,15 +6350,17 @@ string
<td>
<code>kms</code><br/>
<em>
<a href="#tekton.dev/v1alpha1.HashAlgorithm">
HashAlgorithm
</a>
string
</em>
</td>
<td>
<em>(Optional)</em>
<p>KMS contains the KMS url of the public key
Supported formats differ based on the KMS system used.</p>
Supported formats differ based on the KMS system used.
One example of a KMS url could be:
gcpkms://projects/[PROJECT]/locations/[LOCATION]&gt;/keyRings/[KEYRING]/cryptoKeys/[KEY]/cryptoKeyVersions/[KEY_VERSION]
For more examples please refer <a href="https://docs.sigstore.dev/cosign/kms_support">https://docs.sigstore.dev/cosign/kms_support</a>.
Note that the KMS is not supported yet.</p>
Comment on lines +6359 to +6363
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

</td>
</tr>
<tr>
Expand Down Expand Up @@ -10048,6 +10053,17 @@ Provenance
<p>Provenance contains some key authenticated metadata about how a software artifact was built (what sources, what inputs/outputs, etc.).</p>
</td>
</tr>
<tr>
<td>
<code>spanContext</code><br/>
<em>
map[string]string
</em>
</td>
<td>
<p>SpanContext contains tracing span context fields</p>
</td>
</tr>
</tbody>
</table>
<h3 id="tekton.dev/v1beta1.PipelineRunTaskRunStatus">PipelineRunTaskRunStatus
Expand Down Expand Up @@ -13508,6 +13524,17 @@ Provenance
<p>Provenance contains some key authenticated metadata about how a software artifact was built (what sources, what inputs/outputs, etc.).</p>
</td>
</tr>
<tr>
<td>
<code>spanContext</code><br/>
<em>
map[string]string
</em>
</td>
<td>
<p>SpanContext contains tracing span context fields</p>
</td>
</tr>
</tbody>
</table>
<h3 id="tekton.dev/v1beta1.TaskRunStepOverride">TaskRunStepOverride
Expand Down
5 changes: 5 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@ require (
github.com/google/go-containerregistry/pkg/authn/k8schain v0.0.0-20221030203717-1711cefd7eec
github.com/letsencrypt/boulder v0.0.0-20221109233200-85aa52084eaf
github.com/titanous/rocacheck v0.0.0-20171023193734-afe73141d399
go.opentelemetry.io/otel v1.11.1
go.opentelemetry.io/otel/exporters/jaeger v1.11.1
go.opentelemetry.io/otel/sdk v1.11.1
go.opentelemetry.io/otel/trace v1.11.1
k8s.io/utils v0.0.0-20221012122500-cfd413dd9e85
)

Expand Down Expand Up @@ -75,6 +79,7 @@ require (
github.com/cloudflare/circl v1.1.0 // indirect
github.com/emicklei/go-restful/v3 v3.9.0 // indirect
github.com/fatih/color v1.13.0 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/golang/snappy v0.0.4 // indirect
github.com/google/gnostic v0.6.9 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.2.1 // indirect
Expand Down
11 changes: 11 additions & 0 deletions go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

64 changes: 64 additions & 0 deletions pkg/apis/pipeline/v1beta1/openapi_generated.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions pkg/apis/pipeline/v1beta1/pipelinerun_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,9 @@ type PipelineRunStatusFields struct {
// Provenance contains some key authenticated metadata about how a software artifact was built (what sources, what inputs/outputs, etc.).
// +optional
Provenance *Provenance `json:"provenance,omitempty"`

// SpanContext contains tracing span context fields
SpanContext map[string]string `json:"spanContext,omitempty"`
}

// SkippedTask is used to describe the Tasks that were skipped due to their When Expressions
Expand Down
Loading