Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP-0060: Remote Resource Resolution #389

Merged
merged 1 commit into from May 17, 2021
Merged

TEP-0060: Remote Resource Resolution #389

merged 1 commit into from May 17, 2021

Conversation

ghost
Copy link

@ghost ghost commented Mar 24, 2021

This commit adds a new TEP with a problem statement describing a gap
in Tekton's support for referencing Tasks and Pipelines. Namely, that
those resources have to be in the cluster or in an OCI registry to be utilized
by the pipelines controller.

@ghost
Copy link
Author

ghost commented Mar 24, 2021

/kind tep

@tekton-robot tekton-robot added kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 24, 2021
@bobcatfish
Copy link
Contributor

/assign

@vdemeester
Copy link
Member

/assign

@pierretasci
Copy link
Contributor

/assign

@pierretasci
Copy link
Contributor

pierretasci commented Mar 29, 2021

Having read through the doc a couple times, I agree that there are limitations currently imposed by the in-cluster and Tekton Bundle references. Certainly different organizations want to manage their CI/CD supply chain differently.

My principal reservation is that supporting the myriad resolvers is a hard problem. Plus, it creates bifurcation in the community and makes adoption of any one means harder. It also begs the question of what is Tekton's role here. Directly supporting what amounts to a slippery slope of potential endpoints to resolve against, feels like Tekton is solutioning rather than providing flexible primitives.

To that end, perhaps the one missing requirement and use case from the provided list is that the remote resolution process should no longer be embedded into Tekton. Instead, the goal is to offer Tekton operators and by extension users, a pluggable framework for resolving tasks and pipelines with two reference implementations, local and Tekton Bundles.

Comment on lines +92 to +103
[TEP-0053](https://github.com/tektoncd/community/pull/352) is also exploring
ways of encoding this information as part of pipelines in the catalog.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if it would make sense to add a requirement around catalog interoperability - i.e. if we go this route, and folks want to add Pipelines to the catalog, they're going to need a reliable way to reference Tasks - maybe we need to have some minimum number of resource resolution mechanisms that work out of the box (tho folks could disable them if needed)? im thinking oci bundles for sure - maybe git as well...

Copy link
Author

@ghost ghost Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linking relevant discussion on TEP-0053 as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the Goals already describes this a little bit I think:

- Establish a common syntax that tool and platform creators can use to record,
  as part of a pipeline or run, the remote location that Tekton resources
  should be fetched from.

To record the requirement of providing bundle and in-cluster support by default I've added a new req:

- At minimum we should provide resolver implementations for Tekton Bundles and
  in-cluster resources. These can be provided out-of-the-box just as they are
  today. The code for these resolvers can either be hosted as part of
  Pipelines, in the Catalog, or in a new repo under the `tektoncd` GitHub org.

And then to capture adding git support I've added this one too:

- Add new support for resolving resources from git via this mechanism. This
  could be provided out of the box too but we don't _have to_ since this
  doesn't overlap with concerns around backwards compatibility in the same way
  that Tekton Bundles and in-cluster support might.

Comment on lines 94 to 105
4. Tutorials introducing Tekton Pipelines have to include instructions for
fetching and manually installing tasks that the tutorial relies on.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point... I guess a git or a hub reference would solve that, or even a bundle one would, if we hosted a public bundle of the catalog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tektoncd/catalog#577 hopefully we'll go in that direction! 🤞

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's live now gcr.io/tekton-releases/catalog/upstream

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I'll remove this problem on the basis that tutorials now have a way to directly refer to catalog tasks.

resources from remotes they don't want their users to reference.
- Allow operators to control the threshold at which fetching a remote Tekton
resource is considered timed out.
- Emit Events and Conditions operators can use to assess whether fetching
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by Conditions in this context?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking in the sense of status.conditions on TaskRuns - on failing to resolve a taskref, ensure that the Succeeded/"False" condition correctly reflects the failed resolution in the reason and message.

6. Pipelines' existing support for cluster and registry resources is not easily
extensible without modifying Pipelines' source code.

### Goals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have anything about caching or do you think that's a concern of the specific resolver?
Today we store the spec on the status, so that might provide a form of caching.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like the idea that the implementation specifics (caching, pre-fetching, etc) should be left to the resolvers. From the perspective of Pipelines it shouldn't really care about those specifics but it should be concerned with slow or failing resolution. So, in Pipelines, I think this should be controlled via timeout: an operator decides on an SLO e.g. no TaskRuns or PipelineRuns should wait longer than 30s for a remote resource to resolve. They could then configure something like this:

kind: ConfigMap
name: config-tekton-pipelines-resource-resolution
stringData:
  timeout: 30s

Pipelines will observe this configuration and fail any TaskRun or PipelineRun where resolving a taskRef is taking longer than that.

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 30, 2021
@ghost
Copy link
Author

ghost commented Mar 30, 2021

My principal reservation is that supporting the myriad resolvers is a hard problem. Plus, it creates bifurcation in the community and makes adoption of any one means harder. It also begs the question of what is Tekton's role here. Directly supporting what amounts to a slippery slope of potential endpoints to resolve against, feels like Tekton is solutioning rather than providing flexible primitives.

To that end, perhaps the one missing requirement and use case from the provided list is that the remote resolution process should no longer be embedded into Tekton. Instead, the goal is to offer Tekton operators and by extension users, a pluggable framework for resolving tasks and pipelines with two reference implementations, local and Tekton Bundles.

I think I see what you mean - reading through the TEP one could reach the conclusion that Pipelines itself should bake support for all kinds of resolvers into its core codebase. Am I right that this is the impression the doc leaves?

FWIW, Pipeline's role (as I see it) is to loudly declare it has a taskRef which needs to be turned into YAML ASAP. Then it's the role of resolvers to actually do that job. Put another way: it's Pipeline's role to define the protocol via which resolution happens. Actually resolving to JSON or YAML is done by other components in the system who then know how to return that data via that protocol.

I think the concepts for this already live in Pipelines: the Resolver interface and ResolvedObject type, the separate resolver implementations. But that interface and those implementations are trapped in Pipeline's codebase. What I'd like to see is a similar interface manifested in operatorland.

@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 30, 2021
@ghost
Copy link
Author

ghost commented Mar 30, 2021

@pierretasci I've added an additional use case that describes an org creating their own Resolver. I've also added a Requirement:

- Provide a clear, well-documented interface between Pipelines' role as
  executor of Tekton resources and Resolvers' role fetching those Tekton
  resources before execution can begin.

That can probably be finessed but gets the idea across I think?

Copy link
Contributor

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I LOVE THIS

/approve

Still needs similar buy-in from @vdemeester and @pierretasci - and looks like @afrittoli is reviewing as well:

/hold

some of the constraints, goals to aim for during design, as well as some
possible extensions that could stem from a solution to these problems.

## Key Terms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this is a useful section!!

### Goals

- Aspirationally, never require a user to run `kubectl apply -f git-clone.yaml`
ever again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YAASSSSS 💯

(tho as i write this i wanted to give a shout out to tkn hub install which is still doing the kubectl apply but is a nicer experience :D )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really as excited as you about this @bobcatfish 😝. kubectl apply (or whatever means to consume the REST API) is still a very valid use case for Tekton.


### Non-Goals

### Use Cases (optional)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

niiiiice

Comment on lines +92 to +103
[TEP-0053](https://github.com/tektoncd/community/pull/352) is also exploring
ways of encoding this information as part of pipelines in the catalog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if it would make sense to add a requirement around catalog interoperability - i.e. if we go this route, and folks want to add Pipelines to the catalog, they're going to need a reliable way to reference Tasks - maybe we need to have some minimum number of resource resolution mechanisms that work out of the box (tho folks could disable them if needed)? im thinking oci bundles for sure - maybe git as well...

@tekton-robot tekton-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 30, 2021
@pierretasci
Copy link
Contributor

pierretasci commented Mar 30, 2021

I think the concepts for this already live in Pipelines: the Resolver interface and ResolvedObject type, the separate resolver implementations. But that interface and those implementations are trapped in Pipeline's codebase. What I'd like to see is a similar interface manifested in operatorland.

👏 👏 👏 Yes please. That sounds amazing and a huge step forward for Tekton. Sign me up!

One thing that I wonder if we should add to this as well (we can also reserve this conversation until the implementation) is to rework the current resolution model. Right now, resolving a task or a pipeline is a runtime concern. This makes scheduling more flexible but means pipelines are less hermetic and repeatable. It would be great if there was at least the option to resolve references at PipelineRun apply time.

@pierretasci
Copy link
Contributor

/approve

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish, pierretasci

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 3, 2021
@tekton-robot tekton-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 12, 2021
@ghost
Copy link
Author

ghost commented Apr 25, 2021

  1. This may reduce share-ability of tekton Pipeline(s) as it might, tie some resource to some resolver.

Hm, here I want to push back a bit because I suggest in the TEP that it would actually increase share-ability and provide several examples where that would be the case. It might be useful to know what we each mean by share-ability. The specific improvements that I describe here:

  1. one user in an org giving a pipeline to another user: much easier to send a copy of just the Pipeline rather than copies of all the supporting Tasks + Pipeline. The first is a single kubectl command or copy/pasted link to a file or gist. The second would require a scripted solution to fetch the Pipeline and all the referenced Tasks and then zip them up and send them.

  2. Zooming out one level I would also suggest this TEP makes it easier for a "devops" engineer or sysadmin to share common pipelines with their application teams, particularly if they don't already run their own image registry. They can publish the Tasks/Pipelines to a place they do have ownership over (like their git repo) and share from there.

  3. Zooming out again, Platforms and Tools would both have a standardized way to share this remote info:

    • This means your platform and my platform can both agree how to reference a task in git. Implementations can differ but the notation is shared.
    • It also means that when my CLI fetches a Pipeline that your CLI pushed I know exactly where to look for the referenced Tasks.

@ghost
Copy link
Author

ghost commented Apr 25, 2021

One downside of doing this for/in tekton is that we are reinventing a possible wheel but with a wheel that only works on Tekton, whereas an existing wheel can tackle more use-case.

Well we do a bit of that with Tekton Bundles too, right? I'm proposing we add some 🔥 to the tires we've already got.

@ghost
Copy link
Author

ghost commented Apr 25, 2021

I am not really as excited as you about this @bobcatfish 😝. kubectl apply (or whatever means to consume the REST API) is still a very valid use case for Tekton.

We aren't eliminating support for kubectl apply with this TEP 🤔 ... but I would like to further dismantle the requirement for it.

Bundles goes a long way towards this, and so I think we should support it for other kinds of sources too.

@ghost
Copy link
Author

ghost commented Apr 25, 2021

@vdemeester I've added two alternatives to try and capture some of the pros / cons:

  • Added "Do not pursue this TEP" alternative which describes users building their own solutions.
  • Added "Pursue more opionated approach (just implement git)" alternative which describes Pipelines just implementing git support because it's so widely used and calling it a day with that.

@ghost
Copy link
Author

ghost commented May 4, 2021

This allows anyone to setup resolvers with a direct hook into Tekton's state-machine against a well-formed and reliable "API" without requiring Tekton to take hard dependencies on any one "medium".

Yes, this is precisely it. I'm going to make this the summary of the TEP. @pierretasci any objections to including you as an author since I'm taking these words pretty much verbatim and have included other suggestions of yours already?

@pierretasci
Copy link
Contributor

pierretasci commented May 4, 2021

Yes, this is precisely it. I'm going to make this the summary of the TEP. @pierretasci any objections to including you as an author since I'm taking these words pretty much verbatim and have included other suggestions of yours already?

😄 no problem.

@ghost
Copy link
Author

ghost commented May 5, 2021

I've added another use-case: providing ClusterTasks and ClusterPipelines without having explicit CRDs for these types:

Replacing ClusterTasks and introducing ClusterPipelines: Tekton Pipelines
has an existing CRD called ClusterTask which is a cluster-scoped Task. The
idea is that these can be shared across many namespaces / tenants. If Pipelines
drops support for ClusterTask it could be replaced with a Resolver that
has access to a private namespace of shared Tasks. Any user could requests
tasks from that Resolver and get access to those shared resources. Similarly
for the concept of ClusterPipeline which does not exist yet in Tekton:
a private namespace could be created full of pipelines. A Resolver can then
be given sole access to pipelines in this private namespace via RBAC and
users can leverage those shared pipelines.

@tekton-robot tekton-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 7, 2021
@ghost
Copy link
Author

ghost commented May 7, 2021

I've added another Goal:

- Offer a mechanism to verify remote tasks and pipelines against a digest
  (or similar mechanism) before they are processed as resources by Pipelines'
  reconcilers to ensure that the resource returned by a given Remote matches the
  resource expected by the user.

This would allow a Trigger, for example, to include a digest alongside a remote task reference. The taskrun reconciler could then validate that the digest matches that of the task returned by the Resolver.

@ghost
Copy link
Author

ghost commented May 7, 2021

We need alternatives, like for each use case listed, how users are doing it today ? For example, argocd + pipeline and you can have the git or preforce config-as-code style relatively easy, … I really want us to explore what are the ways to do these today, to see if this is a problem we need/have to or just want to tackle 🙃.

@vdemeester I think this comes down to how much complexity we want to push back on to developers using Tekton's components. Taking account of all the moving parts required to build a non-hacky, reproducible, audited system for remote task resolution is actually quite a bit more complicated than throwing another tool at it I think! Here's a short laundry list of things that a developer needs to be concerned about when they're constructing a system to do this:

  • Intercepting the signal from their version control system (ideally using Triggers for this, but then how do they inject their out-of-band system to fetch the remote tasks as part of the Triggers->Pipelines flow?)
  • Using credentials to fetch the referenced remote tasks. These credentials might be per-user or team, per tenant namespace, per node, per cluster (globally shared) or any combination.
  • Verify the fetched remote task against information provided by the signal from the version control system. (e.g. digest comparison of remote tasks)
  • RBAC for creating TaskRuns (and securing this out-of-band system from folks creating arbitrary TaskRuns).
  • Maintaining audit trail of exactly which commit / repo / vcs system the remote task was pulled in from
  • Implementing constraints on the set of allowed hosts / repos / etc that remote tasks may be pulled from
  • Clear user-facing error messaging when resolution fails
  • Reporting back to the pull request / whatever about failures etc.

A lot of these come "for free" if we provide something directly in Pipelines. E.g. if the digest, bucket & path are part of a taskRef or pipelineRef then they can be recorded automatically by Results as it watches those runs. Pipelines already has permission to create TaskRuns so the RBAC already exists there. Pipelines already has error and status reporting so we have a precedent to follow there. And so on.

Beyond what we get "for free", a lot of these problems also become easier when the resolution is hooked into Pipelines' machinery. For Example, I think that a Pipelines-as-code implementation would no longer need to manage credentials for fetching remote tasks from a repo - that can be handed to the Resolver layer.

There's obviously going to be a lot more nuance on top of this that I simply haven't figured out during initial discovery / proof-of-concept work. Essentially I think Pipelines should step in here and offer an in-band option with a shared protocol, working examples, specific RBAC rules for pipelines and RBAC recommendations for Resolvers, instant integration with Results for auditing where / what was the source of a remote task, etc etc.

Because the proposal here is broad, there are still many possible implementations available to us and they don't all involve major changes to Tekton Pipelines. For example I'm considering whether the best option here might simply be to define rules around specific annotations / apiVersions of Custom Task Runs. Pipelines could recognize specific shapes of Custom Task as "Remote Resolver Custom Tasks". The resolver could write the fetched Pipeline/Task into the Run's status field and Pipelines could pick it up from there, performing digest comparison and executing the PipelineRun / TaskRun. This would be "the protocol". The trade-off with this is that it would muddy Custom Tasks by overlaying a framework on top of a feature that otherwise is intentionally free-form.

Finally, I think it's fair to say that nothing about this proposal takes away the ability for users to go out and construct entirely bespoke systems for doing all of this stuff themselves.

@ghost ghost changed the title TEP-0060: Remote Resource Resolution [WIP] TEP-0060: Remote Resource Resolution May 9, 2021
@tekton-robot tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 9, 2021
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On share-ability.

I want to dig a bit more on the concern I may have around share-ability. I think, in the end it boils done to how much we want to make pipeline a share-able construct or not 🙃. Let's assume that we have this resolver feature and the pipeline-in-pipeline custom task in too (be it built-in, shipped custom task, …) and in my cluster, I do have a foo resolver. Let's now assume I want to run a pipeline — a standard go pipeline, lint, build, test — on different platform with pipeline-in-pipeline. Let's also assume this pipeline is available through the hub and the pipeline from the hub uses the OCI resolver. If, for some reason, the pipeline controller has the oci reslover disabled, I cannot use the pipeline from the hub (I need to copy/replace/…). If I write a nice pipeline, that I feel it could be shared, but I am using my foo reslover, then I need to "rewrite" my pipeline to be able to share it with the world.

Now let's take a simple pipeline, that does build, test, build image and push to somewhere, and let's assume this pipeline is using a git resolver. How does a user run it's modified version of the Task and Pipeline in his local cluster to try his change out ? Without any fail-over/precedence thingy (that would say, look at the cluster and if not present, look at something else), the only way is to edit the pipeline definition to user another resolver (cluster, …).

Maybe it's a nit, and maybe it's for the better — just like CustomTask are. With those resolver and CustomTask, what we are saying is that Task are a very share-able unit, Pipeline a little bit less and even though, in here it "helps" some share-ability issue (Pipeline referring tasks in the catalog using oci bundle make things easier). Note that by share-able I mean from inter-organization, in a more "community" approach, like the catalog. That's probably fine 😉.

On separation of concerns

I am not entirely sure this is the exact concept I have in mind when reading/commenting this TEP but it's close enough. With the exception of the oci bundle (I am assuming the feature is not in), right now, the pipeline controller has only one job, scheduling tasks from a pipeline. It has one source-of-truth to get the definition, it doesn't have any concern on where to get the definition, it's on someone else to provide the definition in such a way that the controller understand it. It doesn't have to concern on how those are stored, fetched, applied, updated, managed — it's doesn't even have to concern on permissions (who can create them, …), it's an orthogonal from its perspective.

To get back at the possible git resolver example, the main idea for this, is to be able to run a given pipeline from the current commit the build runs on. Getting the right definition based on the commit sha for example, is it pipeline's concern — that doesn't have any context/construct around a git commit, it is abstract for it — or trigger's concern ?

One thing to note is that the perspective described above is from the tektoncd/pipeline "controller" sole point of view, and definitely not any user/personna point of view (dev, cluster-admin, …).

@vdemeester I think this comes down to how much complexity we want to push back on to developers using Tekton's components. Taking account of all the moving parts required to build a non-hacky, reproducible, audited system for remote task resolution is actually quite a bit more complicated than throwing another tool at it I think!
Here's a short laundry list of things that a developer needs to be concerned about when they're constructing a system to do this:

Agreed. But this also comes with how much opinionated Tekton should be and how much of Tekton's opinion is in tektoncd/pipeline. Tekton should probably provide a non-hacky, reproducible, audited system for remote task resolution that can be used a straight-in ci/cd system, but what feature needs to live in where ? Not saying I am against having this resolution mechanism in pipeline, I am just trying to weight how much it needs to live into pipeline or not.

  • Intercepting the signal from their version control system (ideally using Triggers for this, but then how do they inject their out-of-band system to fetch the remote tasks as part of the Triggers->Pipelines flow?)
  • Using credentials to fetch the referenced remote tasks. These credentials might be per-user or team, per tenant namespace, per node, per cluster (globally shared) or any combination.
  • Verify the fetched remote task against information provided by the signal from the version control system. (e.g. digest comparison of remote tasks)
  • RBAC for creating TaskRuns (and securing this out-of-band system from folks creating arbitrary TaskRuns).
  • Maintaining audit trail of exactly which commit / repo / vcs system the remote task was pulled in from
  • Implementing constraints on the set of allowed hosts / repos / etc that remote tasks may be pulled from
  • Clear user-facing error messaging when resolution fails
  • Reporting back to the pull request / whatever about failures etc.

A lot of these come "for free" if we provide something directly in Pipelines. E.g. if the digest, bucket & path are part of a taskRef or pipelineRef then they can be recorded automatically by Results as it watches those runs. Pipelines already has permission to create TaskRuns so the RBAC already exists there. Pipelines already has error and status reporting so we have a precedent to follow there. And so on.

A lot of these come "for free" (for the user) if we provide a component that does that too 🙃. In both case, the cost is on the developer of this component (be it in tektoncd/pipeline, in tektoncd/* or outside).

Beyond what we get "for free", a lot of these problems also become easier when the resolution is hooked into Pipelines' machinery. For Example, I think that a Pipelines-as-code implementation would no longer need to manage credentials for fetching remote tasks from a repo - that can be handed to the Resolver layer.

There's obviously going to be a lot more nuance on top of this that I simply haven't figured out during initial discovery / proof-of-concept work. Essentially I think Pipelines should step in here and offer an in-band option with a shared protocol, working examples, specific RBAC rules for pipelines and RBAC recommendations for Resolvers, instant integration with Results for auditing where / what was the source of a remote task, etc etc.

As commented above, I would replace Pipelines with Tekton in that sentence, I am not sure if this is something that needs to be in pipeline or not.

Because the proposal here is broad, there are still many possible implementations available to us and they don't all involve major changes to Tekton Pipelines. For example I'm considering whether the best option here might simply be to define rules around specific annotations / apiVersions of Custom Task Runs. Pipelines could recognize specific shapes of Custom Task as "Remote Resolver Custom Tasks". The resolver could write the fetched Pipeline/Task into the Run's status field and Pipelines could pick it up from there, performing digest comparison and executing the PipelineRun / TaskRun. This would be "the protocol". The trade-off with this is that it would muddy Custom Tasks by overlaying a framework on top of a feature that otherwise is intentionally free-form.

Indeed 👼🏼. I like the idea of resolver, but I am trying to figure out how much of this needs to be in pipeline and how much can be done outside (in a component we would provide).

Finally, I think it's fair to say that nothing about this proposal takes away the ability for users to go out and construct entirely bespoke systems for doing all of this stuff themselves.

Indeed 😉 🙃. My concern is more on complexity of one given component.

One of the advantages of remote resource resolution for all the static resources (tasks, pipelines) is that it decouples Tekton a bit more from k8s, allowing uses to set their versioning approach on resource, and leaving to k8s to take care of the execution part only.

That's a fair point too 😉. It has two faces though

Open thoughts and questions

  • How does this relate to tektoncd/chains ? (aka is it an additional challenge for the chains project, or something that doesn't really have any impact)
  • One benefit to this resolver approach, is that you could boostrap a cluster CI/CD system with just a tekton instance and a PipelineRun that refers to, for example, a Pipeline from the main branch of a given repository. This pipeline would do it's thing. It would also mean that, for example on dogfooding, for nightlies, we could almost have a simple cronjob that runs a pipeline from main — problem is, what are the parameters to pass, workspace to attach, … and if they change, how do we ensure the cronjob is up-to-date.
  • This are recurring open-questions for me : is tektoncd/pipeline a component or a ci/cd product ? Does tekton want to ship a ci/cd product ? Answering those question would help (at least me) to know if this resolution needs to be at pipeline level or at another level.

Honestly, the only thing that is stressing me a bit still, is that the TEP is still very "Tekton Pipelines" oriented — and I tend to like the approach of going from the opposite side, I have a need (e.g. TEP-0048 Pipeline as code), how do we fix that in Tekton, digging deeper in which components are involved, etc.. Top-bottom instead of bottom-up.

This proposal advocates for treating Task and Pipeline resolution as an
interface with several default sources out-of-the-box (those we already support
today: in-cluster and Tekton Bundles). This would allow anyone to setup
resolvers with a direct hook into Tekton's state-machine against a well-formed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it "Tekton's state machine" here or "Pipeline's state machine" ?

Comment on lines +162 to +165
**Platform interoperability**: A platform builder accepts workflows imported
from other Tekton-compliant platforms. When a Pipeline is imported that
includes a reference to a Task in a Remote that the platform doesn't already
have it goes out and fetches that Task, importing it alongside the pipeline
without the user having to explicitly provide it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other side of this coin is "what if one or all the resolver used is/are not enable on the instance" ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like an implementation detail to me tangential to the proposal at hand...

...but I think there is a simple answer here so I'll give it a shot anyways. I don't think the object making the reference is or should be strictly tied to the resolving process. No where in a pipeline ref for example, should it say gitRepo: foo. Therefore, there isn't a single resolver for any given Task. Rather, me as an operator of the Tekton componentry will build/enable different resolvers based on my and my orgs priorities and practices.

To the Tekton componentry, it sees a list of resolvers that all follow the same API and it fetches that reference from one of them, falling back to etcd. If no where has the Task, it is the same error message you get today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...but I think there is a simple answer here so I'll give it a shot anyways. I don't think the object making the reference is or should be strictly tied to the resolving process. No where in a pipeline ref for example, should it say gitRepo: foo. Therefore, there isn't a single resolver for any given Task. Rather, me as an operator of the Tekton componentry will build/enable different resolvers based on my and my orgs priorities and practices.

Hum right, maybe I assumed there would be something like git:// or gitRepo:. But yeah that might be an implementation detail indeed.

Comment on lines +143 to +145
**Sharing a git repo of tasks**: An organization decides to manage their
internal tasks and pipelines in a shared git repository. The CI/CD cluster is
set up to allow these resources to be referenced directly by TaskRuns and
PipelineRuns in the cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we dig deeper on this one, how would this work ? The PipelineRun has a commit sha as parameter and use that parameter to refer to a given Pipeline right ? But then, to be sure to refer to the correct Task from this Pipeline, we need the same commit sha parameter, isn't it ? (so that all task are refered using this particular commit sha)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, also seems like an implementation detail but in general, maybe? It doesn't have to be a parameter, could be part of the resolver url eg github.com/tektoncd/community@sha:foo-bar. I just made this off the top of my head but the salient bit is that it is on the developer of the resolver to specify the format of the input and the contract they have is with the end user to provide that format.

Tekton's role in all this is to provide the API machinery for sending that input to a resolver and expecting the resource back.

@pierretasci
Copy link
Contributor

Quote replying is unideal but let's try it 😆

I think, in the end it boils done to how much we want to make pipeline a share-able construct or not
I'm concerned with share-ability but I think the greater concern from my perspective is process. Organizations have preferences or mandates for certain trusted sources be it git, oci repositories, something proprietary and if we want Tekton to be universally useful, I think it behoves us to provide the hooks to enable that. Otherwise, they will modify Tekton's inner componentry to accomplish the same thing (source, have done it 😉 )

If, for some reason, the pipeline controller has the oci reslover disabled, I cannot use the pipeline from the hub (I need to copy/replace/…). If I write a nice pipeline, that I feel it could be shared, but I am using my foo reslover, then I need to "rewrite" my pipeline to be able to share it with the world.

You already have to modify it today. EG, I don't trust docker hub and mirror the catalog into my own repositories and thus I need to rewrite the catalog. To me, this is less a problem with having interchangeable resolvers and more of a desire for better machinery to use the catalog (perhaps the catalog has a way to "generate" itself with different reference formats).

Now let's take a simple pipeline, that does build, test, build image and push to somewhere, and let's assume this pipeline is using a git resolver. How does a user run it's modified version of the Task and Pipeline in his local cluster to try his change out ? Without any fail-over/precedence thingy (that would say, look at the cluster and if not present, look at something else), the only way is to edit the pipeline definition to user another resolver (cluster, …).

That's an implementation detail and this proposal is still in the proposing state. There are many ways to do fail-over/precedence that could be discussed at implementation time (and we should definitely do fail-over && precedence ordering).

I am not entirely sure this is the exact concept I have in mind when reading/commenting this TEP but it's close enough. With the exception of the oci bundle (I am assuming the feature is not in), right now, the pipeline controller has only one job, scheduling tasks from a pipeline. It has one source-of-truth to get the definition, it doesn't have any concern on where to get the definition, it's on someone else to provide the definition in such a way that the controller understand it. It doesn't have to concern on how those are stored, fetched, applied, updated, managed — it's doesn't even have to concern on permissions (who can create them, …), it's an orthogonal from its perspective.

And with this proposal, I think that status quo is mostly maintained. Resolving is handled by the resolver and the machinery should be able to assume it is going to do the right thing or throw an error if it doesn't.

Agreed. But this also comes with how much opinionated Tekton should be and how much of Tekton's opinion is in tektoncd/pipeline. Tekton should probably provide a non-hacky, reproducible, audited system for remote task resolution that can be used a straight-in ci/cd system, but what feature needs to live in where ? Not saying I am against having this resolution mechanism in pipeline, I am just trying to weight how much it needs to live into pipeline or not.

Agreed that upon hearing the idea of bringing this out of pipeline and into a separate component that can be used more generically across Tekton (elsewhere?), I thought it was an excellent idea. I really want to hone in on "Tekton should probably provide a non-hacky, reproducible, audited system for remote task resolution that can be used a straight-in ci/cd system" and challenge it for a second just to play devil's advocate. What if we don't? Like obviously there is a built-in way to fetch Tasks and stuff but do we really need the "blessed" path for remote resolution? No two places are going to agree on what is actually blessed and my version of "audited" ≠ your version of "audited". Let's take the middle-road and say, fine, if we can't agree on what the "right" way to do it is, here's the API, do it yourself 😄

Honestly, the only thing that is stressing me a bit still, is that the TEP is still very "Tekton Pipelines" oriented — and I tend to like the approach of going from the opposite side, I have a need (e.g. TEP-0048 Pipeline as code), how do we fix that in Tekton, digging deeper in which components are involved, etc.. Top-bottom instead of bottom-up.

Well I think the reason for that is the most common surface area for "resolving" things in the Tekton ecosystem is to resolve Tasks and Pipelines. We can bolt on other use cases like Triggers but if that isn't born out of concrete use cases that users experience then we are designing for air. Separately, I think making this resolver thing more general and not strictly tied to pipelines is good but we also should focus on the core use cases driving this proposal in the first place.

@vdemeester
Copy link
Member

You already have to modify it today. EG, I don't trust docker hub and mirror the catalog into my own repositories and thus I need to rewrite the catalog. To me, this is less a problem with having interchangeable resolvers and more of a desire for better machinery to use the catalog (perhaps the catalog has a way to "generate" itself with different reference formats).

Very fair point 😉.

That's an implementation detail and this proposal is still in the proposing state. There are many ways to do fail-over/precedence that could be discussed at implementation time (and we should definitely do fail-over && precedence ordering).

Right. I think initially I thought (or read ?) that it was out of scope (like in this comment). But yeah I can hear that it's an implementation detail.

[…] I really want to hone in on "Tekton should probably provide a non-hacky, reproducible, audited system for remote task resolution that can be used a straight-in ci/cd system" and challenge it for a second just to play devil's advocate. What if we don't? Like obviously there is a built-in way to fetch Tasks and stuff but do we really need the "blessed" path for remote resolution? No two places are going to agree on what is actually blessed and my version of "audited" ≠ your version of "audited". Let's take the middle-road and say, fine, if we can't agree on what the "right" way to do it is, here's the API, do it yourself

Right, this is also kind-of what I am trying to do. As of today, this problem is handled differently, by different tools and people. That say, I tend to think taking the middle-road could be a good path indeed, I just want to weight a bit the other paths 😛.

Well I think the reason for that is the most common surface area for "resolving" things in the Tekton ecosystem is to resolve Tasks and Pipelines. We can bolt on other use cases like Triggers but if that isn't born out of concrete use cases that users experience then we are designing for air. Separately, I think making this resolver thing more general and not strictly tied to pipelines is good but we also should focus on the core use cases driving this proposal in the first place.

Gotcha 👍🏼

@ghost
Copy link
Author

ghost commented May 17, 2021

There are many ways to do fail-over/precedence that could be discussed at implementation time (and we should definitely do fail-over && precedence ordering).

Right. I think initially I thought (or read ?) that it was out of scope (like in this comment). But yeah I can hear that it's an implementation detail.

Yeah you're right - I commented earlier that fail-over wasn't in scope for this TEP, but I now agree it's more likely something we need to revisit during the implementation discussion. I'll add this in an "Open Questions" section of this doc.

@ghost
Copy link
Author

ghost commented May 17, 2021

Added:

+## Open Questions
+
+- How will the implementation of this feature support use-cases where pipeline is                                                                   
+  fetched at specific commit and references tasks that should also be pulled from                                                                   
+  that specific commit?

@afrittoli
Copy link
Member

On re-usability - I think the problem lies in the fact that our taskRef today includes task name, version and source of the task, all defined at static time. The task name may can belong to the authoring domain, the version is debatable, but if we want reusable pipelines, the source should belong to the runtime, or at least Tekton should provide a built-in mechanism for a *Run to override the source. For CI systems it's important to be able to override the version too, to make it possible to test changes to pipelines and tasks themselves.

I started working on a TEP that introduces the idea of overriding the source in the *Run objects, keeping the original one as a default/fallback. That would very much help in the implementation of Tekton based CI system, and it would also help with the TEP about the catalog organisation for pipelines, since we could then store pipelines with a reference that we pick, knowing that users may override it if they need to, with no need to template or hack into the pipeline YAML.

@ghost ghost changed the title [WIP] TEP-0060: Remote Resource Resolution TEP-0060: Remote Resource Resolution May 17, 2021
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 17, 2021
@ghost
Copy link
Author

ghost commented May 17, 2021

I started working on a TEP that introduces the idea of overriding the source in the *Run objects, keeping the original one as a default/fallback.

I agree with this 💯%

In my view "Remote Resource Resolution" should remain focused on resolving remote resources, not defining how Pipelines (or Tekton generally) supports failover / precedence / etc. I much prefer thinking of RRR as "one tool in the toolbox", not a holistic solution to an entire domain of problems.

@ghost
Copy link
Author

ghost commented May 17, 2021

/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 17, 2021
This commit adds a new TEP with a problem statement describing a gap
in Tekton's support for referencing Tasks and Pipelines. Namely, that
those resources have to be in the cluster or in an OCI registry.
@pierretasci
Copy link
Contributor

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label May 17, 2021
@tekton-robot tekton-robot merged commit 72083cc into tektoncd:main May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Proposed
Development

Successfully merging this pull request may close these issues.

5 participants