Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP-0077: Partial pipeline execute. #484

Closed
wants to merge 3 commits into from

Conversation

ScrapCodes
Copy link
Contributor

@ScrapCodes ScrapCodes commented Jul 21, 2021

Summary

Add an ability for PipelineRun to have disabled tasks i.e. a PipelineRun can execute a Pipeline partially.

Allow PipelineRun to be created from previous PipelineRun.

So, a PipelineRun can be partially run or cancelled at run time, and resumed at a later point with the help of work proposed in this TEP.

Together these will bring in the ability to resume/retry a failed PipelineRun

/cc. @jerop @bobcatfish @Tomcli

@tekton-robot tekton-robot requested review from dibyom and khrm July 21, 2021 11:34
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 21, 2021
@ScrapCodes
Copy link
Contributor Author

/kind tep

@tekton-robot tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Jul 21, 2021
@jerop
Copy link
Member

jerop commented Jul 21, 2021

/assign

@ScrapCodes ScrapCodes changed the title TEP-0077: Partial pipeline execute. WIP TEP-0077: Partial pipeline execute. Jul 22, 2021
@tekton-robot tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 22, 2021
@ScrapCodes ScrapCodes force-pushed the TEP-77 branch 3 times, most recently from f298988 to 559f6fe Compare July 22, 2021 13:12
@ScrapCodes
Copy link
Contributor Author

@bobcatfish, @jerop and @Tomcli may we use this PR for brain storming ideas.

@vdemeester
Copy link
Member

/assign

@bobcatfish
Copy link
Contributor

Exciting!! Thanks for getting this going @ScrapCodes !! :D

/assign

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign bobcatfish
You can assign the PR to them by writing /assign @bobcatfish in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ScrapCodes ScrapCodes force-pushed the TEP-77 branch 2 times, most recently from afcbfa2 to 74caa44 Compare July 23, 2021 12:02
@ScrapCodes ScrapCodes changed the title WIP TEP-0077: Partial pipeline execute. TEP-0077: Partial pipeline execute. Jul 23, 2021
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 23, 2021
@bobcatfish
Copy link
Contributor

Note I'm going to be out for 2 weeks so in the meantime I'm happy with whatever @jerop decides in my place 😇

@ScrapCodes ScrapCodes force-pushed the TEP-77 branch 3 times, most recently from 8759599 to a5d37d7 Compare July 26, 2021 09:44
Copy link
Member

@jerop jerop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for picking this up @ScrapCodes, excited to see this moving forward! 😁

it's great that there's already a proposal here, but it'd be even better if it came with it's alternatives so that we can weigh different options before committing to this proposal as the way forward

if it's too much to add alternatives as well in this PR, maybe this PR can focus on the problem statement only then the next one can have the proposal and its alternatives


### Notes/Caveats (optional)

Q. Can we provide an option to disable a task but not all the that depend on it?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be the use case for this?

Copy link
Contributor Author

@ScrapCodes ScrapCodes Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of the resume failed pipeline run,

  1. In the proposal we are saying, we will retry failed tasks and if they have dependence on another tasks's results - we will reference results from previous runs. A user can override the results from previous Run, by redeclaring in this section.
  2. Suppose a failed task is permanently failing, and one would like to not retry that task but its dependents, then the user can hard code the results and execute the dependents.

My preference is to not include this in the current scope of this TEP, but a future TEP can address this.

EDIT: added a ## Future Work section.

and resume at later point.

## Requirements
- Create a new `PipelineRun` to resume or retry a completed `PipelineRun`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also have a requirement for the partial execution of a pipeline, where there's no previous pipelinerun

(represented in use case 2 above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use case 2 and the requirement 2 are of less importance for us. But, we are happy(more than happy) to pursue them, incase you think we should drop them for this proposal, then I can move the disableTasks field under pipelineRunRef field.

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we figure out from the dag what to run, how does the controller knows that what it runs from (the first failure) is going to work ? what if the first part of the run are not indempotent and thus won't produce the same result ?

I am wondering how much use cases / users this affects. If this is covering about 5% of uses cases from Pipeline, then we might complexify Pipeline (syntax, code, …) for very little gain. As commented, I'd rather explore a higher level construct such as Tekton Workflow than adding this support in Pipeline itself — a simple and robust core on top of which we can build powerful abstraction. And this use case is, for me, a perfect example of something I'd rather have on top of pipeline, not in pipeline.

  1. Optimal use of resources: tektoncd as a backend for ML.
    A machine learning pipeline may consist of tasks moving large amount of
    data and then training ml models, all of it can be very resource consuming
    and inability to retry would require a user to start the entire pipeline
    over. A manual retry, with the ability to specify what tasks should
    be skipped, may be helpful.

This also goes down to tektoncd uses as a backend for ML. Is it the target, or is it a side effect ? Should we bend pipeline's core to adapt to ML, or should we build on top of it to adapt. For example, nothing prevent us to create a specific tekton ML Pipeline that would be optimized for that type of stuff and would rely on Task/TaskRun only from the core pipeline.That would enable those kind of use case without complexifying the core 👼🏼

pipelines at Kubeflow.
2. It is not enough to `retry` a `PipelineTask` n times, as the failures can
be due to e.g. service outage. A manual resume/ retry may be helpful.
3. Iterate quickly, by disabling tasks that take longer time. This can be done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This point is at "authoring" time, the other two are at the "runtime" time

Comment on lines +84 to +85
3. Partial execution is also helpful for testing, i.e. skipping some tasks
and developing and testing iteratively and quickly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we talk about testing here, it's "testing tektoncd/pipeline, testing its feature or at least testing a pipeline when we write it", right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Testing a pipeline while we develop it.

3. Partial execution is also helpful for testing, i.e. skipping some tasks
and developing and testing iteratively and quickly.
4. Pause and resume, i.e. one could manually cancel a running `PipelineRun`
and resume at later point.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure I see the use case 😅

Comment on lines +120 to +122
- `pipelineRunRef` : pipelineRunRef references a previous pipelineRun and by default
selects all the failed and unfinished tasks eligible for retrying/resuming.
It references results of completed tasks from previous run.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is more than referenced results to take into account. What about the data that was provided (workspace, …) ? What about required "one-shot" task that would be required for the pipeline to run (like getting a one-time credential that would need to be re-issued in case of a new run, …) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it's handled partly by the next -)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For workspaces we could take two approaches:

  • Try to maximise re-run success: if a task mounts a volume in write mode, and a task that depends on it (directly or indirectly) needs to be executed, that task needs to be re-run too
  • Try to minimise resource utilisation: only re-run failed tasks and tasks that where not executed because of interruption of the pipeline. They might fail again because of missing data on the workspace

Other resources might be missing still, like a test k8s cluster or any external resource created by initial tasks.
Having init tasks might be one way to solve this. I think a proper solution would be to let task specify input/output resources in some format (similar to the PipelineResources we had), so that tasks may declare what they provision and what provisioned resources they rely on.

Comment on lines +124 to +127
- `pipelineRunRef.enableTasks`: If a task was successful in previous run, but
it is required by the current run, this section can be used to explicitly
enable it. For example, a task may perform some initialization for the
other tasks in `PipelineRun`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be on the user to set right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ! It is not possible to auto-detect it, AFAIK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be in future if we add more metadata to our pipelines and workspaces.
We could let users specify that certain tasks are initialisation ones, and need to be executed on re-run.

should be disabled in the `disableTasks` section.


## Alternatives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is additional alternatives, and one would be that this is handled by a higher level construct/abstraction, such as Tekton workflows for example.

## References (optional)

1. [Google Doc: Disabling a Task in a Pipeline](https://docs.google.com/document/d/1rleshixafJy4n1CwFlfbAJuZjBL1PQSm3b0Q9s1B_T8/edit#heading=h.jz9jia3av6h1)
2. [TEP-0065](https://github.com/tektoncd/community/pull/422)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by request: it'd be great to include a link to tektoncd/pipeline#50 as well - we're addressing our oldest open issue in pipelines!! :D

@bobcatfish
Copy link
Contributor

I am wondering how much use cases / users this affects.

@vdemeester are you wondering this about both of the sets of functionality being described here? like jerop mentioned the tep currently seems to be covering 2 things:

  1. Partial execution
  2. Retrying a PipelineRun from another PipelineRun <-- which i'm hoping we can cover in TEP-0065 and remove from this TEP (this TEP would provide functionality that could be leveraged if we pursue TEP-0065 or pursue similar functionality via a Workflow)

I wanted to check if your concerns are primarily about (2) or if they cover both?

@vdemeester
Copy link
Member

vdemeester commented Aug 18, 2021

I am wondering how much use cases / users this affects.

@vdemeester are you wondering this about both of the sets of functionality being described here? like jerop mentioned the tep currently seems to be covering 2 things:

  1. Partial execution
  2. Retrying a PipelineRun from another PipelineRun <-- which i'm hoping we can cover in TEP-0065 and remove from this TEP (this TEP would provide functionality that could be leveraged if we pursue TEP-0065 or pursue similar functionality via a Workflow)

I wanted to check if your concerns are primarily about (2) or if they cover both?

My concern is about both and trying to think if the complexity it adds to the core (tektoncd/pipeline) is justified or if this is a perfect example of where a higher level abstraction such as the experimental workflow project (or something else), would make more sense 🙃.

Let's assume we have 1 million users, in those 100000 users, about 5 percent might be interested by this feature, so about 5000 users. And if one those 5000 users, about 99% of the case, have smaller pipeline and an opinionated event driven setup would fullfil there case. We would add quite some complexity to the code for a very small fraction of our users (~50 out of 100000 ?) — which means more potential bugs, more confusing case, more issues, …

I would make a parallel with Pod and Deployment. A Pod doesn't know how to scale for example, it is handle by a higher abstraction (Deployment). I am suggesting the same with Pipeline/PipelineRun/Task/TaskRun and a yet-to-be-defined type.

@tekton-robot
Copy link
Contributor

@ScrapCodes: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 24, 2021
@ScrapCodes
Copy link
Contributor Author

ScrapCodes commented Aug 24, 2021

Somehow I still feel, lower level building blocks are needed. Otherwise, even from the higher abstraction, the only options are create a new PipelineRun with desired tasks through automation. I feel having disabled task here is simpler than doing complex stuff in higher abstractions.

@bobcatfish
Copy link
Contributor

@ScrapCodes are you okay with the approach where this particular addresses just adding disabledTasks like @jerop suggested (#484 (review)) and pipelineRunRef is addressed in a different TEP?

@ScrapCodes
Copy link
Contributor Author

@ScrapCodes are you okay with the approach where this particular addresses just adding disabledTasks like @jerop suggested (#484 (review)) and pipelineRunRef is addressed in a different TEP?

@bobcatfish I agee!

@vdemeester
Copy link
Member

Somehow I still feel, lower level building blocks are needed. Otherwise, even from the higher abstraction, the only options are create a new PipelineRun with desired tasks through automation. I feel having disabled task here is simpler than doing complex stuff in higher abstractions.

Indeed, from a higher abstraction, the options are to create a new PipelineRun with desired tasks. I am not sure I see a huge problem (yet) there though. We create a new Pod when we retry, when a Pod attached to a Deployment fails a new Pod is created, etc… A higher level abstraction would/could have total control over the PipelineRun created (from whatever format it adopts and give to its user), and thus could create anything really.
As of today, disabling Task is a matter of a param (or set of param) and a when condition right ? For a user, that would be super verbose to do, but for an thin layer on top of Pipeline that wouldn't be such a high bar.

I think my main point here is : we should explore those higher level abstraction to solve this problem, and see what are the shortcomings.

@ScrapCodes
Copy link
Contributor Author

ScrapCodes commented Aug 25, 2021

Thanks for the patience and great progress on this issue @ScrapCodes!

second change only

The requested API changes are:

1. Add `pipelineRunRef` under `PipelineRun.spec`. It has following fields: 
   - `pipelineRunRef.name` which is the name of previously run `PipelineRun`.
   - `pipelineRunRef.enableTasks` accepts an array of task names under it.
2. Add `disableTasks` under `PipelineRun.spec`, which accepts an array
    of task name.
   - `name`: Name of the task to be disabled.

What if we add the second change only initially because it'd handle the first three use cases? The "disabled" tasks with default results, would enable:

  • UC1: manual retry with specified tasks to be skipped
  • UC2: reusing pipeline by partial execution of the pipeline
  • UC3: testing a pipeline by partial execution of the pipeline

For UC4 (resuming a cancelled pipelinerun) - I'd imagine that a user can also use the default results to kick of a new run that continues execution from the stopping point, even though it's not really a continuation of the original pipelinerun - another alternative for this use case is to provide an explicit pause and resume functionality

My concerns about the first change - automagic selection of failed and unfinished tasks - are:

  • does it include skipped tasks?
  • does failed include cancelled tasks?
  • what if a user wants to configure what's automatically selected?
  • it seems complicated to have 3 levers of what is executed (automatic selection, enabled tasks, disabled tasks)

So I propose that we consider solving for most of the use cases with only the second change. This leaves the first change as an option we can explore afterwards when we gather more feedback. What do you think?

disabled --> skipped

What does disabled tasks mean here? is it not executed? is that the same as skipped? I'm wondering if we could use more precise language here instead of disabled, maybe skipped is appropriate?

dependencies of "disabled" tasks

The current proposal is that we don't execute all the dependencies of "disabled" tasks. This makes sense initially, but given experiences with skipping strategies, I suggest that we anticipate what would happen if users want to execute the dependent tasks in the future (ordering dependencies and resource dependencies with defaults) - how would we handle this?

/cc the other assignees @bobcatfish @vdemeester

Thank you @jerop, @bobcatfish and @vdemeester for kindly taking a look. Sorry for the long disappearance, I was on a long vacation.
My original intention was to have something similar to select and filter. Otherwise, if we only have one of them, then some usecases are left out. I am ok to split this work into two, i.e. disabled task and then pipelineRunRef as separate TEPs and evaluated separately.

I am in favour of keeping the pipeline run as the basic building block, however with the current state of the features. One needs to create a completely new pipeline definition and pipelineRun definition, with the tasks copied either through some automation (the case of building a higher abstraction) or manually (until we do not have a higher abstraction). Having a support for disabledTask can simplify things greatly even if go the route of building a higher abstraction later on.

With when expression, a pipeline needs to be pre-designed with a view that its tasks may be disabled in the future. (downside : extra verbosity)

Also provide an option for hard-conding the results.
My preference is this can be work for future TEP.

### Risks and Mitigations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality would have an impact on how we generate attestations.
If an artefact is produced by a pipeline that was partially executed, or resumed, how do we track that?
One simple initial answer could be that we don't generate attestations for partially executed pipelines.
Alternatively we could capture the spec of what is actually being executed and what it's inputs were, including partial results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the artifact that the attestation was created for was produced, it still makes sense to create the attestation even if the pipeline didn't complete or was partially executed. Capturing the spec (or some reference to the spec) sounds like a nice compromise to me.

(could this happen within a retried taskrun as well?)

Comment on lines +109 to +113
`disableTasks` can be used to explicitly disable tasks that a user
do not wish to run. On the other hand, in `pipelineRunRef` tekton controller
automatically figures out the tasks failed and unfinished, because it knows the
DAG. For the end user, it can be difficult to figure out the DAG and prepare
the accurate execution plan for the next pipeline run.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comments here show that the various use cases, running a partial pipeline, resuming a cancelled pipeline and retrying a failed pipeline come with different priorities and requirements.

The initial TEP started by @ScrapCodes was about retrying a failed taskrun/pipelinerun.
We saw common aspects with the partial execution which lead to this TEP, but perhaps we should first do some work to spell out what are the common aspects between the various use cases / features.

In my mind I see re-running a failed / cancelled pipeline - without redoing all the work - as the main use case. For that to be useful it should be up to the controller to decide what to run, not to the user.

The partial execution use case has different needs. The user wants to skip an expensive task in the pipeline for testing purposes or because a system it depends on is temporarily not available. In this case it's up to the user to say which tasks to skip, and it's up to the controller to say if the request can be served and what are the consequences - i.e. other tasks might be skipped because of the missing ones.

Comment on lines +124 to +127
- `pipelineRunRef.enableTasks`: If a task was successful in previous run, but
it is required by the current run, this section can be used to explicitly
enable it. For example, a task may perform some initialization for the
other tasks in `PipelineRun`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be in future if we add more metadata to our pipelines and workspaces.
We could let users specify that certain tasks are initialisation ones, and need to be executed on re-run.

Comment on lines +120 to +122
- `pipelineRunRef` : pipelineRunRef references a previous pipelineRun and by default
selects all the failed and unfinished tasks eligible for retrying/resuming.
It references results of completed tasks from previous run.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For workspaces we could take two approaches:

  • Try to maximise re-run success: if a task mounts a volume in write mode, and a task that depends on it (directly or indirectly) needs to be executed, that task needs to be re-run too
  • Try to minimise resource utilisation: only re-run failed tasks and tasks that where not executed because of interruption of the pipeline. They might fail again because of missing data on the workspace

Other resources might be missing still, like a test k8s cluster or any external resource created by initial tasks.
Having init tasks might be one way to solve this. I think a proper solution would be to let task specify input/output resources in some format (similar to the PipelineResources we had), so that tasks may declare what they provision and what provisioned resources they rely on.

@bobcatfish
Copy link
Contributor

The partial execution use case has different needs.

@afrittoli i see partial execution as the building blocks needed to implement the use cases that this discussion was initially started around. It's also quite possible that we decide that an API for pipeline retries might belong somewhere else, e.g. Workflows - by implementing partial execution, we have flexibility around whether we then add the PipelineRun retry capability at the PipelineRun level or at the Workflow level.

I also suggest we discuss this a bit so that we can be sending @ScrapCodes in a consistent direction b/c I think the feedback he's gotten so far has been to address these problems separately, addressing partial execution first, and it sounds like you don't agree. I'll add this onto the API WG agenda for next week as well.

@afrittoli
Copy link
Member

The partial execution use case has different needs.

@afrittoli i see partial execution as the building blocks needed to implement the use cases that this discussion was initially started around. It's also quite possible that we decide that an API for pipeline retries might belong somewhere else, e.g. Workflows - by implementing partial execution, we have flexibility around whether we then add the PipelineRun retry capability at the PipelineRun level or at the Workflow level.

I also suggest we discuss this a bit so that we can be sending @ScrapCodes in a consistent direction b/c I think the feedback he's gotten so far has been to address these problems separately, addressing partial execution first, and it sounds like you don't agree. I'll add this onto the API WG agenda for next week as well.

Thanks @bobcatfish - I agree that having more discussion on this would be useful. I'm not saying I don't agree with solving the problems separately, but I'd like to understand what is the surface of overlap between the different features. From the API point of view I think they are quite independent, but there may be controller side features shared between the two.

@bobcatfish
Copy link
Contributor

I'd like to understand what is the surface of overlap between the different features.

I can try to explain a bit around the overlap that I see. From the TEP:

disableTasks can be used to explicitly disable tasks that a user do not wish to run. On the other hand, in pipelineRunRef tekton controller automatically figures out the tasks failed and unfinished, because it knows the DAG. For the end user, it can be difficult to figure out the DAG and prepare the accurate execution plan for the next pipeline run.

From what I can tell, pipelineRunRef is syntactic sugar for using disableTasks to disable the Tasks that have succeeded in the previous PipelineRun.

When using disableTasks, a user can specify Tasks that they don't want to execute; when using pipelineRunRef the controller decides which Tasks not to execute.

If we were to implement only disableTasks, this would make it possible for some other tool to implement the pipelineRunRef functionality if desired - today this is not possible. By implementing disableTasks first, we can unlock multiple ways of approaching pipelineRunRef.

Additionally, in order to implement disableTasks we need to talk through several scenarios, e.g. what if you disable a Task that provides a result used by a Task that isn't disabled? What if that Task is expected to write something to a workspace? By limiting this TEP to just disableTasks we can fully explore these cases, laying the ground work for how we can then add pipelineRunRef functionality - whether we implement it in the Pipelines controller, or (as @vdemeester feels strongly) in some other layer (e.g. Workflows, in the CLI, etc.)

Does that help at all @afrittoli ?

@vdemeester
Copy link
Member

I am still not sold on the value of disableTasks in Pipeline or PipelineRun. It brings a lot of complexity for a, I think (maybe I am wrong), a relatively "corner case" uses case.

@bobcatfish
Copy link
Contributor

quick update from yesterday's API WG: added @vdemeester 's concerns to the agenda but we didn't get to them so will be first on the agenda to discuss in the next meeting on nov 1

@ScrapCodes
Copy link
Contributor Author

ScrapCodes commented Oct 27, 2021

Some of the points, I would like to consider.

  1. If we do not have disabledTasks, and we chose to have a higher abstraction e.g. workflow where it will go through each task and copy rebuilding the DAG and resolving all the nifty when conditions of a partially executed/failed pipeline. The higher abstraction has to be aware of internal logic that tekton uses to discern which tasks to copy and how exactly it has to be done e.g. scratch the workspace/results or not - based on what? which tasks depends on what and can it be skipped if it is configured with default results etc...

  2. Internal logic of tekton is subject to change in subtle and gross way across each release, so the external subsystem has to evolve along with it. And may be have compatibility matrix?

@tekton-robot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2022
@bobcatfish
Copy link
Contributor

Apologies @ScrapCodes it looks like this TEP got a bit stuck after the nov 8 API working group meeting

iirc @vdemeester is pushing back on moving forward with this - where do you stand regarding this @ScrapCodes ?

@ScrapCodes
Copy link
Contributor Author

@bobcatfish I am not currently actively pursuing it.

@bobcatfish
Copy link
Contributor

kk thanks for the update @ScrapCodes , sounds like we could close this TEP for now then and we can re-open and keep discussing later if the need comes up again.

/close

@tekton-robot
Copy link
Contributor

@bobcatfish: Closed this PR.

In response to this:

kk thanks for the update @ScrapCodes , sounds like we could close this TEP for now then and we can re-open and keep discussing later if the need comes up again.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ScrapCodes
Copy link
Contributor Author

Thank you @bobcatfish

@jwx0925
Copy link

jwx0925 commented Aug 18, 2023

Is there any further discussion on this topic?

This feature is very needed in our scenario: there are many serial tasks in a pipeline. When the last task fails, automatic retry cannot recover. So the failed task needs to be retried manually.
The alternative is to rerun the pipeline, but it will rerun all the previous tasks and waste a lot of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Opened
Development

Successfully merging this pull request may close these issues.

7 participants