Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): fix lifecycle issue when cancelling manual executions #10789

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

despairblue
Copy link
Contributor

@despairblue despairblue commented Sep 12, 2024

Summary

The fix in #9992 introduced a lifecycle issue with executions where an execution would be removed multiple times from the ActiveExecutions.

Normally the flow would be like this:

sequenceDiagram
  Actor Customer as User
  participant AE as Active Executions
  participant WR as Workflow Runner
  participant WE as Workflow Execute
  participant Node

  Customer ->>+ WR: run workflow
  WR ->>+ AE: register execution
  activate AE
  WR ->>+ WE: execute worfklow
  WE ->>+ Node: run node
  Node ->>- WE: done
  WE ->>- WR: done
  WR ->> AE: get status
  AE ->> WR: send status
  WR ->> AE: unregister execution
  deactivate AE
  WR ->>- Customer: done
Loading

When cancelling this would happen (unless for the wait node):

sequenceDiagram
  Actor Customer as User
  participant ES as Execution Service
  participant AE as Active Executions
  participant WR as Workflow Runner
  participant WE as Workflow Execute
  participant Node

  Customer ->>+ WR: run workflow
  WR ->>+ AE: register execution
  activate AE
  WR ->>+ WE: execute worfklow
  
  WE ->>+ Node: run node

  %% cancel execution
  Customer ->>+ ES: cancel workflow run
  ES ->> WE: cancel execution
  WE -x Node: cancel execution
  ES ->> AE: unregister execution
  deactivate AE
  ES ->>- Customer: done

  %% finish execution anyways
  Node ->>- WE: done
  WE ->>- WR: done, because cancelled
  WR ->> AE: get status BOOM
  deactivate WR
Loading
Wait Node The wait node would never resolve, so it would look like this:
sequenceDiagram
  Actor Customer as User
  participant ES as Execution Service
  participant AE as Active Executions
  participant WR as Workflow Runner
  participant WE as Workflow Execute
  participant Node

  Customer ->>+ WR: run workflow
  WR ->>+ AE: register execution
  activate AE
  WR ->>+ WE: execute worfklow
  
  WE ->>+ Node: run node

  %% cancel execution
  Customer ->>+ ES: cancel workflow run
  ES ->> WE: cancel execution
  WE ->> Node: cancel execution
  ES ->> AE: unregister execution
  deactivate AE
  ES ->>- Customer: done

  deactivate WE
  deactivate WR
  deactivate Node
Loading

Before #9992 the flow was like this. And I think it should be like that again.

sequenceDiagram
  Actor Customer as User
  participant ES as Execution Service
  participant AE as Active Executions
  participant WR as Workflow Runner
  participant WE as Workflow Execute
  participant Node

  Customer ->>+ WR: run workflow
  WR ->>+ AE: register execution
  activate AE
  WR ->>+ WE: execute worfklow
  
  WE ->>+ Node: run node

  %% cancel execution
  Customer ->>+ ES: cancel workflow run
  ES ->> WE: cancel execution
  WE -x Node: cancel execution
  ES ->>- Customer: done

  %% finish execution anyways
  Node ->>- WE: done
  WE ->>- WR: done, because cancelled
  WR ->> AE: get status
  AE ->> WR: send status
  WR ->> AE: unregister execution
  deactivate AE
  WR ->>- Customer: done
Loading

So while I have 3 solutions in this PR I prefer Option 3, which concentrates the responsibility for registering and unregistering active executions again within the WorkflowRunner.

Once I've decided on a solution I'll write tests and refactor the rest of the code.

Related Linear tickets, Github issues, and Community forum posts

https://linear.app/n8n/issue/PAY-1554/error-no-active-execution-found

Review / Merge checklist

  • PR title and summary are descriptive. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.
  • PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

@n8n-assistant n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team node/improvement New feature or request labels Sep 12, 2024
@@ -163,6 +175,12 @@ export class ActiveExecutions {
promise.reject(reason);
}

// FIXME: Option 2: Don't clean up the execution. Because we cancelled it
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can probably be removed, because cancelling the promises is probably not correct, given that cancelling an execution will still resolve the promise returned from it:

onCancel.shouldReject = false;

And in case we want to reject this, then the WorkflowRunner should do it after the execution promise has been resolved and it checked that it resolved early because it was cancelled.

//}

// NOTE: new code
// FIXME: Option 3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferred solution. There is only one place that deregisters the execution:

this.activeExecutions.remove(executionId, fullRunData);

Comment on lines +518 to +522
context.onExecutionCancellation(() => {
clearTimeout(timer);
// FIXME: Don't let the promise dangle
resolve([context.getInputData()]);
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated and I'll split this out, but left it here for context as to why the wait node is not affected by this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team node/improvement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant