fix(ingest/pipeline): catch pipeline exceptions #10753

pie1nthesky · 2024-06-20T09:21:08Z

For now unhandled exception are not reported properly.
When pipeline fails with e.g. 'Connection timeout' exception during source processing,
pipeline exits with final_status = 'unknown' and with cut off log in report.
That makes impossible to troubleshoot ingestion issues from ingestion report page.

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Summary by CodeRabbit

New Features
- Introduced detailed pipeline status enumeration to improve error handling and reporting.
Chores
- Updated GitHub workflow with a timeout for job steps and renamed Docker containers for better logging.

…inal status

hsheth2 · 2024-06-22T00:41:26Z

metadata-ingestion/src/datahub/ingestion/run/pipeline.py

@@ -494,6 +492,10 @@ def run(self) -> None:
                self.final_status = "cancelled"
                logger.error("Caught error", exc_info=e)
                raise
+            except Exception as exc:
+                self.final_status = "pipeline_failure"
+                logger.error("pipline run error: ", exc_info=exc)


I would've thought this log line would be redundant, since we log for any exception as part of entrypoints.py

Can you provide more details about this?

with cut off log in report.

Ingestion Logs are put into report by self._notify_reporters_on_ingestion_completion() method in finally clause.
So if we don't log a pipeline exception before finally clause code block is called, exception that caused pipeline failure is not present in a report.

It can be tested by mangling host_port in any recipe and checking a report in ingestion page.

What do you think about handling redundancy?

Should I wrap this exception up PipelineRunError and handle it in entrypoints.py without calling logger.exception(f"Command failed: {exc}") ?
I don't like import from pipeline to entrypoints, because there is intermediary module ingest_cli.

Another option is to cut tracebacks:

except Exception as exc: self.final_status = "pipeline_failure" logger.error("pipline run error: ", exc_info=exc.with_traceback(None)) raise exc from None

coderabbitai · 2024-06-27T18:38:33Z

Walkthrough

Significant updates were made to the pipeline.py file to enhance status reporting and error handling using a new PipelineStatus enum with values like UNKNOWN, COMPLETED, PIPELINE_ERROR, and CANCELLED. Changes also include a timeout addition and container renaming in a GitHub workflow file for better process flow management and logging.

Changes

Files	Summaries
`.../ingestion/run/pipeline.py`	Introduced `PipelineStatus` enum for more robust status handling and improved exception handling.
`.github/workflows/docker-unified.yml`	Added job step timeout of 15 minutes, and renamed Docker container for clearer logging.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Pipeline
    participant Logger
    
    User->>Pipeline: Start pipeline
    Pipeline->>Pipeline: Set status to PipelineStatus.UNKNOWN
    Pipeline->>+Logger: Log initial status
    Pipeline->>Pipeline: Perform tasks
    
    alt Successful completion
        Pipeline->>Pipeline: Set status to PipelineStatus.COMPLETED
    else Exception raised
        Pipeline->>Pipeline: Set status to PipelineStatus.PIPELINE_ERROR
    end
    
    Pipeline->>Pipeline: Handle specific exceptions (set status to CANCELLED)
    Pipeline->>-Logger: Log final status
    Logger->>User: Provide status update

Poem

The pipeline now knows its fate,
With statuses to celebrate.
From unknown paths to tasks complete,
Errors faced and still it’s neat.
Timeout set, logs all precise,
Improvements made, oh, how nice!

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

hsheth2 · 2024-06-27T18:39:30Z

@pie1nthesky made some tweaks to this, hopefully it does what you're looking for

coderabbitai

Actionable comments posted: 2

Outside diff range and nitpick comments (1)

metadata-ingestion/src/datahub/ingestion/run/pipeline.py (1)
Line range hint 541-543: Refactor: Simplify Boolean Expression

Instead of using a ternary operation for a boolean result, use the bool function directly for clarity.
-        has_errors: bool = (
-            True
-            if self.source.get_report().failures or self.sink.get_report().failures
-            else False
-        )
+        has_errors: bool = bool(self.source.get_report().failures or self.sink.get_report().failures)

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 190f09a and acc1e9c.

Files selected for processing (1)

metadata-ingestion/src/datahub/ingestion/run/pipeline.py (7 hunks)

Additional context used

Ruff

metadata-ingestion/src/datahub/ingestion/run/pipeline.py

98-99: Use a single if statement instead of nested if statements

(SIM102)

541-543: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)

coderabbitai · 2024-06-27T18:40:54Z

metadata-ingestion/src/datahub/ingestion/run/pipeline.py

+class PipelineStatus(enum.Enum):
+    UNKNOWN = enum.auto()
+    COMPLETED = enum.auto()
+    PIPELINE_ERROR = enum.auto()
+    CANCELLED = enum.auto()


Refactor: Simplify Enum Declaration

The use of enum.auto() is great for avoiding manual assignment of values, but it results in the enum values being integers, which might not be the expected behavior since the original status values were strings. Consider explicitly setting the enum values to their string counterparts.

class PipelineStatus(enum.Enum): - UNKNOWN = enum.auto() - COMPLETED = enum.auto() - PIPELINE_ERROR = enum.auto() - CANCELLED = enum.auto() + UNKNOWN = "unknown" + COMPLETED = "completed" + PIPELINE_ERROR = "pipeline_error" + CANCELLED = "cancelled"

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class PipelineStatus(enum.Enum):

UNKNOWN = enum.auto()

COMPLETED = enum.auto()

PIPELINE_ERROR = enum.auto()

CANCELLED = enum.auto()

class PipelineStatus(enum.Enum):

UNKNOWN = "unknown"

COMPLETED = "completed"

PIPELINE_ERROR = "pipeline_error"

CANCELLED = "cancelled"

coderabbitai · 2024-06-27T18:40:54Z

metadata-ingestion/src/datahub/ingestion/run/pipeline.py

+                    if self.final_status == PipelineStatus.CANCELLED
                    else "FAILURE"
                    if self.has_failures()
                    else "SUCCESS"
-                    if self.final_status == "completed"
+                    if self.final_status == PipelineStatus.COMPLETED


Improve Readability: Simplify Conditional Logic

The nested ternary operators make this code hard to read. Consider refactoring this to use a more straightforward conditional structure.

- if self.final_status == PipelineStatus.CANCELLED - else "FAILURE" - if self.has_failures() - else "SUCCESS" - if self.final_status == PipelineStatus.COMPLETED - else "UNKNOWN", + status = "UNKNOWN" + if self.final_status == PipelineStatus.CANCELLED: + status = "CANCELLED" + elif self.has_failures(): + status = "FAILURE" + elif self.final_status == PipelineStatus.COMPLETED: + status = "SUCCESS" + report=self._get_structured_report(), + ctx=self.ctx,

Committable suggestion was skipped due to low confidence.

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between acc1e9c and 008e2ec.

Files selected for processing (1)

.github/workflows/docker-unified.yml (2 hunks)

Additional comments not posted (2)

.github/workflows/docker-unified.yml (2)

799-799: Approval for Timeout Addition

Adding a 15-minute timeout to the job step is a good practice. It helps in managing resources efficiently and prevents potential issues with jobs that may hang or stall.

1026-1026: Approval for Docker Container Renaming

Renaming the Docker container to datahub-datahub-upgrade-1 enhances clarity in identifying related logs and reports, which is beneficial for troubleshooting and monitoring.

This reverts commit 008e2ec.

pie1nthesky · 2024-06-27T21:38:11Z

@hsheth2
HACK seems to be unnecessary since logs are appended here and readily available in UI.

I would go with split traceback like:

except Exception as exc:
   self.final_status = PipelineStatus.PIPELINE_FAILURE
   logger.exception("Ingestion pipeline threw an uncaught exception")
   raise RuntimeError("Ingestion pipeline threw an uncaught exception") from None

No redundancy, no hacks, but status and logs are available in report.
Oh, well...

hsheth2 · 2024-06-27T21:45:59Z

@pie1nthesky it's a bit more tricky than that - I want the logs in the UI to closely match whatever is printed to the CLI. The reporting code you linked to only works for CLI ingestion, but UI-driven ingestion is only based on the stdout/stderr logs. Additionally, I wanted to change it so that pipeline.run() to never throws. Finally, I want the error to show up somewhere in the structured report (to help with #10790).

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

fix(ingest/pipeline): catch pipline exceptions, log them and update f…

90ef70f

…inal status

github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Jun 20, 2024

vercel bot deployed to Preview June 20, 2024 09:35 View deployment

hsheth2 reviewed Jun 22, 2024

View reviewed changes

tweaks

acc1e9c

hsheth2 approved these changes Jun 27, 2024

View reviewed changes

hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Jun 27, 2024

coderabbitai bot reviewed Jun 27, 2024

View reviewed changes

fix logs

008e2ec

coderabbitai bot reviewed Jun 27, 2024

View reviewed changes

Revert "fix logs"

f05e866

This reverts commit 008e2ec.

hsheth2 merged commit 5e9afc6 into datahub-project:master Jun 27, 2024
51 of 55 checks passed

vercel bot deployed to Preview June 27, 2024 20:06 View deployment

yoonhyejin pushed a commit that referenced this pull request Jul 16, 2024

fix(ingest/pipeline): catch pipeline exceptions (#10753)

de23083

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

aviv-julienjehannet pushed a commit to aviv-julienjehannet/datahub that referenced this pull request Jul 17, 2024

fix(ingest/pipeline): catch pipeline exceptions (datahub-project#10753)

4fc06ff

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ingest/pipeline): catch pipeline exceptions #10753

fix(ingest/pipeline): catch pipeline exceptions #10753

pie1nthesky commented Jun 20, 2024 •

edited by coderabbitai bot

Loading

hsheth2 Jun 22, 2024

pie1nthesky Jun 23, 2024 •

edited

Loading

pie1nthesky Jun 23, 2024 •

edited

Loading

coderabbitai bot commented Jun 27, 2024 •

edited

Loading

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

Documentation and Community

hsheth2 commented Jun 27, 2024

coderabbitai bot left a comment

coderabbitai bot Jun 27, 2024

coderabbitai bot Jun 27, 2024

coderabbitai bot left a comment

pie1nthesky commented Jun 27, 2024

hsheth2 commented Jun 27, 2024 •

edited

Loading

fix(ingest/pipeline): catch pipeline exceptions #10753

fix(ingest/pipeline): catch pipeline exceptions #10753

Conversation

pie1nthesky commented Jun 20, 2024 • edited by coderabbitai bot Loading

Checklist

Summary by CodeRabbit

hsheth2 Jun 22, 2024

Choose a reason for hiding this comment

pie1nthesky Jun 23, 2024 • edited Loading

Choose a reason for hiding this comment

pie1nthesky Jun 23, 2024 • edited Loading

Choose a reason for hiding this comment

coderabbitai bot commented Jun 27, 2024 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

hsheth2 commented Jun 27, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jun 27, 2024

Choose a reason for hiding this comment

coderabbitai bot Jun 27, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

pie1nthesky commented Jun 27, 2024

hsheth2 commented Jun 27, 2024 • edited Loading

pie1nthesky commented Jun 20, 2024 •

edited by coderabbitai bot

Loading

pie1nthesky Jun 23, 2024 •

edited

Loading

pie1nthesky Jun 23, 2024 •

edited

Loading

coderabbitai bot commented Jun 27, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)

hsheth2 commented Jun 27, 2024 •

edited

Loading