Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sdk/dataflow): deprecate cluster and use env and platform_instance instead #8201

Merged

Conversation

shubhamjagtap639
Copy link
Contributor

@shubhamjagtap639 shubhamjagtap639 commented Jun 9, 2023

Deprecation

In Dataflow, cluster argument is deprecated. Please use env instead.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Jun 9, 2023
@@ -35,11 +52,24 @@ class DataFlow:
url: Optional[str] = None
tags: Set[str] = field(default_factory=set)
owners: Set[str] = field(default_factory=set)
platform_instance: Optional[str] = None
env: Optional[str] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we used cluster as the env value - why do we have both cluster and env now? I don't think we need this new instance variable

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus this is a breaking change and not deprecation. This needs to be documented in the PR doc. Plus we should not be changing the default as part of deprecation as that will cause existing deployments to have problems due to possible change of URNs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a breaking change yet, just a deprecation of cluster in favor of env + addition of platform_instance

However, we still need to make some changes:

  1. the deprecation should still be noted in the PR description + our updating datahub doc
  2. we should be printing a deprecation warning if folks use the old cluster param
  3. we should throw an error if both env and cluster are provided
  4. all users of this class within datahub should be updated to use env - I believe airflow is one, but there might be more

@@ -35,11 +52,24 @@ class DataFlow:
url: Optional[str] = None
tags: Set[str] = field(default_factory=set)
owners: Set[str] = field(default_factory=set)
platform_instance: Optional[str] = None
env: Optional[str] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a breaking change yet, just a deprecation of cluster in favor of env + addition of platform_instance

However, we still need to make some changes:

  1. the deprecation should still be noted in the PR description + our updating datahub doc
  2. we should be printing a deprecation warning if folks use the old cluster param
  3. we should throw an error if both env and cluster are provided
  4. all users of this class within datahub should be updated to use env - I believe airflow is one, but there might be more

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
@vercel
Copy link

vercel bot commented Jun 15, 2023

Deployment failed with the following error:

The provided GitHub repository does not contain the requested branch or commit reference. Please ensure the repository is not empty.

shubhamjagtap639 and others added 4 commits June 15, 2023 10:04
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
@hsheth2 hsheth2 changed the title fix(ingestion/dataflow): Deprecate cluster and use env and platform_instance instead. fix(sdk/dataflow): deprecate cluster and use env and platform_instance instead Jun 15, 2023
@hsheth2 hsheth2 merged commit 35a4434 into datahub-project:master Jun 15, 2023
tusharm pushed a commit to tusharm/datahub that referenced this pull request Jun 20, 2023
…e instead (datahub-project#8201)

Co-authored-by: mohdsiddique <mohdsiddiquebagwan@gmail.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
@shubhamjagtap639 shubhamjagtap639 deleted the Prefect-Ingestion-Integration branch July 10, 2023 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants