Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion/transformer): create tag if not exist #9076

Merged

Conversation

siddiquebagwan-gslab
Copy link
Collaborator

@siddiquebagwan-gslab siddiquebagwan-gslab commented Oct 23, 2023

For the dataset create the tag if not present

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Oct 23, 2023
@maggiehays maggiehays added the hacktoberfest-accepted Acceptance for hacktoberfest https://hacktoberfest.com/participation/ label Oct 26, 2023
metadata-ingestion/src/datahub/ingestion/graph/client.py Outdated Show resolved Hide resolved
)

for mcp in mcps:
# I am not sure if we need to update the work_unit_id. @harshal please confirm it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we don't need to update the work_unit_id

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -815,7 +815,7 @@ def test_pattern_dataset_tags_transformation(mock_time):
)
)

assert len(outputs) == 3
assert len(outputs) == 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also be checking the actual contents of the tag creations here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -237,5 +277,10 @@ def transform(
),
metadata=record_metadata,
)
yield from self._handle_end_of_stream(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the indentation looks incorrect here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if mcp.aspect is None or mcp.entityUrn is None: # to silent the lint error
continue

record_metadata = _update_work_unit_id(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this _update_work_unit_id thing is kinda strange - don't think we need it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self, envelope: RecordEnvelope, entity_urn: str
) -> Iterable[RecordEnvelope]:

if not isinstance(self, SingleAspectTransformer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make this work for both types of transformers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1065,6 +1067,22 @@ def parse_sql_lineage(
default_schema=default_schema,
)

def create_tag(self, tag_name: str) -> Dict[Any, Any]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should return just the urn, and not a dict

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

class LegacyMCETransformer(Transformer, metaclass=ABCMeta):
class HandleEndOfStreamTransformer:
def handle_end_of_stream(
self, entity_urn: str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method should not take in an entity_urn. It should only be called once after all records have been processed, not once per urn

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

record_metadata = envelope.metadata.copy()
record_metadata.update(
{
"workunit_id": f"txform-{simple_name}-{self.aspect_name()}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I was wrong in my earlier comment - I think this code is still required

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) updated the code

)
yield from self._handle_end_of_stream(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be after the self._mark_processed(urn) call and two level of indentation less

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@hsheth2 hsheth2 merged commit 0d6a5e5 into datahub-project:master Dec 14, 2023
53 checks passed
Salman-Apptware pushed a commit to Salman-Apptware/datahub that referenced this pull request Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest-accepted Acceptance for hacktoberfest https://hacktoberfest.com/participation/ ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants