Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dbt-ingestion): add documentation link from dbt source to institutionalMemory #8686

Merged

Conversation

ethan-cartwright
Copy link
Contributor

@ethan-cartwright ethan-cartwright commented Aug 22, 2023

Description

We want the ability to add a link in the +Add Link section from the DBT source side. This PR adds dbt-side specified documentation_link as an institutionalMemory aspect on a dataset with the datahub-side meta_mapping specified description in the config (see the dbt.md file for details).

NOTE: right now it replaces the current institution memory link with the new link. If we'd like the ability to add multiple links, or to check the dataset's current institutionalMemory and add our link to it, let me know.

Example

adds:
Screenshot 2023-08-22 at 4 52 19 PM

to:
Screenshot 2023-08-22 at 4 53 36 PM

Testing

  • unit tests
  • local ingestion of dbt-cloud project linked here

Results:
Screenshot 2023-08-22 at 4 54 26 PM

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Aug 22, 2023
@ethan-cartwright ethan-cartwright changed the title Cus 903 dbt meta mapping feat(dbt-ingestion): add documentation link from dbt source to institutionalMemory Aug 22, 2023
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial comments from a first pass

we'll need docs for this in the dbt.md file too

stateful_ingestion:
enabled: false
account_id: '107298'
job_id: '399798'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you merge this stuff into the dbt sample recipe instead

run_id: # set to your dbt cloud run id. This is optional, and defaults to the latest run

@@ -12,6 +14,13 @@
OwnershipTypeClass,
)

# Imports for metadata model classes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Imports for metadata model classes


if Constants.ADD_DOC_LINK_OPERATION in operation_map:
try:
docs_dic = ast.literal_eval(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ast eval feels strange here - could we just use json.loads or similar instead?

also, the ownership one supports ownership types without ever doing any parsing - can we use the same approach of a nested object here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, went with the second option. Parsing was indeed unnecessary


if Constants.ADD_DOC_LINK_OPERATION in operation_map:
try:
docs_dic = ast.literal_eval(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
docs_dic = ast.literal_eval(
docs_dict = ast.literal_eval(

now = int(time.time() * 1000) # milliseconds since epoch
institutional_memory_element = InstitutionalMemoryMetadataClass(
url=docs_dic["link"],
description=docs_dic["descr"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

descr -> description

@vercel
Copy link

vercel bot commented Aug 23, 2023

Deployment failed with the following error:

The provided GitHub repository does not contain the requested branch or commit reference. Please ensure the repository is not empty.

account_id: # set to your dbt cloud account id
project_id: # set to your dbt cloud project id
job_id: # set to your dbt cloud job id
account_id: '107298' # set to your dbt cloud account id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it clear that this is not something they can just copy

Suggested change
account_id: '107298' # set to your dbt cloud account id
account_id: "${DBT_ACCOUNT_ID}" # set to your dbt cloud account id

match: ".*"
operation: "add_link"
config:
link: "https://en.wikipedia.org/wiki/Anchor_text"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's change this example to use a pattern

in reality, we don't expect people to just add the same link to every asset. instead, the link is going to be in their meta section and we have to extract it out


if Constants.ADD_DOC_LINK_OPERATION in operation_map:
try:
docs_dict = operation_map[Constants.ADD_DOC_LINK_OPERATION].pop()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using pop() here doesn't seem right

},
},
)
# we require a description, so this should stay empty
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if description is missing, do we issue a generic error message or a detailed one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we give a detailed one 👍

processor = OperationProcessor(
operation_defs={
"documentation_link": {
"match": ".*",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also test that match param works correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this one for me?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something similar to the test case here

e.g. match=(.+)(?:#.*)? and raw props =http://example.com/my-page#stuff-to-ignore should produce an aspect with link = http://example.com/my-page

@hsheth2 hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Sep 27, 2023
@vercel
Copy link

vercel bot commented Oct 3, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs-website ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 3, 2023 9:39pm

@hsheth2
Copy link
Collaborator

hsheth2 commented Oct 4, 2023

CI passed on a prior commit, so going to merge

@ethan-cartwright - if CI is green, there's usually no need to hit the "update branch" button

@hsheth2 hsheth2 enabled auto-merge (squash) October 4, 2023 16:00
@ethan-cartwright
Copy link
Contributor Author

CI passed on a prior commit, so going to merge

@ethan-cartwright - if CI is green, there's usually no need to hit the "update branch" button

oh okay thanks!

@hsheth2 hsheth2 merged commit e2afd44 into datahub-project:master Oct 4, 2023
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants