Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion/redshift): support auto_incremental_lineage #9010

Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
55b90eb
replace lineage-runner by datahub parser
siddiquebagwan-gslab Sep 29, 2023
78a7a5b
Merge branch 'master' into master+ing-188-sqlglotlineage-for-cll-reds…
siddiquebagwan-gslab Sep 29, 2023
70f08b1
cll
siddiquebagwan-gslab Oct 3, 2023
5be048c
Merge branch 'master' into master+ing-188-sqlglotlineage-for-cll-reds…
siddiquebagwan-gslab Oct 5, 2023
f91005e
cll test case
siddiquebagwan-gslab Oct 5, 2023
065332a
lint fix
siddiquebagwan-gslab Oct 6, 2023
7ab729e
Merge branch 'master' into master+ing-188-sqlglotlineage-for-cll-reds…
siddiquebagwan-gslab Oct 6, 2023
9dc161c
Merge branch 'master' into master+ing-188-sqlglotlineage-for-cll-reds…
siddiquebagwan-gslab Oct 9, 2023
2ebddd1
Merge branch 'master' into master+ing-188-sqlglotlineage-for-cll-reds…
siddiquebagwan-gslab Oct 12, 2023
cb86446
Merge branch 'master+ing-188-sqlglotlineage-for-cll-redshift' of gith…
siddiquebagwan-gslab Oct 12, 2023
5d5d034
incremental lineage
siddiquebagwan-gslab Oct 12, 2023
17df519
incremental lineage
siddiquebagwan-gslab Oct 13, 2023
cff9007
Merge branch 'master' into master+incremental-lineage-for-redshift
siddiquebagwan-gslab Oct 13, 2023
d976738
Update metadata-ingestion/src/datahub/ingestion/source/redshift/redsh…
siddiquebagwan-gslab Oct 16, 2023
010a8d3
Merge branch 'master' into master+incremental-lineage-for-redshift
siddiquebagwan-gslab Oct 16, 2023
14d7b1f
Merge branch 'master+incremental-lineage-for-redshift' of github.com:…
siddiquebagwan-gslab Oct 16, 2023
6bfb31f
incremental_lineage set default to false
siddiquebagwan-gslab Oct 16, 2023
32f1105
test case of config
siddiquebagwan-gslab Oct 16, 2023
e5317ee
resolve conflict
siddiquebagwan-gslab Oct 18, 2023
aa112c6
Merge branch 'master' into master+incremental-lineage-for-redshift
siddiquebagwan-gslab Oct 18, 2023
3510863
Merge branch 'master' into master+incremental-lineage-for-redshift
siddiquebagwan-gslab Oct 19, 2023
07c188a
Merge branch 'master' into master+incremental-lineage-for-redshift
hsheth2 Oct 23, 2023
913264a
resolve conflict
siddiquebagwan-gslab Oct 25, 2023
037bd8d
Merge branch 'master+incremental-lineage-for-redshift' of github.com:…
siddiquebagwan-gslab Oct 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions metadata-ingestion/setup.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need an addition to changelog + tweak to make incremental_lineage off by default for redshift

it should also be clear that to use both CLL and incremental lineage, you need to provide a graph instance / use the rest sink

Original file line number Diff line number Diff line change
Expand Up @@ -354,9 +354,9 @@
| {"psycopg2-binary", "pymysql>=1.0.2"},
"pulsar": {"requests"},
"redash": {"redash-toolbelt", "sql-metadata"} | sqllineage_lib,
"redshift": sql_common | redshift_common | usage_common | {"redshift-connector"},
"redshift-legacy": sql_common | redshift_common,
"redshift-usage-legacy": sql_common | usage_common | redshift_common,
"redshift": sql_common | redshift_common | usage_common | {"redshift-connector"} | sqlglot_lib,
"redshift-legacy": sql_common | redshift_common | sqlglot_lib,
"redshift-usage-legacy": sql_common | redshift_common | sqlglot_lib | usage_common,
"s3": {*s3_base, *data_lake_profiling},
"gcs": {*s3_base, *data_lake_profiling},
"sagemaker": aws_common,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import logging
from collections import defaultdict
from functools import partial
from typing import Dict, Iterable, List, Optional, Type, Union

import humanfriendly
Expand All @@ -25,6 +26,7 @@
platform_name,
support_status,
)
from datahub.ingestion.api.incremental_lineage_helper import auto_incremental_lineage
from datahub.ingestion.api.source import (
CapabilityReport,
MetadataWorkUnitProcessor,
Expand Down Expand Up @@ -369,6 +371,11 @@ def gen_database_container(self, database: str) -> Iterable[MetadataWorkUnit]:
def get_workunit_processors(self) -> List[Optional[MetadataWorkUnitProcessor]]:
return [
*super().get_workunit_processors(),
partial(
auto_incremental_lineage,
self.ctx.graph,
self.config.incremental_lineage,
),
StaleEntityRemovalHandler.create(
self, self.config, self.ctx
).workunit_processor,
Expand Down Expand Up @@ -942,7 +949,9 @@ def generate_lineage(self, database: str) -> Iterable[MetadataWorkUnit]:
)
if lineage_info:
yield from gen_lineage(
dataset_urn, lineage_info, self.config.incremental_lineage
dataset_urn,
lineage_info,
False, # incremental lineage generation is taken care by auto_incremental_lineage
)

for schema in self.db_views[database]:
Expand All @@ -956,7 +965,9 @@ def generate_lineage(self, database: str) -> Iterable[MetadataWorkUnit]:
)
if lineage_info:
yield from gen_lineage(
dataset_urn, lineage_info, self.config.incremental_lineage
dataset_urn,
lineage_info,
False, # incremental lineage generation is taken care by auto_incremental_lineage
siddiquebagwan-gslab marked this conversation as resolved.
Show resolved Hide resolved
)

def add_config_to_report(self):
Expand Down
Loading