Skip to content

v0.10.3

Compare
Choose a tag to compare
@iprentic iprentic released this 25 May 20:20
· 2439 commits to master since this release
1478d70

Release Highlights

User Experience

  • Define Data Products via YAML and manage associated entities within a Domain
  • Search experience: quickly apply a filter at time of search
  • Form-based PowerBI ingestion

Developer Experience

  • Progress toward Removing Confluent Schema Registry requirement -- Helm & Quickstart simplifications to follow
    • NOTE: this will only work for new deployments of DataHub; If you have already deployed DataHub with Confluent Schema Registry, you will not be able to disable it
  • Delete CLI - correctly handles deleting timeseries aspects
  • Ongoing improvements to Quickstart stability
  • Support entity types filter in get_urns_by_filter
  • Search customization
    • regex based query matching
    • full control over scoring functions (useable on any document field, i.e. tags, deprecated flags, etc)
    • enable/disable fuzzy, prefix, exact match queries

Ingestion

  • BigQuery - Improve ingestion disk usage & speed; extract dataset usage from Views
  • Unity Catalog - Capture create/last modified timestamps; extract usage; data profiling support
  • PowerBI - Update workspace concept mapping; support modified_since, extract_dataset_schema, and more
  • Superset – support stateful ingestion
  • Business Glossary – Simplify ingestion source
  • Kafka – Add description in dataset properties
  • S3 – Support stateful ingestion & last_updated
  • CSV Enricher – Support updating more types
  • PII Classification - Configurable sample size
  • Nifi - Support Kerberos authentication

What's Changed

  • fix(ingest/bigquery): Add to lineage, not overwrite, when using sql parser by @asikowitz in #7814
  • fix(ingest/bigquery): Enable lineage and usage ingestion without tables by @asikowitz in #7820
  • fix(ingest/bigquery): Do not query columns when not ingesting tables or views by @asikowitz in #7823
  • fix(ingest/bigquery): update usage query, remove erroneous init by @mayurinehate in #7811
  • fix(ingest/bigquery): Handle null values from usage aggregation by @asikowitz in #7827
  • perf(ingest/bigquery): Improve bigquery usage disk usage and speed by @asikowitz in #7825
  • fix(cli): use correct ingestion image in script by @hsheth2 in #7826
  • fix(release): prevent republish of images on release edits by @RyanHolstien in #7828
  • feat(): finish populating the entity registry by @hsheth2 in #7818
  • fix(ui) Fix 404 page routing bug by @chriscollins3456 in #7824
  • feat(ui): Support PowerBI Ingestion via UI form by @jjoyce0510 in #7817
  • fix(ingest/snowflake): fix column name in snowflake optimised lineage by @mayurinehate in #7834
  • feat(ingest/unity): capture create/lastModified timestamps by @hsheth2 in #7819
  • fix(test): fix spark lineage test by @david-leifker in #7829
  • docs(): add markprompt help chat by @jeffmerrick in #7837
  • Update DataJobInputOutput.pdl to express that CLL fields are not shown in the UI right now by @gabe-lyons in #7830
  • feat(cli): improve quickstart stability by @hsheth2 in #7839
  • chore(ci): regular upgrade base requirements.txt by @anshbansal in #7821
  • feat(timeseries): Support sorting timeseries aspects by non-timestampMillis field + fix operations resolver by @jjoyce0510 in #7840
  • doc(ingestion/tableau): Fix rendering ingestion quickstart guide by @mohdsiddique in #7808
  • fix(ingest): pin sqlparse version by @hsheth2 in #7847
  • feat(urn): Add a validator when creating an URN that it is no longer than the li… by @iprentic in #7836
  • chore(ingest): bug fix in sqlparse pin by @hsheth2 in #7848
  • feat: enriching guide on creating dataset by @yoonhyejin in #7777
  • feat(docs): consolidate api guides by @yoonhyejin in #7857
  • fix(ingest/salesforce): use report timestamp for operations by @hsheth2 in #7838
  • chore(ci): fix CI failing due to lint by @anshbansal in #7863
  • fix(mcl): fix improper pass by reference by @RyanHolstien in #7860
  • feat(urn) Add validator to reject URNs which contain the character we plan to u… by @iprentic in #7859
  • feat(elasticsearch): Add servlet which provides an endpoint for a healthcheck on the ES cl… by @iprentic in #7799
  • fix(ui) Add UI fixes and design tweaks to AutoComplete by @chriscollins3456 in #7845
  • fix(ui) Get all entity assertions in chrome extension by @chriscollins3456 in #7849
  • refactor(platform): Refactoring ES Utils, adding EXISTS condition support to Filter Criterion by @jjoyce0510 in #7832
  • chore(ui): change background color to transparent for avatar with photoUrl by @hieunt-itfoss in #7527
  • refactor(ingest): Add helper DataHubGraph methods by @asikowitz in #7851
  • fix(ui) Disable cache on Domain and Glossary Related Entities pages by @chriscollins3456 in #7867
  • fix(cache): Fix cache key serialization in search service by @pedro93 in #7858
  • docs(ingest): update dbt and aws docs by @hsheth2 in #7870
  • docs(ingest): fix CorpGroup example by @hsheth2 in #7816
  • docs(ingest/powerbi): update workspace concept mapping by @eeepmb in #7835
  • feat(ingest/powerbi): support modified_since, extract_dataset_schema and many more by @aezomz in #7519
  • Remove usages of commons-text library lower than 1.10.0 by @iprentic in #7850
  • feat(glue): allow resource links to be ignored by @YusufMahtab in #7639
  • feat(ingestion): lookml refinement support by @mohdsiddique in #7781
  • feat(ingest/unity): Ingest ownership for containers; lookup service principal display names by @asikowitz in #7869
  • Logging and test models fixes by @david-leifker in #7884
  • feat(model) Add ContainerPath aspect model by @chriscollins3456 in #7774
  • bug(7882): run kafka-configs.sh on DataHubUpgradeHistory_v1 to make sure the retention.ms is set to infinite by @jinlintt in #7883
  • fix: refactor toc by @yoonhyejin in #7862
  • feat(cli): Modifies ingest-sample-data command to use DataHub url & token based on config by @pedro93 in #7896
  • feat(ingest/snowflake): optionally emit all upstreams irrespective of recipe pattern by @mayurinehate in #7842
  • fix(ingestion/tableau): backward compatibility with version 2021.1 an… by @mayurinehate in #7864
  • fix(ingest/dbt): ensure dbt shows view properties by @hsheth2 in #7872
  • docs(airflow): add debug guide on url generation by @hsheth2 in #7885
  • feat(sdk): support entity types filter in get_urns_by_filter by @hsheth2 in #7902
  • fix(ingest/snowflake): fix optimised lineage query, filter temporary … by @mayurinehate in #7894
  • fix(ingest/bigquery): fix handling of time decorator offset queries by @mayurinehate in #7843
  • fix(ingest): fix minor bug + protective dep requirements by @hsheth2 in #7861
  • fix(cli): remove duplicate labels from quickstart files by @hsheth2 in #7886
  • Revert "feat(cli): Modifies ingest-sample-data command to use DataHub… by @pedro93 in #7899
  • feat(sdk): add DataHubGraph.get_entity_semityped method by @hsheth2 in #7905
  • test(ingest/biz-glossary): add test for enable_auto_id by @hsheth2 in #7911
  • feat(ingest): add GCS ingestion source by @mayurinehate in #7903
  • [bugfix] Fix remote file ingestion for Windows by @xiphl in #7888
  • refactor(ingest): report soft deleted stale entities with LossyList by @asikowitz in #7907
  • fix(platforms): fix json parse exception for data platforms by @RyanHolstien in #7918
  • docs(release): managed DataHub 0.2.6 by @anshbansal in #7922
  • fix(deploy): add missing plugin files for mysql-client library in mysql-setup by @AndrewZures in #7909
  • docs(deploy): document some of the environment variables by @david-leifker in #7906
  • fix(system-update): fix no wait flag by @david-leifker in #7927
  • fix(consumer): fix datahub usage event topic consumer by @david-leifker in #7866
  • logging(auth): adding optional logging to authentication exceptions by @david-leifker in #7929
  • feat(search): enable search initial customization by @david-leifker in #7901
  • feat(schema-registry): replace confluent schema registry by @david-leifker in #7930
  • feat(ingest/unity): Add usage extraction; add TableReference by @asikowitz in #7910
  • fix(ingest/unity-catalog): Add usage_common dependency to unity catalog plugin by @asikowitz in #7935
  • feat(search): add filter for specific entities by @iprentic in #7919
  • fix(ingest/unity): Add sqllineage dependency by @asikowitz in #7938
  • fix(ingest/hive): fix containers generation for hive by @mayurinehate in #7926
  • docs(ingest): add note about path_specs configuration in data lake sources by @mayurinehate in #7941
  • feat: add missing python sdk guides based on DatahubGraph by @yoonhyejin in #7875
  • fix(ingest/unity): use fully qualified catalog/schema patterns by @hsheth2 in #7900
  • feat(airflow): respect port parameter if provided by @hsheth2 in #7945
  • fix(ingest): improve error message when graph connection fails by @hsheth2 in #7946
  • fix(docs): Adding relationship types section to Business Glossary docs by @jjoyce0510 in #7949
  • docs(ingest): update max_threads default value by @felipeac in #7947
  • fix(ui) Fix Tag Details button to use url encoding by @chriscollins3456 in #7948
  • docs: amend italic formatting by @HansBambel in #7893
  • fix(ldap): properly handle escaped characters in LDAP DNs by @Reilman79 in #7928
  • docs(ingest/postgres): add example with ssl configuration by @hsheth2 in #7916
  • refactor(ingest/biz-glossary): simplify business glossary source by @hsheth2 in #7912
  • fix: Fix broken links on PowerBI by @yoonhyejin in #7959
  • feat(model) Update aspect containerPath -> browsePathsV2 by @chriscollins3456 in #7942
  • fix(ui) Fix displaying column level lineage for sibling nodes by @chriscollins3456 in #7955
  • fix(ingest/bigquery): Filter projects for lineage and usage by @asikowitz in #7954
  • feat(tracking) Add tracking events to our chrome extension page by @chriscollins3456 in #7967
  • fix(search): Handle .keyword properly in the entity type query to ind… by @iprentic in #7957
  • feat(es) Store and map containerPath to elastic search properly by @chriscollins3456 in #7898
  • fix: build vercel python from source by @hsheth2 in #7972
  • feat(models): Make assets searchable by their external URLs by @jjoyce0510 in #7953
  • fix(ingest/salesforce): support JSON web token auth by @matthew-piatkus-cko in #7963
  • fix(SearchBar): Restore explore all link by @joshuaeilers in #7973
  • fix(ingest/tableau): Add a try catch to LineageRunner parser by @maaaikoool in #7965
  • fix(ingest/salesforce): fix lint by @hsheth2 in #7980
  • fix(ingest): use certs correctly in rest emitter by @hsheth2 in #7978
  • fix(ingestion/redshift) - Fixing schema query by @treff7es in #7975
  • chore(log): change sout to log by @anshbansal in #7931
  • fix(ingest/redshift): Enabling autocommit for Redshift connection by @treff7es in #7983
  • fix(ingest): use with for opened connections by @mayurinehate in #7908
  • fix(ingest/unity): improve error message if no scheme in workspace_url by @mayurinehate in #7951
  • fix(download as csv): Support download to csv for impact analysis tab by @jjoyce0510 in #7956
  • docs(development): update per feedback from community by @david-leifker in #7958
  • fix(ingest/bigquery): remove incorrectly used table_pattern filter by @mayurinehate in #7810
  • feat(snowflake): add config option to specify deny patterns for upstreams by @mayurinehate in #7962
  • fix(docker-compose): make startup more robust with deterministic services' dependencies by @gcernier-semarchy in #7880
  • fix(cache): update search cache when skipped, but enabled by @RyanHolstien in #7936
  • feat(telemetry): add server version by @RyanHolstien in #7979
  • docs: add tips on language switchable tap on docs by @yoonhyejin in #7984
  • fix(privileges) Use glossary term manage children privileges for edit docs and links by @chriscollins3456 in #7985
  • fix(ingest/postgres): Allow specification of initial engine database; set default database to postgres by @asikowitz in #7915
  • refactor(ingest/unity): Use databricks-sdk over databricks-cli for usage query by @asikowitz in #7981
  • chore: cleanup some devtool console warnings by @joshuaeilers in #7988
  • feat(search): support only searching by quick filter by @joshuaeilers in #7997
  • feat(docs): Add cli documentation on how to add custom platforms by @pedro93 in #7993
  • fix(search): fix custom search config parsing by @david-leifker in #8010
  • fix(auth): guards against creating a user for the system actor by @aditya-radhakrishnan in #7996
  • chore(security): update org json json dependency - cve-2022-45688 by @RyanHolstien in #7991
  • feat(metrics): add metrics for upgrade steps by @RyanHolstien in #7992
  • feat(models): Adding searchable for chart and dashboard url by @jjoyce0510 in #8002
  • feat(ingest/s3): Inferring schema from the alphabetically last folder by @treff7es in #8005
  • feat(ingest/classification): add classification report by @mayurinehate in #7925
  • docs(managed datahub): release notes for v0.2.7 by @anshbansal in #8020
  • fix(ui ingest): Fix mapping for token_name, token_value form fields for Tableau by @jjoyce0510 in #8018
  • fix(ui): add loading indicator for download as CSV action by @aditya-radhakrishnan in #8003
  • fix(ingest/snowflake): fix lineage query aggregation for optimised li… by @mayurinehate in #8011
  • feat(ingest/unity): Add profiling support by @asikowitz in #7976
  • feat(docs): Add example documentation for scrollAcrossEntities by @pedro93 in #8014
  • fix(ingest/unity): Update databricks-cli pin by @asikowitz in #8024
  • fix(ingest/s3) Adding missing more-itertools dependency by @treff7es in #8023
  • feat(cli): move registry delete to separate subcommand by @hsheth2 in #7968
  • fix(sdk): throw errors on empty gms server urls by @hsheth2 in #8017
  • feat(ingest/superset): add stateful ingestion by @cccs-Dustin in #8013
  • Gitignor'ing generated binary files in OSS by @meyerkev in #8031
  • fix(PFP-260): Upgrading sqlite to fix SQLITE-449762 by @meyerkev in #8032
  • feat(ingest): support importing local modules by @hsheth2 in #8026
  • fix(timeline-events): fix NPE in timeline events by @david-leifker in #8038
  • fix(posts): fix formatting for posts where the title can get cut off by @aditya-radhakrishnan in #8001
  • fix(ingestion/metabase): metabase connector bigquery lineage fix by @shubhamjagtap639 in #8042
  • fix(es) Fix browseV2 index mappings by @chriscollins3456 in #8034
  • fix(search): enter key with no query should search all by @joshuaeilers in #8036
  • feat(ingest): Allow csv-enricher to update more types by @xiphl in #7932
  • fix(search): only show explore all btn on search and home by @joshuaeilers in #8047
  • fix(ingest/dbt): fix dbt subtypes for sources by @hsheth2 in #8048
  • fix(ingest/bigquery): update usage audit log query to include create/… by @mayurinehate in #7995
  • feat(docs): add guide on integration ML system via SDKs by @yoonhyejin in #8029
  • refactor(ingest): Make get_workunits() return MetadataWorkUnits by @asikowitz in #8051
  • refractor(classification): simplify classification handler by @mayurinehate in #8056
  • feat: Add support for Data Products by @shirshanka in #8039
  • fix(build): fix lint issue by @shirshanka in #8066
  • feat(system-update): remove datahub-update requirement on schema reg by @david-leifker in #7999
  • fix(gitignore): update gitignore for generated files by @minjin0121 in #7940
  • feat(ingestion/kafka): add description in dataset properties by @shubhamjagtap639 in #7974
  • fix(ingestion/tableau): ingest parent project name in container properties by @mohdsiddique in #8030
  • refactor(ingest): Move source_helpers.py from datahub/utilities -> datahub/api by @asikowitz in #8052
  • fix(ingest/snowflake): lowercase user urn when using email by @matwalk in #7767
  • fix(ingest/tableau): don't use unsupported sql condition field by @mayurinehate in #8065
  • fix(ingest/looker): don't prematurely show connectivity success by @hsheth2 in #8070
  • feat(web): update AWS logos by @rinzool in #8057
  • fix(metadata-io): remove assert in favor of exceptions by @david-leifker in #8035
  • feat: add docs on column-level linage by @yoonhyejin in #8062
  • ci: prevent qodana from using all of our cache by @hsheth2 in #8054
  • ci(ingest/clickhouse): don't use kernel ephemeral ports by @hsheth2 in #8060
  • test(sdk): better error messages in registry codegen test by @hsheth2 in #8081
  • doc(managed datahub): update release notes for 0.2.7 by @anshbansal in #8088
  • feat(ingest/s3) - Stateful ingestion and last-updated support by @treff7es in #8022
  • docs(ingest/snowflake): fix authentication type docs by @hsheth2 in #8059
  • fix(ingest/s3_data_lake)_ingestor_skips_directories_with_similar_prefix by @alplatonov in #8078
  • fix(ui) Fix entity name styling to show deprecation and others properly by @chriscollins3456 in #8084
  • test(sdk): move cli tests into the unit dir by @hsheth2 in #8028
  • feat(sdk): better auth error messages in the rest emitter by @hsheth2 in #8025
  • feat(caching): skip cache on ownership tabs by @gabe-lyons in #8082
  • feat(embed): embed lookup route by @joshuaeilers in #8033
  • fix(ingest/delta-lake): Walk through directory structure with full path; reduce resource creation by @asikowitz in #8072
  • feat(search): Add AggregateAcrossEntities endpoint by @iprentic in #8000
  • chore(vulnerability): add exclusions for json to prevent leaking dependency by @RyanHolstien in #8090
  • fix(ingestion/powerbi): skip erroneous pages of a report by @shubhamjagtap639 in #8021
  • feat(docs): Update markprompt by @jeffmerrick in #8079
  • feat(images): Add build processes for arm64v8 image variants by @pedro93 in #7990
  • feat(ingest): add env to container properties by @hsheth2 in #8027
  • fix(checkstyle): Fix checkstyle violations to turn master green by @iprentic in #8099
  • doc(auth): fixes doc in DataHubSystemAuthenticator.java by @sgomezvillamor in #8071
  • refactor(ingest): Auto report workunits by @asikowitz in #8061
  • feat(cli): support datahub ingest mcps by @hsheth2 in #7871
  • feat: datahub-upgrade.sh to support old versions by @ollisala in #7891
  • feat(ingest/s3): type aware directory sorting by @treff7es in #8089
  • fix(ci): add missing updates to restli-spec by @anshbansal in #8106
  • fix(ingest/build): setting typing extension <4.6.0 because it breaks tests by @treff7es in #8108
  • fix(upgrade): removes sleep from bootstrap process by @RyanHolstien in #8016
  • fix(jackson): increase max serialized string length default by @RyanHolstien in #8053
  • fix(ui): SchemaDescriptionField 'read-more' doesn't affect table height by @jfrancos-mai in #7970
  • fix(ingest): emitter bug fixes by @hsheth2 in #8093
  • fix(sample data): Update timestamps in bootstrap_mce.json to more recent by @iprentic in #8103
  • feat(ui) Add readOnly flag that disables profile URL editing by @chriscollins3456 in #8067
  • feat(cli): delete cli v2 by @hsheth2 in #8068
  • refactor(ingest): simplify stateful ingestion provider interface by @hsheth2 in #8104
  • Update updating-datahub.md with breaking changes by @chriscollins3456 in #7964
  • feat(ui) Show documentation on Domain pages first by @chriscollins3456 in #8110
  • docs(readme): adds PITS Global Data Recovery Services to the adopters list by @pheianox in #8080
  • fix(ingest/redshift): Making Redshift source more verbose by @treff7es in #8109
  • feat(ingest): Browse Path v2 helper by @asikowitz in #8012
  • feat(classification): configurable sample size by @mayurinehate in #8096
  • fix logic for multiple entities found and clean up messy code by @joshuaeilers in #8113
  • fix(search): Update _entityType transform logic to work for entities containing _ by @iprentic in #8112
  • feat(ingest/bigquery): usage for views by @mayurinehate in #8046
  • fix(ui): Open mailto link in new tab by @jfrancos-mai in #7982
  • fix(search): Transform _entityType/index output for scroll across entities as well by @iprentic in #8117
  • feat(ingest): Add GenericAspectTransformer by @amanda-her in #7994
  • refactor(ingest): Call source_helpers via new WorkUnitProcessors in base Source by @asikowitz in #8101
  • feat(ingest/nifi): kerberos authentication by @mayurinehate in #8097
  • fix(ingest/redshift):fixing schema filter by @treff7es in #8119
  • feat(ingest/unity): Allow ingestion without metastore admin role by @asikowitz in #8091
  • feat(ingest/bigquery): Add BigQuery Views lineage extraction from Google Data Catalog API by @viniciusdsmello in #8100
  • fix(ingest/redshift): Fixing Redshift subtypes by @treff7es in #8125
  • fix(ingest): Fix breaking smoke test on stateful ingestion by @asikowitz in #8128

New Contributors

Full Changelog: v0.10.2...v0.10.3