Skip to content

Commit

Permalink
Add docs
Browse files Browse the repository at this point in the history
  • Loading branch information
pedro93 committed Oct 12, 2023
1 parent c4dce2e commit 41f7852
Show file tree
Hide file tree
Showing 2 changed files with 51 additions and 4 deletions.
16 changes: 12 additions & 4 deletions docs/ui-ingestion.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
import FeatureAvailability from '@site/src/components/FeatureAvailability';

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Ingestion

<FeatureAvailability/>

## Introduction

Starting in version `0.8.25`, DataHub supports creating, configuring, scheduling, & executing batch metadata ingestion using the DataHub user interface. This makes
Expand Down Expand Up @@ -173,28 +180,29 @@ Finally, give your Ingestion Source a name.
Once you're happy with your configurations, click 'Done' to save your changes.


##### Advanced: Running with a specific CLI version
##### Advanced ingestion configs:

DataHub comes pre-configured to use the latest version of the DataHub CLI ([acryl-datahub](https://pypi.org/project/acryl-datahub/)) that is compatible
DataHub's Managed Ingestion UI comes pre-configured to use the latest version of the DataHub CLI ([acryl-datahub](https://pypi.org/project/acryl-datahub/)) that is compatible
with the server. However, you can override the default package version using the 'Advanced' source configurations.

To do so, simply click 'Advanced', then change the 'CLI Version' text box to contain the exact version
of the DataHub CLI you'd like to use.


<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/custom-ingestion-cli-version.png"/>
</p>

_Pinning the CLI version to version `0.8.23.2`_

Other advanced options include specifying environment variables, DataHub plugins or python packages at runtime.

Once you're happy with your changes, simply click 'Done' to save.

</TabItem>
<TabItem value="cli" label="CLI" default>

You can upload and even update recipes using the cli as mentioned in the [cli documentation for uploading ingestion recipes](./cli.md#ingest-deploy).
An example execution would look something like:
An example execution for a given `recipe.yaml` file, would look something like:

```bash
datahub ingest deploy --name "My Test Ingestion Source" --schedule "5 * * * *" --time-zone "UTC" -c recipe.yaml
Expand Down
39 changes: 39 additions & 0 deletions metadata-ingestion/docs/dev_guides/profiling_ingestions.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,35 @@ This page documents how to perform memory profiles of ingestion runs.
It is useful when trying to size the amount of resources necessary to ingest some source or when developing new features or sources.

## How to use

<Tabs>
<TabItem value="ui" label="UI" default>

Create an ingestion as specified in the [Ingestion guide](../../../docs/ui-ingestion.md).

Add a flag to your ingestion recipe to generate a memray memory dump of your ingestion:
```yaml
source:
...

sink:
...

flags:
generate_memory_profiles: "<path to folder where dumps will be written to>"
```
In the final panel, under the advanced section, add the `debug` datahub package under the **Extra DataHub Plugins** section.
As seen below:

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/ingestion-advanced-extra-datahub-plugin.png"/>
</p>

Finally, save and run the ingestion process.

</TabItem>
<TabItem value="cli" label="CLI" default>
Install the `debug` plugin for DataHub's CLI wherever the ingestion runs:

```bash
Expand All @@ -33,6 +62,16 @@ flags:
generate_memory_profiles: "<path to folder where dumps will be written to>"
```

Finally run the ingestion recipe

```bash
$ datahub ingest -c recipe.yaml
```

</TabItem>
</Tabs>


Once the ingestion run starts a binary file will be created and appended to during the execution of the ingestion.

These files follow the pattern `file-<ingestion-run-urn>.bin` for a unique identification.
Expand Down

0 comments on commit 41f7852

Please sign in to comment.