Prerequisites
- Appropriate permission:
nx1_ingest - Ensure you have previously ingested a dataset
Ingest a lakehouse
- Web portal
- REST API
Specify a table containing a previously ingested dataset, so NexusOne can copy
it into another table.
- Log in to NexusOne.
-
From the NexusOne homepage, navigate to
Ingest > Lakehouse. -
Add lakehouse details containing previously ingested data:
- Enter a lakehouse schema name.
- Enter a table name.
-
Add ingest details containing information about the new table:
- Enter a unique name for this ingest job. This name appears in the Monitor tab and Airflow’s portal for tracking.
- Enter a new schema name.
- Enter a new table name.
- Optional: Set how often the ingestion job should run. Schedule options include
Run Once,Every 3 hours,Daily,Weekly,Monthly, andQuarterly. - Select a mode for how to store incoming records at the destination table.
- Append: Add new records to the existing dataset.
- Merge: Add or update existing records where applicable.
- Overwrite: Replace all existing records.
- Optional: Select a DataHub domain. For example,
Company,Product,Sales. This is only applicable if you have one previously. created on DataHub via the Govern feature on NexusOne. - Optional: Select or create one or more tags to label this job.
-
Optional: Column transformations:
-
Click Add Transformation, then select any of the following column transformation types:
- Cast: Converts a column’s data type during ingestion. To use this transformation type, enter the column name in the Column field, and then select a target type in the Target Type.
- Drop: Removes a column from the dataset. To use this transformation type, enter the column name in the Column field.
- Encrypt: Makes the data unreadable using an encryption key. To use this transformation type, enter the column name in the Column field, and then optionally enter a key name in the Key Name field.
- Rename: Changes a column’s name. To use this transformation type, enter the column name in the Column field, and then enter a new column name in the New Name field.
- Repeat until you have added all the transformations necessary for your use case.
-
Click Add Transformation, then select any of the following column transformation types:
- After configuring all fields, click Ingest to submit the job.
Monitor, trigger, or delete a job
When you schedule an ingested lakehouse, it runs as a job in Apache Airflow. You can monitor, trigger, or delete the job.- Web portal
- CLI
- REST API
- When you create a job, a success message and a View Jobs button appear.
- Track the job status by clicking View Jobs or navigating to the NexusOne homepage and clicking Monitor.
- Use the three dots
...menu to trigger or delete a job. - If you clicked Trigger job, then click the job’s name to open its DAG details in Airflow’s portal.
Additional resources
- For more information about the ingest feature, refer to Ingest Overview.
- For more information about the monitoring feature, refer to Monitor Overview.
- For more information about roles or permissions, refer to Govern Overview.