In NexusOne, a lakehouse refers to an internal database with a catalog and table format. Ingesting a lakehouse means copying previously ingested data from a table in that internal database into another table. This preserves the table format and catalog metadata so that downstream apps like Trino can query it. This guide walks you through how to ingest a lakehouse.Documentation Index
Fetch the complete documentation index at: https://docs.nx1cloud.com/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- Appropriate permission:
nx1_ingest - Ensure you have previously ingested a dataset
Ingest a lakehouse
- Web portal
- REST API
Specify a table containing a previously ingested dataset, so NexusOne can copy
it into another table.
- Log in to NexusOne.
- On the top navigation bar, hover your mouse over Data Pipeline and then select Ingest.
- Click Lakehouse.
-
Add lakehouse details containing previously ingested data:
- Enter a lakehouse schema name.
- Enter a table name.
-
Add ingest details containing information about the new table:
- Enter a unique name for this ingest job. This name appears in the Monitor tab and Airflow’s portal for tracking.
- Enter a new schema name.
- Enter a new table name.
- Optional: Set how often the ingestion job should run. Schedule options include
Run Once,Every 3 hours,Daily,Weekly,Monthly, andQuarterly. - Select a mode for how to store incoming records at the destination table.
- Append: Add new records to the existing dataset.
- Merge: Add or update existing records where applicable.
- Overwrite: Replace all existing records.
- Optional: Select a DataHub domain. For example,
Company,Product,Sales. This is only applicable if you have one previously. created on DataHub via the Govern feature on NexusOne. - Optional: Select or create one or more tags to label this job.
-
Optional: Column transformations:
-
Click Add Transformation, then select any of the following column transformation types:
- Cast: Converts a column’s data type during ingestion. To use this transformation type, enter the column name in the Column field, and then select a target type in the Target Type.
- Drop: Removes a column from the dataset. To use this transformation type, enter the column name in the Column field.
- Encrypt: Makes the data unreadable using an encryption key. To use this transformation type, enter the column name in the Column field, and then optionally enter a key name in the Key Name field.
- Rename: Changes a column’s name. To use this transformation type, enter the column name in the Column field, and then enter a new column name in the New Name field.
- Repeat until you have added all the transformations necessary for your use case.
-
Click Add Transformation, then select any of the following column transformation types:
- After configuring all fields, click Ingest to submit the job.
Monitor, trigger, or delete a job
When you schedule an ingested lakehouse, it runs as a job in Apache Airflow. You can monitor, trigger, or delete the job.- Web portal
- CLI
- REST API
- When you create a job, a success message and a View Jobs button appear.
- Track the job status by clicking View Jobs or navigating to the NexusOne homepage and clicking Monitor.
- Use the three dots
...menu to trigger or delete a job. - If you clicked Trigger job, then click the job’s name to open its DAG details in Airflow’s portal.
Additional resources
- For more information about the ingest feature, refer to Ingest Overview.
- For more information about the monitoring feature, refer to Monitor Overview.
- For more information about roles or permissions, refer to Govern Overview.