Skip to main content
In NexusOne, a lakehouse refers to an internal database with a catalog and table format. Ingesting a lakehouse means copying previously ingested data from a table in that internal database into another table. This preserves the table format and catalog metadata so that downstream apps like Trino can query it. This guide walks you through how to ingest a lakehouse.

Prerequisites

Ingest a lakehouse

Specify a table containing a previously ingested dataset, so NexusOne can copy it into another table.
  1. Log in to NexusOne.
  2. From the NexusOne homepage, navigate to Ingest > Lakehouse.
  3. Add lakehouse details containing previously ingested data:
    • Enter a lakehouse schema name.
    • Enter a table name.
  4. Add ingest details containing information about the new table:
    Ensure that the values added to your Schema and Table adhere to Apache Spark’s identifiers.
    • Enter a unique name for this ingest job. This name appears in the Monitor tab and Airflow’s portal for tracking.
    • Enter a new schema name.
    • Enter a new table name.
    • Optional: Set how often the ingestion job should run. Schedule options include Run Once, Every 3 hours, Daily, Weekly, Monthly, and Quarterly.
    • Select a mode for how to store incoming records at the destination table.
      • Append: Add new records to the existing dataset.
      • Merge: Add or update existing records where applicable.
      • Overwrite: Replace all existing records.
    • Optional: Select a DataHub domain. For example, Company, Product, Sales. This is only applicable if you have one previously. created on DataHub via the Govern feature on NexusOne.
    • Optional: Select or create one or more tags to label this job.
  5. After configuring all fields, click Ingest to submit the job.

View, trigger, or delete a job

When you schedule an ingested lakehouse, it runs as a job in Apache Airflow. You can do the following:

Additional resources