Skip to main content
Data transformation and preparation rules modify data to meet specific goals. This could include a rule to summarize the data, remove duplicate rows, or convert the data format. Unlike a federated rule, a lakehouse rule is specific to an Iceberg catalog.

Prerequisites

  • Appropriate permission: nx1_engineer
  • A previously ingested dataset

Select a dataset and describe a transformation rule

The catalog contains metadata about the schema and table. The schema describes the data structure, such as the table name, column names, and data types. The table contains the data.
  1. Log in to NexusOne.
  2. On the NexusOne homepage, click Engineer.
  3. Click Lakehouse. Default is an Iceberg catalog.
  4. Select a schema and table.
You describe a rule using natural language so NexusOne can generate an SQL query.
  1. Enter a transformation job name.
  2. Enter a natural language transformation prompt.
  3. Optional: Select the Show preview? checkbox if you’d like a preview of your data after running the generated query from the prompt input you have provided.
  4. Click Transform. If you selected the preview checkbox in the previous step, then click Finalize to proceed to the next step.

Schedule a transformation rule

Scheduling a rule allows you to run the rule at specific time intervals using Apache AirFlow.
  1. Select a destination schema.
  2. Select a destination table.
  3. Optional: Select a DataHub domain.
  4. Optional: Select or create one or more tags to label this rule.
  5. Select a time interval for how often the transformation rule should run. Schedule options include None, Every 3 hours, Daily, Weekly, Monthly, and Quarterly. On the first schedule, the job/DAG on Apache Airflow automatically runs. Recurrent runs depend on your selected schedule option.
  6. Select a mode for how to store incoming records at the destination table.
    • Append: Add new records.
    • Merge: Add or update existing records where applicable.
    • Overwrite: Replace all existing records.
  7. Click Schedule.

Monitor, trigger, or delete a job

When you schedule a rule, it runs as a job in Apache Airflow. You can monitor, trigger, or delete the job.
  1. When you schedule a rule, a View your job button appears.
  2. Track the job status by clicking View your job or navigate to the NexusOne homepage and click Monitor.
  3. Use the three dots ... menu to trigger or delete a job.
  4. If you clicked Trigger job, then click the job’s name to open its DAG details in Airflow’s portal.

Additional resources