Skip to main content
Data transformation and preparation rules modify data to meet specific goals. This could include a rule to summarize the data, remove duplicate rows, or convert the data format. Unlike a lakehouse rule, federated rules provide simultaneous access to multiple catalogs such as Iceberg and LLM. The LLM catalog has limited support currently, and the System catalog is specific to Trino.

Prerequisites

  • Appropriate permission: nx1_engineer
  • A previously ingested dataset

Select a dataset and describe a transformation rule

The catalog contains metadata about the schema and table. The schema describes the data structure, such as the table name, column names, and data types. The table contains the data.
  1. Log in to NexusOne.
  2. On the NexusOne homepage, click Engineer.
  3. Click Federated > Select catalogs and choose at least two catalogs.
  4. Select schemas from each catalog.
  5. Select tables from each catalog.
Selecting Iceberg as a catalog is equivalent to using a Lakehouse data transformation rule.
You describe a rule using natural language so NexusOne can generate an SQL query.
  1. Enter a transformation job name.
  2. Enter a natural language transformation prompt.
  3. Optional: Select the checkbox if you’d like a preview of your data after running the generated query from the prompt input you have provided.
  4. Click Transform. If you selected the preview checkbox in the previous step, then also click Finalize to proceed to the next step.

Schedule a transformation rule

Scheduling a rule allows you to run the rule at specific time intervals using Apache AirFlow.
  1. Select a destination schema.
  2. Select a destination table.
  3. Optional: Select a DataHub domain.
  4. Optional: Select or create one or more tags to label this rule.
  5. Select a time interval for how often the transformation rule should run. Schedule options include None, Every 3 hours, Daily, Weekly, Monthly, and Quarterly. On the first schedule, the job/DAG on Apache Airflow automatically runs. Recurrent runs depend on your selected schedule option.
  6. Select a mode for how to store incoming records at the destination table.
    • Append: Add new records.
    • Merge: Add or update existing records where applicable.
    • Overwrite: Replace all existing records.
  7. Click Schedule.

Monitor, trigger, or delete a job

When you schedule a rule, it runs as a job in Apache Airflow. You can monitor, trigger, or delete the job.
  1. When you schedule a rule, a View your job button appears.
  2. Track the job status by clicking View your job or navigate to the NexusOne homepage and click Monitor.
  3. Use the three dots ... menu to trigger or delete a job.
  4. If you clicked Trigger job, then click the job’s name to open its DAG details in Airflow’s portal.

Additional resources