Skip to main content
The Engineer feature leverages the concept of data transformation and preparation to modify data to meet specific goals. Afterwards, the data is usable for downstream tools in a data pipeline or queryable directly.

Key features

The Engineer feature in NexusOne supports all data transformation and preparation operations. Its key features include:
  • Metadata catalog management: It leverages a data catalog that manages metadata and allows a query engine like Trino to query and access the data. You could have previously ingested the data, or it could be Trino’s internal system data.
  • Multiple table formats: You can work with data stored in either Hive or Iceberg table formats.
  • Multiple catalog modes: Depending on your environment, you can use the following catalog modes:
    • Federated mode: Provides simultaneous access to multiple catalogs such as Hive, Iceberg, System, and LLM. The LLM catalog has limited support currently, and the System catalog is specific to Trino.
    • Lakehouse mode: Provides access to an Iceberg catalog.
  • Automatically generate SQL queries for transformation: You can use natural language to generate SQL queries. It’s helpful to technical and non-technical users.
  • Schedule queries: You can schedule and execute one or more queries using Apache Airflow.
  • Destination schemas and table: You can specify a destination schema and table to store your transformed data.

Common operations

These are some common operations in data transformation and preparation:
  • Aggregation: Summarizing the data by grouping rows and applying a sum or average math function to produce a summarized output.
  • Cleaning: Removing or correcting missing or invalid data within the cell of a table
  • Data format conversion: Changing the file format of the data. For example, converting from CSV to Parquet so it’s easier to query.
  • Deduplication: Ensuring uniqueness by removing duplicate rows.
  • Enrichment: Adding additional data from internal or external sources to improve the data quality.
  • Formatting: Standardizing the representation of the data. For example, using the YYYY-MM-DD date format consistently across a table.
  • Normalization: Adjusting the data values so they’re consistent and easier to work with. For example, converting all texts to lowercase.
  • Splitting and merging: Splitting or merging complex columns. For example, splitting a full name column by first and last name.

Use cases

These examples show how different industries can use NexusOne’s data transformation and preparation capabilities:
  • E-commerce: Clean or aggregate customer sales transactions by region or product category. This enables you to predict future sales of a product or understand a customer’s behavior so you can send personalized promotions.
  • Financial services: Convert large volumes of customer transaction data in CSV files, which can be slow to process, into Parquet format. The columnar structure of Parquet files makes the data faster to query.

Additional resources

For full instructions about how to ingest a file in NexusOne, refer to How to ingest a file.