Skip to main content
The ingest feature allows you to ingest a file, database, or lakehouse into the NexusOne platform. It also lets you mirror your database in real-time into the platform. This page describes each of these sub-features.

How file ingestion works

The file ingest feature allows you ingest files containing structured data into the platform. NexusOne supports two file ingestion options:
  • Upload file: Files stored on your local machine
  • Public file URL: A public URL to a file you’d like to upload. You might store this file in an S3 bucket and expose it over HTTPS.

Supported file formats

NexusOne currently supports these file formats:
  • CSV
  • Parquet
  • ORC
  • XML
  • XLS/XLSX
An existing Hive catalog using an Iceberg table format already exists on NexusOne, so when you ingest a file, you only have to define your schema and table. After ingesting the file, Apache Airflow schedules a DAG based on your configuration. This DAG uses Spark to process the file. To query and visualize the dataset, NexusOne uses Superset. When querying, Superset communicates with Trino, which then retrieves the dataset using the catalog.

Use cases

These examples show how different industries can use NexusOne’s file ingestion and query capabilities:
  • Financial services: Ingest Parquet-formatted market data feeds into NexusOne to monitor portfolio risk and run analytics on a single, secure platform without having to manage custom pipelines.
  • Education: Ingest Excel-formatted grade books into NexusOne to store student records and analyze student performance trends.

How database ingestion works

The database ingest feature allows you ingest a public database containing datasets into the NexusOne platform. NexusOne does this by connecting to the database using a JDBC URL, authenticating, and creating an Airflow job that queries a table and copies the results into NexusOne.
You can query one table in a schema or several tables.
When NexusOne copies results from a database, you can now query it using Superset’s SQL Lab

Supported database vendors and their JDBC URL

When you attempt to ingest a database into NexusOne, a JDBC URL is one of the options used to set up a connection between NexusOne and the database. The following table describes the supported database vendors on NexusOne and their JDBC URL:
DatabaseJDBC URL format
Db2jdbc:db2://<database_URL_or_IP_address>:50001/<database_name>
MariaDBjdbc:mariadb://<database_URL_or_IP_address>:3306/<database_name>
Microsoft SQL Serverjdbc:sqlserver://<database_URL_or_IP_address>:1433;databaseName=<database_name>
MySQLjdbc:mysql://<database_URL_or_IP_address>:3306/<database_name>
Oraclejdbc:oracle:thin:@//<database_URL_or_IP_address>:1521/<database_name>
PostgreSQLjdbc:postgresql://<database_URL_or_IP_address>:5432/<database_name>
  • All port numbers specified here are defaults. Depending on how you deployed your database, change the port number accordingly.
  • In PostgreSQL, a database name is different from a schema name. The default database name is postgres and it stores default schemas.

Use cases

These examples show how different industries can use NexusOne’s database ingestion and query capabilities:
  • Financial services: Connect to a PostgreSQL or Oracle database containing market transactions so you can ingest its structured tables into NexusOne for centralized risk monitoring and analytics.
  • Education: Connect to a MySQL or Microsoft SQL Server database that stores student records and grades so you can ingest its structured data into NexusOne for centralized student performance analysis.

How lakehouse ingestion works

A lakehouse is a data lake that behaves like a data warehouse. It stores all data as files in object storage, but adds a table format for structure and reliable updates. It also uses a catalog so query engines can quickly find and read the data they need. In NexusOne, lakehouse refers to a data architecture that stores databases in object storage. It uses a metastore and table format for query consistency. In NexusOne, ingesting a lakehouse means copying data from one table into another table. A table format such as Iceberg exposes that ingested table, after NexusOne copies the table into another, it creates a new table format metadata for the copied table so downstream apps like Trino can query it. These examples show how different industries can use NexusOne’s lakehouse ingestion capabilities:
  • Financial services: Use lakehouse ingestion to copy previously ingested Parquet market data tables into new tables within NexusOne. This ensures that analysts can run risk calculations and portfolio analytics in a centralized table.
  • Education: Use lakehouse ingestion to copy previously ingested Excel grade book tables into a new table. This ensures that administrators can centralize student records and analyze performance trends.

Additional resources