Ingest a database

Ingesting a database into the NexusOne platform means copying a dataset from an existing database into the NexusOne platform. Once ingested, the dataset becomes available for your data pipeline workflow.

Prerequisites

Appropriate permission: nx1_ingest, nx1_monitor, nx1_s3_admin, airflow_user, superset_user, spark_sql, and trino_admin
Ensure you are ingesting a database that NexusOne currently supports.

Add datasets from a database

Web portal
REST API

You add datasets from a database so NexusOne can process and analyze it. This automatically makes the dataset available for your data pipeline workflow.

Log in to NexusOne.
From the NexusOne homepage, navigate to Ingest > Database
Add database details:
- If you are using a query to select a dataset, then click From Query and enter the SQL query. When specifying the schema and table in your query, use the following format:
  <database_name>.<schema>.<table>
- If you are select a specific table in a dataset, then click From Table and enter the table and schema name in the Source Schema and Source Table fields.
- Adding filters are optional.
Add connection details:
- In the Database URL field, enter a JDBC URL used to connect to the public database. The URL should have the following format:
  jdbc:<database_vendor's_jdbc_driver_name>:<database_URL>:<database_port>/<database_name>
- In the Username field, enter a username.
- In the Password field, enter a password.
Add ingest details:
Ensure that the values added to your Schema and Table adhere to Apache Spark’s identifiers.
- Enter a unique name for this ingest job. This name appears in the Monitor tab and Airflow’s portal for tracking.
- Choose an existing database schema or enter a new schema name to create one.
- Choose an existing table or enter a new table name.
- Optional: Set how often the ingestion job should run. Schedule options include Run Once, Every 3 hours, Daily, Weekly, Monthly, and Quarterly. On the first schedule, the job/DAG in Apache Airflow automatically runs. Recurrent runs depend on your selected schedule option.
- Select a mode for how to store incoming records at the destination table.
  - Append: Add new records to the existing dataset.
  - Merge: Add or update existing records where applicable.
  - Overwrite: Replace all existing records.
- Optional: Select a DataHub domain. For example, Company, Product, Sales. This is only applicable if you have one previously created on DataHub via the Govern feature on NexusOne.
- Optional: Select or create one or more tags to label this job.
Optional: Column transformations:
- Click Add Transformation, then select any of the following column transformation types:
  - Cast: Converts a column’s data type during ingestion. To use this transformation type, enter the column name in the Column field, and then select a target type in the Target Type.
  - Drop: Removes a column from the dataset. To use this transformation type, enter the column name in the Column field.
  - Encrypt: Makes the data unreadable using an encryption key. To use this transformation type, enter the column name in the Column field, and then optionally enter a key name in the Key Name field.
  - Rename: Changes a column’s name. To use this transformation type, enter the column name in the Column field, and then enter a new column name in the New Name field.
- Repeat until you have added all the transformations necessary for your use case.
After configuring all fields, click Ingest to submit the job.

Monitor, trigger, or delete a job

When you schedule an ingested file, it runs as a job in Apache Airflow. You can monitor, trigger, or delete the job.

Web portal
CLI
REST API

When you create a job, a success message and a View Jobs button appear.
Track the job status by clicking View Jobs or navigating to the NexusOne homepage and clicking Monitor.
Use the three dots ... menu to trigger or delete a job.
If you clicked Trigger job, then click the job’s name to open its DAG details in Airflow’s portal.

Use the following nx1 command to:

Additional resources

To understand how database ingestion works in NexusOne, refer to How database ingestion works.
For more information about the monitoring feature, refer to the Monitor Overview page.
For more information about roles or permissions, refer to the Govern Overview page.

Getting started

Ask

Build

Connect

Discover

Engineer

Govern

Ingest

Monitor

Quality

Crews

Ingest a database

Prerequisites

Add datasets from a database

Monitor, trigger, or delete a job

Additional resources

Getting started

Ask

Build

Connect

Discover

Engineer

Govern

Ingest

Monitor

Quality

Crews

​Prerequisites

​Add datasets from a database

​Monitor, trigger, or delete a job

​Additional resources

Prerequisites

Add datasets from a database

Monitor, trigger, or delete a job

Additional resources