Ingest a file

Ingesting a file into the NexusOne platform means uploading or adding a public URL to the file so the platform can process and analyze it. Once ingested, the file is automatically integrated into your data pipeline workflow.

Prerequisites

Appropriate permission: nx1_ingest, nx1_monitor, nx1_s3_admin, airflow_user, superset_user, spark_sql, and trino_admin
Ensure you are ingesting the files NexusOne currently supports.

Add or upload a file

Web portal
REST API

You add or upload a file so NexusOne can process and analyze it. This automatically integrates the file into your data pipeline workflow for later use.

Log in to NexusOne.
From the NexusOne homepage, navigate to Ingest > File
Add a file
- If you are uploading a file, then click Choose File and select the file that you want to upload.
- If you are adding a public URL to a file, then click Public File URL and enter the public URL to the file you’d like to upload. This file might be in an S3 bucket and exposed over HTTPS.
Add ingest details. These fields define how the NexusOne ingests and accesses the data through a schema and table.
Ensure that the values added to your Schema and Table adhere to Apache Spark’s identifiers.
- Enter a unique name for this ingest job. This name appears in the Monitor tab and Airflow’s portal for tracking.
- Choose an existing database schema or enter a new schema name to create one.
- Choose an existing table or enter a new table name.
- Optional: Set how often the ingestion job should run. Schedule options include None, Every 3 hours, Daily, Weekly, Monthly, and Quarterly. On the first schedule, the job/DAG in Apache Airflow automatically runs. Recurrent runs depend on your selected schedule option.
- Select a mode for how to store incoming records at the destination table.
  - Append: Add new records to the existing dataset.
  - Merge: Add or update existing records where applicable.
  - Overwrite: Replace all existing records.
- Optional: Select a DataHub domain. For example, Company, Product, Sales. This is only applicable if you have one previously created on DataHub via the Govern feature on NexusOne.
- Optional: Select or create one or more tags to label this job.
After configuring all fields, click Ingest to submit the job.

Monitor, trigger, or delete a job

When you schedule an ingested file, it runs as a job in Apache Airflow. You can monitor, trigger, or delete the job.

Web portal
REST API

When you create a job, a success message and a View Jobs button appear.
Track the job status by clicking View Jobs or navigate to the NexusOne homepage and click Monitor.
Use the three dots ... menu to trigger or delete a job.
If you clicked Trigger job, then click the job’s name to open its DAG details in Airflow’s portal.

Additional resources

To get instructions on how to ingest specific file types supported on NexusOne or visualize your dataset, see the following:
For more information about the monitoring feature, refer to Monitor Overview.
For more information about roles or permissions, refer to Govern Overview.

Getting started

Ask

Build

Connect

Discover

Engineer

Govern

Ingest

Monitor

Quality

Crews

Prerequisites

Add or upload a file

Monitor, trigger, or delete a job

Additional resources

Getting started

Ask

Build

Connect

Discover

Engineer

Govern

Ingest

Monitor

Quality

Crews

​Prerequisites

​Add or upload a file

​Monitor, trigger, or delete a job

​Additional resources

Prerequisites

Add or upload a file

Monitor, trigger, or delete a job

Additional resources