Prerequisites
- Appropriate permission:
nx1_ingest,nx1_monitor,nx1_s3_admin,airflow_user,superset_user,spark_sql, andtrino_admin - Ensure you are ingesting a database that NexusOne currently supports.
Add datasets from a database
- Web portal
- REST API
You add datasets from a database so NexusOne can process and analyze it.
This automatically makes the dataset available for your data pipeline workflow.
- Log in to NexusOne.
- From the NexusOne homepage, navigate to Ingest > Database
-
Add database details:
-
If you are using a query to select a dataset, then click From Query and enter the SQL query.
When specifying the schema and table in your query, use the following format:
- If you are select a specific table in a dataset, then click From Table and enter the table and schema name in the Source Schema and Source Table fields.
- Adding filters are optional.
-
If you are using a query to select a dataset, then click From Query and enter the SQL query.
When specifying the schema and table in your query, use the following format:
-
Add connection details:
-
In the Database URL field, enter a JDBC URL used to connect to the public database.
The URL should have the following format:
- In the Username field, enter a username.
- In the Password field, enter a password.
-
In the Database URL field, enter a JDBC URL used to connect to the public database.
The URL should have the following format:
-
Add ingest details:
- Enter a unique name for this ingest job. This name appears in the Monitor tab and Airflow’s portal for tracking.
- Choose an existing database schema or enter a new schema name to create one.
- Choose an existing table or enter a new table name.
- Optional: Set how often the ingestion job should run. Schedule options include
Run Once,Every 3 hours,Daily,Weekly,Monthly, andQuarterly. On the first schedule, the job/DAG in Apache Airflow automatically runs. Recurrent runs depend on your selected schedule option. - Select a mode for how to store incoming records at the destination table.
- Append: Add new records to the existing dataset.
- Merge: Add or update existing records where applicable.
- Overwrite: Replace all existing records.
- Optional: Select a DataHub domain. For example,
Company,Product,Sales. This is only applicable if you have one previously created on DataHub via the Govern feature on NexusOne. - Optional: Select or create one or more tags to label this job.
- After configuring all fields, click Ingest to submit the job.
Monitor, trigger, or delete a job
When you schedule an ingested file, it runs as a job in Apache Airflow. You can monitor, trigger, or delete the job.- Web portal
- REST API
- When you create a job, a success message and a View Jobs button appear.
- Track the job status by clicking View Jobs or navigating to the NexusOne homepage and clicking Monitor.
- Use the three dots
...menu to trigger or delete a job. - If you clicked Trigger job, then click the job’s name to open its DAG details in Airflow’s portal.
Additional resources
- To understand how database ingestion works in NexusOne, refer to How database ingestion works.
- For more information about the monitoring feature, refer to the Monitor Overview page.
- For more information about roles or permissions, refer to the Govern Overview page.