Skip to main content
This tutorial walks you through how you can add a public URL to a Parquet file into the NexusOne platform.

Prerequisite

Appropriate permission: nx1_ingest, nx1_monitor, nx1_s3_admin, airflow_user, superset_user, spark_sql, and trino_admin

Add the public URL to the portal

  1. Log in to NexusOne.
  2. On the NexusOne homepage, navigate to Ingest > File.
  3. In the File Details section, click Public File URL.
  4. In File URL, enter the following URL to a Parquet file:
https://rapid-file-tutorial.s3.us-east-1.amazonaws.com/customers.parquet
Add the URL into the File URL box.

Add ingest details

Add the following information to the fields:
  • Name: parquet_url
  • Schema: parquet_url_schema
  • Table: parquet_url_table
  • Schedule: None
  • Mode: append
  • Tags: Don’t add any tags
For the Schedule field, None specifies that the DAG on Apache Airflow won’t run. After adding these details, click Ingest. Wait for a few minutes until you see a success message appear.

Monitor and trigger job

  1. When you ingest the file, this creates an Airflow job, so a success message with a View Jobs button appears.
  2. Track the job status by clicking View Jobs or navigate to the NexusOne homepage and click Monitor.
  3. You should see your job name, parquet, in the list, and its current status. Use the three dots ... menu to trigger the job.
  4. If you clicked Trigger job, then click the job’s name to open its DAG details in Airflow’s portal.

Visualize your dataset

On the NexusOne homepage, navigate to Discover > New > SQL query. Then execute the following command:
SELECT * FROM parquet_schema.parquet_table
visualize-parquet-url-dataset-mtb58

Visualize your dataset

Additional resources