Skip to main content
This tutorial walks you through how you can add a public URL to a Parquet file into the NexusOne platform.

Prerequisite

Appropriate permission: nx1_ingest, nx1_monitor, nx1_s3_admin, airflow_user, superset_user, spark_sql, and trino_admin

Add the public URL to the portal

  1. Log in to NexusOne.
  2. On the NexusOne homepage, navigate to Ingest > File.
  3. In the File Details section, click Public File URL.
  4. In File URL, enter the following URL to a Parquet file:
https://rapid-file-tutorial.s3.us-east-1.amazonaws.com/customers.parquet
Add the URL into the File URL box.

Add ingest details

Add the following information to the fields:
  • Name: parquet_url
  • Schema: parquet_url_schema
  • Table: parquet_url_table
  • Schedule: Run Once
  • Mode: append
  • Tags: Don’t add any tags
After adding these details, click Ingest. Wait for a few minutes until you see a success message appear.

Monitor job

When you ingest the file, this creates an Airflow job. To monitor the status of the job, use the following steps:
  1. Click View Jobs or navigate to the NexusOne homepage and click Monitor.
  2. Find your job name, parquet, in the list, and watch its current status.
  3. Wait for a few minutes and refresh your browser until the status changes to Completed.

Visualize your dataset

Use the following steps to visualize your dataset:
  1. On the NexusOne homepage, click Discover to launch Superset.
  2. Hover your mouse over SQL, and then select SQL Lab.
  3. Enter the following command in the query box:
SELECT * FROM parquet_url_schema.parquet_url_table
mintlify.s3.us-west-1.amazonaws

Visualize your dataset

Additional resources