> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nx1cloud.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Ingest

> Overview of NexusOne's Ingest feature that brings files, databases, and lakehouses into the platform for querying and analytics.

The Ingest feature allows you to ingest a file, database, or lakehouse into the NexusOne platform.
It also lets you mirror your database in real-time into the platform. This page
describes each of these sub-features.

## How API ingestion works

In NexusOne, API ingestion is the process of pulling data from a SaaS platform's API endpoint, transforming the
responses into data streams, and then storing each stream as a table in NexusOne's Iceberg lakehouse.
It's implemented using the PyAirbyte Python-based library.

PyAirbyte provides the following features:

* Uses the existing Airflow DAG ingestion pattern, `task.pyspark`, that NexusOne supports, with
  PyAirbyte acting as the source connector.
* Transforms ingested data into a format NexusOne can use. Specifically, from pandas DataFrames
  into Spark DataFrames, and lastly into Iceberg tables.
* Remembers what data it already fetched using `_sync_state` tables, so next time it only pulls
  new or changed data.
* Uses Airbyte-defined primary keys to merge new data with existing ones.
* Supports 38 pre-installed SaaS platform connectors

You should use API ingestion in the following scenarios:

* Your data lives inside the SaaS platform NexusOne supports
* You don't have access to a database or a data file

### Supported SaaS platform connectors

NexusOne supports the following SaaS platform connectors:

* **Analytics**: Amplitude, Mixpanel, Google Analytics
* **Communication**: Twilio
* **Customer Relationship Management (CRM)/Support**: Salesforce, Zendesk, HubSpot, Intercom, Freshdesk
* **E-commerce**: Shopify
* **Engineering**: Jira, GitHub, GitLab, PagerDuty, Sentry, Datadog
* **Finance**: Stripe, Chargebee, Braintree, QuickBooks, Xero, NetSuite
* **HR**: BambooHR, Greenhouse
* **Identity**: Okta
* **Marketing**: Facebook Ads, Google Ads, LinkedIn Ads, TikTok, Klaviyo, Mailchimp
* **Productivity**: Google Sheets, Slack, Notion
* **Storage**: S3, Google Drive, Microsoft SharePoint
* **Surveys**: Typeform, SurveyMonkey

### Use cases

These examples show how different industries can use NexusOne's API ingestion:

* **Financial services**: Connect to platforms like Stripe and NetSuite to ingest transaction, billing, and
  revenue data into NexusOne for real-time financial reporting and reconciliation.
* **Customer support**: Sync ticketing and customer interaction data from Zendesk and Intercom for
  unified support analytics.

## How database ingestion works

The database ingest feature allows you ingest a public database containing datasets into
the NexusOne platform. NexusOne does this by connecting to the database using a JDBC URL, authenticating,
and creating an Airflow job that queries a table and copies the results into NexusOne.

<Note>You can query one table in a schema or several tables.</Note>

When NexusOne copies results from a database, you can now query it using
[Superset's SQL Lab](https://docs.nx1cloud.com/platform-components/apache-superset/superset-hands-on-examples#run-a-query)

### Supported database vendors and their JDBC URL

When you attempt to ingest a database into NexusOne, a JDBC URL is one of the options used
to set up a connection between NexusOne and the database.

The following table describes the supported database vendors on NexusOne and their JDBC URL:

| Database             | JDBC URL format                                                                   |
| -------------------- | --------------------------------------------------------------------------------- |
| Db2                  | `jdbc:db2://<database_URL_or_IP_address>:50001/<database_name>`                   |
| MariaDB              | `jdbc:mariadb://<database_URL_or_IP_address>:3306/<database_name>`                |
| Microsoft SQL Server | `jdbc:sqlserver://<database_URL_or_IP_address>:1433;databaseName=<database_name>` |
| MySQL                | `jdbc:mysql://<database_URL_or_IP_address>:3306/<database_name>`                  |
| Oracle               | `jdbc:oracle:thin:@//<database_URL_or_IP_address>:1521/<database_name>`           |
| PostgreSQL           | `jdbc:postgresql://<database_URL_or_IP_address>:5432/<database_name>`             |

<Note>
  * All port numbers specified here are defaults. Depending on how you deployed your database,
    change the port number accordingly.
  * In PostgreSQL, a database name is different from a schema name. The default database name
    is `postgres` and it stores default schemas.
</Note>

### Use cases

These examples show how different industries can use NexusOne's database ingestion and query capabilities:

* **Financial services**: Connect to a PostgreSQL or Oracle database containing market transactions
  so you can ingest its structured tables into NexusOne for centralized risk monitoring and analytics.
* **Education**: Connect to a MySQL or Microsoft SQL Server database that stores student records and grades
  so you can ingest its structured data into NexusOne for centralized student performance analysis.

## How file ingestion works

The file ingest feature allows you ingest files containing structured data into the platform.
NexusOne supports two file ingestion options:

* **Upload file**: Files stored on your local machine
* **Public file URL**: A public URL to a file you'd like to upload. You might store this file in an S3 bucket and expose it over HTTPS.

### Supported file formats

NexusOne currently supports these file formats:

* CSV
* Parquet
* ORC
* XML
* XLS/XLSX

An existing Hive catalog using an Iceberg table format already exists on NexusOne, so when you
ingest a file, you only have to define your schema and table. After ingesting the file,
[Apache Airflow](https://airflow.apache.org/) schedules a DAG based on your configuration.
This DAG uses Spark to process the file.

To query and visualize the dataset, NexusOne uses [Superset](/platform-components/apache-superset).
When querying, Superset communicates with Trino, which then retrieves the dataset using the catalog.

### Use cases

These examples show how different industries can use NexusOne's file ingestion and query capabilities:

* **Financial services**: Ingest Parquet-formatted market data feeds into NexusOne to monitor portfolio risk and run analytics on a single, secure platform without having to manage custom pipelines.
* **Education**: Ingest Excel-formatted grade books into NexusOne to store student records and analyze
  student performance trends.

## How lakehouse ingestion works

A *lakehouse* is a data lake that behaves like a data warehouse. It stores all data as files in
object storage, but adds a table format for structure and reliable updates. It also uses a catalog
so query engines can quickly find and read the data they need.

In NexusOne, lakehouse refers to a data architecture that stores databases
in object storage. It uses a [metastore](/platform-components/apache-hive-metastore) and
[table format](/platform-components/apache-iceberg) for query consistency.
In NexusOne, ingesting a lakehouse means copying data from one table into another table.
A table format such as Iceberg exposes that ingested table, after NexusOne copies the table into
another, it creates a new table format metadata for the copied table so downstream apps like
[Trino](/platform-components/trino) can query it.

These examples show how different industries can use NexusOne's lakehouse ingestion capabilities:

* **Financial services**: Use lakehouse ingestion to copy previously ingested Parquet market data
  tables into new tables within NexusOne. This ensures that analysts can run risk calculations and
  portfolio analytics in a centralized table.
* **Education**: Use lakehouse ingestion to copy previously ingested Excel grade book tables
  into a new table. This ensures that administrators can centralize student records and
  analyze performance trends.

## Additional resources

* For full instructions about how to ingest a database in NexusOne, refer to
  [How to ingest a database](../tasks/ingest/ingest-a-database).
* For full instructions about how to ingest a file in NexusOne, refer to
  [How to ingest a file](../tasks/ingest/ingest-a-file).
* For full instructions about how to ingest a lakehouse in NexusOne, refer to
  [How to ingest a file](../tasks/ingest/ingest-a-lakehouse).
