Glossary

This glossary defines key terms used in the NexusOne documentation.

A

Apache Airflow An open source platform used internally at NexusOne to author, schedule, and monitor tasks or jobs represented as DAGs. See also Engineer, Ingest, or the Apache Airflow repository.
Apache APISIX API Gateway An open source platform used internally at NexusOne to manage traffic from microservices and large language models. See also the Apache APISIX API Gateway repository.
Apache Gravitino An open source platform used internally at NexusOne to provide metadata management and governance across data lakes and warehouses. See also the Apache Gravitino repository.
Apache Iceberg An open source table format used internally at NexusOne to provide a unified way to access data in a data lake. See also Ask, Discover, Engineer, Quality, or the Apache Iceberg.
Apache Kyuubi An open source platform that acts as a gateway so that you can run SQL queries against Apache Spark engines. See also the Apache Kyuubi repository.
Apache Ranger An open source platform used internally at NexusOne that uses policies to manage access control to a data platform. See also the Apache Ranger repository.
Apicurio Schema Registry An open source platform used internally at NexusOne to store, share, and manage the structured definitions of data used in events or APIs.
Apache Spark An open source platform used internally at NexusOne to provide a data processing layer for large data analytics. See also the Apicurio Schema Registry repository.
Apache Superset An open source platform used internally at NexusOne that interactively analyzes and visualizes data. See also Discover, Metabase, or the Apache Superset repository.
Ask A NexusOne feature that generates SQL commands for interacting with ingested data, so you can gain meaningful insights. See also Insight Overview.

B

Bucket A container that stores data in the Amazon S3 service. See also Ingest.
Build A NexusOne feature that launches a JupyterHub-based development environment. See also Build Overview.

C

Catalog A group of similar items or data. It has several meanings:
- DataHub: Groups related datasets and assets such as Topics, Views, or Dashboards, for easier discovery.
- Iceberg: Provides a consistent way to create, load, and drop tables across different storage systems.
- Trino: Defines a connection to a data source containing schemas and tables.
Certificate manager A digital credential manager used internally at NexusOne to manage web domain names in a Kubernetes cluster, ensuring that web communication between clients and services is secure.
Cluster A group of Kubernetes nodes and pods that run open source tools used by NexusOne.
Connect A NexusOne feature that allows you to programmatically access ingested data. It also provides a single interface to launch several user-facing apps hosted on the NexusOne platform. See also Connect Overview.
CSV - Comma-Separated Values A file format for storing tabular data by separating each value with a comma. You can ingest CSV files into NexusOne. See also Ingest.

D

DAG - Directed Acyclic Graph A Directed Acyclic Graph contains several tasks that execute from left to right. At NexusOne, DAGs execute when you ingest data or schedule SQL commands to run at specific times. See also Apache Airflow, Engineer, or Ingest.
Dashboard A visual user interface displaying ingested or processed data as graphs, charts, or other visual elements. See also Discover.
Database A collection of data stored in tabular format and made available as a software package. You reference a database as a data source when it’s ingested into NexusOne or when it’s used as a Trino catalog by the Govern feature.
Data format A specific way to encode data. This could be in csv, parquet, or other formats.
DataHub An open source platform used internally at NexusOne to provide metadata for your ingested data. This metadata displays the lineage of your data, such as which storage location stores the data, and what Spark operation processed it. See also Govern, or the DataHub repository.
Data ingestion See also Ingest.
Data insight See also Insight.
Data lake A centralized data store for all types of data, such as structured, semi-structured, and unstructured. This allows users and apps to access and analyze the data from a single location.
Data mirroring The process of ingesting a database into NexusOne and then creating a copy of it. See also Ingest.
Data pipeline An automated way to ingest and transform data.
Data product A DataHub feature used to organize and manage tables, views, and other data assets.
Data source A platform that stores data and provides it when requested. This could be different vendor databases, such as MongoDB and PostgreSQL, or a data warehouse like Snowflake. On NexusOne, Trino provides access to these data sources.
Data warehouse A large, centralized data store that contains structured and semi-structured data. It’s often used for analytics and reporting because it includes both current and historical data.
Debezium An open source server used internally at NexusOne to capture table changes from an ingested Database and mirroring it into another database. See also the Debezium repository.
Discover A NexusOne feature that launches the Superset platform. See also Discover Overview.
Domain A logical way to categorize related data using DataHub. See also Ask, Engineer, or Ingest.

E

Engineer A NexusOne feature that transforms your data. It allows you to select one or more catalogs. See also Engineer.
ETL - Extract, Transform, and Load A process that’s used to integrate data from multiple sources, transform it, and then write the result to a target data store. See also Ingest.
External DNS Manager An open source tool for making Kubernetes resources managed by NexusOne discoverable to public domain name servers. See also the External DNS Manager repository.

F

File format See also Data format.

G

Govern A NexusOne feature that manages user or group access to NexusOne features and data sources. See also Govern Overview.
Group A collection of users who belong to the same department or share similar responsibilities.

H

Health check A way to verify if the API is operational.

I

IAM - Identity and Access Management A sub-feature of the NexusOne Govern feature. It’s used to manage roles assigned to users, groups, and tags. See also Govern.
IAM group See also Group.
IAM role See also Role.
Ingest The process of adding data to a system. It’s also a NexusOne feature used to add a file, database, or lakehouse. See also Ingest Overview.
Insight The result of getting meaningful information by examining data. It’s also a NexusOne feature that analyzes data and presents insights. See also Ask.

J

Job An Apache Spark operation scheduled in Apache Airflow. This is often found when using the Engineer, Ingest, or Monitor features on NexusOne. See also Monitor.
JupyterHub An open source platform that NexusOne launches when you are about to use the Build feature. It’s helpful when you are developing and testing DAGs that orchestrate ETL pipelines. See also Build or the JupyterHub repository.

K

Keda An open source platform used for auto-scaling Kubernetes workloads managed internally at NexusOne. See also the Keda repository.
Keycloak An open source platform that provides authentication to apps and users. At NexusOne, it’s used to manage user access and authentication. See also the Keycloak repository.
Kubernetes An open source tool used to orchestrate apps deployed at NexusOne. All apps are internally managed by the NexusOne team. See also the Kubernetes repository.

L

Lakehouse A data architecture that combines a data lake and a data warehouse. See also Ingest.
LLM - Large Language Model An artificial intelligence chatbot trained with a huge amount of data. In NexusOne, LLMs generate SQL queries you can use to gather data insights, check data quality, or transform data. See also Ask, Engineer, or Quality.

M

Metabase A more user-friendly open source platform used internally at NexusOne to interactively analyze and visualize ingested data. See also Apache Superset, Ask, Discover, or the Metabase repository.
Metadata Data that describes the content of another data. See also Apache Gravitor and DataHub.
Monitor A NexusOne feature for viewing all scheduled, running, failed, or completed jobs.

N

NexusOne The brand name of the Nexus Cognitive software platform.

O

Ollama An open source platform used internally at NexusOne to provide access to a local LLM from companies such as OpenAI and Google. See also Ask, Engineer, Quality, or the Ollama repository.
OpenFaaS - Open Function as a Service An open source platform that allows the NexusOne team to deploy serverless functions on Kubernetes. See also the OpenFaaS repository.
Open source Software whose source code is publicly available on the internet and distributed under a license. NexusOne uses several open source platforms, and the team actively contributes to them as well.
ORC - Optimized Row Columnar A file format that stores data as columns. Unlike Parque files, ORC files are more efficient for write-heavy workloads. You can ingest ORC files into NexusOne. See also Ingest.

P

Parquet A file format type that you can ingest into NexusOne. Unlike row-based file formats such as CSV, Parquet stores data as columns. This makes querying the data faster. See also Ingest.
Permission See also Permission boundary.
Permission boundary For restricting access to features for specific ingested data associated with a tag. You do this by assigning specific roles to the tag. See also Govern.
PyTorch An open source Python library used internally at NexusOne to build and train deep learning models. See also the PyTorch repository.

Q

Quality A NexusOne feature that allows you to check if your ingested data meets a defined goal. See also Quality.
Query The process of talking to ingested data in NexusOne using the Trino SQL syntax. See also Apache Superset or Metabase.

R

Ray An open source Python library used internally at NexusOne to scale machine learning workloads. See also the Ray repository.
Redis An open source server used internally at NexusOne for caching data apps. See also the Redis repository.
Reporting cycle The process of consistently running a SQL command using the NexusOne Ask feature, so you can gather data insights. See also Ask.
Role A set of permissions grouped under a name. Within NexusOne, the default roles represent specific features such as ingesting or transforming data. Custom roles also exists, these are a combination of several default roles. See also Govern.
Rule A SQL command scheduled to execute at specific times of the day using Apache Airflow. See also Engineer and Quality.

S

S3 An Amazon cloud storage service used for storing large amounts of object data. NexusOne stores ingested data in S3.
Schedule The process of telling a task or job to run at specific times of the day. See also Engineer, Ingest.
Schema The structure of a database. When ingesting data into NexusOne, you must specify a schema. See also Apache Superset or Metabase.
SQL - Structured Query Language A programming language used to interact with ingested data. NexusOne only supports the Trino SQL syntax. See also Ask, Engineer, or Quality.
SSO - Single sign on The process of logging into several apps within NexusOne using your single NexusOne credentials.

T

Table A collection of data represented in rows and columns. When ingesting data into NexusOne, you must specify a table. See also Apache Superset or Metabase.
Tag An optional name given to ingested data within NexusOne. NexusOne can assign a role to a tag to ensure that the tagged data can only use specific NexusOne features. See also Ingest or Govern.
Task See Job.
Trino An open source SQL query engine used within NexusOne. See also Apache Superset or Metabase.

U

User A logical identity assigned to employees interacting with NexusOne. See also Govern.

V

Visualize The process of seeing insights from ingested data on a user interface. See also Apache Superset or Metabase.

W

Warehouse See also Data warehouse.

X

XLSX - Excel Open XML Spreadsheet A file format used to represent data. You can ingest XLSX files into NexusOne. See also Ingest.
XML - eXtensible Markup Language A file format used to represent data using elements. For example, . You can ingest XML files into NexusOne. See also Ingest.

Getting started

Ask

Build

Connect

Discover

Engineer

Govern

Ingest

Monitor

Quality

Crews

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Getting started

Ask

Build

Connect

Discover

Engineer

Govern

Ingest

Monitor

Quality

Crews

​A

​B

​C

​D

​E

​F

​G

​H

​I

​J

​K

​L

​M

​N

​O

​P

​Q

​R

​S

​T

​U

​V

​W

​X

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X