Trino - NexusOne

Trino is a fast, distributed SQL engine built for interactive analytics across large and diverse datasets. Trino enables several key capabilities in NexusOne:

Self-service analytics across curated and raw datasets
OpenID Connect (OIDC) for unified authentication and authorization
Federated SQL queries across Iceberg, Hive, PostgreSQL, and more
BI dashboards powered by Superset
DataOps workflows for validating and inspecting catalog data
AI-assisted SQL generation

By providing a single execution layer for all analytical workloads, Trino forms a core part of NexusOne’s data architecture.

Environment configuration

This section describes how the NexusOne team configured Trino, and how different client tools connect to the service.

Accessing the Trino UI

Trino provides that displays the details about the cluster. You can access the Trino UI on the NexusOne cluster using the following URL:

https://trino-app.<client>.nx1cloud.com/ui/

When you purchase NexusOne, you receive a client name. Replace client with your assigned client name.

After visiting the URL, enter your NexusOne credentials to log in.

Trino client setup

Within NexusOne, Trino supports OAuth 2.0 authenticated connections for JDBC, Python, and the command-line tool clients.

Python

Example of connecting to Trino using Python.

from trino.dbapi import connect
from trino.auth import OAuth2Authentication

conn = connect(
    host="trino-app.<client>.nx1cloud.com",
    port=443,
    http_scheme="https",
    user="your_username",
    password="your_password",
    verify=False, #OPTIONAL FOR SSL ERROR
    catalog="iceberg",
    schema="default"
)

Use your NexusOne credentials as the user and password.

JDBC

Example of connecting to Trino using JDBC.

jdbc:trino://trino-app.<client>.nx1cloud.com:443/iceberg/default?SSL=true

Command-line tool

Example of connecting to Trino using a command-line tool.

java -jar trino.jar \
  --user <user_id> \
  --server https://trino-app.<client>.nx1cloud.com/ \
  --catalog iceberg \
  --schema default \
  --debug

Use your NexusOne credential as the user.

Iceberg/Hive integration

Trino integrates directly with the NexusOne data platform through preconfigured Iceberg and Hive catalogs. These catalogs use S3 as the underlying storage layer for all datasets and a Hive Metastore for providing centralized table and schema metadata.

Superset integration

Superset connects to Trino using OAuth 2.0. Each Trino catalog appears as a database, and each schema inside a catalog is available for querying in SQL Lab, Superset’s interactive SQL editor. Connection example:

trino://token@trino-app.<client>.nx1cloud.com/iceberg/default?auth=oauth

Once connected, Superset uses Trino to run queries, create charts, and build dashboards.

Types of Trino queries

Trino supports different types of queries depending on the purpose and data sources. Understanding these types helps you choose the right approach when analyzing data.

Analytical queries

Analytical queries interact with data within a single catalog or schema. One example is counting employees per job role in a single table, employees1, and calculating their average salary.

SELECT job_id, COUNT(*) AS employee_count, AVG(salary) AS avg_salary
FROM employees1
GROUP BY job_id
ORDER BY employee_count DESC;

Federated queries

Federated queries combine data across multiple catalogs or schemas. One example is aggregating employee data by department, combining information from an Iceberg table, employees1, and a Hive table, departments, in a single query.

SELECT d.department_name,
       COUNT(e.employee_id) AS total_employees,
       AVG(e.salary) AS avg_salary,
       MAX(e.salary) AS max_salary
FROM iceberg.demo.employees1 e
JOIN hive.default.departments d
ON e.department_id = d.department_id
GROUP BY d.department_name
ORDER BY avg_salary DESC;

Understanding these types helps you decide which queries to use.

Query execution management

Trino provides system-level mechanisms to keep query execution efficient in a multi-tenant environment such as NexusOne. This includes the following:

Parallel query execution across Kubernetes nodes
Configurable query management properties that control query behavior and resource usage, such as:
- retry-policy: Defines how many times and how often to retry a failed query
- query_max_run_time: Maximum time a query can run before Trino stops it
- query_max_memory: Maximum memory a query can use during execution
- query_max_execution_time: Maximum CPU execution time a query can consume

Trino hands-on examples

This section provides practical SQL examples that demonstrate how to work with schemas, tables, joins, and aggregations in Trino.

Schema and table operations

Trino supports a variety of metadata operations for creating tables, defining views, and modifying existing schema definitions. These operations work directly with the underlying data lake storage and across available catalogs.

Certain operations require appropriate permissions or roles.

Create an Iceberg table

You can use CREATE TABLE to create a new table in a database. The table might be empty or populated with data from a query. The following example creates a new table summarizing employee counts and average salaries by job ID, providing a quick overview of compensation patterns.

CREATE TABLE demo.employee_salary_summary AS
SELECT job_id,
       COUNT(*) AS employee_count,
       AVG(salary) AS avg_salary
FROM employees1
GROUP BY job_id;

Create a view

You can use CREATE VIEW to define a virtual table representing a saved query. It’s useful for simplifying complex queries or providing controlled access. The following example defines a view to easily access employees earning higher than 5000, without modifying the underlying table.

CREATE VIEW demo.high_earning_employees AS
SELECT *
FROM employees1
WHERE salary > 5000;

Alter a table

You can use ALTER TABLE to modify an existing table’s structure, such as adding, dropping, or modifying columns. The following example adds a new bonus column to the employees table to store additional compensation data.

ALTER TABLE demo.employees1 ADD COLUMN bonus INTEGER;

Available catalogs

Two catalogs are accessible in the NexusOne Trino environment. These include:

Iceberg: Iceberg provides access to tables stored in S3 or HDFS. Key features include the following:
- Atomic commits, rollback, and schema evolution
- Uses hidden partitioning and metadata-based pruning for efficient scans
- Works well with cloud object storage
- Allows time-travel queries using table snapshots
- Serves curated, frequently queried analytical datasets
Hive: Hive provides access to tables in Hive-compatible formats, such as Parquet or ORC. A Hive Metastore stores the metadata while the underlying files reside in S3 or HDFS. Key features include the following:
- Directory-based partitioning
- Supports multiple file formats and storage layouts
- Used for raw, historical, or legacy datasets
- Provides broad compatibility with existing Hadoop/Hive ecosystems
- Serves foundational lake data or mixed-format workloads

The Iceberg and Trino catalogs appear directly in the Trino UI and in BI tools like Superset. It’ll enable you to browse schemas, tables, and metadata while performing analytical or federated SQL queries.

Joins and aggregations

You can perform standard relational joins, cross-catalog joins, and apply optimization hints to improve execution when you know the size characteristics of your tables.

Create a simple join

Use simple joins when querying tables within the same catalog or schema. These follow the standard SQL semantics and combine related datasets stored in a single source. The following example joins an employees1 and departments tables to list employee details with their department names.

SELECT e.employee_id,
       e.first_name,
       e.last_name,
       d.department_name
FROM employees1 e
JOIN departments d
ON e.department_id = d.department_id;

Create a cross-catalog join

Cross-catalog joins allow Trino to combine data from different catalogs, such as joining Iceberg tables with Hive tables or external databases. This enables federated analytics without moving or duplicating datasets. The following example joins an employees1 table in Iceberg with the departments table in Hive to list employee details along with their department names.

SELECT e.employee_id,
       e.first_name,
       e.last_name,
       d.department_name
FROM iceberg.demo.employees1 e
JOIN hive.default.departments d
ON e.department_id = d.department_id;

Use a join optimization hint

Join optimization can be helpful when you have prior knowledge about the size or distribution of the tables involved in a join. In certain scenarios, such as joining a very large fact table with a small dimension table, Trino can optimize performance by broadcasting the small table. Trino copies the small table to every worker node. Each node then processes rows from the large table incrementally and joins them with the broadcasted table locally, hence avoiding redistribution of the large dataset across the cluster. The following example joins an employees1 and departments tables, then it broadcasts the departments table to optimize the join for better performance.

SELECT /*+ BROADCAST(d) */
       e.employee_id,
       e.first_name,
       e.last_name,
       d.department_name
FROM employees1 e
JOIN departments d
ON e.department_id = d.department_id;

Best practices for query optimization

These best practices help you write efficient queries, complementing the hands-on examples previously described.

Use `EXPLAIN` for query plans

It’s useful to inspect the query plan before running heavy queries. EXPLAIN shows how Trino handles joins, filters, and table scans.

EXPLAIN SELECT * FROM employees1 WHERE salary > 5000;

For deeper insight, a JSON format reveals scan ranges, cost estimates, and pushdown behavior.

EXPLAIN (FORMAT JSON)
SELECT department_id, AVG(salary)
FROM employees1
GROUP BY department_id;

Push down predicates

Write filters so you can push it down into Iceberg or Hive metadata. This reduces I/O and avoids unnecessary S3 reads.

Bad example:

WHERE CAST(start_date AS DATE) = DATE '2025-01-15'

Good example:
```
WHERE start_date = DATE '2025-01-15'
```

Prune columns

Select only the required columns. This helps minimize shuffles and reduces the amount of data pulled from S3.

Bad example:
```
SELECT * FROM employees1;
```

Good example:

SELECT employee_id, salary FROM employees1;

Optimize joins

As previously described, use join hints when table sizes are predictable. Broadcast a small dimension table to speed up the join, and apply partitioning for large fact-to-dimension joins.

Bad example:
```
SELECT /*+ PARTITIONED(f, dim) */ ...
```
Good example:
```
SELECT /*+ BROADCAST(dim) */ ...
```

Limit large scans

Apply filters and LIMIT clauses during early data exploration to avoid scanning entire Iceberg tables.

Bad example:
```
SELECT * FROM employees1;
```

Good example:

SELECT *
FROM employees1
WHERE salary > 3000
LIMIT 100;

Pre-compute aggregations with CTAS or materialized views

For dashboards or recurring workloads, pre-compute heavy aggregations using Create Table as Select (CTAS) or materialized views. This reduces the load on Trino and improves Superset’s responsiveness.

CREATE TABLE analytics.employee_salary_by_department AS
SELECT department_id,
       COUNT(*) AS employee_count,
       SUM(salary) AS total_salary
FROM employees1
GROUP BY department_id;

Additional resources

For more details about Trino, refer to the Trino official documentation.
If you are using the NexusOne portal and want to learn how to launch Trino, refer to the Launch a hosted app page.

​Environment configuration

​Accessing the Trino UI

​Trino client setup

​Python

​JDBC

​Command-line tool

​Iceberg/Hive integration

​Superset integration

​Types of Trino queries

​Analytical queries

​Federated queries

​Query execution management

​Trino hands-on examples

​Schema and table operations

​Create an Iceberg table

​Create a view

​Alter a table

​Available catalogs

​Joins and aggregations

​Create a simple join

​Create a cross-catalog join

​Use a join optimization hint

​Best practices for query optimization

​Use EXPLAIN for query plans

​Push down predicates

​Prune columns

​Optimize joins

​Limit large scans

​Pre-compute aggregations with CTAS or materialized views

​Additional resources

Environment configuration

Accessing the Trino UI

Trino client setup

Python

JDBC

Command-line tool

Iceberg/Hive integration

Superset integration

Types of Trino queries

Analytical queries

Federated queries

Query execution management

Trino hands-on examples

Schema and table operations

Create an Iceberg table

Create a view

Alter a table

Available catalogs

Joins and aggregations

Create a simple join

Create a cross-catalog join

Use a join optimization hint

Best practices for query optimization

Use `EXPLAIN` for query plans

Push down predicates

Prune columns

Optimize joins

Limit large scans

Pre-compute aggregations with CTAS or materialized views

Additional resources