- Self-service analytics across curated and raw datasets
- OpenID Connect (OIDC) for unified authentication and authorization
- Federated SQL queries across Iceberg, Hive, PostgreSQL, and more
- BI dashboards powered by Superset
- DataOps workflows for validating and inspecting catalog data
- AI-assisted SQL generation
Environment configuration
This section describes how the NexusOne team configured Trino, and how different client tools connect to the service.Accessing the Trino UI
Trino provides that displays the details about the cluster. You can access the Trino UI on the NexusOne cluster using the following URL:When you purchase NexusOne, you receive a client name.
Replace
client with your assigned client name.Trino client setup
Within NexusOne, Trino supports OAuth 2.0 authenticated connections for JDBC, Python, and the command-line tool clients.Python
Example of connecting to Trino using Python.user and password.
JDBC
Example of connecting to Trino using JDBC.Command-line tool
Example of connecting to Trino using a command-line tool.user.
Iceberg/Hive integration
Trino integrates directly with the NexusOne data platform through preconfigured Iceberg and Hive catalogs. These catalogs use S3 as the underlying storage layer for all datasets and a Hive Metastore for providing centralized table and schema metadata.Superset integration
Superset connects to Trino using OAuth 2.0. Each Trino catalog appears as a database, and each schema inside a catalog is available for querying in SQL Lab, Superset’s interactive SQL editor. Connection example:Types of Trino queries
Trino supports different types of queries depending on the purpose and data sources. Understanding these types helps you choose the right approach when analyzing data.Analytical queries
Analytical queries interact with data within a single catalog or schema. One example is counting employees per job role in a single table,employees1,
and calculating their average salary.
Federated queries
Federated queries combine data across multiple catalogs or schemas. One example is aggregating employee data by department, combining information from an Iceberg table,employees1, and a Hive table, departments, in a single query.
Query execution management
Trino provides system-level mechanisms to keep query execution efficient in a multi-tenant environment such as NexusOne. This includes the following:- Parallel query execution across Kubernetes nodes
- Configurable query management properties that control query behavior and resource
usage, such as:
retry-policy: Defines how many times and how often to retry a failed queryquery_max_run_time: Maximum time a query can run before Trino stops itquery_max_memory: Maximum memory a query can use during executionquery_max_execution_time: Maximum CPU execution time a query can consume
Trino hands-on examples
This section provides practical SQL examples that demonstrate how to work with schemas, tables, joins, and aggregations in Trino.Schema and table operations
Trino supports a variety of metadata operations for creating tables, defining views, and modifying existing schema definitions. These operations work directly with the underlying data lake storage and across available catalogs.Certain operations require appropriate permissions or roles.
Create an Iceberg table
You can useCREATE TABLE to create a new table in a database. The table might be
empty or populated with data from a query.
The following example creates a new table summarizing employee counts and average salaries
by job ID, providing a quick overview of compensation patterns.
Create a view
You can useCREATE VIEW to define a virtual table representing a saved query. It’s useful
for simplifying complex queries or providing controlled access.
The following example defines a view to easily access employees earning higher than 5000,
without modifying the underlying table.
Alter a table
You can useALTER TABLE to modify an existing table’s structure, such as adding,
dropping, or modifying columns.
The following example adds a new bonus column to the employees table to store additional
compensation data.
Available catalogs
Two catalogs are accessible in the NexusOne Trino environment. These include:-
Iceberg: Iceberg provides access to tables stored in S3 or HDFS.
Key features include the following:
- Atomic commits, rollback, and schema evolution
- Uses hidden partitioning and metadata-based pruning for efficient scans
- Works well with cloud object storage
- Allows time-travel queries using table snapshots
- Serves curated, frequently queried analytical datasets
-
Hive: Hive provides access to tables in Hive-compatible formats, such as
Parquet or ORC. A Hive Metastore stores the metadata while the underlying files
reside in S3 or HDFS.
Key features include the following:
- Directory-based partitioning
- Supports multiple file formats and storage layouts
- Used for raw, historical, or legacy datasets
- Provides broad compatibility with existing Hadoop/Hive ecosystems
- Serves foundational lake data or mixed-format workloads
Joins and aggregations
You can perform standard relational joins, cross-catalog joins, and apply optimization hints to improve execution when you know the size characteristics of your tables.Create a simple join
Use simple joins when querying tables within the same catalog or schema. These follow the standard SQL semantics and combine related datasets stored in a single source. The following example joins anemployees1 and departments tables to list employee
details with their department names.
Create a cross-catalog join
Cross-catalog joins allow Trino to combine data from different catalogs, such as joining Iceberg tables with Hive tables or external databases. This enables federated analytics without moving or duplicating datasets. The following example joins anemployees1 table in Iceberg with the departments table
in Hive to list employee details along with their department names.
Use a join optimization hint
Join optimization can be helpful when you have prior knowledge about the size or distribution of the tables involved in a join. In certain scenarios, such as joining a very large fact table with a small dimension table, Trino can optimize performance by broadcasting the small table. Trino copies the small table to every worker node. Each node then processes rows from the large table incrementally and joins them with the broadcasted table locally, hence avoiding redistribution of the large dataset across the cluster. The following example joins anemployees1 and departments tables, then it broadcasts the
departments table to optimize the join for better performance.
Best practices for query optimization
These best practices help you write efficient queries, complementing the hands-on examples previously described.Use EXPLAIN for query plans
It’s useful to inspect the query plan before running heavy queries.
EXPLAIN shows how Trino handles joins, filters, and table scans.
Push down predicates
Write filters so you can push it down into Iceberg or Hive metadata. This reduces I/O and avoids unnecessary S3 reads.-
Bad example:
-
Good example:
Prune columns
Select only the required columns. This helps minimize shuffles and reduces the amount of data pulled from S3.-
Bad example:
-
Good example:
Optimize joins
As previously described, use join hints when table sizes are predictable. Broadcast a small dimension table to speed up the join, and apply partitioning for large fact-to-dimension joins.-
Bad example:
-
Good example:
Limit large scans
Apply filters andLIMIT clauses during early data exploration to avoid
scanning entire Iceberg tables.
-
Bad example:
-
Good example:
Pre-compute aggregations with CTAS or materialized views
For dashboards or recurring workloads, pre-compute heavy aggregations using Create Table as Select (CTAS) or materialized views. This reduces the load on Trino and improves Superset’s responsiveness.Additional resources
- For more details about Trino, refer to the Trino official documentation.
- If you are using the NexusOne portal and want to learn how to launch Trino, refer to the Launch a hosted app page.