Core capabilities
Kyuubi supports several core capabilities, which include:- JDBC/ODBC access: By supporting the JDBC/ODBC universal database connectivity standards, Kyuubi allows a wide range of clients, such as Tableau, Power BI, DBeaver, and custom apps, to connect.
- Multi-tenancy: Kyuubi serves multiple tenants, which are logical groups of users within a Kyuubi instance. It also isolates sessions, resources, configurations, and data access by preventing one tenant’s workload from impacting another’s.
- Spark engines: Kyuubi is deeply integrated with Apache Spark. It can dynamically launch and manage Spark SQL engine instances on demand, for example, in YARN or Kubernetes, for each user or session, providing the full power of distributed Spark processing through simple SQL calls.
- SQL gateway: Kyuubi extends the Apache Spark Thrift JDBC/ODBC Server, which exposes a JDBC/ODBC interface over the Thrift protocol. This allows multiple clients to connect and run SQL queries without needing to know the details of the backend execution engine.
Supported SQL engines
Kyuubi supports two types of SQL engines, which include:- Spark SQL: This is the primary and most feature-complete engine supported by Kyuubi. It leverages Spark’s distributed computing power for complex ETL, batch processing, and SQL analytics.
- Optional engines such as Trino or Flink: Kyuubi’s architecture is pluggable. While Spark is the default, it extends and connects with other SQL engines like Trino for high-performance and low-latency querying or Apache Flink for real-time and stream processing. The availability of features depends on the backend engine used.
Architecture
The architecture of Kyuubi is client-server-based, meaning it’s designed to separate the client interface from the heavy-lift processing engines. This section describes how the components interact with each other.
- Client: The app or user submitting SQL queries via a JDBC or ODBC interface.
- Kyuubi server: This is the central gateway node. It’s a stateless service that
clients connect to. Its responsibilities include the following:
- Authenticating clients
- Managing JDBC/ODBC sessions
- Parsing and routing client requests
- Managing the lifecycle of backend engines
- Execution layer: This comprises the orchestrator and the engine. When a client connects,
the Kyuubi server requests this layer to deploy an engine.
- YARN or Kubernetes: The app orchestrator. It deploys a YARN container or Kubernetes pod containing an engine.
- Engine: A worker process that executes the SQL queries. A few examples include:
- Spark engine: A long-lived Spark driver process deployed for query execution. Kyuubi can map client sessions one-to-one or allow multiple sessions to share the same engine.
- Trino Engine: A Trino coordinator node deployed for query execution. Same principles as Spark; sessions can share or have isolated engines.
- Hive Metastore: Kyuubi doesn’t store table metadata on its own. The Hive Metastore (HMS)
serves as the catalog for tables, by providing metadata such as schema, partitions, and locations.
This setup provides features such as:
- Time travel
- Schema evolution
- Partition evolution
- Efficient inserts/updates/upserts
- Atomic commits
- S3 data lake: The storage layer where data files reside. In NexusOne, Iceberg is the primary table format. The engines read or write data here based on metadata from the Hive Metastore.
Exploring the Kyuubi UI
The Kyuubi UI is a built-in web-based interface that provides real-time visibility into the server’s operations, active users, and running queries. It’s an essential tool for both administrators and developers for monitoring and debugging. Kyuubi is available at the following designated URL:When you purchase NexusOne, you receive a client name.
Replace
client with your assigned client name.Overview
This displays the main dashboard, which gives a high-level summary of the server uptime, version, and aggregate metrics such as the total sessions and executed statements.Management
This management page displays information about sessions, operations, Kyuubi engines, and Kyuubi servers.
Management sidebar on the UI
- Session: The session page displays a table with the following details:
User: Name of the user who created the JDBC or ODBC sessionEngine ID: ID of the engine used by the userClient IP: IP address of the client that made a requestKyuubi Instance: FQDN and port of the Kyuubi server, which registered the sessionSession ID: ID of the session created by the userCreate Time: Session creation timeOperation: Actions available for that session via web UI
- Operations: The operation page displays a table with the following details:
User: Name of the user who started the operationOperation ID: ID of the operation requested by the userStatement: SQL of the statementState: Current state of the operationState Time: Amount of time the operation has spent in its current stateCompleted Time: End time of the operationDuration: The amount of time that the operation has been runningOperation: Actions available for that operation via web UI.
- Engines: The engine page displays a table with the following details:
Engine Address: IP address of the engine’s serverEngine ID: ID of the engineEngine Type: Type of engine. The current version of the web UI only showsSPARK-SQLenginesShare Level: Shows who can use this engineUser: Name of the user who created the engineVersion: Version of the Kyuubi server associated with this engineOperation: Actions available for that engine via web UI
- Server: The Server page displays a table with the following details:
Server IP: Kyuubi server’s IP addressnamespace: Namespace to which the server belongsKyuubi Instance: FQDN and port of the serverVersion: Version of the Kyuubi serverState: Current state of the server
Swagger
The Swagger page displays an interactive Kyuubi REST-API reference. It enables sending requests to Kyuubi using a web interface. To send a request, click the desired HTTP method, select Try it out, enter parameters if necessary, and then click Execute.
Swagger sidebar on the UI
SQL editor
This SQL editor page displays an interactive interface where you can directly write, format, and execute SQL queries against the Kyuubi server. It’s a quick way to run queries without an external client and view logs. This sidebar page comprises the following when you run a query:- Result: A tabular format containing the result of a query.
- Log: Provides a history of executed SQL statements. You can see the query text, execution time, and status. Then you can crucially access the detailed logs and error messages for troubleshooting failed operations.
Additional resources
- To learn practical ways to use Kyuubi in the NexusOne environment, refer to the Kyuubi hands-on examples page.
- For more details about Kyuubi, refer to the Kyuubi official documentation.