Deprecations
Tools or features that are losing its value.Apache Knox
- Nginx + OAuth 2.0 Proxy replaced Apache Knox gateway for authentication and authorization routing
- Simplified architecture, better Keycloak/OIDC integration, and native JWT cookie support for Ranger SSO
- All existing deployments require a migration from Knox to OAuth 2.0 Proxy
Bitnami resources
Community-maintained components or direct alternatives replaced all Bitnami Helm charts due to licensing changes. The following table describes the affected components:| Component | Bitnami chart | Replacement |
|---|---|---|
| Kafka | bitnami/kafka | Custom KRaft StatefulSet |
| PostgreSQL | bitnami/postgresql | groundhog2k/postgres |
| Keycloak | bitnami/keycloak | codecentric/keycloakx |
| Various utilities | bitnami/* | Direct manifests or community charts |
- Kafka now runs in KRaft mode with a two-broker setup for DataHub compatibility.
- PostgreSQL charts include backup job support, compatible Kubernetes PVC naming, and
custom
max_connectionsconfiguration. - Keycloak upgraded to Keycloak 26+ with an external PostgreSQL DB and Kubernetes caching support.
Enhancements
Enhancements to existing app features on the NexusOne platform.JupyterHub S3 browser extension
Major enhancements to the JupyterLab S3 browser plugin, providing a full-featured file management interface with enterprise security. It includes the following changes:-
Multi-bucket and multi-endpoint support
- Consolidated bucket listing from all configured endpoints, so you don’t need to switch endpoints manually
- Can now map each bucket to a different S3 endpoint with Hadoop’s
core-site.xmlfs.s3a.bucket.<name>.endpointconfiguration working behind the scenes - Each bucket can now load credentials from its own Java Cryptography Extension Key Store (JCEKS) file
- Seamless navigation across buckets from different S3-compatible storage systems
-
Full Apache Ranger integration
- When listing buckets, Ranger user permissions are now checked
- All file operations, such as read, write, or delete, now trigger a Ranger check
- Real-time permission checks on navigation and actions
- Users only see content they’re authorized to access
-
Improved user experience
- Content previews for file formats such as
.txt,.csv, and more - An option to download files into the Jupyter workspace or directly to your local machine
- Downloading a directory automatically zips the content
- Large files are now read and sent in small chunks using file streaming
- Content previews for file formats such as
-
Enhanced file operations
Feature Description Recursive delete Delete directories and all contents with a single action Rename/Move Rename files and folders within the browser Create directories Create new folders directly in the UI Directory download Download entire directories as .zipfilesRecursive upload Upload local directories while preserving folder structure
New features
New features recently added to the NexusOne platform.Active directory integration
You can now automatically onboard an Active Directory for both shared services and tenant-specific Keycloak realms. It includes the following capabilities:- Shared services realm
- Platform administrators can now authenticate against a centralized AD through Keycloak federation
- Keycloak now synchronizes AD groups and maps them to an RBAC
- Username now uses a lowercase to prevent case-sensitivity issues
- Tenant realm automation
- On tenant creation, LDAP/AD federation is automatically configured in Keycloak
- You can now configure filters to limit which users or groups you import from AD into Keycloak
- Users and groups from AD are now synchronized to Keycloak on a schedule
Gravitino REST catalogs
With the new addition of a Gravitino REST catalog service, you now have access to unified metadata management across NexusOne. The Gravitino REST catalog service ships with the following catalog types:- Central Hive metastore catalog:
- You have access to a Hive Metastore for technical metadata
- You can apply platform-wide governance automatically to all datasets
- You can access your data directly in S3A storage
- Tenant JDBC catalog
- Each tenant gets its own dedicated JDBC catalog for isolated metadata management
- It uses a PostgreSQL database for isolation
- You can securely access S3 data with automatically issued temporary credentials per tenant
- Authentication to Gravitino REST calls uses a Keycloak-issued OAuth 2.0 token
- Spark now connects to the Gravitino REST catalogs with a Keycloak credential and an OAuth 2.0 token
Kyuubi batch-submit
A new Python command-line tool for submitting and monitoring Apache Kyuubi batch jobs with full support for local file uploads. It includes the following capabilities:- Supports file uploads
- Automatically detects local files vs remote S3 or HDFS URIs
- Uploads local JARs, Python, and data files directly via the Kyuubi REST API
- Supports mixed submissions combining local uploads with remote resources in a single job
- Supports command-line flags for different file types
--resource: Main resource containing a JAR or Python file--jars: JAR files--pyfiles: Python files--files: Data/config files
- Real-time job monitoring with status updates and progress spinner
- Automatic log retrieval upon job completion
- YAML configuration support for reusable job configurations
- Flexible authentication using a command-line tool, YAML, environment variable, or an interactive prompt
- Spark History Server integration with formatted URLs
- YuniKorn queue support for Kubernetes deployments
- Exit codes for scripting and automation
NexusOne operator
A Kubernetes Operator is now available for declarative management of shared services and tenant deployments. It includes the following capabilities:- Platform deployment automation
deploy.nx1.io/v1alpha1Custom Resource Definition (CRD)- Terraform-based deployment of PostgreSQL, Keycloak, DataHub, and the tenant manager UI/API
- Terraform lifecycle actions:
plan,apply, anddestroy - Flexible deployment component modes:
core: Deploys foundational services such as PostgreSQL, Keycloak, and networkingmodules: Deploys individual platform components, such as DataHub, only if core services are already deployedall: Deploys the full platform stack, including core services and all modules
- Option to define an S3 Terraform remote backend
- Ingress configuration
- Supports
ocp,nginx, andtraefik
- Supports
- SSL management
- Supports
venafi,cert-manager, andself-signedcertificates
- Supports
s3Cli
A new S3 command-line tool,s3Cli, is replacing the AWS command-line tool in Jupyter environments.
The S3 command-line tool is fully integrated with NexusOne’s multi-bucket architecture and Apache
Ranger authorization.
It includes the following capabilities:
- Ability to map each bucket to a different S3 endpoint with Hadoop’s
core-site.xmlfs.s3a.bucket.<name>.endpointconfiguration working behind the scenes - Supports multiple credentials
- Each bucket loads its credentials from its own Java Cryptography Extension Key Store (JCEKS) keystore
at
/jceks/<bucket>.jceks - Bucket authorization falls back to a default keystore,
/jceks/default.jceks, if no bucket-specific credentials exist in the JCEKS keystore
- Each bucket loads its credentials from its own Java Cryptography Extension Key Store (JCEKS) keystore
at
- Mandatory authorization enforcement through Apache Ranger
- Ability to manage buckets, directories, and files
nx1-sdk
A Python SDK,nx1-sdk, is now available for programmatic interaction with the NexusOne
platform services.
It includes the following capabilities:
- Authentication
- Keycloak OAuth 2.0 integration
- Access token lifecycle management
- Service account support
- Data operations
- Encrypted file ingestion
- Schema management
- Table operations such as create, drop, or alter
- Platform integration
- Policy management via Apache Ranger
- Catalog operations via Apache Gravitino
- S3 bucket management
Unified Ranger services
A new unified Ranger service,nx1-unifiedsql, is now available. It provides consistent authorization
policies to Trino, Spark, and S3. One policy model governs everything.
It includes the following capabilities:
- One unified policy service
- Hierarchical SQL authorization,
catalog > schema > table > column, so there is fine-grained access control at different levels - Fine-grained data-level controls to hide columns and rows
- S3 access uses a URL-based Ranger policy with read/write controls
- You can use wildcards in S3 paths to apply policies to multiple objects at once
- Ranger now enforces policies correctly on all Spark catalogs, not just the default one
- Allow/Deny policy evaluation now happens in a consistent order across both Spark and Trino
Upgrades
Version upgrades to existing apps on the NexusOne platform.Airflow v3.1 upgrade
Upgraded to Airflowv3.1. This brings significant architectural changes and
new features such as the following:
- API migration
- REST API
/api/v1/*are no longer available - Migrated all integrations to API
/api/v2/* execution_datereplaced withlogical_date- DateTime formats are now RFC3339-compliant
- REST API
- Authentication changes
- Moved the Flask App Builder (FAB) authentication manager to a provider package
- Installed
apache-airflow-providers-fabfor OAuth 2.0 or LDAP authentication - Changed the OAuth 2.0 callback URL from
/oauth-authorized/keycloakto/auth/oauth-authorized/keycloak - Updated Keycloak redirect URIs accordingly
- Task Execution Interface (TEI)
- New SDK-based task execution architecture
- Workers now communicate via the internal API server
- Configures
AIRFLOW__CORE__INTERNAL_API_URLfor distributed deployments
- Database Changes
- Requires a new
sessiontable for FAB provider - After an upgrade, run
airflow db migrate
- Requires a new
Auth configuration updates
The followingauth_manager and auth_backends settings reflect the new provider-based package for
the FAB auth manager.
Migration steps
Before upgrading, use the following steps:- Backup your existing database
- Update your DAGs for Airflow 3 compatibility
- Run
airflow db migrate - Update OAuth 2.0 redirect URIs in Keycloak
- Verify that API integrations now use v2 endpoints
DataHub v1.3.0.1 upgrade
Upgraded DataHub to v1.3.0.1 for improved metadata management and lineage tracking.
The key changes include the following:
- Bootstrap process
- New bootstrap dependency handling
- The
system-updatejob now runs before the Generalized Metadata Service (GMS) starts
- Policy management
- Improved policy population on startup
- Domain-level access controls
- Enhanced RBAC for data assets
- Airflow integration
- Updated DataHub Airflow plugin for Airflow
v3compatibility - OpenLineage-based collection
RuntimeTaskInstancesupport to track state changes of a task and manage the environment
- Updated DataHub Airflow plugin for Airflow
Keycloak v26.0.5 upgrade
Upgraded Keycloak to v26.0.5 using the codecentric/keycloak Helm chart.
The key changes include the following:
- Deployment
- Runtime moved from WildFly to Quarkus
- Support for JGroups DNS-based discovery for Kubernetes cache clustering
- External PostgreSQL backend instead of the default internal relational database
- Configuration
KC_*prefix is now a standardized way to configure environment variables- Proxy mode support:
edgefor TLS termination at ingress - Health endpoints exposed at
/health/liveand/health/ready
- Breaking changes
- Admin account set up now uses
KC_BOOTSTRAP_ADMIN_USERNAMEandKC_BOOTSTRAP_ADMIN_PASSWORDenvironment variables - Truststore configuration now uses the
KC_SPI_TRUSTSTORE_FILE_*environment variable prefix
- Admin account set up now uses