Skip to main content

New features

New features recently added to the NexusOne platform.

Kyuubi batch-submit

A new Python command-line tool for submitting and monitoring Apache Kyuubi batch jobs with full support for local file uploads. It includes the following capabilities:
  • Supports file uploads
    • Automatically detects local files vs remote S3 or HDFS URIs
    • Uploads local JARs, Python, and data files directly via the Kyuubi REST API
    • Supports mixed submissions combining local uploads with remote resources in a single job
  • Supports command-line flags for different file types
    • --resource: Main resource containing a JAR or Python file
    • --jars: JAR files
    • --pyfiles: Python files
    • --files: Data/config files
  • Real-time job monitoring with status updates and progress spinner
  • Automatic log retrieval upon job completion
  • YAML configuration support for reusable job configurations
  • Flexible authentication using a command-line tool, YAML, environment variable, or an interactive prompt
  • Spark History Server integration with formatted URLs
  • YuniKorn queue support for Kubernetes deployments
  • Exit codes for scripting and automation

s3Cli

A new S3 command-line tool, s3Cli, is replacing the AWS command-line tool in Jupyter environments. The S3 command-line tool is fully integrated with NexusOne’s multi-bucket architecture and Apache Ranger authorization. It includes the following capabilities:
  • Ability to map each bucket to a different S3 endpoint with Hadoop’s core-site.xml fs.s3a.bucket.<name>.endpoint configuration working behind the scenes
  • Supports multiple credentials
    • Each bucket loads its credentials from its own Java Cryptography Extension Key Store (JCEKS) keystore at /jceks/<bucket>.jceks
    • Bucket authorization falls back to a default keystore, /jceks/default.jceks, if no bucket-specific credentials exist in the JCEKS keystore
  • Mandatory authorization enforcement through Apache Ranger
  • Ability to manage buckets, directories, and files

nx1-sdk

A Python SDK, nx1-sdk, is now available for programmatic interaction with the NexusOne platform services. It includes the following capabilities:
  • Authentication
    • Keycloak OAuth 2.0 integration
    • Access token lifecycle management
    • Service account support
  • Data operations
    • Encrypted file ingestion
    • Schema management
    • Table operations such as create, drop, or alter
  • Platform integration
    • Policy management via Apache Ranger
    • Catalog operations via Apache Gravitino
    • S3 bucket management

Upgrades

Version upgrades to existing apps on the NexusOne platform.

Airflow v3.1 upgrade

Upgraded to Airflow v3.1. This brings significant architectural changes and new features such as the following:
  • API migration
    • REST API /api/v1/* are no longer available
    • Migrated all integrations to API /api/v2/*
    • execution_date replaced with logical_date
    • DateTime formats are now RFC3339-compliant
  • Authentication changes
    • Moved the Flask App Builder (FAB) authentication manager to a provider package
    • Installed apache-airflow-providers-fab for OAuth 2.0 or LDAP authentication
    • Changed the OAuth 2.0 callback URL from /oauth-authorized/keycloak to /auth/oauth-authorized/keycloak
    • Updated Keycloak redirect URIs accordingly
  • Task Execution Interface (TEI)
    • New SDK-based task execution architecture
    • Workers now communicate via the internal API server
    • Configures AIRFLOW__CORE__INTERNAL_API_URL for distributed deployments
  • Database Changes
    • Requires a new session table for FAB provider
    • After an upgrade, run airflow db migrate

Auth configuration updates

The following auth_manager and auth_backends settings reflect the new provider-based package for the FAB auth manager.
[core]
auth_manager= airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager

[fab]
auth_backends= airflow.providers.fab.auth_manager.api.auth.backend.session

Migration steps

Before upgrading, use the following steps:
  1. Backup your existing database
  2. Update your DAGs for Airflow 3 compatibility
  3. Run airflow db migrate
  4. Update OAuth 2.0 redirect URIs in Keycloak
  5. Verify that API integrations now use v2 endpoints

DataHub v1.3.0.1 upgrade

Upgraded DataHub to v1.3.0.1 for improved metadata management and lineage tracking. The key changes include the following:
  • Bootstrap process
    • New bootstrap dependency handling
    • The system-update job now runs before the Generalized Metadata Service (GMS) starts
  • Policy management
    • Improved policy population on startup
    • Domain-level access controls
    • Enhanced RBAC for data assets
  • Airflow integration
    • Updated DataHub Airflow plugin for Airflow v3 compatibility
    • OpenLineage-based collection
    • RuntimeTaskInstance support to track state changes of a task and manage the environment

Keycloak v26.0.5 upgrade

Upgraded Keycloak to v26.0.5 using the codecentric/keycloak Helm chart. The key changes include the following:
  • Deployment
    • Runtime moved from WildFly to Quarkus
    • Support for JGroups DNS-based discovery for Kubernetes cache clustering
    • External PostgreSQL backend instead of the default internal relational database
  • Configuration
    • KC_* prefix is now a standardized way to configure environment variables
    • Proxy mode support: edge for TLS termination at ingress
    • Health endpoints exposed at /health/live and /health/ready
  • Breaking changes
    • Admin account set up now uses KC_BOOTSTRAP_ADMIN_USERNAME and KC_BOOTSTRAP_ADMIN_PASSWORD environment variables
    • Truststore configuration now uses the KC_SPI_TRUSTSTORE_FILE_* environment variable prefix

Enhancements

Enhancements to existing app features on the NexusOne platform.

JupyterHub S3 browser extension

Major enhancements to the JupyterLab S3 browser plugin, providing a full-featured file management interface with enterprise security. It includes the following changes:
  • Multi-bucket and multi-endpoint support
    • Consolidated bucket listing from all configured endpoints, so you don’t need to switch endpoints manually
    • Can now map each bucket to a different S3 endpoint with Hadoop’s core-site.xml fs.s3a.bucket.<name>.endpoint configuration working behind the scenes
    • Each bucket can now load credentials from its own Java Cryptography Extension Key Store (JCEKS) file
    • Seamless navigation across buckets from different S3-compatible storage systems
  • Full Apache Ranger integration
    • When listing buckets, Ranger user permissions are now checked
    • All file operations, such as read, write, or delete, now trigger a Ranger check
    • Real-time permission checks on navigation and actions
    • Users only see content they’re authorized to access
  • Improved user experience
    • Content previews for file formats such as .txt, .csv, and more
    • An option to download files into the Jupyter workspace or directly to your local machine
    • Downloading a directory automatically zips the content
    • Large files are now read and sent in small chunks using file streaming
  • Enhanced file operations
    FeatureDescription
    Recursive deleteDelete directories and all contents with a single action
    Rename/MoveRename files and folders within the browser
    Create directoriesCreate new folders directly in the UI
    Directory downloadDownload entire directories as .zip files
    Recursive uploadUpload local directories while preserving folder structure

Portal enhancements

  • OAuth 2.0 credential management for Gravitino
    • OAuth 2.0 client credentials are now automatically generated for Gravitino REST catalogs
    • Tenant credentials are now individually managed
    • OAuth 2.0 tokens are now issued and validated using Keycloak
  • Iceberg catalog support
    • Iceberg is now a configurable catalog type in the portal
    • You can now provision Iceberg REST catalogs
    • Spark and Trino configurations can now be automatically generated
  • S3 bucket management
    • Naming validation
      • Bucket names now follow DNS-compliant rules
      • The system enforces prefixes per tenant or environment
    • Bucket operations
      • You can now delete S3 buckets, update bucket configurations, and manage bucket lifecycle policies
    • Credential masking
      • S3 access keys and secrets are now hidden in the portal UI after creation
      • Credentials are only visible at the moment they’re generated
      • Credential rotation is now available

Deprecations

Tools or features that won’t be in use soon.

AWS command-line tool

In Jupyter environments, the AWS command-line tool is no longer going to be in use soon. All S3 operations should now use the s3Cli for proper credential management and access control.