Deployment context
Within NexusOne, Spark jobs can execute in the following environments:- Jupyter Notebooks: An interactive REPL environment with a pre-initialized SparkSession. Optimal for exploratory data analysis, prototyping, and collaborative development. If you are using the NexusOne portal, you can launch a Jupyter Notebook using the NexusOne Build feature.
-
Apache Airflow: Provides production orchestration via Apache Airflow using
PySpark decorators.
- Batch scheduling: Jobs submitted run as scheduled cron tasks, monitored continuously, and have alerting configured.
- Execution method: Jobs run as containerized Spark apps in Kubernetes pods with full lifecycle management.
- Command line via Kyuubi: Provides command-line submission for external job execution. This is available for jobs submitted from outside the Kubernetes cluster.
Current version information
- Python version:
3.12 - Spark version:
3.5.6
Supported Spark interfaces
- PySpark via the Python API: Programmatic access to DataFrame and SQL APIs for pipeline development and data engineering workflows.
- Spark SQL: Write standard SQL queries that execute on Spark. Perfect for analysts and data warehouse teams who prefer SQL syntax for querying and transforming data.
Supported library and table format
Apache Iceberg:1.8
The Apache Iceberg table format provides snapshot isolation and schema evolution.
Default Spark configuration
Additional resources
- To learn practical ways to use Spark in the NexusOne environment, refer to the Spark hands-on examples page.
- To learn about best practices when using Spark, refer to the Spark best practices page.
- For more details about Spark, refer to the Spark and Iceberg official documentation.