Skip to main content
The Build feature is a JupyterHub-based development environment integrated within NexusOne. It provides a build environment for developing and testing Apache Airflow DAGs that orchestrate ETL pipelines. Build is directly connected to Apache Airflow through a shared DAGs directory, which allows you to create, modify, and manage tasks that Apache Airflow can execute automatically.

Key features

Build offers flexible compute environments and shared directories that can make developing and managing DAGs simple.

Environment options

When setting up the Build environment, you can choose between two compute environments:
  • Non-GPU profile: Designed for general-purpose data processing use cases.
  • GPU-enabled profile: Designed for high-performance GPU-based use cases such as machine learning.
Each environment is temporary, it times out and gets destroyed when you aren’t actively interacting with it.

Default directories

When you launch a JupyterHub session, NexusOne launches a Kubernetes pod that spawns a dedicated Notebook server for you. The environment is pre-configured with two default directories you can interact with, dags and utils.

DAGs

Directed Acyclic Graphs, or dags, define specific tasks and in what order they should run. JupyterHub shares this directory with Apache Airflow. It contains all executed DAGs run within NexusOne via the Ask, Engineer, or Ingest feature. When you create or update a DAG in Build, the file is automatically written to the shared DAGs directory. Apache Airflow’s scheduler continuously monitors this DAG directory. When it automatically detects a new or modified DAG, it makes the DAG available for execution.

Utils

Utilities, or utils, contains reusable scripts or tools, such as Apache Spark, used to support the DAGs.

Use cases

These examples show how different industries can use NexusOne’s build capabilities:
  • Financial services: Develop DAGs in Build that ingest customer transaction files and invoke Apache Spark to perform data transformation actions.
  • Healthcare: Develop DAGs in Build that clean and validate patient records by triggering Apache Spark to perform data transformation actions.

Additional resources

For general instructions about how to ingest a file, schedule a data insight report, or schedule a transformation rule in NexusOne, refer to the following: