
Product offering
Key features
- Data ingestion: Provides capabilities to ingest files, mirrored or batch databases, or a lakehouse.
- Orchestration and processing: Provides an orchestration and processing layer for your data pipeline at scale using Apache Airflow and Spark.
- Data transformation and preparation: Structures and prepares your ingested data for later analysis.
- Data validation: Provides recommended rules and also allows you to create custom rules for consistently maintaining data quality.
- Data analysis and insight discovery: Provides recommended queries or allows you to generate queries to analyze and uncover insights in your data.
- Analytics: Leverages Trino for your SQL queries and Metabase or Superset to visualize the dataset.
- Identity, policy, and governance: Secures access with Keycloak, enforces policies with Apache Ranger, and tracks lineage with DataHub, so your data remains trusted and compliant.
- Streaming and events: Leverages NiFi/Kafka/Flink, though Kafka here is currently used for policy distribution instead of just producer-to-consumer pipelines.
- Autonomous assistance: Provides composable AI agents using CrewAI. This helps you reduce manual effort by generating and suggesting SQL queries with deeper insights. It works in combination with Data discovery and insights, Data validation, and Data transformation and preparation.
- ML/AI development: Leverages Jupyter and PyTorch to provide a foundation for model development.
- ML/AI operations: Supports distributed training, deployment, and lifecycle management using Ray, MLflow, and Ollama.