Skip to main content
The DataHub best practices page describes several efficient ways to use DataHub. To maintain a high-quality and trustworthy data catalog, follow these recommended best practices:
  1. Assign owners to every dataset: Ensure each dataset has a clearly identified owner responsible for quality, access, and documentation.
  2. Keep descriptions up to date: Maintain accurate descriptions at both the table and column levels so users can easily understand the dataset’s purpose and contents.
  3. Use standardized glossary terms: Apply approved business terms consistently across datasets to promote shared understanding and improve searchability.
  4. Tag datasets with relevant classifications: Use tags and classifications to support governance, discovery, and compliance workflows.
  5. Review stale or deprecated datasets: Periodically audit unused or superseded datasets and mark them as deprecated when appropriate.
  6. Monitor and maintain ingestion pipelines: Monitor metadata ingestion pipelines and ensure they run reliably and without errors, so the catalog remains accurate and current.
  7. Define and maintain data quality tests: Implement table-level and column-level tests for critical datasets to validate schema, freshness, null values, ranges, or business rules.
  8. Automate test execution within pipelines: Run data quality tests automatically as part of ETL/ELT workflows or orchestration jobs to ensure consistent and reliable validation.
  9. Investigate and resolve failures promptly: Use lineage and test failure details to diagnose root causes and coordinate remediation with upstream dataset owners.
  10. Monitor historical data quality trends: Review test history and recurring failures to detect long-term quality issues and prevent downstream impact.

Additional resources

  • To get an overview of DataHub, refer to the DataHub in NexusOne page.
  • For more details about DataHub, refer to the DataHub official documentation.
  • If you are using the NexusOne portal and want to learn how to launch DataHub, refer to the Govern page.