- Assign owners to every dataset: Ensure each dataset has a clearly identified owner responsible for quality, access, and documentation.
- Keep descriptions up to date: Maintain accurate descriptions at both the table and column levels so users can easily understand the dataset’s purpose and contents.
- Use standardized glossary terms: Apply approved business terms consistently across datasets to promote shared understanding and improve searchability.
- Tag datasets with relevant classifications: Use tags and classifications to support governance, discovery, and compliance workflows.
- Review stale or deprecated datasets: Periodically audit unused or superseded datasets and mark them as deprecated when appropriate.
- Monitor and maintain ingestion pipelines: Monitor metadata ingestion pipelines and ensure they run reliably and without errors, so the catalog remains accurate and current.
- Define and maintain data quality tests: Implement table-level and column-level tests for critical datasets to validate schema, freshness, null values, ranges, or business rules.
- Automate test execution within pipelines: Run data quality tests automatically as part of ETL/ELT workflows or orchestration jobs to ensure consistent and reliable validation.
- Investigate and resolve failures promptly: Use lineage and test failure details to diagnose root causes and coordinate remediation with upstream dataset owners.
- Monitor historical data quality trends: Review test history and recurring failures to detect long-term quality issues and prevent downstream impact.
Additional resources
- To get an overview of DataHub, refer to the DataHub in NexusOne page.
- For more details about DataHub, refer to the DataHub official documentation.
- If you are using the NexusOne portal and want to learn how to launch DataHub, refer to the Govern page.