Data Quality

Keywords

Quality Requirements, QA4AI, AI4QA, Quality Code, Automation

Description

This research area focuses on ensuring that data used in analytics, machine learning, and operational systems is accurate, complete, consistent, timely, and trustworthy across its lifecycle. It examines methods, tools, and governance practices for defining, measuring, and improving data quality from ingestion to consumption—covering metadata, lineage, validation, monitoring, and human oversight—in centralized and decentralized settings (e.g., data lakes, warehouses, data mesh, and IoT pipelines).

Objectives

  • Define a common data quality model (dimensions, metrics, thresholds, SLAs/SLOs).

  • Develop profiling and validation techniques to detect errors (nulls, duplicates, drift, schema changes).

  • Implement continuous monitoring & alerts across batch and streaming pipelines.

  • Establish data governance processes (ownership, stewardship, policies, and standards).

  • Integrate metadata & lineage to trace issues to their sources and assess impact.

  • Provide human-in-the-loop workflows for triage, remediation, and exception handling.

  • Evaluate cost–benefit and performance trade-offs of quality controls at scale.

  • Deliver reusable tooling and benchmarks to support adoption in industry and research.

  •