Domain-Specific Visual Language for Data Engineering Quality

Abstract

Data engineering pipelines process large amounts of information, and ensuring that the quality and integrity of the data is maintained throughout is critical for technical, business, and social reasons. Conventional data quality assurance approaches require a large amount of fine-grained testing code, which is laborious, easy to get out of sync, and inscrutable to non-technical stakeholders. An executable higher-level visual approach to expressing quality requirements can serve as a shared representation of these constraints and their implications for all parties, eliminating repetition while increasing accessibility and maintainability. We present a visual programming language for expressing data quality requirements within a pipeline declaratively, structured as a diagram of compositional data flow, transformation, and validation steps.

Authors

Alexis De Meo, Michael Homer

Published in

ACM SIGPLAN International Workshop on Programming Abstractions and Interactive Notations, Tools, and Environments (PAINT), 2022

The final copy of this publication is available from the publisher.

Resources

PDF
mwh.nz/pdf/paint2022deq
this page
mwh.nz/pubs/paint2022deq
Michael Homer — 2024 103d820c