AWS Glue DQ segregate rows and publish results

01:11 08 May 2026

We are running AWS Glue Data Quality on Glue 5.0 and need a single evaluation of a DQDL
ruleset to deliver two outputs:

1. Row-level outcomes — so the ETL job can split input rows into good and bad partitions
for downstream processing.
2. Catalog publishing — the run must appear under the catalog table's Data Quality tab in
the Glue console.

Today these capabilities live in two different surfaces, and neither one delivers both:

EvaluateDataQuality.process_rows (ETL transform) — returns row-level outcomes, but does
not publish to the table's DQ tab.
start_data_quality_ruleset_evaluation_run (boto3 API) — publishes to the table's DQ
tab, but does not return row-level outcomes.

To get both, we would need to evaluate the same ruleset twice, which doubles the run of the data quality rules.

What we have tried

Created the ruleset with create_data_quality_ruleset using TargetTable, so the catalog
table is bound to the ruleset.
Ran process_rows from inside the Glue job with enableDataQualityResultsPublishing and
enableDataQualityCloudWatchMetrics set to true.
Set dataQualityEvaluationContext to the ruleset name.
Passed database, tableName, and catalogId in additional_options.
Set transformation_ctx on the source DynamicFrame to match the table name.

None of the above caused the run to appear under the catalog table's Data Quality tab.
Running start_data_quality_ruleset_evaluation_run against the same ruleset populates the
table tab , but only returns rule-level pass/fail — no row-level outcomes.

Questions

1. Is there a configuration of EvaluateDataQuality.process_rows that publishes results to
the catalog table's Data Quality tab in the same call?
2. Is there a way to get row-level segregation out of
start_data_quality_ruleset_evaluation_run?
3. If this split is intentional (ETL transform for row-level outcomes, API for the
catalog surface), is there a boto3 call that can post-publish an already-computed
evaluation result to the catalog table tab, so we don't have to evaluate the ruleset a
second time?

pyspark aws-glue data-quality

Your Answer

Privacy & Cookie Consent