AWS Glue DQ segregate rows and publish results
01:11 08 May 2026

We are running AWS Glue Data Quality on Glue 5.0 and need a single evaluation of a DQDL
ruleset to deliver two outputs:

1. Row-level outcomes — so the ETL job can split input rows into good and bad partitions
for downstream processing.
2. Catalog publishing — the run must appear under the catalog table's Data Quality tab in
the Glue console.

Today these capabilities live in two different surfaces, and neither one delivers both:

  • EvaluateDataQuality.process_rows (ETL transform) — returns row-level outcomes, but does
    not publish to the table's DQ tab.
  • start_data_quality_ruleset_evaluation_run (boto3 API) — publishes to the table's DQ
    tab, but does not return row-level outcomes.

To get both, we would need to evaluate the same ruleset twice, which doubles the run of the data quality rules.

What we have tried

  • Created the ruleset with create_data_quality_ruleset using TargetTable, so the catalog
    table is bound to the ruleset.
  • Ran process_rows from inside the Glue job with enableDataQualityResultsPublishing and
    enableDataQualityCloudWatchMetrics set to true.
  • Set dataQualityEvaluationContext to the ruleset name.
  • Passed database, tableName, and catalogId in additional_options.
  • Set transformation_ctx on the source DynamicFrame to match the table name.

None of the above caused the run to appear under the catalog table's Data Quality tab.
Running start_data_quality_ruleset_evaluation_run against the same ruleset populates the
table tab , but only returns rule-level pass/fail — no row-level outcomes.

Questions

1. Is there a configuration of EvaluateDataQuality.process_rows that publishes results to
the catalog table's Data Quality tab in the same call?
2. Is there a way to get row-level segregation out of
start_data_quality_ruleset_evaluation_run?
3. If this split is intentional (ETL transform for row-level outcomes, API for the
catalog surface), is there a boto3 call that can post-publish an already-computed
evaluation result to the catalog table tab, so we don't have to evaluate the ruleset a
second time?

pyspark aws-glue data-quality