When using Iceberg with EMR 7.0.0 with s3 I got awssdk SdkClientException: Timeout waiting for connection from pool

03:13 08 Feb 2024

I recently switched EMR to the label 7.0.0. Part of my workload is doing some updates to big Iceberg tables using pyspark. I moved all my s3 paths to the s3 schema instead of s3a as suggested here.

Previously, Amazon EMR used the s3n and s3a file systems. While both still work, we recommend that you use the s3 URI scheme for the best performance, security, and reliability.

While running the Iceberg job I got this error:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 4217 in stage 4.0 failed 4 times, most recent failure: Lost task 4217.3 in stage 4.0 (TID 5632) (ip-10-5-7-244.us-east-2.compute.internal executor 48): software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)

amazon-web-services amazon-s3 pyspark amazon-emr apache-iceberg

Your Answer

Privacy & Cookie Consent