I recently switched EMR to the label 7.0.0. Part of my workload is doing some updates to big Iceberg tables using pyspark. I moved all my s3 paths to the s3 schema instead of s3a as suggested here.
Previously, Amazon EMR used the s3n and s3a file systems. While both still work, we recommend that you use the s3 URI scheme for the best performance, security, and reliability.
While running the Iceberg job I got this error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 4217 in stage 4.0 failed 4 times, most recent failure: Lost task 4217.3 in stage 4.0 (TID 5632) (ip-10-5-7-244.us-east-2.compute.internal executor 48): software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)