I have lots of csv files. After using spark sql, I want to make the one csv file.
For example I have news1.csv, news2.csv, news3.csv, ect in S3. I download into spark sql from s3, and createDataframe. After using spark sql, I want to upload s3 with only one csv file.
At first I tried spark.mode(append) for each csv file.
df = spark.sql(...)
df.write \
.option("header","true") \
.option("encoding", "UTF-8") \
.mode("append") \
.option("extracopyoptions", "TIMEFORMAT 'auto'") \
.csv("s3a://news/test1")
But in this case. append don't work. And each csv file was saved in news/test1 with part-00000..., part-00000..., part-00000...
Second I union dataframe
df = spark.sql(...)
df_total = df_total.union(df)
df_total.write \
.option("header","true") \
.option("encoding", "UTF-8") \
.mode("append") \
.option("extracopyoptions", "TIMEFORMAT 'auto'") \
.csv("s3a://news/test2")
But in this case, even I made the one dataframe, each csv file was saved in news/test2 with part-00000..., part-00001..., part-00002...
How can I save only one csv file to s3?
I need your help