"Unable to infer schema for JSON." error in PySpark?
02:26 01 Nov 2022

I have a json file with about 1,200,000 records. I want to read this file with pyspark as :

spark.read.option("multiline","true").json('file.json')

But it causes this error:

AnalysisException: Unable to infer schema for JSON. It must be specified manually.

When I create a json file with a smaller record count in the main file, this code can read the file.

I can read this json file with pandas, when I set the encoding to utf-8-sig:

pd.read_json("file.json", encoding = 'utf-8-sig')

How can I solve this problem?

json file apache-spark pyspark schema