Schema mismatch when running code in Python
I need some help cleaning up some of the code for a job that is getting a schema miss-match in dev and will eventually have the same bugs in production. It needs to handle schema evolution so that it matches the expected structure. This is the Job here, as well as the error that I am getting:
My column names tell me something critical:
openfda_brand_name_0
openfda_brand_name_1
openfda_brand_name_2
...
openfda_brand_name_38
I'm assuming usually happens when:
I flattened an array incorrectly, or
I repeatedly expanded nested fields across multiple runs, or
I am unioning JSON records where arrays have variable length
This design is dangerous long-term — you’ll keep adding new numbered columns forever.
See error below-
AnalysisException: A schema mismatch detected when writing to the Delta table (Table ID: 48176273-35d5-486a-827f-fa5f20f3fe29).
To enable schema migration using DataFrameWriter or DataStreamWriter, please set:
'.option("mergeSchema", "true")'.
For other operations, set the session configuration
spark.databricks.delta.schema.autoMerge.enabled to "true". See the documentation
specific to the operation for details.