Converting Spark Scala Dataframe Column to Byte Array
21:52 05 Feb 2018

I'm attempting to write a Spark Scala DataFrame Column as an array of bytes. I have a DataFrame that consists of two columns. The first column is a string and the second is a Map from Strings to Longs.

For example,

user_id | map
"ac2"   | Map("c2" -> 1, "b3" -> 5)

I want to write the map column as an array of bytes. So far I've attempted to use Jackson with the following UDF:

val writeJackson = udf { x: Map[String, Long] =>
    jacksonWriter.writeValueAsBytes(x)
}

val df2 = df.withColumn("jacksonMap", writeJackson($"map"))

but this fails because of

java.io.NotSerializableException: com.fasterxml.jackson.module.paranamer.shaded.CachingParanamer

Is there a way to get this to work with Jackson, and if not is there a different library that will let me write this Spark column as a byte array?

scala apache-spark jackson