Avro/Parquet GenericFixed decimal is not read into Spark correctly

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Avro/Parquet GenericFixed decimal is not read into Spark correctly

Justin Pihony
This post has NOT been accepted by the mailing list yet.

Before creating a JIRA for this I wanted to get a sense as to whether it would be shot down or not:

Take the following code:

spark-shell --packages org.apache.avro:avro:1.8.1
import org.apache.avro.{Conversions, LogicalTypes, Schema}
import java.math.BigDecimal
val dc = new Conversions.DecimalConversion()
val javaBD = BigDecimal.valueOf(643.85924958)
val schema =
    Schema.parse("{\"type\":\"record\",\"name\":\"Header\",\"namespace\":\"org.apache.avro.file\",\"fields\":[" +
      "{\"name\":\"COLUMN\",\"type\":[\"null\",{\"type\":\"fixed\",\"name\":\"COLUMN\"," +
val schemaDec = schema.getField("COLUMN").schema()
val fieldSchema = if(schemaDec.getType() == Schema.Type.UNION) schemaDec.getTypes.get(1) else schemaDec
val converted = dc.toFixed(javaBD, fieldSchema, LogicalTypes.decimal(javaBD.precision, javaBD.scale))

and you'll get this error:

java.lang.UnsupportedOperationException: Schema for type org.apache.avro.generic.GenericFixed is not supported

However if you write out a parquet file using the AvroParquetWriter and the above GenericFixed value (converted), then read it in via the DataFrameReader the decimal value that is retrieved is not accurate (ie. 643... above is listed as -0.5...)

Even if not supported, is there any way to at least have it throw an UnsupportedOperationException as it does when you try to do it directly (as compared to read in from a file)