Spark jdbc postgres numeric array

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark jdbc postgres numeric array

Alexey
Hi,

I came across strange behavior when dealing with postgres columns of type numeric[] using Spark 2.3.2, PostgreSQL 10.4, 9.6.9.
Consider the following table definition:

create table test1
(
   v  numeric[],
   d  numeric
);

insert into test1 values('{1111.222,2222.332}', 222.4555);

When reading the table into a Dataframe, I get the following schema:

root
 |-- v: array (nullable = true)
 |    |-- element: decimal(0,0) (containsNull = true)
 |-- d: decimal(38,18) (nullable = true)

Notice that for both columns precision and scale were not specified, but in case of the array element I got both set to 0, while in the other case defaults were set.

Later, when I try to read the Dataframe, I get the following error:

java.lang.IllegalArgumentException: requirement failed: Decimal precision 4 exceeds max precision 0
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
        at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:453)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$16$$anonfun$apply$6$$anonfun$apply$7.apply(JdbcUtils.scala:474)
        ...

I would expect to get array elements of type decimal(38,18) and no error when reading in this case.
Should this be considered a bug? Is there a workaround other than changing the column array type definition to include explicit precision and scale?

Best regards,
Alexey

-- реклама -----------------------------------------------------------
Поторопись зарегистрировать самый короткий почтовый адрес @i.ua
https://mail.i.ua/reg - и получи 1Gb для хранения писем

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark jdbc postgres numeric array

Takeshi Yamamuro
Hi,

I checked that v2.2/v2.3/v2.4/master had the same issue, so can you file a jira?
I looked over the related code and then I think we need more logics to handle this issue;
 

On Tue, Jan 1, 2019 at 12:13 AM Alexey <[hidden email]> wrote:
Hi,

I came across strange behavior when dealing with postgres columns of type numeric[] using Spark 2.3.2, PostgreSQL 10.4, 9.6.9.
Consider the following table definition:

create table test1
(
   v  numeric[],
   d  numeric
);

insert into test1 values('{1111.222,2222.332}', 222.4555);

When reading the table into a Dataframe, I get the following schema:

root
 |-- v: array (nullable = true)
 |    |-- element: decimal(0,0) (containsNull = true)
 |-- d: decimal(38,18) (nullable = true)

Notice that for both columns precision and scale were not specified, but in case of the array element I got both set to 0, while in the other case defaults were set.

Later, when I try to read the Dataframe, I get the following error:

java.lang.IllegalArgumentException: requirement failed: Decimal precision 4 exceeds max precision 0
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
        at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:453)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$16$$anonfun$apply$6$$anonfun$apply$7.apply(JdbcUtils.scala:474)
        ...

I would expect to get array elements of type decimal(38,18) and no error when reading in this case.
Should this be considered a bug? Is there a workaround other than changing the column array type definition to include explicit precision and scale?

Best regards,
Alexey

-- реклама -----------------------------------------------------------
Поторопись зарегистрировать самый короткий почтовый адрес @i.ua
https://mail.i.ua/reg - и получи 1Gb для хранения писем

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: Spark jdbc postgres numeric array

Takeshi Yamamuro

On Thu, Jan 3, 2019 at 10:04 PM Takeshi Yamamuro <[hidden email]> wrote:
Hi,

I checked that v2.2/v2.3/v2.4/master had the same issue, so can you file a jira?
I looked over the related code and then I think we need more logics to handle this issue;
 

On Tue, Jan 1, 2019 at 12:13 AM Alexey <[hidden email]> wrote:
Hi,

I came across strange behavior when dealing with postgres columns of type numeric[] using Spark 2.3.2, PostgreSQL 10.4, 9.6.9.
Consider the following table definition:

create table test1
(
   v  numeric[],
   d  numeric
);

insert into test1 values('{1111.222,2222.332}', 222.4555);

When reading the table into a Dataframe, I get the following schema:

root
 |-- v: array (nullable = true)
 |    |-- element: decimal(0,0) (containsNull = true)
 |-- d: decimal(38,18) (nullable = true)

Notice that for both columns precision and scale were not specified, but in case of the array element I got both set to 0, while in the other case defaults were set.

Later, when I try to read the Dataframe, I get the following error:

java.lang.IllegalArgumentException: requirement failed: Decimal precision 4 exceeds max precision 0
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
        at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:453)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$16$$anonfun$apply$6$$anonfun$apply$7.apply(JdbcUtils.scala:474)
        ...

I would expect to get array elements of type decimal(38,18) and no error when reading in this case.
Should this be considered a bug? Is there a workaround other than changing the column array type definition to include explicit precision and scale?

Best regards,
Alexey

-- реклама -----------------------------------------------------------
Поторопись зарегистрировать самый короткий почтовый адрес @i.ua
https://mail.i.ua/reg - и получи 1Gb для хранения писем

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
---
Takeshi Yamamuro


--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: Spark jdbc postgres numeric array

Alexey
Hi,

I also filed a jira yesterday:
https://issues.apache.org/jira/browse/SPARK-26538

Looks like one needs to be closed as duplicate. Sorry for the late update.

Best regards



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]