Hive to Oracle using Spark - Type(Date) conversion issue

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Hive to Oracle using Spark - Type(Date) conversion issue

Gurusamy Thirupathy
Hi,

I am trying to read data from Hive as DataFrame, then trying to write the DF into the Oracle data base. In this case, the date field/column in hive is with Type Varchar(20)
but the corresponding column type in Oracle is Date. While reading from hive , the hive table names are dynamically decided(read from another table) based on some job condition(ex. Job1). There are multiple tables like this, so column and the table names are decided only run time. So I can't do type conversion explicitly when read from Hive.

So is there any utility/api available in Spark to achieve this conversion issue?


Thanks,
Guru
Reply | Threaded
Open this post in threaded view
|

Re: Hive to Oracle using Spark - Type(Date) conversion issue

Deepak Sharma
The other approach would to write to temp table and then merge the data.
But this may be expensive solution.

Thanks
Deepak

On Mon, Mar 19, 2018, 08:04 Gurusamy Thirupathy <[hidden email]> wrote:
Hi,

I am trying to read data from Hive as DataFrame, then trying to write the DF into the Oracle data base. In this case, the date field/column in hive is with Type Varchar(20)
but the corresponding column type in Oracle is Date. While reading from hive , the hive table names are dynamically decided(read from another table) based on some job condition(ex. Job1). There are multiple tables like this, so column and the table names are decided only run time. So I can't do type conversion explicitly when read from Hive.

So is there any utility/api available in Spark to achieve this conversion issue?


Thanks,
Guru
Reply | Threaded
Open this post in threaded view
|

Re: Hive to Oracle using Spark - Type(Date) conversion issue

ayan guha
Hi

The is not with spark in this case, it is with Oracle. If you do not know which columns to apply date-related conversion rule, then you have a problem. 

You should try either

a) Define some config file where you can define table name, date column name and date-format @ source  so that you can apply appropriate conversion dynamically
b) Write data into Oracle DB with String data type but have a view which will translate the date
c) Define Hive tables with date data type so that you can apply appropriate conversion



On Mon, Mar 19, 2018 at 1:36 PM, Deepak Sharma <[hidden email]> wrote:
The other approach would to write to temp table and then merge the data.
But this may be expensive solution.

Thanks
Deepak

On Mon, Mar 19, 2018, 08:04 Gurusamy Thirupathy <[hidden email]> wrote:
Hi,

I am trying to read data from Hive as DataFrame, then trying to write the DF into the Oracle data base. In this case, the date field/column in hive is with Type Varchar(20)
but the corresponding column type in Oracle is Date. While reading from hive , the hive table names are dynamically decided(read from another table) based on some job condition(ex. Job1). There are multiple tables like this, so column and the table names are decided only run time. So I can't do type conversion explicitly when read from Hive.

So is there any utility/api available in Spark to achieve this conversion issue?


Thanks,
Guru



--
Best Regards,
Ayan Guha
Reply | Threaded
Open this post in threaded view
|

Re: Hive to Oracle using Spark - Type(Date) conversion issue

Gurusamy Thirupathy
In reply to this post by Deepak Sharma
Hi guha,

Thanks for your quick response, option a and b are in our table already. For option b, again the same problem, we don't know which column is date.


Thanks,
-G

On Sun, Mar 18, 2018 at 9:36 PM, Deepak Sharma <[hidden email]> wrote:
The other approach would to write to temp table and then merge the data.
But this may be expensive solution.

Thanks
Deepak

On Mon, Mar 19, 2018, 08:04 Gurusamy Thirupathy <[hidden email]> wrote:
Hi,

I am trying to read data from Hive as DataFrame, then trying to write the DF into the Oracle data base. In this case, the date field/column in hive is with Type Varchar(20)
but the corresponding column type in Oracle is Date. While reading from hive , the hive table names are dynamically decided(read from another table) based on some job condition(ex. Job1). There are multiple tables like this, so column and the table names are decided only run time. So I can't do type conversion explicitly when read from Hive.

So is there any utility/api available in Spark to achieve this conversion issue?


Thanks,
Guru



--
Thanks,
Guru
Reply | Threaded
Open this post in threaded view
|

Re: Hive to Oracle using Spark - Type(Date) conversion issue

Jörn Franke
Write your own Spark UDF. Apply it to all varchar columns.

Within this udf you can use the SimpleDateFormat parse method. If this method returns null you return the content as varchar if not you return a date. If the content is null you return null.

Alternatively you can define an insert function as pl/sql on Oracle side.

Another alternative is to read the Oracle metadata for the table at runtime and then adapt your conversion based on this. 

However, this may not be perfect depending on your use case. Can you please provide more details/examples? Do you aim at a generic hive to Oracle import tool using Spark? Sqoop would not be an alternative?

On 20. Mar 2018, at 03:45, Gurusamy Thirupathy <[hidden email]> wrote:

Hi guha,

Thanks for your quick response, option a and b are in our table already. For option b, again the same problem, we don't know which column is date.


Thanks,
-G

On Sun, Mar 18, 2018 at 9:36 PM, Deepak Sharma <[hidden email]> wrote:
The other approach would to write to temp table and then merge the data.
But this may be expensive solution.

Thanks
Deepak

On Mon, Mar 19, 2018, 08:04 Gurusamy Thirupathy <[hidden email]> wrote:
Hi,

I am trying to read data from Hive as DataFrame, then trying to write the DF into the Oracle data base. In this case, the date field/column in hive is with Type Varchar(20)
but the corresponding column type in Oracle is Date. While reading from hive , the hive table names are dynamically decided(read from another table) based on some job condition(ex. Job1). There are multiple tables like this, so column and the table names are decided only run time. So I can't do type conversion explicitly when read from Hive.

So is there any utility/api available in Spark to achieve this conversion issue?


Thanks,
Guru



--
Thanks,
Guru
Reply | Threaded
Open this post in threaded view
|

Re: Hive to Oracle using Spark - Type(Date) conversion issue

Gurusamy Thirupathy
HI Jorn,

Thanks for your sharing different options, yes we are trying to build a generic tool for Hive to Spark export. 
FYI, currently we are using sqoop, we are trying to migrate from sqoop to spark.

Thanks
-G

On Tue, Mar 20, 2018 at 2:17 AM, Jörn Franke <[hidden email]> wrote:
Write your own Spark UDF. Apply it to all varchar columns.

Within this udf you can use the SimpleDateFormat parse method. If this method returns null you return the content as varchar if not you return a date. If the content is null you return null.

Alternatively you can define an insert function as pl/sql on Oracle side.

Another alternative is to read the Oracle metadata for the table at runtime and then adapt your conversion based on this. 

However, this may not be perfect depending on your use case. Can you please provide more details/examples? Do you aim at a generic hive to Oracle import tool using Spark? Sqoop would not be an alternative?

On 20. Mar 2018, at 03:45, Gurusamy Thirupathy <[hidden email]> wrote:

Hi guha,

Thanks for your quick response, option a and b are in our table already. For option b, again the same problem, we don't know which column is date.


Thanks,
-G

On Sun, Mar 18, 2018 at 9:36 PM, Deepak Sharma <[hidden email]> wrote:
The other approach would to write to temp table and then merge the data.
But this may be expensive solution.

Thanks
Deepak

On Mon, Mar 19, 2018, 08:04 Gurusamy Thirupathy <[hidden email]> wrote:
Hi,

I am trying to read data from Hive as DataFrame, then trying to write the DF into the Oracle data base. In this case, the date field/column in hive is with Type Varchar(20)
but the corresponding column type in Oracle is Date. While reading from hive , the hive table names are dynamically decided(read from another table) based on some job condition(ex. Job1). There are multiple tables like this, so column and the table names are decided only run time. So I can't do type conversion explicitly when read from Hive.

So is there any utility/api available in Spark to achieve this conversion issue?


Thanks,
Guru



--
Thanks,
Guru



--
Thanks,
Guru
Reply | Threaded
Open this post in threaded view
|

Re: Hive to Oracle using Spark - Type(Date) conversion issue

spark receiver
Use unix time and write the unix time to oracle as number column type ,create virtual column in oracle database for the unix time  like “oracle_time generated always as (to_date('1970010108','YYYYMMDDHH24')+(1/24/60/60)*unixtime )

On Mar 20, 2018, at 11:08 PM, Gurusamy Thirupathy <[hidden email]> wrote:

HI Jorn,

Thanks for your sharing different options, yes we are trying to build a generic tool for Hive to Spark export. 
FYI, currently we are using sqoop, we are trying to migrate from sqoop to spark.

Thanks
-G

On Tue, Mar 20, 2018 at 2:17 AM, Jörn Franke <[hidden email]> wrote:
Write your own Spark UDF. Apply it to all varchar columns.

Within this udf you can use the SimpleDateFormat parse method. If this method returns null you return the content as varchar if not you return a date. If the content is null you return null.

Alternatively you can define an insert function as pl/sql on Oracle side.

Another alternative is to read the Oracle metadata for the table at runtime and then adapt your conversion based on this. 

However, this may not be perfect depending on your use case. Can you please provide more details/examples? Do you aim at a generic hive to Oracle import tool using Spark? Sqoop would not be an alternative?

On 20. Mar 2018, at 03:45, Gurusamy Thirupathy <[hidden email]> wrote:

Hi guha,

Thanks for your quick response, option a and b are in our table already. For option b, again the same problem, we don't know which column is date.


Thanks,
-G

On Sun, Mar 18, 2018 at 9:36 PM, Deepak Sharma <[hidden email]> wrote:
The other approach would to write to temp table and then merge the data.
But this may be expensive solution.

Thanks
Deepak

On Mon, Mar 19, 2018, 08:04 Gurusamy Thirupathy <[hidden email]> wrote:
Hi,

I am trying to read data from Hive as DataFrame, then trying to write the DF into the Oracle data base. In this case, the date field/column in hive is with Type Varchar(20)
but the corresponding column type in Oracle is Date. While reading from hive , the hive table names are dynamically decided(read from another table) based on some job condition(ex. Job1). There are multiple tables like this, so column and the table names are decided only run time. So I can't do type conversion explicitly when read from Hive.

So is there any utility/api available in Spark to achieve this conversion issue?


Thanks,
Guru



--
Thanks,
Guru



--
Thanks,
Guru