Adding isolation level when reading from DB2 with spark.read

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Adding isolation level when reading from DB2 with spark.read

Filipa Sousa
    Hello,

    We are trying to read from an IBM DB2 database using a pyspark job.
    We have a requirement to add an isolation level - Read Uncommitted (WITH UR) to the JDBC queries when reading DB2 data.
    We found "isolationLevel" parameter in Spark documentation, but apparently it seems like it only applies to writing (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html). Do you know if there's a similar one for reading?

    isolationLevel - The transaction isolation level, which applies to current connection. It can be one of NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of READ_UNCOMMITTED. This option applies only to writing. Please refer the documentation in java.sql.Connection.

    Also, we tested putting the "WITH UR" directly to the query, but since the isolation level must always be at the outer-most layer of the query, and Spark always parenthesizes the query (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html), it throws an error.

    The last try we did is to add this predicates option when reading with spark, but this is being ignored.
    predicates = "PART_NR != '0' with UR"
            input_df = (
                self.spark.read.format("jdbc")
                .option("url", self.db_settings["jdbc_url"])
                .option("dbtable", db_table)
                .option("user", self.db_settings["db_username"])
                .option("password", self.db_settings["db_password"])
                .option("predicates", predicates )
                .option("fetchsize", self.fetch_size)
            )

    Do you have any advises on how can we do this?


Best regards,
Filipa Sousa

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: Adding isolation level when reading from DB2 with spark.read

Luca Canali
Hi Filipa ,

Spark JDBC data source has the option to add a "sessionInitStatement".
Documented in https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html and https://issues.apache.org/jira/browse/SPARK-21519 
I guess you could use that option  to "inject " a SET ISOLATION statement, although I am not familiar with the details of DB2.
Would that be useful for your use case?

Best,
Luca

-----Original Message-----
From: Filipa Sousa <[hidden email]>
Sent: Wednesday, September 2, 2020 16:34
To: [hidden email]
Cc: Ana Sofia Martins <[hidden email]>
Subject: Adding isolation level when reading from DB2 with spark.read

    Hello,

    We are trying to read from an IBM DB2 database using a pyspark job.
    We have a requirement to add an isolation level - Read Uncommitted (WITH UR) to the JDBC queries when reading DB2 data.
    We found "isolationLevel" parameter in Spark documentation, but apparently it seems like it only applies to writing (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html). Do you know if there's a similar one for reading?

    isolationLevel - The transaction isolation level, which applies to current connection. It can be one of NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of READ_UNCOMMITTED. This option applies only to writing. Please refer the documentation in java.sql.Connection.

    Also, we tested putting the "WITH UR" directly to the query, but since the isolation level must always be at the outer-most layer of the query, and Spark always parenthesizes the query (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html), it throws an error.

    The last try we did is to add this predicates option when reading with spark, but this is being ignored.
    predicates = "PART_NR != '0' with UR"
            input_df = (
                self.spark.read.format("jdbc")
                .option("url", self.db_settings["jdbc_url"])
                .option("dbtable", db_table)
                .option("user", self.db_settings["db_username"])
                .option("password", self.db_settings["db_password"])
                .option("predicates", predicates )
                .option("fetchsize", self.fetch_size)
            )

    Do you have any advises on how can we do this?


Best regards,
Filipa Sousa
B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB��[��X��ܚX�HK[XZ[
�\�\�][��X��ܚX�P�\�˘\X�K�ܙ�B�

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: Adding isolation level when reading from DB2 with spark.read

Filipa Sousa
Hello Luca,

Thank you for your fast response!

It is an interesting suggestion, but unfortunately, isolation level change statement like SET ISOLATION is not permitted while connected to a DB2 database (https://www.ibm.com/support/knowledgecenter/th/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0001946.html). It must be set prior to the connection and Spark is stablishing that connection.

Thank you anyway,
Filipa.


-----Mensagem original-----
De: Luca Canali <[hidden email]>
Enviada: 2 de setembro de 2020 16:10
Para: Filipa Sousa <[hidden email]>
Cc: Ana Sofia Martins <[hidden email]>; [hidden email]
Assunto: RE: Adding isolation level when reading from DB2 with spark.read

Notice: This e-mail has originated from an external email service, so do not click on any links, nor open any attachments unless you know who the sender is and are sure the content is secure.



Hi Filipa ,

Spark JDBC data source has the option to add a "sessionInitStatement".
Documented in https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html and https://issues.apache.org/jira/browse/SPARK-21519
I guess you could use that option  to "inject " a SET ISOLATION statement, although I am not familiar with the details of DB2.
Would that be useful for your use case?

Best,
Luca

-----Original Message-----
From: Filipa Sousa <[hidden email]>
Sent: Wednesday, September 2, 2020 16:34
To: [hidden email]
Cc: Ana Sofia Martins <[hidden email]>
Subject: Adding isolation level when reading from DB2 with spark.read

    Hello,

    We are trying to read from an IBM DB2 database using a pyspark job.
    We have a requirement to add an isolation level - Read Uncommitted (WITH UR) to the JDBC queries when reading DB2 data.
    We found "isolationLevel" parameter in Spark documentation, but apparently it seems like it only applies to writing (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html). Do you know if there's a similar one for reading?

    isolationLevel - The transaction isolation level, which applies to current connection. It can be one of NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of READ_UNCOMMITTED. This option applies only to writing. Please refer the documentation in java.sql.Connection.

    Also, we tested putting the "WITH UR" directly to the query, but since the isolation level must always be at the outer-most layer of the query, and Spark always parenthesizes the query (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html), it throws an error.

    The last try we did is to add this predicates option when reading with spark, but this is being ignored.
    predicates = "PART_NR != '0' with UR"
            input_df = (
                self.spark.read.format("jdbc")
                .option("url", self.db_settings["jdbc_url"])
                .option("dbtable", db_table)
                .option("user", self.db_settings["db_username"])
                .option("password", self.db_settings["db_password"])
                .option("predicates", predicates )
                .option("fetchsize", self.fetch_size)
            )

    Do you have any advises on how can we do this?


Best regards,
Filipa Sousa
B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB��[��X��ܚX�HK[XZ[
�\�\�][��X��ܚX�P�\�˘\X�K�ܙ�B�

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Adding isolation level when reading from DB2 with spark.read

Jörg Strebel
In reply to this post by Filipa Sousa
Hallo!

You can set the DB2 JDBC driver options in theJDBC connection string:
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.apdv.java.doc/src/tpc/imjcc_rjvdsprp.html


The DB2 JDBC driver has an option called "defaultIsolationLevel"
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.apdv.java.doc/src/tpc/imjcc_r0052038.html

Maybe you should also try the option "concurrentAccessResolution"
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.apdv.java.doc/src/tpc/imjcc_r0052607.html

See also
https://www.ibm.com/support/pages/how-set-isolation-level-db2-jdbc-database-connections
and https://www.idug.org/p/fo/et/thread=45083

Regards

Jörg Strebel


Am 02.09.20 um 16:34 schrieb Filipa Sousa:

>     Hello,
>
>     We are trying to read from an IBM DB2 database using a pyspark job.
>     We have a requirement to add an isolation level - Read Uncommitted (WITH UR) to the JDBC queries when reading DB2 data.
>     We found "isolationLevel" parameter in Spark documentation, but apparently it seems like it only applies to writing (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html). Do you know if there's a similar one for reading?
>
>     isolationLevel - The transaction isolation level, which applies to current connection. It can be one of NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of READ_UNCOMMITTED. This option applies only to writing. Please refer the documentation in java.sql.Connection.
>
>     Also, we tested putting the "WITH UR" directly to the query, but since the isolation level must always be at the outer-most layer of the query, and Spark always parenthesizes the query (https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html), it throws an error.
>
>     The last try we did is to add this predicates option when reading with spark, but this is being ignored.
>     predicates = "PART_NR != '0' with UR"
>             input_df = (
>                 self.spark.read.format("jdbc")
>                 .option("url", self.db_settings["jdbc_url"])
>                 .option("dbtable", db_table)
>                 .option("user", self.db_settings["db_username"])
>                 .option("password", self.db_settings["db_password"])
>                 .option("predicates", predicates )
>                 .option("fetchsize", self.fetch_size)
>             )
>
>     Do you have any advises on how can we do this?
>
>
> Best regards,
> Filipa Sousa
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>
--
Jörg Strebel
Aachener Straße 2
80804 München


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]