What is the latest stable release of Spark

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

What is the latest stable release of Spark

Mich Talebzadeh

According to Apache Spark download the latest version is Spark 3.0.2 dated February 2021.

However, I expected this to be 3.1.1.

Has 3.1.1 withdrawn for any reason?. The reason is that my spark streaming works fine with 3.0.1 but does not work on GCP data proc which they have Spark 3.1.1!

I may be totally wrong.

Also notes:

Next official release: Spark 3.1.1

The next official Spark release is Spark 3.1.1 instead of Spark 3.1.0. There was a technical issue during Spark 3.1.0 RC1 preparation, see [VOTE] Release Spark 3.1.0 (RC1) in the Spark dev mailing list.


image.png


LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 


Reply | Threaded
Open this post in threaded view
|

Re: What is the latest stable release of Spark

srowen
3.1.1 - what makes you say otherwise?

On Mon, Mar 1, 2021, 10:33 AM Mich Talebzadeh <[hidden email]> wrote:

According to Apache Spark download the latest version is Spark 3.0.2 dated February 2021.

However, I expected this to be 3.1.1.

Has 3.1.1 withdrawn for any reason?. The reason is that my spark streaming works fine with 3.0.1 but does not work on GCP data proc which they have Spark 3.1.1!

I may be totally wrong.

Also notes:

Next official release: Spark 3.1.1

The next official Spark release is Spark 3.1.1 instead of Spark 3.1.0. There was a technical issue during Spark 3.1.0 RC1 preparation, see [VOTE] Release Spark 3.1.0 (RC1) in the Spark dev mailing list.


image.png


LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 


Reply | Threaded
Open this post in threaded view
|

Re: What is the latest stable release of Spark

Mich Talebzadeh

Hi Sean,


The issue I have is the use of 3.1.1 -rc2 with GCP dataproc. I have a streaming job that works on 3.0.1 and the same job does not show any output on 3.1.1


This is what GCP Dataproc 2.0 says; Is that a stable release 3.1.1-rc2


image.png



> spark-shell

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Spark context Web UI available at http://ctpcluster-m.europe-west2-c.c.axial-glow-224522.internal:46241

Spark context available as 'sc' (master = yarn, app id = application_1614585459458_0015).

Spark session available as 'spark'.

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1

      /_/


Using Scala version 2.12.13 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)


Cheers


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 16:34, Sean Owen <[hidden email]> wrote:
3.1.1 - what makes you say otherwise?

On Mon, Mar 1, 2021, 10:33 AM Mich Talebzadeh <[hidden email]> wrote:

According to Apache Spark download the latest version is Spark 3.0.2 dated February 2021.

However, I expected this to be 3.1.1.

Has 3.1.1 withdrawn for any reason?. The reason is that my spark streaming works fine with 3.0.1 but does not work on GCP data proc which they have Spark 3.1.1!

I may be totally wrong.

Also notes:

Next official release: Spark 3.1.1

The next official Spark release is Spark 3.1.1 instead of Spark 3.1.0. There was a technical issue during Spark 3.1.0 RC1 preparation, see [VOTE] Release Spark 3.1.0 (RC1) in the Spark dev mailing list.


image.png


LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 


Reply | Threaded
Open this post in threaded view
|

Re: What is the latest stable release of Spark

srowen
Apache Spark 3.1.1 is about to be released any second now. 3.1.1-RC2 is not an official release, it's a release candidate.
Anyone's free to ship any bits they like from the OSS project though, so that's what GCP has done and that's OK. Very technically, it shouldn't really be called Apache Spark 3.1.1-RC2 as there is no such release, but that's a minor nit.

On Mon, Mar 1, 2021 at 10:41 AM Mich Talebzadeh <[hidden email]> wrote:

Hi Sean,


The issue I have is the use of 3.1.1 -rc2 with GCP dataproc. I have a streaming job that works on 3.0.1 and the same job does not show any output on 3.1.1


This is what GCP Dataproc 2.0 says; Is that a stable release 3.1.1-rc2


image.png



> spark-shell

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Spark context Web UI available at http://ctpcluster-m.europe-west2-c.c.axial-glow-224522.internal:46241

Spark context available as 'sc' (master = yarn, app id = application_1614585459458_0015).

Spark session available as 'spark'.

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1

      /_/


Using Scala version 2.12.13 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)


Cheers


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 16:34, Sean Owen <[hidden email]> wrote:
3.1.1 - what makes you say otherwise?

On Mon, Mar 1, 2021, 10:33 AM Mich Talebzadeh <[hidden email]> wrote:

According to Apache Spark download the latest version is Spark 3.0.2 dated February 2021.

However, I expected this to be 3.1.1.

Has 3.1.1 withdrawn for any reason?. The reason is that my spark streaming works fine with 3.0.1 but does not work on GCP data proc which they have Spark 3.1.1!

I may be totally wrong.

Also notes:

Next official release: Spark 3.1.1

The next official Spark release is Spark 3.1.1 instead of Spark 3.1.0. There was a technical issue during Spark 3.1.0 RC1 preparation, see [VOTE] Release Spark 3.1.0 (RC1) in the Spark dev mailing list.


image.png


LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 


Reply | Threaded
Open this post in threaded view
|

Re: What is the latest stable release of Spark

Mich Talebzadeh

OK I guess my question is (and I stand corrected) that GCP guys chose a version of Spark (3.1.1-rc x) to build as part of Dataproc cluster and offered it as a service  which is not officially out there yet.


So there seems to be an issue here. 


Going back how can this be rectified? Can I use Spark 3.0.1 specific jars and try to see my job works on GCP


# These ones on GCP (spark 3.1.1)

  1. spark-token-provider-kafka-0-10_2.12-3.1.0.jar
  2. spark-sql-kafka-0-10_2.12-3.1.0.jar

# These ones on premise (spark 3.0.1)

  1. spark-token-provider-kafka-0-10_2.12-3.0.1.jar
  2. spark-sql-kafka-0-10_2.12-3.0.1.jar


Cheers


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 17:22, Sean Owen <[hidden email]> wrote:
Apache Spark 3.1.1 is about to be released any second now. 3.1.1-RC2 is not an official release, it's a release candidate.
Anyone's free to ship any bits they like from the OSS project though, so that's what GCP has done and that's OK. Very technically, it shouldn't really be called Apache Spark 3.1.1-RC2 as there is no such release, but that's a minor nit.

On Mon, Mar 1, 2021 at 10:41 AM Mich Talebzadeh <[hidden email]> wrote:

Hi Sean,


The issue I have is the use of 3.1.1 -rc2 with GCP dataproc. I have a streaming job that works on 3.0.1 and the same job does not show any output on 3.1.1


This is what GCP Dataproc 2.0 says; Is that a stable release 3.1.1-rc2


image.png



> spark-shell

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Spark context Web UI available at http://ctpcluster-m.europe-west2-c.c.axial-glow-224522.internal:46241

Spark context available as 'sc' (master = yarn, app id = application_1614585459458_0015).

Spark session available as 'spark'.

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1

      /_/


Using Scala version 2.12.13 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)


Cheers


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 16:34, Sean Owen <[hidden email]> wrote:
3.1.1 - what makes you say otherwise?

On Mon, Mar 1, 2021, 10:33 AM Mich Talebzadeh <[hidden email]> wrote:

According to Apache Spark download the latest version is Spark 3.0.2 dated February 2021.

However, I expected this to be 3.1.1.

Has 3.1.1 withdrawn for any reason?. The reason is that my spark streaming works fine with 3.0.1 but does not work on GCP data proc which they have Spark 3.1.1!

I may be totally wrong.

Also notes:

Next official release: Spark 3.1.1

The next official Spark release is Spark 3.1.1 instead of Spark 3.1.0. There was a technical issue during Spark 3.1.0 RC1 preparation, see [VOTE] Release Spark 3.1.0 (RC1) in the Spark dev mailing list.


image.png


LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 


Reply | Threaded
Open this post in threaded view
|

Re: What is the latest stable release of Spark

srowen
Nothing wrong with it - every bit is OSS for anyone to use under the terms of the license. The only possible issue is representing it as _Apache Spark_ but I think this is minor.
I don't know how to change what Dataproc does; use an older release?
I don't know what the issue is on 3.1.1-RC2 but I would assume it's the same on 3.1.1 unless magically something was fixed in RC3.

On Mon, Mar 1, 2021 at 11:36 AM Mich Talebzadeh <[hidden email]> wrote:

OK I guess my question is (and I stand corrected) that GCP guys chose a version of Spark (3.1.1-rc x) to build as part of Dataproc cluster and offered it as a service  which is not officially out there yet.


So there seems to be an issue here. 


Going back how can this be rectified? Can I use Spark 3.0.1 specific jars and try to see my job works on GCP


# These ones on GCP (spark 3.1.1)

  1. spark-token-provider-kafka-0-10_2.12-3.1.0.jar
  2. spark-sql-kafka-0-10_2.12-3.1.0.jar

# These ones on premise (spark 3.0.1)

  1. spark-token-provider-kafka-0-10_2.12-3.0.1.jar
  2. spark-sql-kafka-0-10_2.12-3.0.1.jar


Cheers


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 17:22, Sean Owen <[hidden email]> wrote:
Apache Spark 3.1.1 is about to be released any second now. 3.1.1-RC2 is not an official release, it's a release candidate.
Anyone's free to ship any bits they like from the OSS project though, so that's what GCP has done and that's OK. Very technically, it shouldn't really be called Apache Spark 3.1.1-RC2 as there is no such release, but that's a minor nit.

On Mon, Mar 1, 2021 at 10:41 AM Mich Talebzadeh <[hidden email]> wrote:

Hi Sean,


The issue I have is the use of 3.1.1 -rc2 with GCP dataproc. I have a streaming job that works on 3.0.1 and the same job does not show any output on 3.1.1


This is what GCP Dataproc 2.0 says; Is that a stable release 3.1.1-rc2


image.png



> spark-shell

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Spark context Web UI available at http://ctpcluster-m.europe-west2-c.c.axial-glow-224522.internal:46241

Spark context available as 'sc' (master = yarn, app id = application_1614585459458_0015).

Spark session available as 'spark'.

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1

      /_/


Using Scala version 2.12.13 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)


Cheers


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 16:34, Sean Owen <[hidden email]> wrote:
3.1.1 - what makes you say otherwise?

On Mon, Mar 1, 2021, 10:33 AM Mich Talebzadeh <[hidden email]> wrote:

According to Apache Spark download the latest version is Spark 3.0.2 dated February 2021.

However, I expected this to be 3.1.1.

Has 3.1.1 withdrawn for any reason?. The reason is that my spark streaming works fine with 3.0.1 but does not work on GCP data proc which they have Spark 3.1.1!

I may be totally wrong.

Also notes:

Next official release: Spark 3.1.1

The next official Spark release is Spark 3.1.1 instead of Spark 3.1.0. There was a technical issue during Spark 3.1.0 RC1 preparation, see [VOTE] Release Spark 3.1.0 (RC1) in the Spark dev mailing list.


image.png


LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 


Reply | Threaded
Open this post in threaded view
|

Re: What is the latest stable release of Spark

Mich Talebzadeh
One would have thought that for software as a service you go for stable releases of a product.

From Spark download site:

Release Notes for Stable Releases

Spark 3.0.2 (Feb 19 2021)
Spark 2.4.7 (Sep 12 2020)




LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 17:59, Sean Owen <[hidden email]> wrote:
Nothing wrong with it - every bit is OSS for anyone to use under the terms of the license. The only possible issue is representing it as _Apache Spark_ but I think this is minor.
I don't know how to change what Dataproc does; use an older release?
I don't know what the issue is on 3.1.1-RC2 but I would assume it's the same on 3.1.1 unless magically something was fixed in RC3.

On Mon, Mar 1, 2021 at 11:36 AM Mich Talebzadeh <[hidden email]> wrote:

OK I guess my question is (and I stand corrected) that GCP guys chose a version of Spark (3.1.1-rc x) to build as part of Dataproc cluster and offered it as a service  which is not officially out there yet.


So there seems to be an issue here. 


Going back how can this be rectified? Can I use Spark 3.0.1 specific jars and try to see my job works on GCP


# These ones on GCP (spark 3.1.1)

  1. spark-token-provider-kafka-0-10_2.12-3.1.0.jar
  2. spark-sql-kafka-0-10_2.12-3.1.0.jar

# These ones on premise (spark 3.0.1)

  1. spark-token-provider-kafka-0-10_2.12-3.0.1.jar
  2. spark-sql-kafka-0-10_2.12-3.0.1.jar


Cheers


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 17:22, Sean Owen <[hidden email]> wrote:
Apache Spark 3.1.1 is about to be released any second now. 3.1.1-RC2 is not an official release, it's a release candidate.
Anyone's free to ship any bits they like from the OSS project though, so that's what GCP has done and that's OK. Very technically, it shouldn't really be called Apache Spark 3.1.1-RC2 as there is no such release, but that's a minor nit.

On Mon, Mar 1, 2021 at 10:41 AM Mich Talebzadeh <[hidden email]> wrote:

Hi Sean,


The issue I have is the use of 3.1.1 -rc2 with GCP dataproc. I have a streaming job that works on 3.0.1 and the same job does not show any output on 3.1.1


This is what GCP Dataproc 2.0 says; Is that a stable release 3.1.1-rc2


image.png



> spark-shell

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Spark context Web UI available at http://ctpcluster-m.europe-west2-c.c.axial-glow-224522.internal:46241

Spark context available as 'sc' (master = yarn, app id = application_1614585459458_0015).

Spark session available as 'spark'.

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1

      /_/


Using Scala version 2.12.13 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)


Cheers


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 16:34, Sean Owen <[hidden email]> wrote:
3.1.1 - what makes you say otherwise?

On Mon, Mar 1, 2021, 10:33 AM Mich Talebzadeh <[hidden email]> wrote:

According to Apache Spark download the latest version is Spark 3.0.2 dated February 2021.

However, I expected this to be 3.1.1.

Has 3.1.1 withdrawn for any reason?. The reason is that my spark streaming works fine with 3.0.1 but does not work on GCP data proc which they have Spark 3.1.1!

I may be totally wrong.

Also notes:

Next official release: Spark 3.1.1

The next official Spark release is Spark 3.1.1 instead of Spark 3.1.0. There was a technical issue during Spark 3.1.0 RC1 preparation, see [VOTE] Release Spark 3.1.0 (RC1) in the Spark dev mailing list.


image.png


LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 


Reply | Threaded
Open this post in threaded view
|

Re: What is the latest stable release of Spark

Jungtaek Lim-2
Every vendors tend to have lots of custom fixes on their releases. If you use Google Dataproc I'd say it's no longer the same with Apache Spark, which you may want to contact Google to resolve the issue or try out community version to replicate the issue.

On Tue, Mar 2, 2021 at 5:27 AM Mich Talebzadeh <[hidden email]> wrote:
One would have thought that for software as a service you go for stable releases of a product.

From Spark download site:

Release Notes for Stable Releases

Spark 3.0.2 (Feb 19 2021)
Spark 2.4.7 (Sep 12 2020)




LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 17:59, Sean Owen <[hidden email]> wrote:
Nothing wrong with it - every bit is OSS for anyone to use under the terms of the license. The only possible issue is representing it as _Apache Spark_ but I think this is minor.
I don't know how to change what Dataproc does; use an older release?
I don't know what the issue is on 3.1.1-RC2 but I would assume it's the same on 3.1.1 unless magically something was fixed in RC3.

On Mon, Mar 1, 2021 at 11:36 AM Mich Talebzadeh <[hidden email]> wrote:

OK I guess my question is (and I stand corrected) that GCP guys chose a version of Spark (3.1.1-rc x) to build as part of Dataproc cluster and offered it as a service  which is not officially out there yet.


So there seems to be an issue here. 


Going back how can this be rectified? Can I use Spark 3.0.1 specific jars and try to see my job works on GCP


# These ones on GCP (spark 3.1.1)

  1. spark-token-provider-kafka-0-10_2.12-3.1.0.jar
  2. spark-sql-kafka-0-10_2.12-3.1.0.jar

# These ones on premise (spark 3.0.1)

  1. spark-token-provider-kafka-0-10_2.12-3.0.1.jar
  2. spark-sql-kafka-0-10_2.12-3.0.1.jar


Cheers


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 17:22, Sean Owen <[hidden email]> wrote:
Apache Spark 3.1.1 is about to be released any second now. 3.1.1-RC2 is not an official release, it's a release candidate.
Anyone's free to ship any bits they like from the OSS project though, so that's what GCP has done and that's OK. Very technically, it shouldn't really be called Apache Spark 3.1.1-RC2 as there is no such release, but that's a minor nit.

On Mon, Mar 1, 2021 at 10:41 AM Mich Talebzadeh <[hidden email]> wrote:

Hi Sean,


The issue I have is the use of 3.1.1 -rc2 with GCP dataproc. I have a streaming job that works on 3.0.1 and the same job does not show any output on 3.1.1


This is what GCP Dataproc 2.0 says; Is that a stable release 3.1.1-rc2


image.png



> spark-shell

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Spark context Web UI available at http://ctpcluster-m.europe-west2-c.c.axial-glow-224522.internal:46241

Spark context available as 'sc' (master = yarn, app id = application_1614585459458_0015).

Spark session available as 'spark'.

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1

      /_/


Using Scala version 2.12.13 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)


Cheers


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 16:34, Sean Owen <[hidden email]> wrote:
3.1.1 - what makes you say otherwise?

On Mon, Mar 1, 2021, 10:33 AM Mich Talebzadeh <[hidden email]> wrote:

According to Apache Spark download the latest version is Spark 3.0.2 dated February 2021.

However, I expected this to be 3.1.1.

Has 3.1.1 withdrawn for any reason?. The reason is that my spark streaming works fine with 3.0.1 but does not work on GCP data proc which they have Spark 3.1.1!

I may be totally wrong.

Also notes:

Next official release: Spark 3.1.1

The next official Spark release is Spark 3.1.1 instead of Spark 3.1.0. There was a technical issue during Spark 3.1.0 RC1 preparation, see [VOTE] Release Spark 3.1.0 (RC1) in the Spark dev mailing list.


image.png


LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 


Reply | Threaded
Open this post in threaded view
|

Re: What is the latest stable release of Spark

Mich Talebzadeh
thanks points noted



LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 23:09, Jungtaek Lim <[hidden email]> wrote:
Every vendors tend to have lots of custom fixes on their releases. If you use Google Dataproc I'd say it's no longer the same with Apache Spark, which you may want to contact Google to resolve the issue or try out community version to replicate the issue.

On Tue, Mar 2, 2021 at 5:27 AM Mich Talebzadeh <[hidden email]> wrote:
One would have thought that for software as a service you go for stable releases of a product.

From Spark download site:

Release Notes for Stable Releases

Spark 3.0.2 (Feb 19 2021)
Spark 2.4.7 (Sep 12 2020)




LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 17:59, Sean Owen <[hidden email]> wrote:
Nothing wrong with it - every bit is OSS for anyone to use under the terms of the license. The only possible issue is representing it as _Apache Spark_ but I think this is minor.
I don't know how to change what Dataproc does; use an older release?
I don't know what the issue is on 3.1.1-RC2 but I would assume it's the same on 3.1.1 unless magically something was fixed in RC3.

On Mon, Mar 1, 2021 at 11:36 AM Mich Talebzadeh <[hidden email]> wrote:

OK I guess my question is (and I stand corrected) that GCP guys chose a version of Spark (3.1.1-rc x) to build as part of Dataproc cluster and offered it as a service  which is not officially out there yet.


So there seems to be an issue here. 


Going back how can this be rectified? Can I use Spark 3.0.1 specific jars and try to see my job works on GCP


# These ones on GCP (spark 3.1.1)

  1. spark-token-provider-kafka-0-10_2.12-3.1.0.jar
  2. spark-sql-kafka-0-10_2.12-3.1.0.jar

# These ones on premise (spark 3.0.1)

  1. spark-token-provider-kafka-0-10_2.12-3.0.1.jar
  2. spark-sql-kafka-0-10_2.12-3.0.1.jar


Cheers


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 17:22, Sean Owen <[hidden email]> wrote:
Apache Spark 3.1.1 is about to be released any second now. 3.1.1-RC2 is not an official release, it's a release candidate.
Anyone's free to ship any bits they like from the OSS project though, so that's what GCP has done and that's OK. Very technically, it shouldn't really be called Apache Spark 3.1.1-RC2 as there is no such release, but that's a minor nit.

On Mon, Mar 1, 2021 at 10:41 AM Mich Talebzadeh <[hidden email]> wrote:

Hi Sean,


The issue I have is the use of 3.1.1 -rc2 with GCP dataproc. I have a streaming job that works on 3.0.1 and the same job does not show any output on 3.1.1


This is what GCP Dataproc 2.0 says; Is that a stable release 3.1.1-rc2


image.png



> spark-shell

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Spark context Web UI available at http://ctpcluster-m.europe-west2-c.c.axial-glow-224522.internal:46241

Spark context available as 'sc' (master = yarn, app id = application_1614585459458_0015).

Spark session available as 'spark'.

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1

      /_/


Using Scala version 2.12.13 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)


Cheers


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Mar 2021 at 16:34, Sean Owen <[hidden email]> wrote:
3.1.1 - what makes you say otherwise?

On Mon, Mar 1, 2021, 10:33 AM Mich Talebzadeh <[hidden email]> wrote:

According to Apache Spark download the latest version is Spark 3.0.2 dated February 2021.

However, I expected this to be 3.1.1.

Has 3.1.1 withdrawn for any reason?. The reason is that my spark streaming works fine with 3.0.1 but does not work on GCP data proc which they have Spark 3.1.1!

I may be totally wrong.

Also notes:

Next official release: Spark 3.1.1

The next official Spark release is Spark 3.1.1 instead of Spark 3.1.0. There was a technical issue during Spark 3.1.0 RC1 preparation, see [VOTE] Release Spark 3.1.0 (RC1) in the Spark dev mailing list.


image.png


LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.