Problems during upgrade 2.2.2 -> 2.4.4

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Problems during upgrade 2.2.2 -> 2.4.4

bsikander
A few details about clusters

- Current Version 2.2
- Resource manager: Spark standalone
- Modes: cluster + supervise
- HA setup: Zookeeper
- Expected version after upgrade: 2.4.4

Note: Before and after the upgrade, everything works fine.

During the upgrade, I see number of issues.
- Spark master on version 2.4.4 tries to recover itself from zookeeper and
fails to deserialize the driver/app/worker objects and throws
InvalidClassException.
- Spark master (2.4.4) after failing to deserialize, deletes all the
information about driver/apps/workers from zookeeper and loses all contacts
to running JVMs.
- Sometimes mysteriously respawns the drivers but with new ids, having no
clue about old ones. Sometimes multiple "same" drivers are running at the
same time with different ids.
- Old spark workers (2.2) fails to communicate with new Spark master (2.4.4)

I checked the release notes and couldn't find anything regarding upgrades.

Could someone please help me answer a few questions above and maybe point me
to some documentation regarding upgrades. Or if the upgrades are not
working, then maybe some documentation which explains this would be helpful.

Exceptions as seen on master:
2020-01-21 23:58:09,010 INFO dispatcher-event-loop-1-EventThread
org.apache.spark.deploy.master.ZooKeeperLeaderElectionAgent: We have gained
leadership
2020-01-21 23:58:09,073 ERROR dispatcher-event-loop-1
org.apache.spark.util.Utils: Exception encountered
java.io.InvalidClassException: org.apache.spark.rpc.RpcEndpointRef; local
class incompatible: stream classdesc serialVersionUID = 1835832137613908542,
local class serialVersionUID = -1329125091869941550
        at
java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)
        at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1880)
        at
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)
        at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1880)
        at
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2037)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
        at
java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:558)
        at
org.apache.spark.deploy.master.ApplicationInfo$$anonfun$readObject$1.apply$mcV$sp(ApplicationInfo.scala:55)
        at
org.apache.spark.deploy.master.ApplicationInfo$$anonfun$readObject$1.apply(ApplicationInfo.scala:54)
        at
org.apache.spark.deploy.master.ApplicationInfo$$anonfun$readObject$1.apply(ApplicationInfo.scala:54)
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
        at
org.apache.spark.deploy.master.ApplicationInfo.readObject(ApplicationInfo.scala:54)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1160)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2173)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
        at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
        at
org.apache.spark.deploy.master.ZooKeeperPersistenceEngine.org$apache$spark$deploy$master$ZooKeeperPersistenceEngine$$deserializeFromFile(ZooKeeperPersistenceEngine.scala:76)
        at
org.apache.spark.deploy.master.ZooKeeperPersistenceEngine$$anonfun$read$2.apply(ZooKeeperPersistenceEngine.scala:59)
        at
org.apache.spark.deploy.master.ZooKeeperPersistenceEngine$$anonfun$read$2.apply(ZooKeeperPersistenceEngine.scala:59)
        at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at
scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
        at
org.apache.spark.deploy.master.ZooKeeperPersistenceEngine.read(ZooKeeperPersistenceEngine.scala:59)
        at
org.apache.spark.deploy.master.PersistenceEngine$$anonfun$readPersistedData$1.apply(PersistenceEngine.scala:87)
        at
org.apache.spark.deploy.master.PersistenceEngine$$anonfun$readPersistedData$1.apply(PersistenceEngine.scala:86)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at
org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:320)
        at
org.apache.spark.deploy.master.PersistenceEngine.readPersistedData(PersistenceEngine.scala:86)
        at
org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:221)
        at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
        at
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:808)







--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Problems during upgrade 2.2.2 -> 2.4.4

bsikander
After digging deeper, we found that apps/workers inside zookeeper are not
deserializable but drivers can.
Due to this driver comes up (mysteriously).

The deserialization is failing due to "RpcEndpointRef".

I think somebody should be able to point me to a solution now, i guess.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Problems during upgrade 2.2.2 -> 2.4.4

bsikander
Any help would be much appreciated.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Problems during upgrade 2.2.2 -> 2.4.4

bsikander
Anyone?
This question is not regarding my application running on top of Spark.
The question is about the upgrade of spark itself from 2.2 to 2.4.

I expected atleast that spark would recover from upgrades gracefully and
recover its own persisted objects.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Problems during upgrade 2.2.2 -> 2.4.4

Shixiong(Ryan) Zhu
Unfortunately, Spark standalone mode doesn't support rolling update. All Spark components (master, worker, driver) must be updated to the same version. When using HA mode, the states persisted in zookeeper (or files if not using zookeeper) need to be cleaned because they are not compatible between versions. 

Best Regards,

Ryan


On Wed, Jan 29, 2020 at 2:12 AM bsikander <[hidden email]> wrote:
Anyone?
This question is not regarding my application running on top of Spark.
The question is about the upgrade of spark itself from 2.2 to 2.4.

I expected atleast that spark would recover from upgrades gracefully and
recover its own persisted objects.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

--

Best Regards,

Ryan
Reply | Threaded
Open this post in threaded view
|

Re: Problems during upgrade 2.2.2 -> 2.4.4

Shixiong(Ryan) Zhu
The reason of this is Spark RPC and the persisted states of HA mode are both using Java serialization to serialize internal classes which don't have any compatibility guarantee.

Best Regards,

Ryan


On Fri, Jan 31, 2020 at 9:08 AM Shixiong(Ryan) Zhu <[hidden email]> wrote:
Unfortunately, Spark standalone mode doesn't support rolling update. All Spark components (master, worker, driver) must be updated to the same version. When using HA mode, the states persisted in zookeeper (or files if not using zookeeper) need to be cleaned because they are not compatible between versions. 

Best Regards,

Ryan


On Wed, Jan 29, 2020 at 2:12 AM bsikander <[hidden email]> wrote:
Anyone?
This question is not regarding my application running on top of Spark.
The question is about the upgrade of spark itself from 2.2 to 2.4.

I expected atleast that spark would recover from upgrades gracefully and
recover its own persisted objects.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

--

Best Regards,

Ryan
Reply | Threaded
Open this post in threaded view
|

Re: Problems during upgrade 2.2.2 -> 2.4.4

bsikander
Thank you for your reply.

Which resource manager has support for rolling update? YARN?
Also where can I find this information in the documentation?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]