How to access global kryo instance?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to access global kryo instance?

Aureliano Buendia
Hi,

Is there a way to access the global kryo instance created by spark? I'm referring to the one which is passed to registerClasses() in a KryoRegistrator sub class.

I'd like to access this kryo instance inside a map closure, so it should be accessible from thw workers side too.
Reply | Threaded
Open this post in threaded view
|

Re: How to access global kryo instance?

Aaron Davidson
I believe SparkEnv.get.serializer would return the serializer created from the "spark.serializer" property.

You can also obtain a Kryo serializer directly via it's no-arg constructor (it still invokes your spark.kryo.registrator):
val serializer = new KryoSerializer()
but this could have some overhead, and so should probably not be done for every element you process.


On Mon, Jan 6, 2014 at 4:36 PM, Aureliano Buendia <[hidden email]> wrote:
Hi,

Is there a way to access the global kryo instance created by spark? I'm referring to the one which is passed to registerClasses() in a KryoRegistrator sub class.

I'd like to access this kryo instance inside a map closure, so it should be accessible from thw workers side too.

Reply | Threaded
Open this post in threaded view
|

Re: How to access global kryo instance?

Aureliano Buendia
In a map closure, I could use:

val ser = SparkEnv.get.serializer.asInstanceOf[KryoSerializer]


But how to get the instance of Kryo that spark uses from ser?


On Tue, Jan 7, 2014 at 1:04 AM, Aaron Davidson <[hidden email]> wrote:
I believe SparkEnv.get.serializer would return the serializer created from the "spark.serializer" property.

You can also obtain a Kryo serializer directly via it's no-arg constructor (it still invokes your spark.kryo.registrator):
val serializer = new KryoSerializer()
but this could have some overhead, and so should probably not be done for every element you process.


On Mon, Jan 6, 2014 at 4:36 PM, Aureliano Buendia <[hidden email]> wrote:
Hi,

Is there a way to access the global kryo instance created by spark? I'm referring to the one which is passed to registerClasses() in a KryoRegistrator sub class.

I'd like to access this kryo instance inside a map closure, so it should be accessible from thw workers side too.


Reply | Threaded
Open this post in threaded view
|

Re: How to access global kryo instance?

Aaron Davidson
Please take a look at the source code -- it's relatively friendly, and very useful for digging into Spark internals! (KryoSerializer)

As you can see, a Kryo instance is available via ser.newKryo(). You can also use Spark's SerializerInstance interface (which features serialize() and deserialize() methods) by simply calling ser.newInstance().


On Mon, Jan 6, 2014 at 5:20 PM, Aureliano Buendia <[hidden email]> wrote:
In a map closure, I could use:

val ser = SparkEnv.get.serializer.asInstanceOf[KryoSerializer]


But how to get the instance of Kryo that spark uses from ser?


On Tue, Jan 7, 2014 at 1:04 AM, Aaron Davidson <[hidden email]> wrote:
I believe SparkEnv.get.serializer would return the serializer created from the "spark.serializer" property.

You can also obtain a Kryo serializer directly via it's no-arg constructor (it still invokes your spark.kryo.registrator):
val serializer = new KryoSerializer()
but this could have some overhead, and so should probably not be done for every element you process.


On Mon, Jan 6, 2014 at 4:36 PM, Aureliano Buendia <[hidden email]> wrote:
Hi,

Is there a way to access the global kryo instance created by spark? I'm referring to the one which is passed to registerClasses() in a KryoRegistrator sub class.

I'd like to access this kryo instance inside a map closure, so it should be accessible from thw workers side too.



Reply | Threaded
Open this post in threaded view
|

Re: How to access global kryo instance?

Aureliano Buendia



On Tue, Jan 7, 2014 at 2:52 AM, Aaron Davidson <[hidden email]> wrote:
Please take a look at the source code -- it's relatively friendly, and very useful for digging into Spark internals! (KryoSerializer)

As you can see, a Kryo instance is available via ser.newKryo(). You can also use Spark's SerializerInstance interface (which features serialize() and deserialize() methods) by simply calling ser.newInstance().

Sorry, maybe I wasn't clear. What I meant was, does spark use a singleton instance of kryo that can be accessed inside the map closure?

Keep calling ser.newKryo() for every element (inside a map closure) has a huge overhead, and it seems newKryo() doesn't use any caching. Twitter chill uses an object pool for kryo instances, I'm not sure how spark handles this.
 


On Mon, Jan 6, 2014 at 5:20 PM, Aureliano Buendia <[hidden email]> wrote:
In a map closure, I could use:

val ser = SparkEnv.get.serializer.asInstanceOf[KryoSerializer]


But how to get the instance of Kryo that spark uses from ser?


On Tue, Jan 7, 2014 at 1:04 AM, Aaron Davidson <[hidden email]> wrote:
I believe SparkEnv.get.serializer would return the serializer created from the "spark.serializer" property.

You can also obtain a Kryo serializer directly via it's no-arg constructor (it still invokes your spark.kryo.registrator):
val serializer = new KryoSerializer()
but this could have some overhead, and so should probably not be done for every element you process.


On Mon, Jan 6, 2014 at 4:36 PM, Aureliano Buendia <[hidden email]> wrote:
Hi,

Is there a way to access the global kryo instance created by spark? I'm referring to the one which is passed to registerClasses() in a KryoRegistrator sub class.

I'd like to access this kryo instance inside a map closure, so it should be accessible from thw workers side too.




Reply | Threaded
Open this post in threaded view
|

Re: How to access global kryo instance?

Aaron Davidson
I see -- the answer is no, we do currently not use an object pool, but instead just try to create it less frequently (typically one SerializerInstance per partition). For instance, you could do

rdd.mapPartitions { partitionIterator =>
  val kryo = SparkEnv.get.serializer.newKryo()
  partitionIterator.map(row => doWorkWithKryo(kryo, row))
}

This should amortize the cost greatly. The only requirement of an instance is that it not be used by multiple threads simultaneously, and this fits that requirement perfectly.


On Mon, Jan 6, 2014 at 6:59 PM, Aureliano Buendia <[hidden email]> wrote:



On Tue, Jan 7, 2014 at 2:52 AM, Aaron Davidson <[hidden email]> wrote:
Please take a look at the source code -- it's relatively friendly, and very useful for digging into Spark internals! (KryoSerializer)

As you can see, a Kryo instance is available via ser.newKryo(). You can also use Spark's SerializerInstance interface (which features serialize() and deserialize() methods) by simply calling ser.newInstance().

Sorry, maybe I wasn't clear. What I meant was, does spark use a singleton instance of kryo that can be accessed inside the map closure?

Keep calling ser.newKryo() for every element (inside a map closure) has a huge overhead, and it seems newKryo() doesn't use any caching. Twitter chill uses an object pool for kryo instances, I'm not sure how spark handles this.
 


On Mon, Jan 6, 2014 at 5:20 PM, Aureliano Buendia <[hidden email]> wrote:
In a map closure, I could use:

val ser = SparkEnv.get.serializer.asInstanceOf[KryoSerializer]


But how to get the instance of Kryo that spark uses from ser?


On Tue, Jan 7, 2014 at 1:04 AM, Aaron Davidson <[hidden email]> wrote:
I believe SparkEnv.get.serializer would return the serializer created from the "spark.serializer" property.

You can also obtain a Kryo serializer directly via it's no-arg constructor (it still invokes your spark.kryo.registrator):
val serializer = new KryoSerializer()
but this could have some overhead, and so should probably not be done for every element you process.


On Mon, Jan 6, 2014 at 4:36 PM, Aureliano Buendia <[hidden email]> wrote:
Hi,

Is there a way to access the global kryo instance created by spark? I'm referring to the one which is passed to registerClasses() in a KryoRegistrator sub class.

I'd like to access this kryo instance inside a map closure, so it should be accessible from thw workers side too.