multi-concurrent proccessing

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

multi-concurrent proccessing

Livni, Dana

Hi,

 

Wanted to know what is the best practice to a certain scenario we have.

 

we have a lot of batch processing on data stored in HBASE cluster. they are independent and need to run in parallel.

The current implementation we are using is running multiple independent process (each of them is multi treaded itself).

Each process raise one spark context and all it child thread are using it.

This creates a situation in which we raise around 150 concurrent spark context (each is used by 5-10 threads each preform about 4 map tasks).

 

It seems this implementation is not very efficient both in memory meaner (mainly for our batch server) and processing time on the cluster.

 

Wanted to know what will be the best way to do it?

We thought maybe create a service that will raise only one spark context and all the process and threads will send request to it.

Does anyone have insights if this will be better solution or maybe have other ideas.

Thanks in advanced

Dana

 

---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Reply | Threaded
Open this post in threaded view
|

multi-concurrent proccessing

Livni, Dana

Hi,

 

Wanted to know what is the best practice to a certain scenario we have.

 

we have a lot of batch processing on data stored in HBASE cluster. they are independent and need to run in parallel.

The current implementation we are using is running multiple independent process (each of them is multi treaded itself).

Each process raise one spark context and all it child thread are using it.

This creates a situation in which we raise around 150 concurrent spark context (each is used by 5-10 threads each preform about 4 map tasks).

 

It seems this implementation is not very efficient both in memory meaner (mainly for our batch server) and processing time on the cluster.

 

Wanted to know what will be the best way to do it?

We thought maybe create a service that will raise only one spark context and all the process and threads will send request to it.

Does anyone have insights if this will be better solution or maybe have other ideas.

Thanks in advanced

Dana

 

---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Reply | Threaded
Open this post in threaded view
|

Re: multi-concurrent proccessing

Mayur Rustagi
In reply to this post by Livni, Dana
You didnt specify what is the key blocker. Why is processing time underutilized? Are your threads processing the results hence spark jobs are not being deployed?




On Thu, Feb 20, 2014 at 7:30 PM, Livni, Dana <[hidden email]> wrote:

Hi,

 

Wanted to know what is the best practice to a certain scenario we have.

 

we have a lot of batch processing on data stored in HBASE cluster. they are independent and need to run in parallel.

The current implementation we are using is running multiple independent process (each of them is multi treaded itself).

Each process raise one spark context and all it child thread are using it.

This creates a situation in which we raise around 150 concurrent spark context (each is used by 5-10 threads each preform about 4 map tasks).

 

It seems this implementation is not very efficient both in memory meaner (mainly for our batch server) and processing time on the cluster.

 

Wanted to know what will be the best way to do it?

We thought maybe create a service that will raise only one spark context and all the process and threads will send request to it.

Does anyone have insights if this will be better solution or maybe have other ideas.

Thanks in advanced

Dana

 

---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.