Sharing SparkContext

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Sharing SparkContext

abhinav chowdary
Hi,
       I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. 

Below is code of a simple app i am testing

 def main(args: Array[String]) {
    println("Welcome to example application!")

    val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")

    println("Spark context created!")

    println("Creating RDD!")

Now once this context is created i want to access  this to submit multiple jobs/operations

Any help is much appreciated

Thanks



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing SparkContext

Mayur Rustagi
how do you want to pass the operations to the spark context?




On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <[hidden email]> wrote:
Hi,
       I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. 

Below is code of a simple app i am testing

 def main(args: Array[String]) {
    println("Welcome to example application!")

    val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")

    println("Spark context created!")

    println("Creating RDD!")

Now once this context is created i want to access  this to submit multiple jobs/operations

Any help is much appreciated

Thanks




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing SparkContext

abhinav chowdary
Sorry for not being clear earlier
how do you want to pass the operations to the spark context?
this is partly what i am looking for . How to access the active spark context and possible ways to pass operations

Thanks



On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <[hidden email]> wrote:
how do you want to pass the operations to the spark context?


Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <[hidden email]> wrote:
Hi,
       I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. 

Below is code of a simple app i am testing

 def main(args: Array[String]) {
    println("Welcome to example application!")

    val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")

    println("Spark context created!")

    println("Creating RDD!")

Now once this context is created i want to access  this to submit multiple jobs/operations

Any help is much appreciated

Thanks







--
Warm Regards
Abhinav Chowdary
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing SparkContext

Mayur Rustagi
So there is no way to share context currently, 
1. you can try jobserver by Ooyala but I havnt used it & frankly nobody has shared feedback on it.
2. If you can load that rdd to Shark then you get a sql interface on that RDD + columnar storage 
3. You can try a crude method of starting a spark shell & passing commands to it after receiving them through html interface etc.. but you'll have to do the hard work of managing concurrency. 
I was wondering about the usecase, are you looking to pass the spark closure on rdd & transforming it each time or looking to avoid caching RDD again & again.
 






On Tue, Feb 25, 2014 at 10:08 AM, abhinav chowdary <[hidden email]> wrote:
Sorry for not being clear earlier

how do you want to pass the operations to the spark context?
this is partly what i am looking for . How to access the active spark context and possible ways to pass operations

Thanks



On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <[hidden email]> wrote:
how do you want to pass the operations to the spark context?


Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <[hidden email]> wrote:
Hi,
       I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. 

Below is code of a simple app i am testing

 def main(args: Array[String]) {
    println("Welcome to example application!")

    val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")

    println("Spark context created!")

    println("Creating RDD!")

Now once this context is created i want to access  this to submit multiple jobs/operations

Any help is much appreciated

Thanks







--
Warm Regards
Abhinav Chowdary

od
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing SparkContext

od
In reply to this post by abhinav chowdary
Doesn't the fair scheduler solve this?
Ognen

On 2/25/14, 12:08 PM, abhinav chowdary wrote:
Sorry for not being clear earlier
how do you want to pass the operations to the spark context?
this is partly what i am looking for . How to access the active spark context and possible ways to pass operations

Thanks



On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <[hidden email]> wrote:
how do you want to pass the operations to the spark context?


Mayur Rustagi
Ph: <a moz-do-not-send="true" href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <[hidden email]> wrote:
Hi,
       I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. 

Below is code of a simple app i am testing

 def main(args: Array[String]) {
    println("Welcome to example application!")

    val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")

    println("Spark context created!")

    println("Creating RDD!")

Now once this context is created i want to access  this to submit multiple jobs/operations

Any help is much appreciated

Thanks







--
Warm Regards
Abhinav Chowdary

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing SparkContext

Mayur Rustagi
fair scheduler merely reorders tasks .. I think he is looking to run multiple pieces of code on a single context on demand from customers...if the code & order is decided then fair scheduler will ensure that all tasks get equal cluster time :) 



Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski <[hidden email]> wrote:
Doesn't the fair scheduler solve this?
Ognen


On 2/25/14, 12:08 PM, abhinav chowdary wrote:
Sorry for not being clear earlier
how do you want to pass the operations to the spark context?
this is partly what i am looking for . How to access the active spark context and possible ways to pass operations

Thanks



On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <[hidden email]> wrote:
how do you want to pass the operations to the spark context?


Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <[hidden email]> wrote:
Hi,
       I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. 

Below is code of a simple app i am testing

 def main(args: Array[String]) {
    println("Welcome to example application!")

    val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")

    println("Spark context created!")

    println("Creating RDD!")

Now once this context is created i want to access  this to submit multiple jobs/operations

Any help is much appreciated

Thanks







--
Warm Regards
Abhinav Chowdary


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing SparkContext

abhinav chowdary
Thank You Mayur

I will try Ooyala job server to begin with. Is there a way to load RDD created via sparkContext into shark? Only reason i ask is my RDD is being created from Cassandra (not Hadoop,  we are trying to get shark work with Cassandra as well, having troubles with it when running in distributed mode). 


On Tue, Feb 25, 2014 at 10:30 AM, Mayur Rustagi <[hidden email]> wrote:
fair scheduler merely reorders tasks .. I think he is looking to run multiple pieces of code on a single context on demand from customers...if the code & order is decided then fair scheduler will ensure that all tasks get equal cluster time :) 



Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski <[hidden email]> wrote:
Doesn't the fair scheduler solve this?
Ognen


On 2/25/14, 12:08 PM, abhinav chowdary wrote:
Sorry for not being clear earlier
how do you want to pass the operations to the spark context?
this is partly what i am looking for . How to access the active spark context and possible ways to pass operations

Thanks



On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <[hidden email]> wrote:
how do you want to pass the operations to the spark context?


Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <[hidden email]> wrote:
Hi,
       I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. 

Below is code of a simple app i am testing

 def main(args: Array[String]) {
    println("Welcome to example application!")

    val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")

    println("Spark context created!")

    println("Creating RDD!")

Now once this context is created i want to access  this to submit multiple jobs/operations

Any help is much appreciated

Thanks







--
Warm Regards
Abhinav Chowdary





--
Warm Regards
Abhinav Chowdary
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing SparkContext

Ognen Duzlevski-2
In reply to this post by Mayur Rustagi

On 2/25/14, 12:24 PM, Mayur Rustagi wrote:
> So there is no way to share context currently,
> 1. you can try jobserver by Ooyala but I havnt used it & frankly
> nobody has shared feedback on it.

One of the major show stoppers for me is that when compiled with Hadoop
2.2.0 - Ooyala standalone server from the jobserver branch does not
work. If you are OK staying with 1.0.4, it does work.

Ognen
od
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing SparkContext

od
In reply to this post by Mayur Rustagi
In that case, I must have misunderstood the following (from http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html). Apologies. Ognen

"Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users).

By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided into “stages” (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the head of the queue don’t need to use the whole cluster, later jobs can start to run right away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly.

Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.

To enable the fair scheduler, simply set the spark.scheduler.mode to FAIR before creating a SparkContext:"

On 2/25/14, 12:30 PM, Mayur Rustagi wrote:
fair scheduler merely reorders tasks .. I think he is looking to run multiple pieces of code on a single context on demand from customers...if the code & order is decided then fair scheduler will ensure that all tasks get equal cluster time :) 



Mayur Rustagi
Ph: <a moz-do-not-send="true" href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski <[hidden email]> wrote:
Doesn't the fair scheduler solve this?
Ognen


On 2/25/14, 12:08 PM, abhinav chowdary wrote:
Sorry for not being clear earlier
how do you want to pass the operations to the spark context?
this is partly what i am looking for . How to access the active spark context and possible ways to pass operations

Thanks



On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <[hidden email]> wrote:
how do you want to pass the operations to the spark context?


Mayur Rustagi
Ph: <a moz-do-not-send="true" href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <[hidden email]> wrote:
Hi,
       I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. 

Below is code of a simple app i am testing

 def main(args: Array[String]) {
    println("Welcome to example application!")

    val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")

    println("Spark context created!")

    println("Creating RDD!")

Now once this context is created i want to access  this to submit multiple jobs/operations

Any help is much appreciated

Thanks







--
Warm Regards
Abhinav Chowdary



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing SparkContext

abhinav chowdary

for any one who is interested to know about job server from Ooyala.. we started using it recently and been working great so far..

On Feb 25, 2014 9:23 PM, "Ognen Duzlevski" <[hidden email]> wrote:
In that case, I must have misunderstood the following (from http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html). Apologies. Ognen

"Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users).

By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided into “stages” (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the head of the queue don’t need to use the whole cluster, later jobs can start to run right away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly.

Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.

To enable the fair scheduler, simply set the spark.scheduler.mode to FAIR before creating a SparkContext:"

On 2/25/14, 12:30 PM, Mayur Rustagi wrote:
fair scheduler merely reorders tasks .. I think he is looking to run multiple pieces of code on a single context on demand from customers...if the code & order is decided then fair scheduler will ensure that all tasks get equal cluster time :) 



Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski <[hidden email]> wrote:
Doesn't the fair scheduler solve this?
Ognen


On 2/25/14, 12:08 PM, abhinav chowdary wrote:
Sorry for not being clear earlier
how do you want to pass the operations to the spark context?
this is partly what i am looking for . How to access the active spark context and possible ways to pass operations

Thanks



On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <[hidden email]> wrote:
how do you want to pass the operations to the spark context?


Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <[hidden email]> wrote:
Hi,
       I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. 

Below is code of a simple app i am testing

 def main(args: Array[String]) {
    println("Welcome to example application!")

    val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")

    println("Spark context created!")

    println("Creating RDD!")

Now once this context is created i want to access  this to submit multiple jobs/operations

Any help is much appreciated

Thanks







--
Warm Regards
Abhinav Chowdary



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing SparkContext

Mayur Rustagi
Which version of Spark  are you using?


Mayur Rustagi
Ph: +1 (760) 203 3257


On Mon, Mar 10, 2014 at 6:49 PM, abhinav chowdary <[hidden email]> wrote:

for any one who is interested to know about job server from Ooyala.. we started using it recently and been working great so far..

On Feb 25, 2014 9:23 PM, "Ognen Duzlevski" <[hidden email]> wrote:
In that case, I must have misunderstood the following (from http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html). Apologies. Ognen

"Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users).

By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided into “stages” (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the head of the queue don’t need to use the whole cluster, later jobs can start to run right away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly.

Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.

To enable the fair scheduler, simply set the spark.scheduler.mode to FAIR before creating a SparkContext:"

On 2/25/14, 12:30 PM, Mayur Rustagi wrote:
fair scheduler merely reorders tasks .. I think he is looking to run multiple pieces of code on a single context on demand from customers...if the code & order is decided then fair scheduler will ensure that all tasks get equal cluster time :) 



Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski <[hidden email]> wrote:
Doesn't the fair scheduler solve this?
Ognen


On 2/25/14, 12:08 PM, abhinav chowdary wrote:
Sorry for not being clear earlier
how do you want to pass the operations to the spark context?
this is partly what i am looking for . How to access the active spark context and possible ways to pass operations

Thanks



On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <[hidden email]> wrote:
how do you want to pass the operations to the spark context?


Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <[hidden email]> wrote:
Hi,
       I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. 

Below is code of a simple app i am testing

 def main(args: Array[String]) {
    println("Welcome to example application!")

    val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")

    println("Spark context created!")

    println("Creating RDD!")

Now once this context is created i want to access  this to submit multiple jobs/operations

Any help is much appreciated

Thanks







--
Warm Regards
Abhinav Chowdary




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing SparkContext

abhinav chowdary

0.8.1 we used branch 0.8 and  pull request into our local repo. I remember we have to deal with few issues but once we are thought that its working great.

On Mar 10, 2014 6:51 PM, "Mayur Rustagi" <[hidden email]> wrote:
Which version of Spark  are you using?


Mayur Rustagi
Ph: <a href="tel:%2B1%20%28760%29%20203%203257" value="+17602033257" target="_blank">+1 (760) 203 3257


On Mon, Mar 10, 2014 at 6:49 PM, abhinav chowdary <[hidden email]> wrote:

for any one who is interested to know about job server from Ooyala.. we started using it recently and been working great so far..

On Feb 25, 2014 9:23 PM, "Ognen Duzlevski" <[hidden email]> wrote:
In that case, I must have misunderstood the following (from http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html). Apologies. Ognen

"Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users).

By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided into “stages” (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the head of the queue don’t need to use the whole cluster, later jobs can start to run right away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly.

Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.

To enable the fair scheduler, simply set the spark.scheduler.mode to FAIR before creating a SparkContext:"

On 2/25/14, 12:30 PM, Mayur Rustagi wrote:
fair scheduler merely reorders tasks .. I think he is looking to run multiple pieces of code on a single context on demand from customers...if the code & order is decided then fair scheduler will ensure that all tasks get equal cluster time :) 



Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski <[hidden email]> wrote:
Doesn't the fair scheduler solve this?
Ognen


On 2/25/14, 12:08 PM, abhinav chowdary wrote:
Sorry for not being clear earlier
how do you want to pass the operations to the spark context?
this is partly what i am looking for . How to access the active spark context and possible ways to pass operations

Thanks



On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <[hidden email]> wrote:
how do you want to pass the operations to the spark context?


Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <[hidden email]> wrote:
Hi,
       I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. 

Below is code of a simple app i am testing

 def main(args: Array[String]) {
    println("Welcome to example application!")

    val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")

    println("Spark context created!")

    println("Creating RDD!")

Now once this context is created i want to access  this to submit multiple jobs/operations

Any help is much appreciated

Thanks







--
Warm Regards
Abhinav Chowdary




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing SparkContext

Ognen Duzlevski-2
In reply to this post by abhinav chowdary
Are you using it with HDFS? What version of Hadoop? 1.0.4?
Ognen

On 3/10/14, 8:49 PM, abhinav chowdary wrote:

for any one who is interested to know about job server from Ooyala.. we started using it recently and been working great so far..

On Feb 25, 2014 9:23 PM, "Ognen Duzlevski" <[hidden email]> wrote:
In that case, I must have misunderstood the following (from http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html). Apologies. Ognen

"Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users).

By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided into “stages” (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the head of the queue don’t need to use the whole cluster, later jobs can start to run right away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly.

Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.

To enable the fair scheduler, simply set the spark.scheduler.mode to FAIR before creating a SparkContext:"

On 2/25/14, 12:30 PM, Mayur Rustagi wrote:
fair scheduler merely reorders tasks .. I think he is looking to run multiple pieces of code on a single context on demand from customers...if the code & order is decided then fair scheduler will ensure that all tasks get equal cluster time :) 



Mayur Rustagi
Ph: <a moz-do-not-send="true" href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski <[hidden email]> wrote:
Doesn't the fair scheduler solve this?
Ognen


On 2/25/14, 12:08 PM, abhinav chowdary wrote:
Sorry for not being clear earlier
how do you want to pass the operations to the spark context?
this is partly what i am looking for . How to access the active spark context and possible ways to pass operations

Thanks



On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <[hidden email]> wrote:
how do you want to pass the operations to the spark context?


Mayur Rustagi
Ph: <a moz-do-not-send="true" href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <[hidden email]> wrote:
Hi,
       I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. 

Below is code of a simple app i am testing

 def main(args: Array[String]) {
    println("Welcome to example application!")

    val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")

    println("Spark context created!")

    println("Creating RDD!")

Now once this context is created i want to access  this to submit multiple jobs/operations

Any help is much appreciated

Thanks







--
Warm Regards
Abhinav Chowdary




-- 
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
-- Jamie Zawinski
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sharing SparkContext

abhinav chowdary

hdfs 1.0.4 but we primarily use Cassandra + Spark (calliope). I tested it with both

Are you using it with HDFS? What version of Hadoop? 1.0.4?
Ognen

On 3/10/14, 8:49 PM, abhinav chowdary wrote:

for any one who is interested to know about job server from Ooyala.. we started using it recently and been working great so far..

On Feb 25, 2014 9:23 PM, "Ognen Duzlevski" <[hidden email]> wrote:
In that case, I must have misunderstood the following (from http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html). Apologies. Ognen

"Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users).

By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided into “stages” (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the head of the queue don’t need to use the whole cluster, later jobs can start to run right away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly.

Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.

To enable the fair scheduler, simply set the spark.scheduler.mode to FAIR before creating a SparkContext:"

On 2/25/14, 12:30 PM, Mayur Rustagi wrote:
fair scheduler merely reorders tasks .. I think he is looking to run multiple pieces of code on a single context on demand from customers...if the code & order is decided then fair scheduler will ensure that all tasks get equal cluster time :) 



Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski <[hidden email]> wrote:
Doesn't the fair scheduler solve this?
Ognen


On 2/25/14, 12:08 PM, abhinav chowdary wrote:
Sorry for not being clear earlier
how do you want to pass the operations to the spark context?
this is partly what i am looking for . How to access the active spark context and possible ways to pass operations

Thanks



On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <[hidden email]> wrote:
how do you want to pass the operations to the spark context?


Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <[hidden email]> wrote:
Hi,
       I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. 

Below is code of a simple app i am testing

 def main(args: Array[String]) {
    println("Welcome to example application!")

    val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")

    println("Spark context created!")

    println("Creating RDD!")

Now once this context is created i want to access  this to submit multiple jobs/operations

Any help is much appreciated

Thanks







--
Warm Regards
Abhinav Chowdary




-- 
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
-- Jamie Zawinski
Loading...