Lightweight pipeline execution for single eow

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Lightweight pipeline execution for single eow

purijatin
Hi.

What tactics can I apply for such a scenario.

I have a pipeline of 10 stages. Simple text processing. I train the data with the pipeline and for the fitted data, do some modelling and store the results.

I also have a web-server, where I receive requests. For each request (dataframe of single row), I transform against the same pipeline created above. And do the respective action. The problem is: calling spark for single row takes less than  1 second, but under  higher  load, spark becomes  a major bottleneck.

One solution  that I can  think of, is to have scala re-implementation of the same pipeline, and with  the help of the model generated above, process the requests. But this results in  duplication of code and hence maintenance.

Is there any way, that I can call the same pipeline (transform) in a very light manner, and just for single row. So that it just works concurrently and spark does not remain a bottlenect?

Thanks
Jatin
Reply | Threaded
Open this post in threaded view
|

Re: Lightweight pipeline execution for single eow

MidwestMike
Are you using the scheduler in fair mode instead of fifo mode?

Sent from my iPhone

> On Sep 22, 2018, at 12:58 AM, Jatin Puri <[hidden email]> wrote:
>
> Hi.
>
> What tactics can I apply for such a scenario.
>
> I have a pipeline of 10 stages. Simple text processing. I train the data with the pipeline and for the fitted data, do some modelling and store the results.
>
> I also have a web-server, where I receive requests. For each request (dataframe of single row), I transform against the same pipeline created above. And do the respective action. The problem is: calling spark for single row takes less than  1 second, but under  higher  load, spark becomes  a major bottleneck.
>
> One solution  that I can  think of, is to have scala re-implementation of the same pipeline, and with  the help of the model generated above, process the requests. But this results in  duplication of code and hence maintenance.
>
> Is there any way, that I can call the same pipeline (transform) in a very light manner, and just for single row. So that it just works concurrently and spark does not remain a bottlenect?
>
> Thanks
> Jatin

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lightweight pipeline execution for single eow

purijatin
Using FAIR mode.

If no other way. I think there is a limitation on number of parallel jobs that spark can run. Is there a way that more number of jobs can run in  parallel. This is alright because, this sparkcontext would only be used during web service calls.
I looked at spark configuration  page and tried a few. But they didnt seem to work. I am using spark 2.3.1

Thanks.

On Sun, Sep 23, 2018 at 6:00 PM Michael Artz <[hidden email]> wrote:
Are you using the scheduler in fair mode instead of fifo mode?

Sent from my iPhone

> On Sep 22, 2018, at 12:58 AM, Jatin Puri <[hidden email]> wrote:
>
> Hi.
>
> What tactics can I apply for such a scenario.
>
> I have a pipeline of 10 stages. Simple text processing. I train the data with the pipeline and for the fitted data, do some modelling and store the results.
>
> I also have a web-server, where I receive requests. For each request (dataframe of single row), I transform against the same pipeline created above. And do the respective action. The problem is: calling spark for single row takes less than  1 second, but under  higher  load, spark becomes  a major bottleneck.
>
> One solution  that I can  think of, is to have scala re-implementation of the same pipeline, and with  the help of the model generated above, process the requests. But this results in  duplication of code and hence maintenance.
>
> Is there any way, that I can call the same pipeline (transform) in a very light manner, and just for single row. So that it just works concurrently and spark does not remain a bottlenect?
>
> Thanks
> Jatin


--