Suggestions on using scala/python for Spark Streaming

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Suggestions on using scala/python for Spark Streaming

umargeek
We are building a spark streaming application which is process and time
intensive and currently using python API but looking forward for suggestions
whether to use Scala over python such as pro's and con's as we are planning
to production setup as next step?

Thanks,
Umar



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Suggestions on using scala/python for Spark Streaming

lucas.gary@gmail.com
I don't have any specific wisdom for you on that front.  But I've always been served well by the 'Try both' approach.

Set up your benchmarks, configure both setups...  You don't have to go the whole hog, but just enough to get a mostly realistic implementation functional.  Run them both with some captured / fixture data...  And compare.

I personally haven't come across a situation where you just have to go scala, but I've come across multiple situations where it was preferable but not by a big enough margin to retool a team and a product.  

On the plus side you'll be well setup for integration tests with whichever system you end up rolling out!

Good luck!  and i'd love to hear any findings discovery you may come across!

Gary Lucas

On 26 October 2017 at 09:22, umargeek <[hidden email]> wrote:
We are building a spark streaming application which is process and time
intensive and currently using python API but looking forward for suggestions
whether to use Scala over python such as pro's and con's as we are planning
to production setup as next step?

Thanks,
Umar



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Suggestions on using scala/python for Spark Streaming

sebastian.piu

Have a look at how pyspark works in conjunction with spark as it is not just a matter of language preference. There are several implications and a performance price to pay if you go with python.

At the end of the day only you can answer whether that price is worth over retraining your team in another language, but if performance is a key decision factor then there isn't much debate and go for Scala


On Thu, 26 Oct 2017, 17:31 [hidden email], <[hidden email]> wrote:
I don't have any specific wisdom for you on that front.  But I've always been served well by the 'Try both' approach.

Set up your benchmarks, configure both setups...  You don't have to go the whole hog, but just enough to get a mostly realistic implementation functional.  Run them both with some captured / fixture data...  And compare.

I personally haven't come across a situation where you just have to go scala, but I've come across multiple situations where it was preferable but not by a big enough margin to retool a team and a product.  

On the plus side you'll be well setup for integration tests with whichever system you end up rolling out!

Good luck!  and i'd love to hear any findings discovery you may come across!

Gary Lucas

On 26 October 2017 at 09:22, umargeek <[hidden email]> wrote:
We are building a spark streaming application which is process and time
intensive and currently using python API but looking forward for suggestions
whether to use Scala over python such as pro's and con's as we are planning
to production setup as next step?

Thanks,
Umar



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]