How to improve performance in searching for URLs.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to improve performance in searching for URLs.

suman bharadwaj
Hi,

I was exploring SPARK. And in the process, I was trying to search a column containing URL.

Basically we are doing a contains operator on the column. This is taking around >3 min  to return the results. Is there any way to optimize this query ?

.filter( line=>line.contains("someUrl"))

I currently have a system in standalone mode with 8GB ram
Everything is stored in memory in De-serialized format. The data size in memory( De-serialized ) is around 1 GB. 


Any suggestions ?

Thanks in advance.

Regards,
SB
Reply | Threaded
Open this post in threaded view
|

Re: How to improve performance in searching for URLs.

Mayur Rustagi
Can you describe looking at the task list on spark dashboard around number of mappers & reducers and time taken by the same.




On Mon, Feb 3, 2014 at 12:39 AM, suman bharadwaj <[hidden email]> wrote:
Hi,

I was exploring SPARK. And in the process, I was trying to search a column containing URL.

Basically we are doing a contains operator on the column. This is taking around >3 min  to return the results. Is there any way to optimize this query ?

.filter( line=>line.contains("someUrl"))

I currently have a system in standalone mode with 8GB ram
Everything is stored in memory in De-serialized format. The data size in memory( De-serialized ) is around 1 GB. 


Any suggestions ?

Thanks in advance.

Regards,
SB