Spark reading from cassandra

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark reading from cassandra

Amit Sharma-2
Hi, i have a question while we are reading from cassandra should we use partition key only in where clause from performance perspective or it does not matter from spark perspective because it always allows filtering.


Thanks
Amit
Reply | Threaded
Open this post in threaded view
|

Re: Spark reading from cassandra

Russell Spitzer
Yes, the "Allow filtering" part isn't actually important other than for letting the query run in the first place. A where clause that utilizes a clustering column restriction will perform much better than a full scan, column pruning as well can be extremely beneficial.

On Wed, Nov 4, 2020 at 11:12 AM Amit Sharma <[hidden email]> wrote:
Hi, i have a question while we are reading from cassandra should we use partition key only in where clause from performance perspective or it does not matter from spark perspective because it always allows filtering.


Thanks
Amit
Reply | Threaded
Open this post in threaded view
|

Re: Spark reading from cassandra

Russell Spitzer
A where clause with a PK restriction should be identified by the Connector and transformed into a single request. This should still be much slower than doing the request directly but still much much faster than a full scan.

On Wed, Nov 4, 2020 at 12:51 PM Russell Spitzer <[hidden email]> wrote:
Yes, the "Allow filtering" part isn't actually important other than for letting the query run in the first place. A where clause that utilizes a clustering column restriction will perform much better than a full scan, column pruning as well can be extremely beneficial.

On Wed, Nov 4, 2020 at 11:12 AM Amit Sharma <[hidden email]> wrote:
Hi, i have a question while we are reading from cassandra should we use partition key only in where clause from performance perspective or it does not matter from spark perspective because it always allows filtering.


Thanks
Amit