How to query on Cassandra and load results in Spark dataframe

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to query on Cassandra and load results in Spark dataframe

Soheil Pourbafrani
Hi,

Using the command 
val table = spark
.read
.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> "A", "keyspace" -> "B"))
.load
someone can load whole table data into a dataframe. Instead, I want to run a query in Cassandra and load just the result in dataframe (not whole table).

Is it possible in spark?
Reply | Threaded
Open this post in threaded view
|

Re: How to query on Cassandra and load results in Spark dataframe

Riccardo Ferrari
Hi Soheil,

You should able to apply some filter transformation. Spark is lazy evaluated and the actual loading from Cassandra happens only when an action triggers it. Find more here: https://spark.apache.org/docs/2.3.2/rdd-programming-guide.html#rdd-operations


You can always check everything goes as expected by asking Spark to `explain` your Dataframe: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset@explain():Unit

HTH,

On Wed, Jan 23, 2019 at 8:44 AM Soheil Pourbafrani <[hidden email]> wrote:
Hi,

Using the command 
val table = spark
.read
.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> "A", "keyspace" -> "B"))
.load
someone can load whole table data into a dataframe. Instead, I want to run a query in Cassandra and load just the result in dataframe (not whole table).

Is it possible in spark?