sparkSession.sql("sql query") vs df.sqlContext().sql(this.query) ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

sparkSession.sql("sql query") vs df.sqlContext().sql(this.query) ?

kant kodali
Hi All,

I have the following snippets of the code and I wonder what is the difference between these two and which one should I use? I am using spark 2.2.

Dataset<Row> df = sparkSession.readStream()
.format("kafka")
.load();

df.createOrReplaceTempView("table");
df.printSchema();

Dataset<Row> resultSet = df.sqlContext().sql(
"select value from table"); //sparkSession.sql(this.query);
StreamingQuery streamingQuery = resultSet
.writeStream()
.trigger(Trigger.
ProcessingTime(1000))
.format(
"console")
.start();

vs

Dataset<Row> df = sparkSession.readStream()
.format(
"kafka")
.load();

df.createOrReplaceTempView(
"table");

Dataset<Row> resultSet = sparkSession.sql(
"select value from table"); //sparkSession.sql(this.query);
StreamingQuery streamingQuery = resultSet
.writeStream()
.trigger(Trigger.
ProcessingTime(1000))
.format(
"console")
.start();

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: sparkSession.sql("sql query") vs df.sqlContext().sql(this.query) ?

khathiravan raj maadhaven
Hi Kant,

Based on my understanding, I think the only difference is the overhead of the selection/creation of SqlContext for the query you have passed. As the table / view is already available for use, sparkSession.sql('your query') should be simple & good enough.

Following uses the session/context by default created and available:
 sparkSession.sql("select value from table")
while the following would look for create one & run the query (which I believe is extra overhead):
df.sqlContext().sql("select value from table")

Regards
Raj


On Wed, Dec 6, 2017 at 6:07 PM, kant kodali <[hidden email]> wrote:
Hi All,

I have the following snippets of the code and I wonder what is the difference between these two and which one should I use? I am using spark 2.2.

Dataset<Row> df = sparkSession.readStream()
.format("kafka")
.load();

df.createOrReplaceTempView("table");
df.printSchema();

Dataset<Row> resultSet = df.sqlContext().sql(
"select value from table"); //sparkSession.sql(this.query);
StreamingQuery streamingQuery = resultSet
.writeStream()
.trigger(Trigger.
ProcessingTime(1000))
.format(
"console")
.start();

vs

Dataset<Row> df = sparkSession.readStream()
.format(
"kafka")
.load();

df.createOrReplaceTempView(
"table");

Dataset<Row> resultSet = sparkSession.sql(
"select value from table"); //sparkSession.sql(this.query);
StreamingQuery streamingQuery = resultSet
.writeStream()
.trigger(Trigger.
ProcessingTime(1000))
.format(
"console")
.start();

Thanks!