How does spark sql evaluate case statements?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How does spark sql evaluate case statements?

kant kodali
Hi All,

I have the following query and I was wondering if spark sql evaluates the same condition twice in the case statement below? I did .explain(true) and all I get is a table scan so not sure if spark sql evaluates the same condition twice? if it does, is there a way to return multiple values where each value belongs to a separate column?

SELECT
  CASE WHEN <condition 1> THEN <a1> WHEN <condition 2> THEN <a2> ELSE <a3> END AS result1,
  CASE WHEN <condition 1> THEN <b1> WHEN <condition 2> THEN <b2> ELSE <b3> END AS result2
FROM 
  <table> 

Thanks,
Reply | Threaded
Open this post in threaded view
|

Re: How does spark sql evaluate case statements?

Yeikel
I do not know the answer to this question so I am also looking for it,  but
@kant maybe the generated code can help with this.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How does spark sql evaluate case statements?

ZHANG Wei
Are you looking for this: https://spark.apache.org/docs/2.4.0/api/sql/#when ?

The code generated will look like this in a `do { ... } while (false)` loop:

  do {
    ${cond.code}
    if (!${cond.isNull} && ${cond.value}) {
      ${res.code}
      $resultState = (byte)(${res.isNull} ? $HAS_NULL : $HAS_NONNULL);
      ${ev.value} = ${res.value};
      continue;
    }

    <and the following CASE conditions ...>
  } while (false)

Refer to: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala#L208

Here is a full generated code sample for `spark.sql("select CASE WHEN age IS NULL THEN 'unknown' WHEN age < 30 THEN 'young' WHEN age < 40 THEN 'middle' ELSE 'senior' END, name  from people").show()` :

/* 034 */       byte project_caseWhenResultState_0 = -1;
/* 035 */       do {
/* 036 */         if (!false && scan_isNull_0) {
/* 037 */           project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
/* 038 */           project_mutableStateArray_0[0] = ((UTF8String) references[1] /* literal */);
/* 039 */           continue;
/* 040 */         }
/* 041 */
/* 042 */         boolean project_isNull_4 = true;
/* 043 */         boolean project_value_4 = false;
/* 044 */
/* 045 */         if (!scan_isNull_0) {
/* 046 */           project_isNull_4 = false; // resultCode could change nullability.
/* 047 */           project_value_4 = scan_value_0 < 30L;
/* 048 */
/* 049 */         }
/* 050 */         if (!project_isNull_4 && project_value_4) {
/* 051 */           project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
/* 052 */           project_mutableStateArray_0[0] = ((UTF8String) references[2] /* literal */);
/* 053 */           continue;
/* 054 */         }
/* 055 */
/* 056 */         boolean project_isNull_8 = true;
/* 057 */         boolean project_value_8 = false;
/* 058 */
/* 059 */         if (!scan_isNull_0) {
/* 060 */           project_isNull_8 = false; // resultCode could change nullability.
/* 061 */           project_value_8 = scan_value_0 < 40L;
/* 062 */
/* 063 */         }
/* 064 */         if (!project_isNull_8 && project_value_8) {
/* 065 */           project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
/* 066 */           project_mutableStateArray_0[0] = ((UTF8String) references[3] /* literal */);
/* 067 */           continue;
/* 068 */         }
/* 069 */
/* 070 */         project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
/* 071 */         project_mutableStateArray_0[0] = ((UTF8String) references[4] /* literal */);
/* 072 */
/* 073 */       } while (false);


Cheers,
-z
________________________________________
From: Yeikel <[hidden email]>
Sent: Wednesday, April 15, 2020 2:22
To: [hidden email]
Subject: Re: How does spark sql evaluate case statements?

I do not know the answer to this question so I am also looking for it,  but
@kant maybe the generated code can help with this.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How does spark sql evaluate case statements?

kant kodali
Thanks!

On Thu, Apr 16, 2020 at 9:57 PM ZHANG Wei <[hidden email]> wrote:
Are you looking for this: https://spark.apache.org/docs/2.4.0/api/sql/#when ?

The code generated will look like this in a `do { ... } while (false)` loop:

  do {
    ${cond.code}
    if (!${cond.isNull} && ${cond.value}) {
      ${res.code}
      $resultState = (byte)(${res.isNull} ? $HAS_NULL : $HAS_NONNULL);
      ${ev.value} = ${res.value};
      continue;
    }

    <and the following CASE conditions ...>
  } while (false)

Refer to: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala#L208

Here is a full generated code sample for `spark.sql("select CASE WHEN age IS NULL THEN 'unknown' WHEN age < 30 THEN 'young' WHEN age < 40 THEN 'middle' ELSE 'senior' END, name  from people").show()` :

/* 034 */       byte project_caseWhenResultState_0 = -1;
/* 035 */       do {
/* 036 */         if (!false && scan_isNull_0) {
/* 037 */           project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
/* 038 */           project_mutableStateArray_0[0] = ((UTF8String) references[1] /* literal */);
/* 039 */           continue;
/* 040 */         }
/* 041 */
/* 042 */         boolean project_isNull_4 = true;
/* 043 */         boolean project_value_4 = false;
/* 044 */
/* 045 */         if (!scan_isNull_0) {
/* 046 */           project_isNull_4 = false; // resultCode could change nullability.
/* 047 */           project_value_4 = scan_value_0 < 30L;
/* 048 */
/* 049 */         }
/* 050 */         if (!project_isNull_4 && project_value_4) {
/* 051 */           project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
/* 052 */           project_mutableStateArray_0[0] = ((UTF8String) references[2] /* literal */);
/* 053 */           continue;
/* 054 */         }
/* 055 */
/* 056 */         boolean project_isNull_8 = true;
/* 057 */         boolean project_value_8 = false;
/* 058 */
/* 059 */         if (!scan_isNull_0) {
/* 060 */           project_isNull_8 = false; // resultCode could change nullability.
/* 061 */           project_value_8 = scan_value_0 < 40L;
/* 062 */
/* 063 */         }
/* 064 */         if (!project_isNull_8 && project_value_8) {
/* 065 */           project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
/* 066 */           project_mutableStateArray_0[0] = ((UTF8String) references[3] /* literal */);
/* 067 */           continue;
/* 068 */         }
/* 069 */
/* 070 */         project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
/* 071 */         project_mutableStateArray_0[0] = ((UTF8String) references[4] /* literal */);
/* 072 */
/* 073 */       } while (false);


Cheers,
-z
________________________________________
From: Yeikel <[hidden email]>
Sent: Wednesday, April 15, 2020 2:22
To: [hidden email]
Subject: Re: How does spark sql evaluate case statements?

I do not know the answer to this question so I am also looking for it,  but
@kant maybe the generated code can help with this.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]