3 equalTo "3.15" = true

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

3 equalTo "3.15" = true

Artur Sukhenko
Hello guys,
I am migrating from Spark 1.6 to 2.2 and have this issue:
I am casting string to short and comparing them with equal .
Original code is:
... when(col(fieldName).equalTo(castedValueCol), castedValueCol).
  otherwise(defaultErrorValueCol)
Reproduce (version 2.3.0.cloudera4):
scala> val df = Seq("3.15").toDF("tier_id")
df: org.apache.spark.sql.DataFrame = [tier_id: string]

scala> val colShort = col("tier_id").cast(ShortType)
colShort: org.apache.spark.sql.Column = CAST(tier_id AS SMALLINT)

scala> val colString = col("tier_id")
colString: org.apache.spark.sql.Column = tier_id

scala> res4.select(colString, colShort, colShort.equalTo(colString)).show
+-------+-------+-------------------------------------+
|tier_id|tier_id|(CAST(tier_id AS SMALLINT) = tier_id)|
+-------+-------+-------------------------------------+
|   3.15|      3|                                 true|
+-------+-------+-------------------------------------+
scala>

Why is this?
--
--
Artur Sukhenko
Reply | Threaded
Open this post in threaded view
|

Re: 3 equalTo "3.15" = true

Russell Spitzer
Run an "explain" instead of show, i'm betting it's casting tier_id to a small_int to do the comparison

On Wed, Feb 6, 2019 at 9:31 AM Artur Sukhenko <[hidden email]> wrote:
Hello guys,
I am migrating from Spark 1.6 to 2.2 and have this issue:
I am casting string to short and comparing them with equal .
Original code is:
... when(col(fieldName).equalTo(castedValueCol), castedValueCol).
  otherwise(defaultErrorValueCol)
Reproduce (version 2.3.0.cloudera4):
scala> val df = Seq("3.15").toDF("tier_id")
df: org.apache.spark.sql.DataFrame = [tier_id: string]

scala> val colShort = col("tier_id").cast(ShortType)
colShort: org.apache.spark.sql.Column = CAST(tier_id AS SMALLINT)

scala> val colString = col("tier_id")
colString: org.apache.spark.sql.Column = tier_id

scala> res4.select(colString, colShort, colShort.equalTo(colString)).show
+-------+-------+-------------------------------------+
|tier_id|tier_id|(CAST(tier_id AS SMALLINT) = tier_id)|
+-------+-------+-------------------------------------+
|   3.15|      3|                                 true|
+-------+-------+-------------------------------------+
scala>

Why is this?
--
--
Artur Sukhenko
Reply | Threaded
Open this post in threaded view
|

Re: 3 equalTo "3.15" = true

Artur Sukhenko
scala> df.select(colString, colShort, colShort.equalTo(colString)).explain
== Physical Plan ==
LocalTableScan [tier_id#3, tier_id#56, (CAST(tier_id AS SMALLINT) = tier_id)#50]


On Wed, Feb 6, 2019 at 6:19 PM Russell Spitzer <[hidden email]> wrote:
Run an "explain" instead of show, i'm betting it's casting tier_id to a small_int to do the comparison

On Wed, Feb 6, 2019 at 9:31 AM Artur Sukhenko <[hidden email]> wrote:
Hello guys,
I am migrating from Spark 1.6 to 2.2 and have this issue:
I am casting string to short and comparing them with equal .
Original code is:
... when(col(fieldName).equalTo(castedValueCol), castedValueCol).
  otherwise(defaultErrorValueCol)
Reproduce (version 2.3.0.cloudera4):
scala> val df = Seq("3.15").toDF("tier_id")
df: org.apache.spark.sql.DataFrame = [tier_id: string]

scala> val colShort = col("tier_id").cast(ShortType)
colShort: org.apache.spark.sql.Column = CAST(tier_id AS SMALLINT)

scala> val colString = col("tier_id")
colString: org.apache.spark.sql.Column = tier_id

scala> res4.select(colString, colShort, colShort.equalTo(colString)).show
+-------+-------+-------------------------------------+
|tier_id|tier_id|(CAST(tier_id AS SMALLINT) = tier_id)|
+-------+-------+-------------------------------------+
|   3.15|      3|                                 true|
+-------+-------+-------------------------------------+
scala>

Why is this?
--
--
Artur Sukhenko
--
--
Artur Sukhenko
Reply | Threaded
Open this post in threaded view
|

Re: 3 equalTo "3.15" = true

Artur Sukhenko
Probably it is wrong to compare StringType and ShortType.
I'll use something like this
df.select(colString, colShort, colShort.equalTo(colString.cast(DecimalType(38,15)))).show

On Wed, Feb 6, 2019 at 6:32 PM Artur Sukhenko <[hidden email]> wrote:
scala> df.select(colString, colShort, colShort.equalTo(colString)).explain
== Physical Plan ==
LocalTableScan [tier_id#3, tier_id#56, (CAST(tier_id AS SMALLINT) = tier_id)#50]


On Wed, Feb 6, 2019 at 6:19 PM Russell Spitzer <[hidden email]> wrote:
Run an "explain" instead of show, i'm betting it's casting tier_id to a small_int to do the comparison

On Wed, Feb 6, 2019 at 9:31 AM Artur Sukhenko <[hidden email]> wrote:
Hello guys,
I am migrating from Spark 1.6 to 2.2 and have this issue:
I am casting string to short and comparing them with equal .
Original code is:
... when(col(fieldName).equalTo(castedValueCol), castedValueCol).
  otherwise(defaultErrorValueCol)
Reproduce (version 2.3.0.cloudera4):
scala> val df = Seq("3.15").toDF("tier_id")
df: org.apache.spark.sql.DataFrame = [tier_id: string]

scala> val colShort = col("tier_id").cast(ShortType)
colShort: org.apache.spark.sql.Column = CAST(tier_id AS SMALLINT)

scala> val colString = col("tier_id")
colString: org.apache.spark.sql.Column = tier_id

scala> res4.select(colString, colShort, colShort.equalTo(colString)).show
+-------+-------+-------------------------------------+
|tier_id|tier_id|(CAST(tier_id AS SMALLINT) = tier_id)|
+-------+-------+-------------------------------------+
|   3.15|      3|                                 true|
+-------+-------+-------------------------------------+
scala>

Why is this?
--
--
Artur Sukhenko
--
--
Artur Sukhenko
--
--
Artur Sukhenko
Reply | Threaded
Open this post in threaded view
|

RE : 3 equalTo "3.15" = true

ddebarbieux
In reply to this post by Artur Sukhenko
I am confused since the two column have the same name.


________________________________________
De : Artur Sukhenko [[hidden email]]
Date d'envoi : mercredi 6 février 2019 17:32
À : Russell Spitzer
Cc : [hidden email]
Objet : Re: 3 equalTo "3.15" = true

scala> df.select(colString, colShort, colShort.equalTo(colString)).explain
== Physical Plan ==
LocalTableScan [tier_id#3, tier_id#56, (CAST(tier_id AS SMALLINT) = tier_id)#50]


On Wed, Feb 6, 2019 at 6:19 PM Russell Spitzer <[hidden email]<mailto:[hidden email]>> wrote:
Run an "explain" instead of show, i'm betting it's casting tier_id to a small_int to do the comparison

On Wed, Feb 6, 2019 at 9:31 AM Artur Sukhenko <[hidden email]<mailto:[hidden email]>> wrote:
Hello guys,
I am migrating from Spark 1.6 to 2.2 and have this issue:
I am casting string to short and comparing them with equal .
Original code is:
... when(col(fieldName).equalTo(castedValueCol), castedValueCol).

  otherwise(defaultErrorValueCol)

Reproduce (version 2.3.0.cloudera4):
scala> val df = Seq("3.15").toDF("tier_id")
df: org.apache.spark.sql.DataFrame = [tier_id: string]

scala> val colShort = col("tier_id").cast(ShortType)
colShort: org.apache.spark.sql.Column = CAST(tier_id AS SMALLINT)

scala> val colString = col("tier_id")
colString: org.apache.spark.sql.Column = tier_id

scala> res4.select(colString, colShort, colShort.equalTo(colString)).show
+-------+-------+-------------------------------------+
|tier_id|tier_id|(CAST(tier_id AS SMALLINT) = tier_id)|
+-------+-------+-------------------------------------+
|   3.15|      3|                                 true|
+-------+-------+-------------------------------------+
scala>

Why is this?
--
--
Artur Sukhenko
--
--
Artur Sukhenko

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]