Mismatch in data type comparision results full data in Spark

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Mismatch in data type comparision results full data in Spark

This post has NOT been accepted by the mailing list yet.

I am using where method of dataframe to filter data.
I am comparing Integer field with String type data, this comparision results full table data.
I have tested same scenario with HIVE and MYSQL but this comparision will not give any result.

Scenario :

 val sqlDf = df.where("f1 = 'abc'")
 here f1 : Integer
Logical and Physical Plan :
 == Parsed Logical Plan ==
'Filter ('f1 = abc)
+- Relation[f1#0] csv

== Analyzed Logical Plan ==
f1: int
Filter (cast(f1#0 as double) = cast(abc as double))
+- Relation[f1#0] csv

== Optimized Logical Plan ==
Filter (isnotnull(f1#0) && null)
+- Relation[f1#0] csv

== Physical Plan ==
*Project [f1#0]
+- *Filter isnotnull(f1#0)
   +- *Scan csv [f1#0] Format: CSV, InputPaths: file:/C:/Users/santlalg/IdeaProjects/SparkTestPoc/Int, PartitionFilters: [null], PushedFilters: [IsNotNull(f1)], ReadSchema: struct<f1:int>

In Optimized Logical Plan, why cast(f1#0 as double) > cast(abc as double) from Analyzed Logical Plan is replaced with null?
I am using below version of dependency:
Spark-core : 2.0.2
Spark-sql : 2.0.2

In My scenario this should be false, so that dataframe should not give any result.
Can someone help me to achieve this?