Mismatch in data type comparision results full data in Spark

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

Mismatch in data type comparision results full data in Spark

This post has NOT been accepted by the mailing list yet.

I am using where method of dataframe to filter data.
I am comparing Integer field with String type data, this comparision results full table data.
I have tested same scenario with HIVE and MYSQL but this comparision will not give any result.

Scenario :

 val sqlDf = df.where("f1 = 'abc'")
 here f1 : Integer
Logical and Physical Plan :
 == Parsed Logical Plan ==
'Filter ('f1 = abc)
+- Relation[f1#0] csv

== Analyzed Logical Plan ==
f1: int
Filter (cast(f1#0 as double) = cast(abc as double))
+- Relation[f1#0] csv

== Optimized Logical Plan ==
Filter (isnotnull(f1#0) && null)
+- Relation[f1#0] csv

== Physical Plan ==
*Project [f1#0]
+- *Filter isnotnull(f1#0)
   +- *Scan csv [f1#0] Format: CSV, InputPaths: file:/C:/Users/santlalg/IdeaProjects/SparkTestPoc/Int, PartitionFilters: [null], PushedFilters: [IsNotNull(f1)], ReadSchema: struct<f1:int>

In Optimized Logical Plan, why cast(f1#0 as double) > cast(abc as double) from Analyzed Logical Plan is replaced with null?
I am using below version of dependency:
Spark-core : 2.0.2
Spark-sql : 2.0.2

In My scenario this should be false, so that dataframe should not give any result.
Can someone help me to achieve this?