Filtering on multiple columns in spark

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Filtering on multiple columns in spark

Mich Talebzadeh

Hi,

 

Trying to filter a dataframe with multiple conditions using OR "||" as below

 

  val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                   filter(length(col("target_mobile_no")) !== 10 || substring(col("target_mobile_no"),1,1) !== "7")

 

This throws this error

 

res12: org.apache.spark.sql.DataFrame = []

<console>:49: error: value || is not a member of Int

                          filter(length(col("target_mobile_no")) !== 10 || substring(col("target_mobile_no"),1,1) !== "7")

 

Try another way

 

val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                   filter(length(col("target_mobile_no")) !=== 10 || substring(col("target_mobile_no"),1,1) !=== "7")

  rejectedDF.createOrReplaceTempView("tmp")

 

Tried few options but I am still getting this error

 

<console>:49: error: value !=== is not a member of org.apache.spark.sql.Column

                          filter(length(col("target_mobile_no")) !=== 10 || substring(col("target_mobile_no"),1,1) !=== "7")

                                                                 ^

<console>:49: error: value || is not a member of Int

                          filter(length(col("target_mobile_no")) !=== 10 || substring(col("target_mobile_no"),1,1) !=== "7")

              

I can create a dataframe for each filter but that does not look efficient to me?

 

Thanks

 


Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 

Reply | Threaded
Open this post in threaded view
|

Re: Filtering on multiple columns in spark

Som Lima
From your email the obvious seems to be that 
10  is an Int because it is not surrounded in quotes ""
10 should be "10".

Although I can't image a telephone number with only 10 because that is what you are trying to program.


In Scala, you can check if two operands are equal ( == ) or not ( != ) and it returns true if the condition is met, false if not ( else ). By itself, ! is called the Logical NOT Operator.

On Wed, 29 Apr 2020, 08:45 Mich Talebzadeh, <[hidden email]> wrote:

Hi,

 

Trying to filter a dataframe with multiple conditions using OR "||" as below

 

  val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                   filter(length(col("target_mobile_no")) !== 10 || substring(col("target_mobile_no"),1,1) !== "7")

 

This throws this error

 

res12: org.apache.spark.sql.DataFrame = []

<console>:49: error: value || is not a member of Int

                          filter(length(col("target_mobile_no")) !== 10 || substring(col("target_mobile_no"),1,1) !== "7")

 

Try another way

 

val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                   filter(length(col("target_mobile_no")) !=== 10 || substring(col("target_mobile_no"),1,1) !=== "7")

  rejectedDF.createOrReplaceTempView("tmp")

 

Tried few options but I am still getting this error

 

<console>:49: error: value !=== is not a member of org.apache.spark.sql.Column

                          filter(length(col("target_mobile_no")) !=== 10 || substring(col("target_mobile_no"),1,1) !=== "7")

                                                                 ^

<console>:49: error: value || is not a member of Int

                          filter(length(col("target_mobile_no")) !=== 10 || substring(col("target_mobile_no"),1,1) !=== "7")

              

I can create a dataframe for each filter but that does not look efficient to me?

 

Thanks

 


Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 

Reply | Threaded
Open this post in threaded view
|

Re: Filtering on multiple columns in spark

ZHANG Wei
In reply to this post by Mich Talebzadeh
AFAICT, maybe Spark SQL built-in functions[1] can help as below:

scala> df.show()
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+


scala> df.filter("length(name) == 4 or substring(name, 1, 1) == 'J'").show()
+---+------+
|age|  name|
+---+------+
| 30|  Andy|
| 19|Justin|
+---+------+


--
Cheers,
-z
[1] https://spark.apache.org/docs/latest/api/sql/index.html

On Wed, 29 Apr 2020 08:45:26 +0100
Mich Talebzadeh <[hidden email]> wrote:

> Hi,
>
>
>
> Trying to filter a dataframe with multiple conditions using OR "||" as below
>
>
>
>   val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
>
>                    filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
>
>
>
> This throws this error
>
>
>
> res12: org.apache.spark.sql.DataFrame = []
>
> <console>:49: error: value || is not a member of Int
>
>                           filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
>
>
>
> Try another way
>
>
>
> val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
>
>                    filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>   rejectedDF.createOrReplaceTempView("tmp")
>
>
>
> Tried few options but I am still getting this error
>
>
>
> <console>:49: error: value !=== is not a member of
> org.apache.spark.sql.Column
>
>                           filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>                                                                  ^
>
> <console>:49: error: value || is not a member of Int
>
>                           filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>
>
> I can create a dataframe for each filter but that does not look efficient
> to me?
>
>
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Filtering on multiple columns in spark

Mich Talebzadeh
Hi Zhang,

Yes the SQL way worked fine

  val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                   filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1) != '7'")


Many thanks,

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Wed, 29 Apr 2020 at 09:51, ZHANG Wei <[hidden email]> wrote:
AFAICT, maybe Spark SQL built-in functions[1] can help as below:

scala> df.show()
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+


scala> df.filter("length(name) == 4 or substring(name, 1, 1) == 'J'").show()
+---+------+
|age|  name|
+---+------+
| 30|  Andy|
| 19|Justin|
+---+------+


--
Cheers,
-z
[1] https://spark.apache.org/docs/latest/api/sql/index.html

On Wed, 29 Apr 2020 08:45:26 +0100
Mich Talebzadeh <[hidden email]> wrote:

> Hi,
>
>
>
> Trying to filter a dataframe with multiple conditions using OR "||" as below
>
>
>
>   val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
>
>                    filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
>
>
>
> This throws this error
>
>
>
> res12: org.apache.spark.sql.DataFrame = []
>
> <console>:49: error: value || is not a member of Int
>
>                           filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
>
>
>
> Try another way
>
>
>
> val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
>
>                    filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>   rejectedDF.createOrReplaceTempView("tmp")
>
>
>
> Tried few options but I am still getting this error
>
>
>
> <console>:49: error: value !=== is not a member of
> org.apache.spark.sql.Column
>
>                           filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>                                                                  ^
>
> <console>:49: error: value || is not a member of Int
>
>                           filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>
>
> I can create a dataframe for each filter but that does not look efficient
> to me?
>
>
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
Reply | Threaded
Open this post in threaded view
|

Re: Filtering on multiple columns in spark

Mich Talebzadeh
OK how do you pass variables for 10 and '7' 

 val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                   filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1) != '7'")


in above in Scala. Neither $ value below or lit() are working!

   val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                     filter("length(target_mobile_no) != ${broadcastStagingConfig.mobileNoLength} OR substring(target_mobile_no,1,1) != ${broadcastStagingConfig.ukMobileNoStart}")


Thanks





Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Wed, 29 Apr 2020 at 10:15, Mich Talebzadeh <[hidden email]> wrote:
Hi Zhang,

Yes the SQL way worked fine

  val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                   filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1) != '7'")


Many thanks,

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Wed, 29 Apr 2020 at 09:51, ZHANG Wei <[hidden email]> wrote:
AFAICT, maybe Spark SQL built-in functions[1] can help as below:

scala> df.show()
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+


scala> df.filter("length(name) == 4 or substring(name, 1, 1) == 'J'").show()
+---+------+
|age|  name|
+---+------+
| 30|  Andy|
| 19|Justin|
+---+------+


--
Cheers,
-z
[1] https://spark.apache.org/docs/latest/api/sql/index.html

On Wed, 29 Apr 2020 08:45:26 +0100
Mich Talebzadeh <[hidden email]> wrote:

> Hi,
>
>
>
> Trying to filter a dataframe with multiple conditions using OR "||" as below
>
>
>
>   val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
>
>                    filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
>
>
>
> This throws this error
>
>
>
> res12: org.apache.spark.sql.DataFrame = []
>
> <console>:49: error: value || is not a member of Int
>
>                           filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
>
>
>
> Try another way
>
>
>
> val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
>
>                    filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>   rejectedDF.createOrReplaceTempView("tmp")
>
>
>
> Tried few options but I am still getting this error
>
>
>
> <console>:49: error: value !=== is not a member of
> org.apache.spark.sql.Column
>
>                           filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>                                                                  ^
>
> <console>:49: error: value || is not a member of Int
>
>                           filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>
>
> I can create a dataframe for each filter but that does not look efficient
> to me?
>
>
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
Reply | Threaded
Open this post in threaded view
|

Re: Filtering on multiple columns in spark

Mich Talebzadeh

The below line works     

 

val c = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

         filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1) != '7'")

 

 

But not the following when the values are passed as parameters

 

val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

  filter("length(target_mobile_no) != broadcastStagingConfig.mobileNoLength OR substring(target_mobile_no,1,1) != broadcastStagingConfig.ukMobileNoStart")

 

I think it cannot interpret them


Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Wed, 29 Apr 2020 at 13:25, Mich Talebzadeh <[hidden email]> wrote:
OK how do you pass variables for 10 and '7' 

 val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                   filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1) != '7'")


in above in Scala. Neither $ value below or lit() are working!

   val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                     filter("length(target_mobile_no) != ${broadcastStagingConfig.mobileNoLength} OR substring(target_mobile_no,1,1) != ${broadcastStagingConfig.ukMobileNoStart}")


Thanks





Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Wed, 29 Apr 2020 at 10:15, Mich Talebzadeh <[hidden email]> wrote:
Hi Zhang,

Yes the SQL way worked fine

  val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                   filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1) != '7'")


Many thanks,

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Wed, 29 Apr 2020 at 09:51, ZHANG Wei <[hidden email]> wrote:
AFAICT, maybe Spark SQL built-in functions[1] can help as below:

scala> df.show()
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+


scala> df.filter("length(name) == 4 or substring(name, 1, 1) == 'J'").show()
+---+------+
|age|  name|
+---+------+
| 30|  Andy|
| 19|Justin|
+---+------+


--
Cheers,
-z
[1] https://spark.apache.org/docs/latest/api/sql/index.html

On Wed, 29 Apr 2020 08:45:26 +0100
Mich Talebzadeh <[hidden email]> wrote:

> Hi,
>
>
>
> Trying to filter a dataframe with multiple conditions using OR "||" as below
>
>
>
>   val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
>
>                    filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
>
>
>
> This throws this error
>
>
>
> res12: org.apache.spark.sql.DataFrame = []
>
> <console>:49: error: value || is not a member of Int
>
>                           filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
>
>
>
> Try another way
>
>
>
> val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
>
>                    filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>   rejectedDF.createOrReplaceTempView("tmp")
>
>
>
> Tried few options but I am still getting this error
>
>
>
> <console>:49: error: value !=== is not a member of
> org.apache.spark.sql.Column
>
>                           filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>                                                                  ^
>
> <console>:49: error: value || is not a member of Int
>
>                           filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>
>
> I can create a dataframe for each filter but that does not look efficient
> to me?
>
>
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
Reply | Threaded
Open this post in threaded view
|

Re: Filtering on multiple columns in spark

Edgardo Szrajber
Maybe create a column with "lit" function for the variables you are comparing against.
Bentzi


On Wed, Apr 29, 2020 at 18:40, Mich Talebzadeh

The below line works     

 

val c = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

         filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1) != '7'")

 

 

But not the following when the values are passed as parameters

 

val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

  filter("length(target_mobile_no) != broadcastStagingConfig.mobileNoLength OR substring(target_mobile_no,1,1) != broadcastStagingConfig.ukMobileNoStart")

 

I think it cannot interpret them


Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Wed, 29 Apr 2020 at 13:25, Mich Talebzadeh <[hidden email]> wrote:
OK how do you pass variables for 10 and '7' 

 val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                   filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1) != '7'")


in above in Scala. Neither $ value below or lit() are working!

   val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                     filter("length(target_mobile_no) != ${broadcastStagingConfig.mobileNoLength} OR substring(target_mobile_no,1,1) != ${broadcastStagingConfig.ukMobileNoStart}")


Thanks





Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Wed, 29 Apr 2020 at 10:15, Mich Talebzadeh <[hidden email]> wrote:
Hi Zhang,

Yes the SQL way worked fine

  val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                   filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1) != '7'")


Many thanks,

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Wed, 29 Apr 2020 at 09:51, ZHANG Wei <[hidden email]> wrote:
AFAICT, maybe Spark SQL built-in functions[1] can help as below:

scala> df.show()
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+


scala> df.filter("length(name) == 4 or substring(name, 1, 1) == 'J'").show()
+---+------+
|age|  name|
+---+------+
| 30|  Andy|
| 19|Justin|
+---+------+


--
Cheers,
-z
[1] https://spark.apache.org/docs/latest/api/sql/index.html

On Wed, 29 Apr 2020 08:45:26 +0100
Mich Talebzadeh <[hidden email]> wrote:

> Hi,
>
>
>
> Trying to filter a dataframe with multiple conditions using OR "||" as below
>
>
>
>   val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
>
>                    filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
>
>
>
> This throws this error
>
>
>
> res12: org.apache.spark.sql.DataFrame = []
>
> <console>:49: error: value || is not a member of Int
>
>                           filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
>
>
>
> Try another way
>
>
>
> val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
>
>                    filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>   rejectedDF.createOrReplaceTempView("tmp")
>
>
>
> Tried few options but I am still getting this error
>
>
>
> <console>:49: error: value !=== is not a member of
> org.apache.spark.sql.Column
>
>                           filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>                                                                  ^
>
> <console>:49: error: value || is not a member of Int
>
>                           filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>
>
> I can create a dataframe for each filter but that does not look efficient
> to me?
>
>
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.