[Spark SQL] [Spark 2.4.0] v1 -> struct(v1.e) fails

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Spark SQL] [Spark 2.4.0] v1 -> struct(v1.e) fails

François Sarradin
Hi,

I've this JSON document :

{ "b": [ { "e": 1 } ] }

When I do :

df.select(expr("transform( b, v1 -> struct(v1.e) )"))

I get this error :

cannot resolve 'named_struct(NamePlaceholder(), namedlambdavariable().e)' due to data type mismatch: Only foldable string expressions are allowed to appear at odd position, got: NamePlaceholder; line 1 pos 20; 'Project [unresolvedalias(transform(b#5, lambdafunction(named_struct(NamePlaceholder, lambda v1#7.e), lambda v1#7, false)), Some(<function1>))] +- LogicalRDD [b#5], false

org.apache.spark.sql.AnalysisException: cannot resolve 'named_struct(NamePlaceholder(), namedlambdavariable().`e`)' due to data type mismatch: Only foldable string expressions are allowed to appear at odd position, got: NamePlaceholder; line 1 pos 20;
'Project [unresolvedalias(transform(b#5, lambdafunction(named_struct(NamePlaceholder, lambda v1#7.e), lambda v1#7, false)), Some(<function1>))]
+- LogicalRDD [b#5], false

	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3.applyOrElse(CheckAnalysis.scala:115)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3.applyOrElse(CheckAnalysis.scala:107)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:278)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:278)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:277)
...

By doing some investigations, it seems that this error is due to the fact that v1.e is seen as a NamePlaceHolder and not as a Literal. This is somewhat understandable, as v1 is not resolved here. But, isn't it possible that struct(v1.e) uses v1.e as a field name?
regards,
françois-
Reply | Threaded
Open this post in threaded view
|

Re: [Spark SQL] [Spark 2.4.0] v1 -> struct(v1.e) fails

kathleen li
How about this:

df.select(expr("transform( b, v1 -> struct(v1) )")).show()
--------------------------------------------------------------------------------------------+
|transform(b, lambdafunction(named_struct(v1, namedlambdavariable()), namedlambdavariable()))|
+--------------------------------------------------------------------------------------------+
|                                                                                     [[[1]]]|
+--------------------------------------------------------------------------------------------+

On Thu, Nov 15, 2018 at 6:47 AM François Sarradin <[hidden email]> wrote:
Hi,

I've this JSON document :

{ "b": [ { "e": 1 } ] }

When I do :

df.select(expr("transform( b, v1 -> struct(v1.e) )"))

I get this error :

cannot resolve 'named_struct(NamePlaceholder(), namedlambdavariable().e)' due to data type mismatch: Only foldable string expressions are allowed to appear at odd position, got: NamePlaceholder; line 1 pos 20; 'Project [unresolvedalias(transform(b#5, lambdafunction(named_struct(NamePlaceholder, lambda v1#7.e), lambda v1#7, false)), Some(<function1>))] +- LogicalRDD [b#5], false

org.apache.spark.sql.AnalysisException: cannot resolve 'named_struct(NamePlaceholder(), namedlambdavariable().`e`)' due to data type mismatch: Only foldable string expressions are allowed to appear at odd position, got: NamePlaceholder; line 1 pos 20;
'Project [unresolvedalias(transform(b#5, lambdafunction(named_struct(NamePlaceholder, lambda v1#7.e), lambda v1#7, false)), Some(<function1>))]
+- LogicalRDD [b#5], false

	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3.applyOrElse(CheckAnalysis.scala:115)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3.applyOrElse(CheckAnalysis.scala:107)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:278)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:278)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:277)
...

By doing some investigations, it seems that this error is due to the fact that v1.e is seen as a NamePlaceHolder and not as a Literal. This is somewhat understandable, as v1 is not resolved here. But, isn't it possible that struct(v1.e) uses v1.e as a field name?
regards,
françois-