[SparkR] gapply with strings with arrow

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[SparkR] gapply with strings with arrow

Jacek Pliszka
Hi!

Is there any place I can find information how to use gapply with arrow?

I've tried something very simple

collect(gapply(
  df,
  c("ColumnA"),
  function(key, x){
      data.frame(out=c("dfs"), stringAsFactors=FALSE)
  },
  "out String"
))

But it fails - similar code with integers or double works fine.

[Fetched stdout timeout] Error in readBin(con, raw(),
as.integer(dataLen), endian = "big") : invalid 'n' argument

java.lang.UnsupportedOperationException at
org.apache.spark.sql.vectorized.ArrowColumnVector$ArrowVectorAccessor.getUTF8String(ArrowColumnVector.java:233)
at org.apache.spark.sql.vectorized.ArrowColumnVector.getUTF8String(ArrowColumnVector.java:109)
at org.apache.spark.sql.vectorized.ColumnarBatchRow.getUTF8String(ColumnarBatch.java:220)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source)
 ...

When I looked at the source code there - it is all stubs.

Is there a proper way to use arrow in gapply in SparkR?

BR,

Jacel

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SparkR] gapply with strings with arrow

Hyukjin Kwon
If it works without Arrow optimization, it's likely a bug. Please feel free to file a JIRA for that.

On Wed, 7 Oct 2020, 22:44 Jacek Pliszka, <[hidden email]> wrote:
Hi!

Is there any place I can find information how to use gapply with arrow?

I've tried something very simple

collect(gapply(
  df,
  c("ColumnA"),
  function(key, x){
      data.frame(out=c("dfs"), stringAsFactors=FALSE)
  },
  "out String"
))

But it fails - similar code with integers or double works fine.

[Fetched stdout timeout] Error in readBin(con, raw(),
as.integer(dataLen), endian = "big") : invalid 'n' argument

java.lang.UnsupportedOperationException at
org.apache.spark.sql.vectorized.ArrowColumnVector$ArrowVectorAccessor.getUTF8String(ArrowColumnVector.java:233)
at org.apache.spark.sql.vectorized.ArrowColumnVector.getUTF8String(ArrowColumnVector.java:109)
at org.apache.spark.sql.vectorized.ColumnarBatchRow.getUTF8String(ColumnarBatch.java:220)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source)
 ...

When I looked at the source code there - it is all stubs.

Is there a proper way to use arrow in gapply in SparkR?

BR,

Jacel

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]