pandas_udf throws "Unsupported class file major version 56"

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

pandas_udf throws "Unsupported class file major version 56"

Lian Jiang
Hi,

from pyspark.sql.functions import pandas_udf, PandasUDFType
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame(
    [(1, True, 1.0, 'aa'), (1, False, 2.0, 'aa'), (2, True, 3.0, 'aa'), (2, True, 5.0, 'aa'), (2, True, 10.0, 'aa')],
    ("id", "is_deleted", "v", "a"))


@pandas_udf("id long, is_deleted boolean, v double, a string", PandasUDFType.GROUPED_MAP)
def subtract_mean(pdf):
    v = pdf.v
    return pdf.assign(v=v - v.mean())
   
df.groupby("id").apply(subtract_mean).show()  ----> works fine.



@pandas_udf('double', PandasUDFType.SCALAR)
def pandas_plus_one(v):
    return v + 1

df.withColumn('v2', pandas_plus_one(df.v)).show()  ---> throw error


Error:
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line 380, in show
    print(self._jdf.showString(n, 20, vertical))
  File "/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/pyspark/sql/utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
IllegalArgumentException: u'Unsupported class file major version 56'


Any idea? Appreciate any help!




I am using:
pyspark 2.4.4
pandas 0.23.4
numpy 1.14.5
pyarrow 0.10.0
Note that I got numpy, pyarrow, pandas versions from https://stackoverflow.com/questions/51713705/python-pandas-udf-spark-error which help to make UDAF subtract_mean work.








Reply | Threaded
Open this post in threaded view
|

Re: pandas_udf throws "Unsupported class file major version 56"

Lian Jiang
I figured out. Thanks.

On Mon, Oct 7, 2019 at 9:55 AM Lian Jiang <[hidden email]> wrote:
Hi,

from pyspark.sql.functions import pandas_udf, PandasUDFType
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame(
    [(1, True, 1.0, 'aa'), (1, False, 2.0, 'aa'), (2, True, 3.0, 'aa'), (2, True, 5.0, 'aa'), (2, True, 10.0, 'aa')],
    ("id", "is_deleted", "v", "a"))


@pandas_udf("id long, is_deleted boolean, v double, a string", PandasUDFType.GROUPED_MAP)
def subtract_mean(pdf):
    v = pdf.v
    return pdf.assign(v=v - v.mean())
   
df.groupby("id").apply(subtract_mean).show()  ----> works fine.



@pandas_udf('double', PandasUDFType.SCALAR)
def pandas_plus_one(v):
    return v + 1

df.withColumn('v2', pandas_plus_one(df.v)).show()  ---> throw error


Error:
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line 380, in show
    print(self._jdf.showString(n, 20, vertical))
  File "/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/pyspark/sql/utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
IllegalArgumentException: u'Unsupported class file major version 56'


Any idea? Appreciate any help!




I am using:
pyspark 2.4.4
pandas 0.23.4
numpy 1.14.5
pyarrow 0.10.0
Note that I got numpy, pyarrow, pandas versions from https://stackoverflow.com/questions/51713705/python-pandas-udf-spark-error which help to make UDAF subtract_mean work.