Exceptions with simplest Structured Streaming example

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Exceptions with simplest Structured Streaming example

Jonathan Apple
Hello,

There is a streaming World Count example at the beginning of the  Structured
Streaming Programming Guide
<https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html>
.

First, we execute *nc -lk 8888* in a separate terminal.

Next, following the Python code, we have this in *example.py:*



We test the application by:

*
spark-submit example.py
*

The application runs and waits for data on the socket. We type single words
several times in the terminal running netcat, each time with carriage
returns.

The application is failing every time. Here is some (snipped) output:



What is strange is the the exact same example works for other members of the
team, using the same Python version (3.7.0) and the same Spark version
(2.3.1).

Has anyone seen similar behavior?

Many thanks,

Jonathan




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Exceptions with simplest Structured Streaming example

Tathagata Das
Unfortunately, your output is not visible in the email that we see. Was it an image that some got removed?
Maybe best to copy the output text (i.e. the error message) into the email. 

On Thu, Jul 26, 2018 at 5:41 AM, Jonathan Apple <[hidden email]> wrote:
Hello,

There is a streaming World Count example at the beginning of the  Structured
Streaming Programming Guide
<https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html>
.

First, we execute *nc -lk 8888* in a separate terminal.

Next, following the Python code, we have this in *example.py:*



We test the application by:

*
spark-submit example.py
*

The application runs and waits for data on the socket. We type single words
several times in the terminal running netcat, each time with carriage
returns.

The application is failing every time. Here is some (snipped) output:



What is strange is the the exact same example works for other members of the
team, using the same Python version (3.7.0) and the same Spark version
(2.3.1).

Has anyone seen similar behavior?

Many thanks,

Jonathan




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Exceptions with simplest Structured Streaming example

Jonathan Apple
(My apologies; I used Nabble to post and it stripped out the HTML)

The original message is below, but note that we just had the issue solved on
Stack Overflow:

https://stackoverflow.com/questions/51541134/pyspark-exceptions-with-simplest-structured-streaming-example

Turns out it's a known issue with Spark incompatibility with Java 10. (The
reason this example worked for some and not others is because of different
Java installations.)

Many thanks,

Jonathan

----

Original message:

Hello,

There is a streaming World Count example at the beginning of the Structured
Streaming Programming Guide:
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html

First, we execute `nc -lk 8888` in a separate terminal.

Next, following the Python code, we have this in example.py:


from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
from pyspark.sql.functions import split
import sys

spark = SparkSession \
    .builder \
    .appName("StructuredNetworkWordCount") \
    .getOrCreate()
spark.sparkContext.setLogLevel("FATAL")

print("python version: "+sys.version)
print("spark version: "+str(spark.sparkContext.version))

 # Create DataFrame representing the stream of input lines from connection
to localhost:9999
lines = spark \
    .readStream \
    .format("socket") \
    .option("host", "localhost") \
    .option("port", 8888) \
    .load()

# Split the lines into words
words = lines.select(
   explode(
       split(lines.value, " ")
   ).alias("word")
)

# Generate running word count
wordCounts = words.groupBy("word").count()

 # Start running the query that prints the running counts to the console
query = wordCounts \
    .writeStream \
    .outputMode("complete") \
    .format("console") \
    .start()

query.awaitTermination()


We test the application by:

`spark-submit example.py`

The application runs and waits for data on the socket. We type single words
several times in the terminal running netcat, each time with carriage
returns.

The application is failing every time. Here is some (snipped) output:

...
python version: 3.7.0 (default, Jul 23 2018, 20:22:55)
[Clang 9.1.0 (clang-902.0.39.2)]
spark version: 2.3.1
...
Traceback (most recent call last):
  File
"/usr/local/lib/python3.7/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/utils.py",
line 63, in deco
  File
"/usr/local/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py",
line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
o47.awaitTermination.
: org.apache.spark.sql.streaming.StreamingQueryException: null
=== Streaming Query ===
Identifier: [id = edbc0c22-2572-4036-82fd-b11afd030f26, runId =
16cbc842-3e20-4e43-9692-40ed09fd81e0]
Current Committed Offsets: {}
Current Available Offsets: {TextSocketSource[host: localhost, port: 8888]:
0}
 
Current State: ACTIVE
Thread State: RUNNABLE
 
Logical Plan:
Aggregate [word#4], [word#4, count(1) AS count#8L]
+- Project [word#4]
   +- Generate explode(split(value#1,  )), false, [word#4]
      +- StreamingExecutionRelation TextSocketSource[host: localhost, port:
8888], [value#1]
 
       at
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
       at
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: java.lang.IllegalArgumentException
       at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
       at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
       at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
       at
org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:46)
       at
org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:449)
       at
org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:432)
       at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
       at
scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
       at
scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
       at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
       at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
       at
scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:103)
       at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
       at
org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:432)
...

What is strange is the the exact same example works for other members of the
team, using the same Python version (3.7.0) and the same Spark version
(2.3.1).

Has anyone seen similar behavior?

Many thanks,

Jonathan



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]