Broadcast variables: destroy/unpersist unexpected behaviour

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Broadcast variables: destroy/unpersist unexpected behaviour

Sunil
I experienced the below two cases when unpersisting or destroying broadcast
variables in pyspark. But the same works good in spark scala shell. Any clue
why this happens ? Is it a bug in pyspark?

***Case 1:***
    >>> b1 = sc.broadcast([1,2,3])
    >>> b1.value
    [1, 2, 3]
    >>> b1.destroy()
    >>> b1.value
    [1, 2, 3]
I can still access the value in driver.


***Case 2:***
    >>> b = sc.broadcast([1,2,3])
    >>> b.destroy()
    >>> b.value
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File
"/home/sdh/Downloads/spark-2.2.1-bin-hadoop2.7/python/pyspark/broadcast.py",
line 109, in value
        self._value = self.load(self._path)
      File
"/home/sdh/Downloads/spark-2.2.1-bin-hadoop2.7/python/pyspark/broadcast.py",
line 95, in load
        with open(path, 'rb', 1 << 20) as f:
    IOError: [Errno 2] No such file or directory:
u'/tmp/spark-eef352c0-6470-4b89-999f-923493a27bc4/pyspark-17d3a9a3-b5c1-4331-b408-8447f078789e/tmpzq4kv0'

Rather i should get a message something similar to "Attempted to use
broadcast variable after it was destroyed"






--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]