FileNotFoundException when using persist(DISK_ONLY)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

FileNotFoundException when using persist(DISK_ONLY)

Surendranauth Hiraman
I have a dataset of about 10GB. I am using persist(DISK_ONLY) to avoid out of memory issues when running my job.

When I run with a dataset of about 1 GB, the job is able to complete.

But when I run with the larger dataset of 10 GB, I get the following error/stacktrace, which seems to be happening when the RDD is writing out to disk.

Anyone have any ideas as to what is going on or if there is a setting I can tune?


14/06/09 21:33:55 ERROR executor.Executor: Exception in task ID 560
java.io.FileNotFoundException: /tmp/spark-local-20140609210741-0bb8/14/rdd_331_175 (No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:209)
at java.io.FileOutputStream.<init>(FileOutputStream.java:160)
at org.apache.spark.storage.DiskStore.putValues(DiskStore.scala:79)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:698)
at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:95)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)

--
                                                            
SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: [hidden email]elos.io
W: www.velos.io

Reply | Threaded
Open this post in threaded view
|

Re: FileNotFoundException when using persist(DISK_ONLY)

Surendranauth Hiraman
I don't know if this is related but a little earlier in stderr, I also have the following stacktrace. But this stacktrace seems to be when the code is grabbing RDD data from a remote node, which is different from the above.


14/06/09 21:33:26 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-16,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:329)
at org.apache.spark.storage.BlockMessage.set(BlockMessage.scala:94)
at org.apache.spark.storage.BlockMessage$.fromByteBuffer(BlockMessage.scala:176)
at org.apache.spark.storage.BlockMessageArray.set(BlockMessageArray.scala:63)
at org.apache.spark.storage.BlockMessageArray$.fromBufferMessage(BlockMessageArray.scala:109)
at org.apache.spark.storage.BlockManagerWorker$.syncGetBlock(BlockManagerWorker.scala:128)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:489)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:487)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:487)
at org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:473)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:513)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:39)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)



On Mon, Jun 9, 2014 at 10:05 PM, Surendranauth Hiraman <[hidden email]> wrote:
I have a dataset of about 10GB. I am using persist(DISK_ONLY) to avoid out of memory issues when running my job.

When I run with a dataset of about 1 GB, the job is able to complete.

But when I run with the larger dataset of 10 GB, I get the following error/stacktrace, which seems to be happening when the RDD is writing out to disk.

Anyone have any ideas as to what is going on or if there is a setting I can tune?


14/06/09 21:33:55 ERROR executor.Executor: Exception in task ID 560
java.io.FileNotFoundException: /tmp/spark-local-20140609210741-0bb8/14/rdd_331_175 (No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:209)
at java.io.FileOutputStream.<init>(FileOutputStream.java:160)
at org.apache.spark.storage.DiskStore.putValues(DiskStore.scala:79)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:698)
at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:95)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)

--
                                                            
SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466" target="_blank">(917) 525-2466 ext. 105
F: <a href="tel:646.349.4063" value="+16463494063" target="_blank">646.349.4063
E: [hidden email]elos.io
W: www.velos.io




--
                                                            
SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: [hidden email]elos.io
W: www.velos.io

Reply | Threaded
Open this post in threaded view
|

Re: FileNotFoundException when using persist(DISK_ONLY)

Surendranauth Hiraman
Sorry for the stream of consciousness but after thinking about this a bit more, I'm thinking that the FileNotFoundExceptions are due to tasks being cancelled/restarted and the root cause is the OutOfMemoryError.

If anyone has any insights on how to debug this more deeply or relevant config settings, that would be much appreciated.

Otherwise, I figure next steps would be to enable more debugging levels in the spark code to see what much memory the code is trying to allocate. At this point, I'm wondering if the block could be in the GB range.

-Suren




On Mon, Jun 9, 2014 at 10:27 PM, Surendranauth Hiraman <[hidden email]> wrote:
I don't know if this is related but a little earlier in stderr, I also have the following stacktrace. But this stacktrace seems to be when the code is grabbing RDD data from a remote node, which is different from the above.


14/06/09 21:33:26 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-16,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:329)
at org.apache.spark.storage.BlockMessage.set(BlockMessage.scala:94)
at org.apache.spark.storage.BlockMessage$.fromByteBuffer(BlockMessage.scala:176)
at org.apache.spark.storage.BlockMessageArray.set(BlockMessageArray.scala:63)
at org.apache.spark.storage.BlockMessageArray$.fromBufferMessage(BlockMessageArray.scala:109)
at org.apache.spark.storage.BlockManagerWorker$.syncGetBlock(BlockManagerWorker.scala:128)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:489)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:487)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:487)
at org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:473)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:513)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:39)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)



On Mon, Jun 9, 2014 at 10:05 PM, Surendranauth Hiraman <[hidden email]> wrote:
I have a dataset of about 10GB. I am using persist(DISK_ONLY) to avoid out of memory issues when running my job.

When I run with a dataset of about 1 GB, the job is able to complete.

But when I run with the larger dataset of 10 GB, I get the following error/stacktrace, which seems to be happening when the RDD is writing out to disk.

Anyone have any ideas as to what is going on or if there is a setting I can tune?


14/06/09 21:33:55 ERROR executor.Executor: Exception in task ID 560
java.io.FileNotFoundException: /tmp/spark-local-20140609210741-0bb8/14/rdd_331_175 (No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:209)
at java.io.FileOutputStream.<init>(FileOutputStream.java:160)
at org.apache.spark.storage.DiskStore.putValues(DiskStore.scala:79)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:698)
at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:95)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)

--
                                                            
SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466" target="_blank">(917) 525-2466 ext. 105
F: <a href="tel:646.349.4063" value="+16463494063" target="_blank">646.349.4063
E: [hidden email]elos.io
W: www.velos.io




--
                                                            
SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466" target="_blank">(917) 525-2466 ext. 105
F: <a href="tel:646.349.4063" value="+16463494063" target="_blank">646.349.4063
E: [hidden email]elos.io
W: www.velos.io




--
                                                            
SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: [hidden email]elos.io
W: www.velos.io

Reply | Threaded
Open this post in threaded view
|

Re: FileNotFoundException when using persist(DISK_ONLY)

Surendranauth Hiraman
Can anyone help point me to configuration options that allow me to reduce the max buffer size when the BlockManager calls doGetRemote()?

I'm assuming that is my problem based on the below stack trace. Any help thinking this through (especially if you have dealt with large datasets (greater than RAM)) would be appreciated.


14/06/09 21:33:26 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-16,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:329)
at org.apache.spark.storage.BlockMessage.set(BlockMessage.scala:94)
at org.apache.spark.storage.BlockMessage$.fromByteBuffer(BlockMessage.scala:176)
at org.apache.spark.storage.BlockMessageArray.set(BlockMessageArray.scala:63)
at org.apache.spark.storage.BlockMessageArray$.fromBufferMessage(BlockMessageArray.scala:109)
at org.apache.spark.storage.BlockManagerWorker$.syncGetBlock(BlockManagerWorker.scala:128)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:489)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:487)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:487)
at org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:473)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:513)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:39)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)


On Mon, Jun 9, 2014 at 10:47 PM, Surendranauth Hiraman <[hidden email]> wrote:
Sorry for the stream of consciousness but after thinking about this a bit more, I'm thinking that the FileNotFoundExceptions are due to tasks being cancelled/restarted and the root cause is the OutOfMemoryError.

If anyone has any insights on how to debug this more deeply or relevant config settings, that would be much appreciated.

Otherwise, I figure next steps would be to enable more debugging levels in the spark code to see what much memory the code is trying to allocate. At this point, I'm wondering if the block could be in the GB range.

-Suren




On Mon, Jun 9, 2014 at 10:27 PM, Surendranauth Hiraman <[hidden email]> wrote:
I don't know if this is related but a little earlier in stderr, I also have the following stacktrace. But this stacktrace seems to be when the code is grabbing RDD data from a remote node, which is different from the above.


14/06/09 21:33:26 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-16,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:329)
at org.apache.spark.storage.BlockMessage.set(BlockMessage.scala:94)
at org.apache.spark.storage.BlockMessage$.fromByteBuffer(BlockMessage.scala:176)
at org.apache.spark.storage.BlockMessageArray.set(BlockMessageArray.scala:63)
at org.apache.spark.storage.BlockMessageArray$.fromBufferMessage(BlockMessageArray.scala:109)
at org.apache.spark.storage.BlockManagerWorker$.syncGetBlock(BlockManagerWorker.scala:128)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:489)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:487)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:487)
at org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:473)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:513)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:39)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)



On Mon, Jun 9, 2014 at 10:05 PM, Surendranauth Hiraman <[hidden email]> wrote:
I have a dataset of about 10GB. I am using persist(DISK_ONLY) to avoid out of memory issues when running my job.

When I run with a dataset of about 1 GB, the job is able to complete.

But when I run with the larger dataset of 10 GB, I get the following error/stacktrace, which seems to be happening when the RDD is writing out to disk.

Anyone have any ideas as to what is going on or if there is a setting I can tune?


14/06/09 21:33:55 ERROR executor.Executor: Exception in task ID 560
java.io.FileNotFoundException: /tmp/spark-local-20140609210741-0bb8/14/rdd_331_175 (No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:209)
at java.io.FileOutputStream.<init>(FileOutputStream.java:160)
at org.apache.spark.storage.DiskStore.putValues(DiskStore.scala:79)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:698)
at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:95)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)

--
                                                            
SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466" target="_blank">(917) 525-2466 ext. 105
F: <a href="tel:646.349.4063" value="+16463494063" target="_blank">646.349.4063
E: [hidden email]elos.io
W: www.velos.io




--
                                                            
SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466" target="_blank">(917) 525-2466 ext. 105
F: <a href="tel:646.349.4063" value="+16463494063" target="_blank">646.349.4063
E: [hidden email]elos.io
W: www.velos.io




--
                                                            
SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466" target="_blank">(917) 525-2466 ext. 105
F: <a href="tel:646.349.4063" value="+16463494063" target="_blank">646.349.4063
E: [hidden email]elos.io
W: www.velos.io




--
                                                            
SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: [hidden email]elos.io
W: www.velos.io