Spark Writing to parquet directory : java.io.IOException: Disk quota exceeded

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark Writing to parquet directory : java.io.IOException: Disk quota exceeded

Chetan Khatri

Hello Spark Users,

I am getting below error, when i am trying to write dataset to parquet location. I have enough disk space available. Last time i was facing same kind of error which were resolved by increasing number of cores at hyper parameters. Currently result set data size is almost 400Gig with below hyper parameters

Driver memory: 4g
Executor Memory: 16g
Executor cores=12
num executors= 8

Still it's failing, any Idea ? that if i increase executor memory and number of executors.  it could get resolved ?


17/11/21 04:29:37 ERROR storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /mapr/chetan/local/david.com/tmp/hadoop/nm-local-dir/usercache/david-khurana/appcache/application_1509639363072_10572/blockmgr-008604e6-37cb-421f-8cc5-e94db75684e7/12/temp_shuffle_ae885911-a1ef-404f-9a6a-ded544bb5b3c
java.io.IOException: Disk quota exceeded
        at java.io.FileOutputStream.close0(Native Method)
        at java.io.FileOutputStream.access$000(FileOutputStream.java:53)
        at java.io.FileOutputStream$1.close(FileOutputStream.java:356)
        at java.io.FileDescriptor.closeAll(FileDescriptor.java:212)
        at java.io.FileOutputStream.close(FileOutputStream.java:354)
        at org.apache.spark.storage.TimeTrackingOutputStream.close(TimeTrackingOutputStream.java:72)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
        at net.jpountz.lz4.LZ4BlockOutputStream.close(LZ4BlockOutputStream.java:178)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
        at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2.close(UnsafeRowSerializer.scala:96)
        at org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$close$2.apply$mcV$sp(DiskBlockObjectWriter.scala:108)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1316)
        at org.apache.spark.storage.DiskBlockObjectWriter.close(DiskBlockObjectWriter.scala:107)
        at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:159)
        at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:234)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
17/11/21 04:29:37 WARN netty.OneWayOutboxMessage: Failed to send one-way RPC.
java.io.IOException: Failed to connect to /192.168.123.43:58889
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
        at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.123.43:58889
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      ... 1 more
Reply | Threaded
Open this post in threaded view
|

Re: Spark Writing to parquet directory : java.io.IOException: Disk quota exceeded

Chetan Khatri
Anybody reply on this ?

On Tue, Nov 21, 2017 at 3:36 PM, Chetan Khatri <[hidden email]> wrote:

Hello Spark Users,

I am getting below error, when i am trying to write dataset to parquet location. I have enough disk space available. Last time i was facing same kind of error which were resolved by increasing number of cores at hyper parameters. Currently result set data size is almost 400Gig with below hyper parameters

Driver memory: 4g
Executor Memory: 16g
Executor cores=12
num executors= 8

Still it's failing, any Idea ? that if i increase executor memory and number of executors.  it could get resolved ?


17/11/21 04:29:37 ERROR storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /mapr/chetan/local/david.com/tmp/hadoop/nm-local-dir/usercache/david-khurana/appcache/application_1509639363072_10572/blockmgr-008604e6-37cb-421f-8cc5-e94db75684e7/12/temp_shuffle_ae885911-a1ef-404f-9a6a-ded544bb5b3c
java.io.IOException: Disk quota exceeded
        at java.io.FileOutputStream.close0(Native Method)
        at java.io.FileOutputStream.access$000(FileOutputStream.java:53)
        at java.io.FileOutputStream$1.close(FileOutputStream.java:356)
        at java.io.FileDescriptor.closeAll(FileDescriptor.java:212)
        at java.io.FileOutputStream.close(FileOutputStream.java:354)
        at org.apache.spark.storage.TimeTrackingOutputStream.close(TimeTrackingOutputStream.java:72)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
        at net.jpountz.lz4.LZ4BlockOutputStream.close(LZ4BlockOutputStream.java:178)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
        at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2.close(UnsafeRowSerializer.scala:96)
        at org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$close$2.apply$mcV$sp(DiskBlockObjectWriter.scala:108)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1316)
        at org.apache.spark.storage.DiskBlockObjectWriter.close(DiskBlockObjectWriter.scala:107)
        at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:159)
        at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:234)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
17/11/21 04:29:37 WARN netty.OneWayOutboxMessage: Failed to send one-way RPC.
java.io.IOException: Failed to connect to /192.168.123.43:58889
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
        at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.123.43:58889
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      ... 1 more

Reply | Threaded
Open this post in threaded view
|

Re: Spark Writing to parquet directory : java.io.IOException: Disk quota exceeded

Vadim Semenov
The error message seems self-explanatory, try to figure out what's the disk quota you have for your user.

On Wed, Nov 22, 2017 at 8:23 AM, Chetan Khatri <[hidden email]> wrote:
Anybody reply on this ?

On Tue, Nov 21, 2017 at 3:36 PM, Chetan Khatri <[hidden email]> wrote:

Hello Spark Users,

I am getting below error, when i am trying to write dataset to parquet location. I have enough disk space available. Last time i was facing same kind of error which were resolved by increasing number of cores at hyper parameters. Currently result set data size is almost 400Gig with below hyper parameters

Driver memory: 4g
Executor Memory: 16g
Executor cores=12
num executors= 8

Still it's failing, any Idea ? that if i increase executor memory and number of executors.  it could get resolved ?


17/11/21 04:29:37 ERROR storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /mapr/chetan/local/david.com/tmp/hadoop/nm-local-dir/usercache/david-khurana/appcache/application_1509639363072_10572/blockmgr-008604e6-37cb-421f-8cc5-e94db75684e7/12/temp_shuffle_ae885911-a1ef-404f-9a6a-ded544bb5b3c
java.io.IOException: Disk quota exceeded
        at java.io.FileOutputStream.close0(Native Method)
        at java.io.FileOutputStream.access$000(FileOutputStream.java:53)
        at java.io.FileOutputStream$1.close(FileOutputStream.java:356)
        at java.io.FileDescriptor.closeAll(FileDescriptor.java:212)
        at java.io.FileOutputStream.close(FileOutputStream.java:354)
        at org.apache.spark.storage.TimeTrackingOutputStream.close(TimeTrackingOutputStream.java:72)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
        at net.jpountz.lz4.LZ4BlockOutputStream.close(LZ4BlockOutputStream.java:178)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
        at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2.close(UnsafeRowSerializer.scala:96)
        at org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$close$2.apply$mcV$sp(DiskBlockObjectWriter.scala:108)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1316)
        at org.apache.spark.storage.DiskBlockObjectWriter.close(DiskBlockObjectWriter.scala:107)
        at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:159)
        at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:234)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
17/11/21 04:29:37 WARN netty.OneWayOutboxMessage: Failed to send one-way RPC.
java.io.IOException: Failed to connect to /192.168.123.43:58889
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
        at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.123.43:58889
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      ... 1 more