Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

smikesh
Hi everybody,

I am running spark job on yarn, and my problem is that the blockmgr-*
folders are being created under
/tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
The size of this folder can grow to a significant size and does not
really fit into /tmp file system for one job, which makes a real
problem for my installation.
I have redefined hadoop.tmp.dir in core-site.xml and
yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
location and expected that the block manager will create the files
there and not under /tmp, but this is not the case. The files are
created under /tmp.

I am wondering if there is a way to make spark not use /tmp at all and
configure it to create all the files somewhere else ?

Any assistance would be greatly appreciated!

Best,
Michael

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Keith Chapman
Hi Michael,

You could either set spark.local.dir through spark conf or java.io.tmpdir system property.


On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <[hidden email]> wrote:
Hi everybody,

I am running spark job on yarn, and my problem is that the blockmgr-*
folders are being created under
/tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
The size of this folder can grow to a significant size and does not
really fit into /tmp file system for one job, which makes a real
problem for my installation.
I have redefined hadoop.tmp.dir in core-site.xml and
yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
location and expected that the block manager will create the files
there and not under /tmp, but this is not the case. The files are
created under /tmp.

I am wondering if there is a way to make spark not use /tmp at all and
configure it to create all the files somewhere else ?

Any assistance would be greatly appreciated!

Best,
Michael

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

smikesh
Hi Keith,

Thank you for your answer!
I have done this, and it is working for spark driver.
I would like to make something like this for the executors as well, so
that the setting will be used on all the nodes, where I have executors
running.

Best,
Michael


On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman <[hidden email]> wrote:

> Hi Michael,
>
> You could either set spark.local.dir through spark conf or java.io.tmpdir
> system property.
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <[hidden email]> wrote:
>>
>> Hi everybody,
>>
>> I am running spark job on yarn, and my problem is that the blockmgr-*
>> folders are being created under
>> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
>> The size of this folder can grow to a significant size and does not
>> really fit into /tmp file system for one job, which makes a real
>> problem for my installation.
>> I have redefined hadoop.tmp.dir in core-site.xml and
>> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>> location and expected that the block manager will create the files
>> there and not under /tmp, but this is not the case. The files are
>> created under /tmp.
>>
>> I am wondering if there is a way to make spark not use /tmp at all and
>> configure it to create all the files somewhere else ?
>>
>> Any assistance would be greatly appreciated!
>>
>> Best,
>> Michael
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Keith Chapman
Can you try setting spark.executor.extraJavaOptions to have -Djava.io.tmpdir=someValue


On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <[hidden email]> wrote:
Hi Keith,

Thank you for your answer!
I have done this, and it is working for spark driver.
I would like to make something like this for the executors as well, so
that the setting will be used on all the nodes, where I have executors
running.

Best,
Michael


On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman <[hidden email]> wrote:
> Hi Michael,
>
> You could either set spark.local.dir through spark conf or java.io.tmpdir
> system property.
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <[hidden email]> wrote:
>>
>> Hi everybody,
>>
>> I am running spark job on yarn, and my problem is that the blockmgr-*
>> folders are being created under
>> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
>> The size of this folder can grow to a significant size and does not
>> really fit into /tmp file system for one job, which makes a real
>> problem for my installation.
>> I have redefined hadoop.tmp.dir in core-site.xml and
>> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>> location and expected that the block manager will create the files
>> there and not under /tmp, but this is not the case. The files are
>> created under /tmp.
>>
>> I am wondering if there is a way to make spark not use /tmp at all and
>> configure it to create all the files somewhere else ?
>>
>> Any assistance would be greatly appreciated!
>>
>> Best,
>> Michael
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

smikesh
Hi Keith,

Thank you for the idea!
I have tried it, so now the executor command is looking in the following way :

/bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m
'-Djava.io.tmpdir=my_prefered_path'
-Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_1521110306769_0041/container_1521110306769_0041_01_000004/tmp

JVM is using the second Djava.io.tmpdir parameter and writing
everything to the same directory as before.

Best,
Michael
Sincerely,
Michael Shtelma


On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman <[hidden email]> wrote:

> Can you try setting spark.executor.extraJavaOptions to have
> -Djava.io.tmpdir=someValue
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <[hidden email]>
> wrote:
>>
>> Hi Keith,
>>
>> Thank you for your answer!
>> I have done this, and it is working for spark driver.
>> I would like to make something like this for the executors as well, so
>> that the setting will be used on all the nodes, where I have executors
>> running.
>>
>> Best,
>> Michael
>>
>>
>> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman <[hidden email]>
>> wrote:
>> > Hi Michael,
>> >
>> > You could either set spark.local.dir through spark conf or
>> > java.io.tmpdir
>> > system property.
>> >
>> > Regards,
>> > Keith.
>> >
>> > http://keith-chapman.com
>> >
>> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <[hidden email]>
>> > wrote:
>> >>
>> >> Hi everybody,
>> >>
>> >> I am running spark job on yarn, and my problem is that the blockmgr-*
>> >> folders are being created under
>> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
>> >> The size of this folder can grow to a significant size and does not
>> >> really fit into /tmp file system for one job, which makes a real
>> >> problem for my installation.
>> >> I have redefined hadoop.tmp.dir in core-site.xml and
>> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>> >> location and expected that the block manager will create the files
>> >> there and not under /tmp, but this is not the case. The files are
>> >> created under /tmp.
>> >>
>> >> I am wondering if there is a way to make spark not use /tmp at all and
>> >> configure it to create all the files somewhere else ?
>> >>
>> >> Any assistance would be greatly appreciated!
>> >>
>> >> Best,
>> >> Michael
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: [hidden email]
>> >>
>> >
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Keith Chapman
Hi Michael,

sorry for the late reply. I guess you may have to set it through the hdfs core-site.xml file. The property you need to set is "hadoop.tmp.dir" which defaults to "/tmp/hadoop-${user.name}"


On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma <[hidden email]> wrote:
Hi Keith,

Thank you for the idea!
I have tried it, so now the executor command is looking in the following way :

/bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m
'-Djava.io.tmpdir=my_prefered_path'
-Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_1521110306769_0041/container_1521110306769_0041_01_000004/tmp

JVM is using the second Djava.io.tmpdir parameter and writing
everything to the same directory as before.

Best,
Michael
Sincerely,
Michael Shtelma


On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman <[hidden email]> wrote:
> Can you try setting spark.executor.extraJavaOptions to have
> -Djava.io.tmpdir=someValue
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <[hidden email]>
> wrote:
>>
>> Hi Keith,
>>
>> Thank you for your answer!
>> I have done this, and it is working for spark driver.
>> I would like to make something like this for the executors as well, so
>> that the setting will be used on all the nodes, where I have executors
>> running.
>>
>> Best,
>> Michael
>>
>>
>> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman <[hidden email]>
>> wrote:
>> > Hi Michael,
>> >
>> > You could either set spark.local.dir through spark conf or
>> > java.io.tmpdir
>> > system property.
>> >
>> > Regards,
>> > Keith.
>> >
>> > http://keith-chapman.com
>> >
>> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <[hidden email]>
>> > wrote:
>> >>
>> >> Hi everybody,
>> >>
>> >> I am running spark job on yarn, and my problem is that the blockmgr-*
>> >> folders are being created under
>> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
>> >> The size of this folder can grow to a significant size and does not
>> >> really fit into /tmp file system for one job, which makes a real
>> >> problem for my installation.
>> >> I have redefined hadoop.tmp.dir in core-site.xml and
>> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>> >> location and expected that the block manager will create the files
>> >> there and not under /tmp, but this is not the case. The files are
>> >> created under /tmp.
>> >>
>> >> I am wondering if there is a way to make spark not use /tmp at all and
>> >> configure it to create all the files somewhere else ?
>> >>
>> >> Any assistance would be greatly appreciated!
>> >>
>> >> Best,
>> >> Michael
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: [hidden email]
>> >>
>> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

smikesh
Hi Keith,

Thanks  for the suggestion!
I have solved this already.
The problem was, that the yarn process was not responding to
start/stop commands and has not applied my configuration changes.
I have killed it and restarted my cluster, and after that yarn has
started using yarn.nodemanager.local-dirs parameter defined in
yarn-site.xml.
After this change, -Djava.io.tmpdir for the spark executor was set
correctly,  according to yarn.nodemanager.local-dirs parameter.

Best,
Michael


On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapman <[hidden email]> wrote:

> Hi Michael,
>
> sorry for the late reply. I guess you may have to set it through the hdfs
> core-site.xml file. The property you need to set is "hadoop.tmp.dir" which
> defaults to "/tmp/hadoop-${user.name}"
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma <[hidden email]> wrote:
>>
>> Hi Keith,
>>
>> Thank you for the idea!
>> I have tried it, so now the executor command is looking in the following
>> way :
>>
>> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m
>> '-Djava.io.tmpdir=my_prefered_path'
>>
>> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_1521110306769_0041/container_1521110306769_0041_01_000004/tmp
>>
>> JVM is using the second Djava.io.tmpdir parameter and writing
>> everything to the same directory as before.
>>
>> Best,
>> Michael
>> Sincerely,
>> Michael Shtelma
>>
>>
>> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman <[hidden email]>
>> wrote:
>> > Can you try setting spark.executor.extraJavaOptions to have
>> > -Djava.io.tmpdir=someValue
>> >
>> > Regards,
>> > Keith.
>> >
>> > http://keith-chapman.com
>> >
>> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <[hidden email]>
>> > wrote:
>> >>
>> >> Hi Keith,
>> >>
>> >> Thank you for your answer!
>> >> I have done this, and it is working for spark driver.
>> >> I would like to make something like this for the executors as well, so
>> >> that the setting will be used on all the nodes, where I have executors
>> >> running.
>> >>
>> >> Best,
>> >> Michael
>> >>
>> >>
>> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman
>> >> <[hidden email]>
>> >> wrote:
>> >> > Hi Michael,
>> >> >
>> >> > You could either set spark.local.dir through spark conf or
>> >> > java.io.tmpdir
>> >> > system property.
>> >> >
>> >> > Regards,
>> >> > Keith.
>> >> >
>> >> > http://keith-chapman.com
>> >> >
>> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Hi everybody,
>> >> >>
>> >> >> I am running spark job on yarn, and my problem is that the
>> >> >> blockmgr-*
>> >> >> folders are being created under
>> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
>> >> >> The size of this folder can grow to a significant size and does not
>> >> >> really fit into /tmp file system for one job, which makes a real
>> >> >> problem for my installation.
>> >> >> I have redefined hadoop.tmp.dir in core-site.xml and
>> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>> >> >> location and expected that the block manager will create the files
>> >> >> there and not under /tmp, but this is not the case. The files are
>> >> >> created under /tmp.
>> >> >>
>> >> >> I am wondering if there is a way to make spark not use /tmp at all
>> >> >> and
>> >> >> configure it to create all the files somewhere else ?
>> >> >>
>> >> >> Any assistance would be greatly appreciated!
>> >> >>
>> >> >> Best,
>> >> >> Michael
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe e-mail: [hidden email]
>> >> >>
>> >> >
>> >
>> >
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Gourav Sengupta
Hi,




spark.local.dir/tmpDirectory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system. It can also be a comma-separated list of multiple directories on different disks. NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager.

Regards,
Gourav Sengupta





On Mon, Mar 26, 2018 at 8:28 PM, Michael Shtelma <[hidden email]> wrote:
Hi Keith,

Thanks  for the suggestion!
I have solved this already.
The problem was, that the yarn process was not responding to
start/stop commands and has not applied my configuration changes.
I have killed it and restarted my cluster, and after that yarn has
started using yarn.nodemanager.local-dirs parameter defined in
yarn-site.xml.
After this change, -Djava.io.tmpdir for the spark executor was set
correctly,  according to yarn.nodemanager.local-dirs parameter.

Best,
Michael


On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapman <[hidden email]> wrote:
> Hi Michael,
>
> sorry for the late reply. I guess you may have to set it through the hdfs
> core-site.xml file. The property you need to set is "hadoop.tmp.dir" which
> defaults to "/tmp/hadoop-${user.name}"
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma <[hidden email]> wrote:
>>
>> Hi Keith,
>>
>> Thank you for the idea!
>> I have tried it, so now the executor command is looking in the following
>> way :
>>
>> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m
>> '-Djava.io.tmpdir=my_prefered_path'
>>
>> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_1521110306769_0041/container_1521110306769_0041_01_000004/tmp
>>
>> JVM is using the second Djava.io.tmpdir parameter and writing
>> everything to the same directory as before.
>>
>> Best,
>> Michael
>> Sincerely,
>> Michael Shtelma
>>
>>
>> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman <[hidden email]>
>> wrote:
>> > Can you try setting spark.executor.extraJavaOptions to have
>> > -Djava.io.tmpdir=someValue
>> >
>> > Regards,
>> > Keith.
>> >
>> > http://keith-chapman.com
>> >
>> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <[hidden email]>
>> > wrote:
>> >>
>> >> Hi Keith,
>> >>
>> >> Thank you for your answer!
>> >> I have done this, and it is working for spark driver.
>> >> I would like to make something like this for the executors as well, so
>> >> that the setting will be used on all the nodes, where I have executors
>> >> running.
>> >>
>> >> Best,
>> >> Michael
>> >>
>> >>
>> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman
>> >> <[hidden email]>
>> >> wrote:
>> >> > Hi Michael,
>> >> >
>> >> > You could either set spark.local.dir through spark conf or
>> >> > java.io.tmpdir
>> >> > system property.
>> >> >
>> >> > Regards,
>> >> > Keith.
>> >> >
>> >> > http://keith-chapman.com
>> >> >
>> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Hi everybody,
>> >> >>
>> >> >> I am running spark job on yarn, and my problem is that the
>> >> >> blockmgr-*
>> >> >> folders are being created under
>> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
>> >> >> The size of this folder can grow to a significant size and does not
>> >> >> really fit into /tmp file system for one job, which makes a real
>> >> >> problem for my installation.
>> >> >> I have redefined hadoop.tmp.dir in core-site.xml and
>> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>> >> >> location and expected that the block manager will create the files
>> >> >> there and not under /tmp, but this is not the case. The files are
>> >> >> created under /tmp.
>> >> >>
>> >> >> I am wondering if there is a way to make spark not use /tmp at all
>> >> >> and
>> >> >> configure it to create all the files somewhere else ?
>> >> >>
>> >> >> Any assistance would be greatly appreciated!
>> >> >>
>> >> >> Best,
>> >> >> Michael
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe e-mail: [hidden email]
>> >> >>
>> >> >
>> >
>> >
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

smikesh
Hi, 

this property will be used in YARN mode only by the driver. 
Executors will use the properties coming from YARN for storing temporary files. 


Best,
Michael

On Wed, Mar 28, 2018 at 7:37 AM, Gourav Sengupta <[hidden email]> wrote:
Hi,




spark.local.dir/tmpDirectory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system. It can also be a comma-separated list of multiple directories on different disks. NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager.

Regards,
Gourav Sengupta





On Mon, Mar 26, 2018 at 8:28 PM, Michael Shtelma <[hidden email]> wrote:
Hi Keith,

Thanks  for the suggestion!
I have solved this already.
The problem was, that the yarn process was not responding to
start/stop commands and has not applied my configuration changes.
I have killed it and restarted my cluster, and after that yarn has
started using yarn.nodemanager.local-dirs parameter defined in
yarn-site.xml.
After this change, -Djava.io.tmpdir for the spark executor was set
correctly,  according to yarn.nodemanager.local-dirs parameter.

Best,
Michael


On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapman <[hidden email]> wrote:
> Hi Michael,
>
> sorry for the late reply. I guess you may have to set it through the hdfs
> core-site.xml file. The property you need to set is "hadoop.tmp.dir" which
> defaults to "/tmp/hadoop-${user.name}"
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma <[hidden email]> wrote:
>>
>> Hi Keith,
>>
>> Thank you for the idea!
>> I have tried it, so now the executor command is looking in the following
>> way :
>>
>> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m
>> '-Djava.io.tmpdir=my_prefered_path'
>>
>> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_1521110306769_0041/container_1521110306769_0041_01_000004/tmp
>>
>> JVM is using the second Djava.io.tmpdir parameter and writing
>> everything to the same directory as before.
>>
>> Best,
>> Michael
>> Sincerely,
>> Michael Shtelma
>>
>>
>> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman <[hidden email]>
>> wrote:
>> > Can you try setting spark.executor.extraJavaOptions to have
>> > -Djava.io.tmpdir=someValue
>> >
>> > Regards,
>> > Keith.
>> >
>> > http://keith-chapman.com
>> >
>> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <[hidden email]>
>> > wrote:
>> >>
>> >> Hi Keith,
>> >>
>> >> Thank you for your answer!
>> >> I have done this, and it is working for spark driver.
>> >> I would like to make something like this for the executors as well, so
>> >> that the setting will be used on all the nodes, where I have executors
>> >> running.
>> >>
>> >> Best,
>> >> Michael
>> >>
>> >>
>> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman
>> >> <[hidden email]>
>> >> wrote:
>> >> > Hi Michael,
>> >> >
>> >> > You could either set spark.local.dir through spark conf or
>> >> > java.io.tmpdir
>> >> > system property.
>> >> >
>> >> > Regards,
>> >> > Keith.
>> >> >
>> >> > http://keith-chapman.com
>> >> >
>> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Hi everybody,
>> >> >>
>> >> >> I am running spark job on yarn, and my problem is that the
>> >> >> blockmgr-*
>> >> >> folders are being created under
>> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
>> >> >> The size of this folder can grow to a significant size and does not
>> >> >> really fit into /tmp file system for one job, which makes a real
>> >> >> problem for my installation.
>> >> >> I have redefined hadoop.tmp.dir in core-site.xml and
>> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>> >> >> location and expected that the block manager will create the files
>> >> >> there and not under /tmp, but this is not the case. The files are
>> >> >> created under /tmp.
>> >> >>
>> >> >> I am wondering if there is a way to make spark not use /tmp at all
>> >> >> and
>> >> >> configure it to create all the files somewhere else ?
>> >> >>
>> >> >> Any assistance would be greatly appreciated!
>> >> >>
>> >> >> Best,
>> >> >> Michael
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe e-mail: [hidden email]
>> >> >>
>> >> >
>> >
>> >
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



Reply | Threaded
Open this post in threaded view
|

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Gourav Sengupta
Hi Michael,

I think that is what I am trying to show here as the documentation mentions "NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager."

So, in a way I am supporting your statement :)

Regards,
Gourav

On Wed, Mar 28, 2018 at 10:00 AM, Michael Shtelma <[hidden email]> wrote:
Hi, 

this property will be used in YARN mode only by the driver. 
Executors will use the properties coming from YARN for storing temporary files. 


Best,
Michael

On Wed, Mar 28, 2018 at 7:37 AM, Gourav Sengupta <[hidden email]> wrote:
Hi,




spark.local.dir/tmpDirectory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system. It can also be a comma-separated list of multiple directories on different disks. NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager.

Regards,
Gourav Sengupta





On Mon, Mar 26, 2018 at 8:28 PM, Michael Shtelma <[hidden email]> wrote:
Hi Keith,

Thanks  for the suggestion!
I have solved this already.
The problem was, that the yarn process was not responding to
start/stop commands and has not applied my configuration changes.
I have killed it and restarted my cluster, and after that yarn has
started using yarn.nodemanager.local-dirs parameter defined in
yarn-site.xml.
After this change, -Djava.io.tmpdir for the spark executor was set
correctly,  according to yarn.nodemanager.local-dirs parameter.

Best,
Michael


On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapman <[hidden email]> wrote:
> Hi Michael,
>
> sorry for the late reply. I guess you may have to set it through the hdfs
> core-site.xml file. The property you need to set is "hadoop.tmp.dir" which
> defaults to "/tmp/hadoop-${user.name}"
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma <[hidden email]> wrote:
>>
>> Hi Keith,
>>
>> Thank you for the idea!
>> I have tried it, so now the executor command is looking in the following
>> way :
>>
>> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m
>> '-Djava.io.tmpdir=my_prefered_path'
>>
>> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_1521110306769_0041/container_1521110306769_0041_01_000004/tmp
>>
>> JVM is using the second Djava.io.tmpdir parameter and writing
>> everything to the same directory as before.
>>
>> Best,
>> Michael
>> Sincerely,
>> Michael Shtelma
>>
>>
>> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman <[hidden email]>
>> wrote:
>> > Can you try setting spark.executor.extraJavaOptions to have
>> > -Djava.io.tmpdir=someValue
>> >
>> > Regards,
>> > Keith.
>> >
>> > http://keith-chapman.com
>> >
>> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <[hidden email]>
>> > wrote:
>> >>
>> >> Hi Keith,
>> >>
>> >> Thank you for your answer!
>> >> I have done this, and it is working for spark driver.
>> >> I would like to make something like this for the executors as well, so
>> >> that the setting will be used on all the nodes, where I have executors
>> >> running.
>> >>
>> >> Best,
>> >> Michael
>> >>
>> >>
>> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman
>> >> <[hidden email]>
>> >> wrote:
>> >> > Hi Michael,
>> >> >
>> >> > You could either set spark.local.dir through spark conf or
>> >> > java.io.tmpdir
>> >> > system property.
>> >> >
>> >> > Regards,
>> >> > Keith.
>> >> >
>> >> > http://keith-chapman.com
>> >> >
>> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Hi everybody,
>> >> >>
>> >> >> I am running spark job on yarn, and my problem is that the
>> >> >> blockmgr-*
>> >> >> folders are being created under
>> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
>> >> >> The size of this folder can grow to a significant size and does not
>> >> >> really fit into /tmp file system for one job, which makes a real
>> >> >> problem for my installation.
>> >> >> I have redefined hadoop.tmp.dir in core-site.xml and
>> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>> >> >> location and expected that the block manager will create the files
>> >> >> there and not under /tmp, but this is not the case. The files are
>> >> >> created under /tmp.
>> >> >>
>> >> >> I am wondering if there is a way to make spark not use /tmp at all
>> >> >> and
>> >> >> configure it to create all the files somewhere else ?
>> >> >>
>> >> >> Any assistance would be greatly appreciated!
>> >> >>
>> >> >> Best,
>> >> >> Michael
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe e-mail: [hidden email]
>> >> >>
>> >> >
>> >
>> >
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]