Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

Ranju Jain

Hi,

 

I need to write all Executors pods data on some common location  which can be accessed and retrieved by driver pod.

I was first planning to go with NFS, but I think Shared Volume is equally good.

Please suggest Is there any major drawback in using Shared Volume instead of NFS when many pods are writing  on the same Volume [ReadWriteMany].

 

Regards

Ranju

Reply | Threaded
Open this post in threaded view
|

Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

Mich Talebzadeh
Ok this is on Google Cloud correct?




LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Thu, 11 Mar 2021 at 11:29, Ranju Jain <[hidden email]> wrote:

Hi,

 

I need to write all Executors pods data on some common location  which can be accessed and retrieved by driver pod.

I was first planning to go with NFS, but I think Shared Volume is equally good.

Please suggest Is there any major drawback in using Shared Volume instead of NFS when many pods are writing  on the same Volume [ReadWriteMany].

 

Regards

Ranju

Reply | Threaded
Open this post in threaded view
|

RE: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

Ranju Jain

Hi Mich,

 

No, it is not Google cloud. It is simply Kubernetes deployed over Bare Metal Platform.

I am not clear for pros and cons of Shared Volume vs NFS for Read Write Many.

As NFS is Network File Server [remote] , so I can figure out that Shared Volume should be more preferable, but don’t know the other sides [drawback].

 

Regards

Ranju

From: Mich Talebzadeh <[hidden email]>
Sent: Thursday, March 11, 2021 5:22 PM
To: Ranju Jain <[hidden email]>
Cc: [hidden email]
Subject: Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

 

Ok this is on Google Cloud correct?

 


 

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

 

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 

 

 

On Thu, 11 Mar 2021 at 11:29, Ranju Jain <[hidden email]> wrote:

Hi,

 

I need to write all Executors pods data on some common location  which can be accessed and retrieved by driver pod.

I was first planning to go with NFS, but I think Shared Volume is equally good.

Please suggest Is there any major drawback in using Shared Volume instead of NFS when many pods are writing  on the same Volume [ReadWriteMany].

 

Regards

Ranju

Reply | Threaded
Open this post in threaded view
|

Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

Mich Talebzadeh
Well your mileage varies so to speak.

The only way to find out is setting an NFS mount and testing it.


The performance will depend on the mounted file system and the amount of cache it has.


File cache is important for reads and if you are going to do random writes (as opposed to sequential writes), then you can stripe the volume (RAID 1) for better performance. 


Do you have a UNIX admin who can help you out as well?


HTH


LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Thu, 11 Mar 2021 at 12:01, Ranju Jain <[hidden email]> wrote:

Hi Mich,

 

No, it is not Google cloud. It is simply Kubernetes deployed over Bare Metal Platform.

I am not clear for pros and cons of Shared Volume vs NFS for Read Write Many.

As NFS is Network File Server [remote] , so I can figure out that Shared Volume should be more preferable, but don’t know the other sides [drawback].

 

Regards

Ranju

From: Mich Talebzadeh <[hidden email]>
Sent: Thursday, March 11, 2021 5:22 PM
To: Ranju Jain <[hidden email]>
Cc: [hidden email]
Subject: Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

 

Ok this is on Google Cloud correct?

 


 

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

 

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 

 

 

On Thu, 11 Mar 2021 at 11:29, Ranju Jain <[hidden email]> wrote:

Hi,

 

I need to write all Executors pods data on some common location  which can be accessed and retrieved by driver pod.

I was first planning to go with NFS, but I think Shared Volume is equally good.

Please suggest Is there any major drawback in using Shared Volume instead of NFS when many pods are writing  on the same Volume [ReadWriteMany].

 

Regards

Ranju

Reply | Threaded
Open this post in threaded view
|

RE: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

Ranju Jain

Yes, there is a Team but I have not contacted them yet.

Trying to understand at my end.

 

I understood your point you mentioned below:

 

Do you have any reference or links where I can check out the Shared Volumes ?

 

Regards

Ranju

 

From: Mich Talebzadeh <[hidden email]>
Sent: Thursday, March 11, 2021 5:38 PM
Cc: [hidden email]
Subject: Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

 

Well your mileage varies so to speak.

 

The only way to find out is setting an NFS mount and testing it.

 

The performance will depend on the mounted file system and the amount of cache it has.

 

File cache is important for reads and if you are going to do random writes (as opposed to sequential writes), then you can stripe the volume (RAID 1) for better performance. 

 

Do you have a UNIX admin who can help you out as well?

 

HTH

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

 

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 

 

 

On Thu, 11 Mar 2021 at 12:01, Ranju Jain <[hidden email]> wrote:

Hi Mich,

 

No, it is not Google cloud. It is simply Kubernetes deployed over Bare Metal Platform.

I am not clear for pros and cons of Shared Volume vs NFS for Read Write Many.

As NFS is Network File Server [remote] , so I can figure out that Shared Volume should be more preferable, but don’t know the other sides [drawback].

 

Regards

Ranju

From: Mich Talebzadeh <[hidden email]>
Sent: Thursday, March 11, 2021 5:22 PM
To: Ranju Jain <[hidden email]>
Cc: [hidden email]
Subject: Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

 

Ok this is on Google Cloud correct?

 


 

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

 

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 

 

 

On Thu, 11 Mar 2021 at 11:29, Ranju Jain <[hidden email]> wrote:

Hi,

 

I need to write all Executors pods data on some common location  which can be accessed and retrieved by driver pod.

I was first planning to go with NFS, but I think Shared Volume is equally good.

Please suggest Is there any major drawback in using Shared Volume instead of NFS when many pods are writing  on the same Volume [ReadWriteMany].

 

Regards

Ranju

Reply | Threaded
Open this post in threaded view
|

Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

Mich Talebzadeh
I don't have any specific reference. However, you can do a Google search.

best to ask the Unix team. They can do all that themselves.

HTHT



LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Thu, 11 Mar 2021 at 12:53, Ranju Jain <[hidden email]> wrote:

Yes, there is a Team but I have not contacted them yet.

Trying to understand at my end.

 

I understood your point you mentioned below:

 

Do you have any reference or links where I can check out the Shared Volumes ?

 

Regards

Ranju

 

From: Mich Talebzadeh <[hidden email]>
Sent: Thursday, March 11, 2021 5:38 PM
Cc: [hidden email]
Subject: Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

 

Well your mileage varies so to speak.

 

The only way to find out is setting an NFS mount and testing it.

 

The performance will depend on the mounted file system and the amount of cache it has.

 

File cache is important for reads and if you are going to do random writes (as opposed to sequential writes), then you can stripe the volume (RAID 1) for better performance. 

 

Do you have a UNIX admin who can help you out as well?

 

HTH

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

 

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 

 

 

On Thu, 11 Mar 2021 at 12:01, Ranju Jain <[hidden email]> wrote:

Hi Mich,

 

No, it is not Google cloud. It is simply Kubernetes deployed over Bare Metal Platform.

I am not clear for pros and cons of Shared Volume vs NFS for Read Write Many.

As NFS is Network File Server [remote] , so I can figure out that Shared Volume should be more preferable, but don’t know the other sides [drawback].

 

Regards

Ranju

From: Mich Talebzadeh <[hidden email]>
Sent: Thursday, March 11, 2021 5:22 PM
To: Ranju Jain <[hidden email]>
Cc: [hidden email]
Subject: Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

 

Ok this is on Google Cloud correct?

 


 

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

 

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 

 

 

On Thu, 11 Mar 2021 at 11:29, Ranju Jain <[hidden email]> wrote:

Hi,

 

I need to write all Executors pods data on some common location  which can be accessed and retrieved by driver pod.

I was first planning to go with NFS, but I think Shared Volume is equally good.

Please suggest Is there any major drawback in using Shared Volume instead of NFS when many pods are writing  on the same Volume [ReadWriteMany].

 

Regards

Ranju

Reply | Threaded
Open this post in threaded view
|

RE: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

Ranju Jain

Ok!

 

Thanks for all guidance :-)

 

Regards

Ranju

 

From: Mich Talebzadeh <[hidden email]>
Sent: Thursday, March 11, 2021 11:07 PM
To: Ranju Jain <[hidden email]>
Cc: [hidden email]
Subject: Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

 

I don't have any specific reference. However, you can do a Google search.

 

best to ask the Unix team. They can do all that themselves.

 

HTHT

 

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

 

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 

 

 

On Thu, 11 Mar 2021 at 12:53, Ranju Jain <[hidden email]> wrote:

Yes, there is a Team but I have not contacted them yet.

Trying to understand at my end.

 

I understood your point you mentioned below:

 

Do you have any reference or links where I can check out the Shared Volumes ?

 

Regards

Ranju

 

From: Mich Talebzadeh <[hidden email]>
Sent: Thursday, March 11, 2021 5:38 PM
Cc: [hidden email]
Subject: Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

 

Well your mileage varies so to speak.

 

The only way to find out is setting an NFS mount and testing it.

 

The performance will depend on the mounted file system and the amount of cache it has.

 

File cache is important for reads and if you are going to do random writes (as opposed to sequential writes), then you can stripe the volume (RAID 1) for better performance. 

 

Do you have a UNIX admin who can help you out as well?

 

HTH

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

 

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 

 

 

On Thu, 11 Mar 2021 at 12:01, Ranju Jain <[hidden email]> wrote:

Hi Mich,

 

No, it is not Google cloud. It is simply Kubernetes deployed over Bare Metal Platform.

I am not clear for pros and cons of Shared Volume vs NFS for Read Write Many.

As NFS is Network File Server [remote] , so I can figure out that Shared Volume should be more preferable, but don’t know the other sides [drawback].

 

Regards

Ranju

From: Mich Talebzadeh <[hidden email]>
Sent: Thursday, March 11, 2021 5:22 PM
To: Ranju Jain <[hidden email]>
Cc: [hidden email]
Subject: Re: Spark on Kubernetes | 3.0.1 | Shared Volume or NFS

 

Ok this is on Google Cloud correct?

 


 

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

 

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 

 

 

On Thu, 11 Mar 2021 at 11:29, Ranju Jain <[hidden email]> wrote:

Hi,

 

I need to write all Executors pods data on some common location  which can be accessed and retrieved by driver pod.

I was first planning to go with NFS, but I think Shared Volume is equally good.

Please suggest Is there any major drawback in using Shared Volume instead of NFS when many pods are writing  on the same Volume [ReadWriteMany].

 

Regards

Ranju