GPU Acceleration for spark-3.0.0

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

GPU Acceleration for spark-3.0.0

charles_cai
hi,

I have configured the GPU scheduling for spark-3.0.0 on yarn following the
official document ,but the job seems not runing with GPU . Do I need to
modify my code to invoke CUDA ?  Is there any tutorial can be shared ?

running logs:
...
2020-06-13 10:58:01,938 INFO spark.SparkContext: Running Spark version
3.0.0-preview2
2020-06-13 10:58:04,101 INFO resource.ResourceUtils:
==============================================================
2020-06-13 10:58:04,105 INFO resource.ResourceUtils: Resources for
spark.driver:
gpu -> [name: gpu, addresses: 0]


spark-default.conf:
...
spark.executor.resource.gpu.amount  1
spark.worker.resource.gpu.amount    1
spark.driver.resource.gpu.amount    1
spark.driver.resource.gpu.discoveryScript  
/usr/local/spark-3.0.0/examples/src/main/scripts/getGpusResources.sh
spark.worker.resource.gpu.discoveryScript  
/usr/local/spark-3.0.0/examples/src/main/scripts/getGpusResources.sh
 

nodemanager log:
...
2020-06-13 10:55:07,702 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePluginManager:
Found Resource plugins from configuration: [yarn.io/gpu]
2020-06-13 10:55:07,745 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer:
Trying to discover GPU information ...
2020-06-13 10:55:10,601 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer:
Discovered GPU information: === GPUs in the system ===
        Driver Version:440.82
        ProductName=GeForce GTX 950M, MinorNumber=0, TotalMemory=2004MiB,
Utilization=2.0%


Thanks
charles



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GPU Acceleration for spark-3.0.0

Bobby Evans
Charles,

I am sorry that you got the idea that Apache Spark is GPU accelerated out of the box.  Where did you get that information so we can try to make it more clear?  Apache Spark 3.0 opens up a set of plugin APIs that allow for a plugin to provide GPU acceleration.  You can look at SPARK-27396 for the API.  Nvidia (my employer) is working on an implementation, but it has not been released yet.  Should be open source very shortly though.  You can get more information about it at https://nvidia.com/spark

Thanks,

Bobby

On Fri, Jun 12, 2020 at 10:50 PM charles_cai <[hidden email]> wrote:
hi,

I have configured the GPU scheduling for spark-3.0.0 on yarn following the
official document ,but the job seems not runing with GPU . Do I need to
modify my code to invoke CUDA ?  Is there any tutorial can be shared ?

running logs:
...
2020-06-13 10:58:01,938 INFO spark.SparkContext: Running Spark version
3.0.0-preview2
2020-06-13 10:58:04,101 INFO resource.ResourceUtils:
==============================================================
2020-06-13 10:58:04,105 INFO resource.ResourceUtils: Resources for
spark.driver:
gpu -> [name: gpu, addresses: 0]


spark-default.conf:
...
spark.executor.resource.gpu.amount  1
spark.worker.resource.gpu.amount    1
spark.driver.resource.gpu.amount    1
spark.driver.resource.gpu.discoveryScript 
/usr/local/spark-3.0.0/examples/src/main/scripts/getGpusResources.sh
spark.worker.resource.gpu.discoveryScript 
/usr/local/spark-3.0.0/examples/src/main/scripts/getGpusResources.sh


nodemanager log:
...
2020-06-13 10:55:07,702 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePluginManager:
Found Resource plugins from configuration: [yarn.io/gpu]
2020-06-13 10:55:07,745 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer:
Trying to discover GPU information ...
2020-06-13 10:55:10,601 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer:
Discovered GPU information: === GPUs in the system ===
        Driver Version:440.82
        ProductName=GeForce GTX 950M, MinorNumber=0, TotalMemory=2004MiB,
Utilization=2.0%


Thanks
charles



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GPU Acceleration for spark-3.0.0

charles_cai
Bobby

Thanks for your answer, it seems that I have misunderstood this paragraph in
the website : *"GPU-accelerate your Apache Spark 3.0 data science
pipelines—without code changes—and speed up data processing and model
training while substantially lowering infrastructure costs."* . So if I am
going to use GPU in my job running on the spark , I still need to code the
map and reduce function in cuda or in c++ and then invoke them throught jni
or something like GPUEnabler , is that right ?

thanks
Charles



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GPU Acceleration for spark-3.0.0

Bobby Evans-2
"So if I am
going to use GPU in my job running on the spark , I still need to code the
map and reduce function in cuda or in c++ and then invoke them throught jni
or something like GPUEnabler , is that right ?"

Sort of.  You could go through all of that work yourself, or you could use the plugin that we are going to open source in the next few days.  Go to https://nvidia.com/spark and click on the contact us link.  You should be able to get the information you want that way. I know from the list of spark summit talks that others are working on similar things too.  Intel has a talk about some of their efforts for  columnar processing on FPGAs and I think SIMD instructions too, at least going off of their talk last year.

It should be an exciting time for accelerated SQL in spark.



On Wed, Jun 17, 2020 at 11:17 PM charles_cai <[hidden email]> wrote:
Bobby

Thanks for your answer, it seems that I have misunderstood this paragraph in
the website : *"GPU-accelerate your Apache Spark 3.0 data science
pipelines—without code changes—and speed up data processing and model
training while substantially lowering infrastructure costs."* . So if I am
going to use GPU in my job running on the spark , I still need to code the
map and reduce function in cuda or in c++ and then invoke them throught jni
or something like GPUEnabler , is that right ?

thanks
Charles



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]