In driver, can I gc myArray after get a rdd by sparkContext.parallelize(myArray,100)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

In driver, can I gc myArray after get a rdd by sparkContext.parallelize(myArray,100)

maqy

Hi all,

I need to make RDDs one by one from many big arrays in driver. In order to save the memory usage of the driver, I want to free the last array after the correspond rdd is created by `sparkContext.parallelize(lastArray,100)`. In my understanding,  RDDs created by each array will be send to Executors, so the driver need not keep the original array anymore, is it right? I use the code below, but the driver’s memory usage is growth during the code running.

 

```

val globalRddList: ListBuffer[RDD[Array[Float]]] = new ListBuffer[RDD[Array[Float]]]()

while(number < 100){

val tmpArr: Array[Float] = new Array(10000000)

fill data into tmpArr

val rddBatch = sparkContext.parallelize(batchArr, 100)

rddBatch.cache()

rddBatch.first()

globalRddList.append(rddBatch)

}

 

```

Best regards,

maqy