[PySpark CrossValidator] Dropping column randCol before fitting model

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[PySpark CrossValidator] Dropping column randCol before fitting model

Ablaye F.
Hello,

I have noticed that the _fit method of CrossValidator class adds a new column (randCol) to the input dataset in Pyspark. This column allows to split the dataset in k folds.

Is this variable removed from the training data and test data of the fold before fitting model?

I ask this question because I've gone through all the code but I haven't seen a place where this variable is removed before executing the fitting.

Thanks for your help