Aureliano, you're correct that this is not "validation error", which is computed as the residuals on out-of-training-sample data, and helps minimize overfit variance.

However, in this example, the errors are correctly referred to as "training error", which is what you might compute on a per-iteration basis in a gradient-descent optimizer, in order to see how you're doing with respect to minimizing the in-sample residuals.

There's nothing special about Spark ML algorithms that claims to escape these bias-variance considerations.