[ML] Setting Non-Transform Params for a Pipeline & PipelineModel

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[ML] Setting Non-Transform Params for a Pipeline & PipelineModel

Aleksander Eskilson
I had originally sent this to the Dev list since the API discussed here is still marked as experimental in portions, but it occurs to me this may still be a general use question, sorry for the cross-listing.

In a nutshell, what I'd like to do is instantiate a Pipeline (or extension class of Pipeline) with metadata that is copied to the PipelineModel when fitted, and can be read again when the fitted model is persisted and loaded by another consumer. These metadata are specific to the PipelineModel more than any particular Transform or the Estimator declared as part of the Pipeline: the intent is that the PipelineModel params can be read by a downstream consumer of the loaded model, but the value that the params should take will only be known to the creator the of Pipeline/trainer of the PipelineModel.

It seems that Pipeline and PipelineModel support the Params interface, like Transform and Estimator do. It seems I can extend Pipeline to a custom class MyPipeline, where the constructor could enforce that my metadata Params are set. However, when the Pipeline is fit, the resultant PipelineModel doesn't seem to include the original CustomPipeline's params, only params from the individual Transform steps.

From a read of the code, it seems that the fit method will copy over the Stages to the PipelineModel, and those will be persisted (along with the Stages' Params) during writebut any Params belonging to the Pipeline are not copied to the PipelineModel (as only Stages are considered during copy, not the ParamMap of the Pipeline) [1].

Is this a correct read of the flow here? That a CustomPipeline extension of Pipeline with member Params does not get those non-Transform Params copied into the fitted PipelineMode? 

If so, would a feature enhancement including Pipeline-specific Params being copyable into the fitted PipelineModel be considered acceptable?

Or should there be another way to include metadata about the Pipeline such that the metadata is copyable to the fitted PipelineModel, and able to be persisted with PipelineModel write and read again with PipelineModel load? My first attempt at this has been to extend the Pipeline class itself with member params, but this doesn't seem to do the trick given how Params are actually copied only for Stages between Pipeline and the fitted PipelineModel.

It occurs to me I could write a custom withMetadata transform Stage which would really just an identity function but with the desired Params built in, and that those Params would get copied with the other Stages, but as discussed at the top, this particular use-case for metadata isn't about any particular Transform, but more about metadata for the whole Pipeline.