ML Transformer: create feature that uses multiple columns

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

ML Transformer: create feature that uses multiple columns

davideanastasia
Hi,
I am trying to write a custom ml.Transformer. It's a very simple row-by-row
transformation, but it takes in account multiple columns of the DataFrame
(and sometimes, interaction between columns).

I was wondering what the best way to achieve this is. I have used a udf in
the Transformer before, but that only allows me to use one column (am I
right?). How can I use multiple columns?

Thanks,
D.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: ML Transformer: create feature that uses multiple columns

Filipp Zhinkin
Hi,

you can combine multiple columns using
org.apache.spark.sql.functions.struct and invoke UDF on resulting
column.
In that case your UDF have to accept Row as an argument.

See VectorAssermber's sources for example:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala#L109

Regards,
Filipp.

On Sat, Dec 9, 2017 at 2:41 PM, davideanastasia
<[hidden email]> wrote:

> Hi,
> I am trying to write a custom ml.Transformer. It's a very simple row-by-row
> transformation, but it takes in account multiple columns of the DataFrame
> (and sometimes, interaction between columns).
>
> I was wondering what the best way to achieve this is. I have used a udf in
> the Transformer before, but that only allows me to use one column (am I
> right?). How can I use multiple columns?
>
> Thanks,
> D.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: ML Transformer: create feature that uses multiple columns

davideanastasia
Hi Filipp,
your solution worked very well: thanks a lot!

Davide



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]