How to use StringIndexer for multiple input /output columns in Spark Java

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to use StringIndexer for multiple input /output columns in Spark Java

Mina Aslani
Hi, 

There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
How multiple input/output columns can be specified then?

Regards,
Mina
Reply | Threaded
Open this post in threaded view
|

Re: How to use StringIndexer for multiple input /output columns in Spark Java

Mina Aslani

On Mon, May 14, 2018 at 4:30 PM, Mina Aslani <[hidden email]> wrote:
Hi, 

There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
How multiple input/output columns can be specified then?

Regards,
Mina

Reply | Threaded
Open this post in threaded view
|

Re: How to use StringIndexer for multiple input /output columns in Spark Java

MLnick
Multi column support for StringIndexer didn’t make it into Spark 2.3.0

The PR is still in progress I think - should be available in 2.4.0

On Mon, 14 May 2018 at 22:32, Mina Aslani <[hidden email]> wrote:

On Mon, May 14, 2018 at 4:30 PM, Mina Aslani <[hidden email]> wrote:
Hi, 

There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
How multiple input/output columns can be specified then?

Regards,
Mina

Reply | Threaded
Open this post in threaded view
|

Re: How to use StringIndexer for multiple input /output columns in Spark Java

Mina Aslani
Hi,

So, what is the workaround? Should I create multiple indexer(one for each column), and then create pipeline and set stages to have all the StringIndexers?
I am using 2.2.1 as I cannot move to 2.3.0. Looks like oneHotEncoderEstimator is broken, please see my email sent today with subject:
OneHotEncoderEstimator - java.lang.NoSuchMethodError: org.apache.spark.sql.Dataset.withColumns

Regards,
Mina

On Tue, May 15, 2018 at 2:37 AM, Nick Pentreath <[hidden email]> wrote:
Multi column support for StringIndexer didn’t make it into Spark 2.3.0

The PR is still in progress I think - should be available in 2.4.0

On Mon, 14 May 2018 at 22:32, Mina Aslani <[hidden email]> wrote:

On Mon, May 14, 2018 at 4:30 PM, Mina Aslani <[hidden email]> wrote:
Hi, 

There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
How multiple input/output columns can be specified then?

Regards,
Mina


Reply | Threaded
Open this post in threaded view
|

Re: How to use StringIndexer for multiple input /output columns in Spark Java

Bryan Cutler
Yes, the workaround is to create multiple StringIndexers as you described.  OneHotEncoderEstimator is only in Spark 2.3.0, you will have to use just OneHotEncoder.

On Tue, May 15, 2018, 8:40 AM Mina Aslani <[hidden email]> wrote:
Hi,

So, what is the workaround? Should I create multiple indexer(one for each column), and then create pipeline and set stages to have all the StringIndexers?
I am using 2.2.1 as I cannot move to 2.3.0. Looks like oneHotEncoderEstimator is broken, please see my email sent today with subject:
OneHotEncoderEstimator - java.lang.NoSuchMethodError: org.apache.spark.sql.Dataset.withColumns

Regards,
Mina

On Tue, May 15, 2018 at 2:37 AM, Nick Pentreath <[hidden email]> wrote:
Multi column support for StringIndexer didn’t make it into Spark 2.3.0

The PR is still in progress I think - should be available in 2.4.0

On Mon, 14 May 2018 at 22:32, Mina Aslani <[hidden email]> wrote:

On Mon, May 14, 2018 at 4:30 PM, Mina Aslani <[hidden email]> wrote:
Hi, 

There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
How multiple input/output columns can be specified then?

Regards,
Mina