Zero Coefficient in logistic regression

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Zero Coefficient in logistic regression

Alexis Peña

Hi Guys,

 

We are fitting a Logistic model using the following code.

 

 

val Chisqselector = new ChiSqSelector().setNumTopFeatures(10).setFeaturesCol("VECTOR_1").setLabelCol("TARGET").setOutputCol("selectedFeatures")

val assembler = new VectorAssembler().setInputCols(Array("FEATURES", "selectedFeatures", "PROM_MESES_DIST", "RECENCIA", "TEMP_MIN", "TEMP_MAX", "PRECIPITACIONES")).setOutputCol("Union")

val lr = new LogisticRegression().setLabelCol("TARGET").setFeaturesCol("Union")

val pipeline = new Pipeline().setStages(Array(Chisqselector, assembler, lr))

 

 

do you know why the coeff for  the following features are zero estimate, is it  produced in ChisqSelector or Logistic model?

 

Thanks in advance!!

 

 

CODIGO

PARAMETRO

COEFICIENTES_MUESTREO_BALANCEADO

PROPIAS

CV_UM

0,276866756

PROPIAS

CV_U3M

-0,241851427

PROPIAS

CV_U6M

-0,568312819

PROPIAS

CV_U12M

0,134706601

PROPIAS

M_UM

5,47E-06

PROPIAS

M_U3M

-7,10E-06

PROPIAS

M_U6M

1,73E-05

PROPIAS

M_U12M

-5,41E-06

PROPIAS

CP_UM

-0,050750105

PROPIAS

CP_U3M

0,125483162

PROPIAS

CP_U6M

-0,353906788

PROPIAS

CP_U12M

0,159538155

PROPIAS

TUM

-0,020217902

PROPIAS

TU3M

0,002101906

PROPIAS

TU6M

-0,005481915

PROPIAS

TU12M

0,003443081

CRUZADAS

2303

0

CRUZADAS

3901

0

CRUZADAS

3905

0

CRUZADAS

3907

0

CRUZADAS

3909

0

CRUZADAS

4102

0

CRUZADAS

4307

0

CRUZADAS

4501

0

CRUZADAS

4907

0,247624087

CRUZADAS

5304

-0,161424508

LP

PROM_MESES_DIST

-0,680356554

PROPIAS

RECENCIA

-0,00289069

EXTERNAS

TEMP_MIN

0,006488683

EXTERNAS

TEMP_MAX

-0,013497441

EXTERNAS

PRECIPITACIONES

-0,007607086

INTERCEPTO

2,401593191

 

Reply | Threaded
Open this post in threaded view
|

Re: Zero Coefficient in logistic regression

Simon Dirmeier

Hey,

as far as I know feature selection using the a chi-squared statistic, can only be done on categorical features and not on possibly continuous ones?
Furthermore, since your logistic model doesn't use any regularization, you should be fine here. So I'd check the ChiSqSeletor and possibly replace it with another feature selection method.

There is however always the chance that your response does not depend on your covariables, so you'd estimate a zero coefficient.

Cheers,
Simon


Am 24.10.17 um 04:56 schrieb Alexis Peña:

Hi Guys,

 

We are fitting a Logistic model using the following code.

 

 

val Chisqselector = new ChiSqSelector().setNumTopFeatures(10).setFeaturesCol("VECTOR_1").setLabelCol("TARGET").setOutputCol("selectedFeatures")

val assembler = new VectorAssembler().setInputCols(Array("FEATURES", "selectedFeatures", "PROM_MESES_DIST", "RECENCIA", "TEMP_MIN", "TEMP_MAX", "PRECIPITACIONES")).setOutputCol("Union")

val lr = new LogisticRegression().setLabelCol("TARGET").setFeaturesCol("Union")

val pipeline = new Pipeline().setStages(Array(Chisqselector, assembler, lr))

 

 

do you know why the coeff for  the following features are zero estimate, is it  produced in ChisqSelector or Logistic model?

 

Thanks in advance!!

 

 

CODIGO

PARAMETRO

COEFICIENTES_MUESTREO_BALANCEADO

PROPIAS

CV_UM

0,276866756

PROPIAS

CV_U3M

-0,241851427

PROPIAS

CV_U6M

-0,568312819

PROPIAS

CV_U12M

0,134706601

PROPIAS

M_UM

5,47E-06

PROPIAS

M_U3M

-7,10E-06

PROPIAS

M_U6M

1,73E-05

PROPIAS

M_U12M

-5,41E-06

PROPIAS

CP_UM

-0,050750105

PROPIAS

CP_U3M

0,125483162

PROPIAS

CP_U6M

-0,353906788

PROPIAS

CP_U12M

0,159538155

PROPIAS

TUM

-0,020217902

PROPIAS

TU3M

0,002101906

PROPIAS

TU6M

-0,005481915

PROPIAS

TU12M

0,003443081

CRUZADAS

2303

0

CRUZADAS

3901

0

CRUZADAS

3905

0

CRUZADAS

3907

0

CRUZADAS

3909

0

CRUZADAS

4102

0

CRUZADAS

4307

0

CRUZADAS

4501

0

CRUZADAS

4907

0,247624087

CRUZADAS

5304

-0,161424508

LP

PROM_MESES_DIST

-0,680356554

PROPIAS

RECENCIA

-0,00289069

EXTERNAS

TEMP_MIN

0,006488683

EXTERNAS

TEMP_MAX

-0,013497441

EXTERNAS

PRECIPITACIONES

-0,007607086

INTERCEPTO


2,401593191

 


Reply | Threaded
Open this post in threaded view
|

Re: Zero Coefficient in logistic regression

Weichen Xu
Yes chi-squared statistic only used in categorical features. It looks not proper here.
Thanks!

On Tue, Oct 24, 2017 at 5:13 PM, Simon Dirmeier <[hidden email]> wrote:

Hey,

as far as I know feature selection using the a chi-squared statistic, can only be done on categorical features and not on possibly continuous ones?
Furthermore, since your logistic model doesn't use any regularization, you should be fine here. So I'd check the ChiSqSeletor and possibly replace it with another feature selection method.

There is however always the chance that your response does not depend on your covariables, so you'd estimate a zero coefficient.

Cheers,
Simon


Am 24.10.17 um 04:56 schrieb Alexis Peña:

Hi Guys,

 

We are fitting a Logistic model using the following code.

 

 

val Chisqselector = new ChiSqSelector().setNumTopFeatures(10).setFeaturesCol("VECTOR_1").setLabelCol("TARGET").setOutputCol("selectedFeatures")

val assembler = new VectorAssembler().setInputCols(Array("FEATURES", "selectedFeatures", "PROM_MESES_DIST", "RECENCIA", "TEMP_MIN", "TEMP_MAX", "PRECIPITACIONES")).setOutputCol("Union")

val lr = new LogisticRegression().setLabelCol("TARGET").setFeaturesCol("Union")

val pipeline = new Pipeline().setStages(Array(Chisqselector, assembler, lr))

 

 

do you know why the coeff for  the following features are zero estimate, is it  produced in ChisqSelector or Logistic model?

 

Thanks in advance!!

 

 

CODIGO

PARAMETRO

COEFICIENTES_MUESTREO_BALANCEADO

PROPIAS

CV_UM

0,276866756

PROPIAS

CV_U3M

-0,241851427

PROPIAS

CV_U6M

-0,568312819

PROPIAS

CV_U12M

0,134706601

PROPIAS

M_UM

5,47E-06

PROPIAS

M_U3M

-7,10E-06

PROPIAS

M_U6M

1,73E-05

PROPIAS

M_U12M

-5,41E-06

PROPIAS

CP_UM

-0,050750105

PROPIAS

CP_U3M

0,125483162

PROPIAS

CP_U6M

-0,353906788

PROPIAS

CP_U12M

0,159538155

PROPIAS

TUM

-0,020217902

PROPIAS

TU3M

0,002101906

PROPIAS

TU6M

-0,005481915

PROPIAS

TU12M

0,003443081

CRUZADAS

2303

0

CRUZADAS

3901

0

CRUZADAS

3905

0

CRUZADAS

3907

0

CRUZADAS

3909

0

CRUZADAS

4102

0

CRUZADAS

4307

0

CRUZADAS

4501

0

CRUZADAS

4907

0,247624087

CRUZADAS

5304

-0,161424508

LP

PROM_MESES_DIST

-0,680356554

PROPIAS

RECENCIA

-0,00289069

EXTERNAS

TEMP_MIN

0,006488683

EXTERNAS

TEMP_MAX

-0,013497441

EXTERNAS

PRECIPITACIONES

-0,007607086

INTERCEPTO


2,401593191

 



Reply | Threaded
Open this post in threaded view
|

Re: Zero Coefficient in logistic regression

Alexis Peña

Thanks for your Answer, the features “Cruzadas” are Binaries (0/1). The chisq statistic must be work whit 2x2 tables.

 

i fit the model in SAS and R and both the coeff have estimates (not significant). Two of this kind of features has estimations

 

CRUZADAS

4907

0,247624087

CRUZADAS

5304

-0,161424508

 

 

Thanks

 

 

De: Weichen Xu <[hidden email]>
Fecha: martes, 24 de octubre de 2017, 07:23
Para: Alexis Peña <[hidden email]>
CC: "user @spark" <[hidden email]>
Asunto: Re: Zero Coefficient in logistic regression

 

Yes chi-squared statistic only used in categorical features. It looks not proper here.

Thanks!

 

On Tue, Oct 24, 2017 at 5:13 PM, Simon Dirmeier <[hidden email]> wrote:

Hey,

as far as I know feature selection using the a chi-squared statistic, can only be done on categorical features and not on possibly continuous ones?
Furthermore, since your logistic model doesn't use any regularization, you should be fine here. So I'd check the ChiSqSeletor and possibly replace it with another feature selection method.

There is however always the chance that your response does not depend on your covariables, so you'd estimate a zero coefficient.

Cheers,
Simon

Am 24.10.17 um 04:56 schrieb Alexis Peña:

Hi Guys,

 

We are fitting a Logistic model using the following code.

 

 

val Chisqselector = new ChiSqSelector().setNumTopFeatures(10).setFeaturesCol("VECTOR_1").setLabelCol("TARGET").setOutputCol("selectedFeatures")

val assembler = new VectorAssembler().setInputCols(Array("FEATURES", "selectedFeatures", "PROM_MESES_DIST", "RECENCIA", "TEMP_MIN", "TEMP_MAX", "PRECIPITACIONES")).setOutputCol("Union")

val lr = new LogisticRegression().setLabelCol("TARGET").setFeaturesCol("Union")

val pipeline = new Pipeline().setStages(Array(Chisqselector, assembler, lr))

 

 

do you know why the coeff for  the following features are zero estimate, is it  produced in ChisqSelector or Logistic model?

 

Thanks in advance!!

 

 

CODIGO

PARAMETRO

COEFICIENTES_MUESTREO_BALANCEADO

PROPIAS

CV_UM

0,276866756

PROPIAS

CV_U3M

-0,241851427

PROPIAS

CV_U6M

-0,568312819

PROPIAS

CV_U12M

0,134706601

PROPIAS

M_UM

5,47E-06

PROPIAS

M_U3M

-7,10E-06

PROPIAS

M_U6M

1,73E-05

PROPIAS

M_U12M

-5,41E-06

PROPIAS

CP_UM

-0,050750105

PROPIAS

CP_U3M

0,125483162

PROPIAS

CP_U6M

-0,353906788

PROPIAS

CP_U12M

0,159538155

PROPIAS

TUM

-0,020217902

PROPIAS

TU3M

0,002101906

PROPIAS

TU6M

-0,005481915

PROPIAS

TU12M

0,003443081

CRUZADAS

2303

0

CRUZADAS

3901

0

CRUZADAS

3905

0

CRUZADAS

3907

0

CRUZADAS

3909

0

CRUZADAS

4102

0

CRUZADAS

4307

0

CRUZADAS

4501

0

CRUZADAS

4907

0,247624087

CRUZADAS

5304

-0,161424508

LP

PROM_MESES_DIST

-0,680356554

PROPIAS

RECENCIA

-0,00289069

EXTERNAS

TEMP_MIN

0,006488683

EXTERNAS

TEMP_MAX

-0,013497441

EXTERNAS

PRECIPITACIONES

-0,007607086

INTERCEPTO

2,401593191

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Zero Coefficient in logistic regression

Simon Dirmeier
So, all the coefficients are the same but  for CRUZADAS? How are you fitting the model in R (glm)?  Can you try setting zero penalty for alpha and lambda:
  .setRegParam(0)
  .setElasticNetParam(0)
Cheers,
S


Am 24.10.17 um 13:19 schrieb Alexis Peña:

Thanks for your Answer, the features “Cruzadas” are Binaries (0/1). The chisq statistic must be work whit 2x2 tables.

 

i fit the model in SAS and R and both the coeff have estimates (not significant). Two of this kind of features has estimations

 

CRUZADAS

4907

0,247624087

CRUZADAS

5304

-0,161424508

 

 

Thanks

 

 

De: Weichen Xu [hidden email]
Fecha: martes, 24 de octubre de 2017, 07:23
Para: Alexis Peña [hidden email]
CC: "user @spark" [hidden email]
Asunto: Re: Zero Coefficient in logistic regression

 

Yes chi-squared statistic only used in categorical features. It looks not proper here.

Thanks!

 

On Tue, Oct 24, 2017 at 5:13 PM, Simon Dirmeier <[hidden email]> wrote:

Hey,

as far as I know feature selection using the a chi-squared statistic, can only be done on categorical features and not on possibly continuous ones?
Furthermore, since your logistic model doesn't use any regularization, you should be fine here. So I'd check the ChiSqSeletor and possibly replace it with another feature selection method.

There is however always the chance that your response does not depend on your covariables, so you'd estimate a zero coefficient.

Cheers,
Simon

Am 24.10.17 um 04:56 schrieb Alexis Peña:

Hi Guys,

 

We are fitting a Logistic model using the following code.

 

 

val Chisqselector = new ChiSqSelector().setNumTopFeatures(10).setFeaturesCol("VECTOR_1").setLabelCol("TARGET").setOutputCol("selectedFeatures")

val assembler = new VectorAssembler().setInputCols(Array("FEATURES", "selectedFeatures", "PROM_MESES_DIST", "RECENCIA", "TEMP_MIN", "TEMP_MAX", "PRECIPITACIONES")).setOutputCol("Union")

val lr = new LogisticRegression().setLabelCol("TARGET").setFeaturesCol("Union")

val pipeline = new Pipeline().setStages(Array(Chisqselector, assembler, lr))

 

 

do you know why the coeff for  the following features are zero estimate, is it  produced in ChisqSelector or Logistic model?

 

Thanks in advance!!

 

 

CODIGO

PARAMETRO

COEFICIENTES_MUESTREO_BALANCEADO

PROPIAS

CV_UM

0,276866756

PROPIAS

CV_U3M

-0,241851427

PROPIAS

CV_U6M

-0,568312819

PROPIAS

CV_U12M

0,134706601

PROPIAS

M_UM

5,47E-06

PROPIAS

M_U3M

-7,10E-06

PROPIAS

M_U6M

1,73E-05

PROPIAS

M_U12M

-5,41E-06

PROPIAS

CP_UM

-0,050750105

PROPIAS

CP_U3M

0,125483162

PROPIAS

CP_U6M

-0,353906788

PROPIAS

CP_U12M

0,159538155

PROPIAS

TUM

-0,020217902

PROPIAS

TU3M

0,002101906

PROPIAS

TU6M

-0,005481915

PROPIAS

TU12M

0,003443081

CRUZADAS

2303

0

CRUZADAS

3901

0

CRUZADAS

3905

0

CRUZADAS

3907

0

CRUZADAS

3909

0

CRUZADAS

4102

0

CRUZADAS

4307

0

CRUZADAS

4501

0

CRUZADAS

4907

0,247624087

CRUZADAS

5304

-0,161424508

LP

PROM_MESES_DIST

-0,680356554

PROPIAS

RECENCIA

-0,00289069

EXTERNAS

TEMP_MIN

0,006488683

EXTERNAS

TEMP_MAX

-0,013497441

EXTERNAS

PRECIPITACIONES

-0,007607086

INTERCEPTO


2,401593191

 

 

 


Reply | Threaded
Open this post in threaded view
|

Re: Zero Coefficient in logistic regression

Alexis Peña

Thanks,  8/10 coeff are zero estimate in CRUZADAS, the parameters for alpha and lambda are set in default(i think  zero, the model in R and SAS was fitted using glm binary logistic.

 

Cheers

 

De: Simon Dirmeier <[hidden email]>
Fecha: martes, 24 de octubre de 2017, 08:30
Para: Alexis Peña <[hidden email]>, <[hidden email]>
Asunto: Re: Zero Coefficient in logistic regression

 

So, all the coefficients are the same but  for CRUZADAS? How are you fitting the model in R (glm)?  Can you try setting zero penalty for alpha and lambda:

  .setRegParam(0)
  .setElasticNetParam(0)

Cheers,
S

Am 24.10.17 um 13:19 schrieb Alexis Peña:

Thanks for your Answer, the features “Cruzadas” are Binaries (0/1). The chisq statistic must be work whit 2x2 tables.

 

i fit the model in SAS and R and both the coeff have estimates (not significant). Two of this kind of features has estimations

 

CRUZADAS

4907

0,247624087

CRUZADAS

5304

-0,161424508

 

 

Thanks

 

 

De: Weichen Xu [hidden email]
Fecha: martes, 24 de octubre de 2017, 07:23
Para: Alexis Peña [hidden email]
CC: "user @spark" [hidden email]
Asunto: Re: Zero Coefficient in logistic regression

 

Yes chi-squared statistic only used in categorical features. It looks not proper here.

Thanks!

 

On Tue, Oct 24, 2017 at 5:13 PM, Simon Dirmeier <[hidden email]> wrote:

Hey,

as far as I know feature selection using the a chi-squared statistic, can only be done on categorical features and not on possibly continuous ones?
Furthermore, since your logistic model doesn't use any regularization, you should be fine here. So I'd check the ChiSqSeletor and possibly replace it with another feature selection method.

There is however always the chance that your response does not depend on your covariables, so you'd estimate a zero coefficient.

Cheers,
Simon


Am 24.10.17 um 04:56 schrieb Alexis Peña:

Hi Guys,

 

We are fitting a Logistic model using the following code.

 

 

val Chisqselector = new ChiSqSelector().setNumTopFeatures(10).setFeaturesCol("VECTOR_1").setLabelCol("TARGET").setOutputCol("selectedFeatures")

val assembler = new VectorAssembler().setInputCols(Array("FEATURES", "selectedFeatures", "PROM_MESES_DIST", "RECENCIA", "TEMP_MIN", "TEMP_MAX", "PRECIPITACIONES")).setOutputCol("Union")

val lr = new LogisticRegression().setLabelCol("TARGET").setFeaturesCol("Union")

val pipeline = new Pipeline().setStages(Array(Chisqselector, assembler, lr))

 

 

do you know why the coeff for  the following features are zero estimate, is it  produced in ChisqSelector or Logistic model?

 

Thanks in advance!!

 

 

CODIGO

PARAMETRO

COEFICIENTES_MUESTREO_BALANCEADO

PROPIAS

CV_UM

0,276866756

PROPIAS

CV_U3M

-0,241851427

PROPIAS

CV_U6M

-0,568312819

PROPIAS

CV_U12M

0,134706601

PROPIAS

M_UM

5,47E-06

PROPIAS

M_U3M

-7,10E-06

PROPIAS

M_U6M

1,73E-05

PROPIAS

M_U12M

-5,41E-06

PROPIAS

CP_UM

-0,050750105

PROPIAS

CP_U3M

0,125483162

PROPIAS

CP_U6M

-0,353906788

PROPIAS

CP_U12M

0,159538155

PROPIAS

TUM

-0,020217902

PROPIAS

TU3M

0,002101906

PROPIAS

TU6M

-0,005481915

PROPIAS

TU12M

0,003443081

CRUZADAS

2303

0

CRUZADAS

3901

0

CRUZADAS

3905

0

CRUZADAS

3907

0

CRUZADAS

3909

0

CRUZADAS

4102

0

CRUZADAS

4307

0

CRUZADAS

4501

0

CRUZADAS

4907

0,247624087

CRUZADAS

5304

-0,161424508

LP

PROM_MESES_DIST

-0,680356554

PROPIAS

RECENCIA

-0,00289069

EXTERNAS

TEMP_MIN

0,006488683

EXTERNAS

TEMP_MAX

-0,013497441

EXTERNAS

PRECIPITACIONES

-0,007607086

INTERCEPTO

2,401593191