Schema - DataTypes.NullType

classic Classic list List threaded Threaded
5 messages Options
jgp
Reply | Threaded
Open this post in threaded view
|

Schema - DataTypes.NullType

jgp
Hi Sparkians,

Can someone tell me what is the purpose of DataTypes.NullType, specially as you are building a schema?

Thanks

jg
---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

jgp
Reply | Threaded
Open this post in threaded view
|

Re: Schema - DataTypes.NullType

jgp
Any taker on this one? ;)

> On Jan 29, 2018, at 16:05, Jean Georges Perrin <[hidden email]> wrote:
>
> Hi Sparkians,
>
> Can someone tell me what is the purpose of DataTypes.NullType, specially as you are building a schema?
>
> Thanks
>
> jg
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

jgp
Reply | Threaded
Open this post in threaded view
|

Re: Schema - DataTypes.NullType

jgp
What is the purpose of DataTypes.NullType, specially as you are building a schema? Have anyone used it or seen it as spart of a schema auto-generation?


(If I keep asking long enough, I may get an answer, no? :) )


> On Feb 4, 2018, at 13:15, Jean Georges Perrin <[hidden email]> wrote:
>
> Any taker on this one? ;)
>
>> On Jan 29, 2018, at 16:05, Jean Georges Perrin <[hidden email]> wrote:
>>
>> Hi Sparkians,
>>
>> Can someone tell me what is the purpose of DataTypes.NullType, specially as you are building a schema?
>>
>> Thanks
>>
>> jg
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Schema - DataTypes.NullType

Nicholas Hakobian
I spent a few minutes poking around in the source code and found this:

The data type representing None, used for the types that cannot be inferred.

https://github.com/apache/spark/blob/branch-2.1/python/pyspark/sql/types.py#L107-L113

Playing around a bit, this is the only use case that I could immediately come up with; you have some type of a placeholder field already in data, but its always null. If you let createDataFrame (and I bet other things like DataFrameReader would behave similarly) try to infer it directly, it will error out since it can't infer the schema automatically. Doing something like below will allow the data to be used. And, if memory serves, Hive has a concept of a Null data type also for these types of situations.

In [9]: df = spark.createDataFrame([Row(id=1, val=None), Row(id=2, val=None)], schema=StructType([StructField('id', LongType()), StructField('val', NullType())]))

In [10]: df.show()
+---+----+
| id| val|
+---+----+
|  1|null|
|  2|null|
+---+----+


In [11]: df.printSchema()
root
 |-- id: long (nullable = true)
 |-- val: null (nullable = true)


Nicholas Szandor Hakobian, Ph.D.
Staff Data Scientist
Rally Health


On Sun, Feb 11, 2018 at 5:40 AM, Jean Georges Perrin <[hidden email]> wrote:
What is the purpose of DataTypes.NullType, specially as you are building a schema? Have anyone used it or seen it as spart of a schema auto-generation?


(If I keep asking long enough, I may get an answer, no? :) )


> On Feb 4, 2018, at 13:15, Jean Georges Perrin <[hidden email]> wrote:
>
> Any taker on this one? ;)
>
>> On Jan 29, 2018, at 16:05, Jean Georges Perrin <[hidden email]> wrote:
>>
>> Hi Sparkians,
>>
>> Can someone tell me what is the purpose of DataTypes.NullType, specially as you are building a schema?
>>
>> Thanks
>>
>> jg
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


jgp
Reply | Threaded
Open this post in threaded view
|

Re: Schema - DataTypes.NullType

jgp
Thanks Nicholas. It makes sense. Now that I have a hint, I can play with it too!

jg

On Feb 11, 2018, at 19:15, Nicholas Hakobian <[hidden email]> wrote:

I spent a few minutes poking around in the source code and found this:

The data type representing None, used for the types that cannot be inferred.

https://github.com/apache/spark/blob/branch-2.1/python/pyspark/sql/types.py#L107-L113

Playing around a bit, this is the only use case that I could immediately come up with; you have some type of a placeholder field already in data, but its always null. If you let createDataFrame (and I bet other things like DataFrameReader would behave similarly) try to infer it directly, it will error out since it can't infer the schema automatically. Doing something like below will allow the data to be used. And, if memory serves, Hive has a concept of a Null data type also for these types of situations.

In [9]: df = spark.createDataFrame([Row(id=1, val=None), Row(id=2, val=None)], schema=StructType([StructField('id', LongType()), StructField('val', NullType())]))

In [10]: df.show()
+---+----+
| id| val|
+---+----+
|  1|null|
|  2|null|
+---+----+


In [11]: df.printSchema()
root
 |-- id: long (nullable = true)
 |-- val: null (nullable = true)


Nicholas Szandor Hakobian, Ph.D.
Staff Data Scientist
Rally Health


On Sun, Feb 11, 2018 at 5:40 AM, Jean Georges Perrin <[hidden email]> wrote:
What is the purpose of DataTypes.NullType, specially as you are building a schema? Have anyone used it or seen it as spart of a schema auto-generation?


(If I keep asking long enough, I may get an answer, no? :) )


> On Feb 4, 2018, at 13:15, Jean Georges Perrin <[hidden email]> wrote:
>
> Any taker on this one? ;)
>
>> On Jan 29, 2018, at 16:05, Jean Georges Perrin <[hidden email]> wrote:
>>
>> Hi Sparkians,
>>
>> Can someone tell me what is the purpose of DataTypes.NullType, specially as you are building a schema?
>>
>> Thanks
>>
>> jg
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]