pyspark inferSchema

11 messages Options
pyspark inferSchema – Hi All, I have a data set where each record is serialized using JSON, and I'm interested to use SchemaRDDs to work with the data. Unfortunate...
I was just about to ask about this. Currently, there are two methods, sqlContext.jsonFile() and sqlContext.jsonRDD(), that work on JSON text a...
Hi Nick, Thanks for the great response. I actually already investigated jsonRDD and jsonFile, although I did not realize they provide more ...
Notice the difference in the schema. Are you running the 1.0.1 release, or > a more bleeding-edge version from the repository? Yep, my bad...
Got it. Thanks! On Tue, Aug 5, 2014 at 11:53 AM, Nicholas Chammas < nicholas.chammas@...> wrote: > Notice the difference in the...
On Tue, Aug 5, 2014 at 11:01 AM, Nicholas Chammas <nicholas.chammas@...> wrote: > I was just about to ask about this. > > Curre...
Hi Davies, Thanks for the response and tips. Is the "sample" argument to inferSchema available in the 1.0.1 release of pyspark? I'...
This "sample" argument of inferSchema is still no in master, if will try to add it if it make sense. On Tue, Aug 5, 2014 at 12:14 P...
Assuming updating to master fixes the bug I was experiencing with jsonRDD and jsonFile, then pushing "sample" to master will probably n...
Yes, 2376 has been fixed in master. Can you give it a try? Also, for inferSchema, because Python is dynamically typed, I agree with Davies to ...
I've followed up in a thread more directly related to jsonRDD and jsonFile, but it seems like after building from the current master I'm still ha...