json parsing with json4s

classic Classic list List threaded Threaded
3 messages Options
SK
Reply | Threaded
Open this post in threaded view
|

json parsing with json4s

SK
I have the following piece of code that parses a json file and extracts the age and TypeID
 
val p = sc.textFile(log_file)
                   .map(line => { parse(line) })
                   .map(json =>
                      {  val v1 = json \ "person" \ "age"
                         val v2 = json \ "Action" \ "Content" \ "TypeID"
                         (v1, v2)
                      }
                    )
                 
p.foreach(r => println(r))

The result is:

(JInt(12),JInt(5))
(JInt(32),JInt(6))
(JInt(40),JInt(7))

1) How can I extract the values (i.e. without the JInt) ? I tried returning (v1.toInt, v2.toInt) from the map but got a compilation error stating that toInt is not a valid operation.

2) I would also like to know how  I can filter the above tuples based on the age values. For e.g. I added the following after the second map operation:

  p.filter(tup => tup._1 > 20)

I got a compilation errror: value > is not a member of org.json4s.JValue

Thanks for your help.

   
Reply | Threaded
Open this post in threaded view
|

Re: json parsing with json4s

cotdp
Hello,

You're absolutely right, the syntax you're using is returning the json4s value objects, not native types like Int, Long etc. fix that problem and then everything else (filters) will work as you expect.  This is a short snippet of a larger example: [1]

val lines = sc.textFile("likes.json")
val user_interest = lines.map(line => {
// Parse the JSON, returns RDD[JValue]
parse(line)
}).map(json => {
// Extract the values we need to populate the UserInterest class
implicit lazy val formats = org.json4s.DefaultFormats
val name = (json \ "name").extract[String]
val location_x = (json \ "location" \ "x").extract[Double]
val location_y = (json \ "location" \ "y").extract[Double]
val likes = (json \ "likes").extract[Seq[String]].map(_.toLowerCase()).mkString(";")
( UserInterest(name, location_x, location_y, likes) )
})

The key parts are "implicit lazy val formats = org.json4s.DefaultFormats" being defined before you mess with the JSON and "(json \ "location" \ "x").extract[Double]" to extract the parts you need.

One thing to be wary of is if you're JSON is not consistent, i.e. fields not always being set -- then using the "extract[Double]" method will raise exceptions.  Then you may wish to use an alternate way to pull out the values as a String and process them yourself. e.g.

val id = compact(render(json \ "facebook" \ "id"))

Good luck playing with JSON and Spark!  :o)

Best,

MC







On 11 June 2014 23:26, SK <[hidden email]> wrote:
I have the following piece of code that parses a json file and extracts the
age and TypeID

val p = sc.textFile(log_file)
                   .map(line => { parse(line) })
                   .map(json =>
                      {  val v1 = json \ "person" \ "age"
                         val v2 = json \ "Action" \ "Content" \ "TypeID"
                         (v1, v2)
                      }
                    )

p.foreach(r => println(r))

The result is:

(JInt(12),JInt(5))
(JInt(32),JInt(6))
(JInt(40),JInt(7))

1) How can I extract the values (i.e. without the JInt) ? I tried returning
(v1.toInt, v2.toInt) from the map but got a compilation error stating that
toInt is not a valid operation.

2) I would also like to know how  I can filter the above tuples based on the
age values. For e.g. I added the following after the second map operation:

  p.filter(tup => tup._1 > 20)

I got a compilation errror: value > is not a member of org.json4s.JValue

Thanks for your help.






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/json-parsing-with-json4s-tp7430.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: json parsing with json4s

Tobias Pfeiffer
Hi,

I usually use pattern matching for that, like
json \ "key" match { case JInt(i) => i; case _ => 0 /* default value */ }

Tobias

On Thu, Jun 12, 2014 at 7:39 AM, Michael Cutler <[hidden email]> wrote:

> Hello,
>
> You're absolutely right, the syntax you're using is returning the json4s
> value objects, not native types like Int, Long etc. fix that problem and
> then everything else (filters) will work as you expect.  This is a short
> snippet of a larger example: [1]
>
>
>     val lines = sc.textFile("likes.json")
>     val user_interest = lines.map(line => {
>       // Parse the JSON, returns RDD[JValue]
>       parse(line)
>     }).map(json => {
>       // Extract the values we need to populate the UserInterest class
>       implicit lazy val formats = org.json4s.DefaultFormats
>       val name = (json \ "name").extract[String]
>       val location_x = (json \ "location" \ "x").extract[Double]
>       val location_y = (json \ "location" \ "y").extract[Double]
>       val likes = (json \
> "likes").extract[Seq[String]].map(_.toLowerCase()).mkString(";")
>       ( UserInterest(name, location_x, location_y, likes) )
>     })
>
>
> The key parts are "implicit lazy val formats = org.json4s.DefaultFormats"
> being defined before you mess with the JSON and "(json \ "location" \
> "x").extract[Double]" to extract the parts you need.
>
> One thing to be wary of is if you're JSON is not consistent, i.e. fields not
> always being set -- then using the "extract[Double]" method will raise
> exceptions.  Then you may wish to use an alternate way to pull out the
> values as a String and process them yourself. e.g.
>
> val id = compact(render(json \ "facebook" \ "id"))
>
> Good luck playing with JSON and Spark!  :o)
>
> Best,
>
> MC
>
>
> [1] UserInterestsExample.scala
> https://gist.github.com/cotdp/b471cfff183b59d65ae1
>
>
>
>
>
> On 11 June 2014 23:26, SK <[hidden email]> wrote:
>>
>> I have the following piece of code that parses a json file and extracts
>> the
>> age and TypeID
>>
>> val p = sc.textFile(log_file)
>>                    .map(line => { parse(line) })
>>                    .map(json =>
>>                       {  val v1 = json \ "person" \ "age"
>>                          val v2 = json \ "Action" \ "Content" \ "TypeID"
>>                          (v1, v2)
>>                       }
>>                     )
>>
>> p.foreach(r => println(r))
>>
>> The result is:
>>
>> (JInt(12),JInt(5))
>> (JInt(32),JInt(6))
>> (JInt(40),JInt(7))
>>
>> 1) How can I extract the values (i.e. without the JInt) ? I tried
>> returning
>> (v1.toInt, v2.toInt) from the map but got a compilation error stating that
>> toInt is not a valid operation.
>>
>> 2) I would also like to know how  I can filter the above tuples based on
>> the
>> age values. For e.g. I added the following after the second map operation:
>>
>>   p.filter(tup => tup._1 > 20)
>>
>> I got a compilation errror: value > is not a member of org.json4s.JValue
>>
>> Thanks for your help.
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/json-parsing-with-json4s-tp7430.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>