sequenceFile and groupByKey

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

sequenceFile and groupByKey

Kane
when i try to open sequence file:
val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[String], classOf[String])
t2.groupByKey().take(5)

I get:
org.apache.spark.SparkException: Job aborted: Task 25.0:0 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.io.Text

another thing is:
t2.take(5) - returns 5 identical items, i guess I have to map/clone items, but i get something like org.apache.hadoop.io.Text cannot be cast to java.lang.String, how do i clone it?

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: sequenceFile and groupByKey

Shixiong Zhu
Hi Kane,

In the sequence file, the class is org.apache.hadoop.io.Text. You need to convert Text to String. There are two approaches:

1. Use implicit conversions to convert Text to String automatically. I recommend this one. E.g.,

val t2 = sc.sequenceFile[String, String]("/user/hdfs/e1Mseq")
t2.groupByKey().take(5) 

2. Use "classOf[Text]" to specify the correct class in the sequence file and convert Text to String.  E.g.,

import org.apache.hadoop.io.Text
val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[Text], classOf[Text])
t2.map { case (k,v) => (k.toString, v.toString) } .groupByKey().take(5)


Best Regards,

Shixiong Zhu


2014-03-09 13:30 GMT+08:00 Kane <[hidden email]>:
when i try to open sequence file:
val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[String],
classOf[String])
t2.groupByKey().take(5)

I get:
org.apache.spark.SparkException: Job aborted: Task 25.0:0 had a not
serializable result: java.io.NotSerializableException:
org.apache.hadoop.io.Text

another thing is:
t2.take(5) - returns 5 identical items, i guess I have to map/clone items,
but i get something like org.apache.hadoop.io.Text cannot be cast to
java.lang.String, how do i clone it?

Thanks.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sequenceFile-and-groupByKey-tp2428.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: sequenceFile and groupByKey

Yishu Lin
I have the same question and tried with 1, but get compilation error:

[error] …. could not find implicit value for parameter kcf: () => org.apache.spark.WritableConverter[String]
[error]     val t2 = sc.sequenceFile[String, Int](“/test/data", 20)


Yishu

On Mar 9, 2014, at 12:21 AM, Shixiong Zhu <[hidden email]> wrote:

Hi Kane,

In the sequence file, the class is org.apache.hadoop.io.Text. You need to convert Text to String. There are two approaches:

1. Use implicit conversions to convert Text to String automatically. I recommend this one. E.g.,

val t2 = sc.sequenceFile[String, String]("/user/hdfs/e1Mseq")
t2.groupByKey().take(5) 

2. Use "classOf[Text]" to specify the correct class in the sequence file and convert Text to String.  E.g.,

import org.apache.hadoop.io.Text
val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[Text], classOf[Text])
t2.map { case (k,v) => (k.toString, v.toString) } .groupByKey().take(5)


Best Regards,

Shixiong Zhu


2014-03-09 13:30 GMT+08:00 Kane <[hidden email]>:
when i try to open sequence file:
val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[String],
classOf[String])
t2.groupByKey().take(5)

I get:
org.apache.spark.SparkException: Job aborted: Task 25.0:0 had a not
serializable result: java.io.NotSerializableException:
org.apache.hadoop.io.Text

another thing is:
t2.take(5) - returns 5 identical items, i guess I have to map/clone items,
but i get something like org.apache.hadoop.io.Text cannot be cast to
java.lang.String, how do i clone it?

Thanks.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sequenceFile-and-groupByKey-tp2428.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: sequenceFile and groupByKey

Yishu Lin
Need this to solve the problem:

import org.apache.spark.SparkContext._

Yishu

On Mar 10, 2014, at 2:46 PM, Yishu Lin <[hidden email]> wrote:

I have the same question and tried with 1, but get compilation error:

[error] …. could not find implicit value for parameter kcf: () => org.apache.spark.WritableConverter[String]
[error]     val t2 = sc.sequenceFile[String, Int](“/test/data", 20)


Yishu

On Mar 9, 2014, at 12:21 AM, Shixiong Zhu <[hidden email]> wrote:

Hi Kane,

In the sequence file, the class is org.apache.hadoop.io.Text. You need to convert Text to String. There are two approaches:

1. Use implicit conversions to convert Text to String automatically. I recommend this one. E.g.,

val t2 = sc.sequenceFile[String, String]("/user/hdfs/e1Mseq")
t2.groupByKey().take(5) 

2. Use "classOf[Text]" to specify the correct class in the sequence file and convert Text to String.  E.g.,

import org.apache.hadoop.io.Text
val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[Text], classOf[Text])
t2.map { case (k,v) => (k.toString, v.toString) } .groupByKey().take(5)


Best Regards,

Shixiong Zhu


2014-03-09 13:30 GMT+08:00 Kane <[hidden email]>:
when i try to open sequence file:
val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[String],
classOf[String])
t2.groupByKey().take(5)

I get:
org.apache.spark.SparkException: Job aborted: Task 25.0:0 had a not
serializable result: java.io.NotSerializableException:
org.apache.hadoop.io.Text

another thing is:
t2.take(5) - returns 5 identical items, i guess I have to map/clone items,
but i get something like org.apache.hadoop.io.Text cannot be cast to
java.lang.String, how do i clone it?

Thanks.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sequenceFile-and-groupByKey-tp2428.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.