Question related to RDD transformation

classic Classic list List threaded Threaded
1 message Options
li
Reply | Threaded
Open this post in threaded view
|

Question related to RDD transformation

li
This post has NOT been accepted by the mailing list yet.
I am trying to process some log files using Spark.
The log files like this:

....
...
-----------------------------
    CPU Time Date ....
     0.02 sec 2013-01-31
     0.12 sec 2013001-31
------------------------------

The above info can appear any line numbers in a log file.  I wonder what is an easy way to parse
the CPU seconds out (0.02, 0.12).  Only way I can think of is that each time when I see the line
with "CPU Time Date", I know the following lines before "-----------" need to be processed in order
to get seconds.  But I look at the existing RDD API, most of the functions not dealing with logic which needs
to look at the data of previous records and make decision.

Thanks for the help