How to deal with context dependent computing?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to deal with context dependent computing?

JF Chen
For example, I have some data with timstamp marked as category A and B, and ordered by time. Now I want to calculate each duration from A to B. In normal program, I can use the  flag bit to record the preview data if it is A or B, and then calculate the duration. But in Spark Dataframe, how to do it? 

Thanks!

Regard,
Junfeng Chen
Reply | Threaded
Open this post in threaded view
|

Re: How to deal with context dependent computing?

Sonal Goyal
Hi Junfeng,

Can you please show by means of an example what you are trying to achieve? 

Thanks,
Sonal
Nube Technologies 





On Thu, Aug 23, 2018 at 8:22 AM, JF Chen <[hidden email]> wrote:
For example, I have some data with timstamp marked as category A and B, and ordered by time. Now I want to calculate each duration from A to B. In normal program, I can use the  flag bit to record the preview data if it is A or B, and then calculate the duration. But in Spark Dataframe, how to do it? 

Thanks!

Regard,
Junfeng Chen

Reply | Threaded
Open this post in threaded view
|

Re: How to deal with context dependent computing?

JF Chen
Thanks Sonal. 
For example, I have data as following:
login 2018/8/27 10:00
logout 2018/8/27 10:05
login 2018/8/27 10:08
logout 2018/8/27 10:15
login 2018/8/27 11:08
logout 2018/8/27 11:32

Now I want to calculate the time between each login and logout. For example, I should get 5 min, 7 min, 24 min from the above sample data. 
I know I can calculate it with foreach, but it seems all data running on spark driver node rather than multi executors. 
So any good way to solve this problem? Thanks!

Regard,
Junfeng Chen


On Thu, Aug 23, 2018 at 6:15 PM Sonal Goyal <[hidden email]> wrote:
Hi Junfeng,

Can you please show by means of an example what you are trying to achieve? 

Thanks,
Sonal
Nube Technologies 





On Thu, Aug 23, 2018 at 8:22 AM, JF Chen <[hidden email]> wrote:
For example, I have some data with timstamp marked as category A and B, and ordered by time. Now I want to calculate each duration from A to B. In normal program, I can use the  flag bit to record the preview data if it is A or B, and then calculate the duration. But in Spark Dataframe, how to do it? 

Thanks!

Regard,
Junfeng Chen

Reply | Threaded
Open this post in threaded view
|

Re: How to deal with context dependent computing?

devjyoti patra
Hi Junfeng,

You should be able to do this with  window aggregation functions  lead or lag

Thanks,
Dev

On Mon, Aug 27, 2018 at 7:08 AM JF Chen <[hidden email]> wrote:
Thanks Sonal. 
For example, I have data as following:
login 2018/8/27 10:00
logout 2018/8/27 10:05
login 2018/8/27 10:08
logout 2018/8/27 10:15
login 2018/8/27 11:08
logout 2018/8/27 11:32

Now I want to calculate the time between each login and logout. For example, I should get 5 min, 7 min, 24 min from the above sample data. 
I know I can calculate it with foreach, but it seems all data running on spark driver node rather than multi executors. 
So any good way to solve this problem? Thanks!

Regard,
Junfeng Chen


On Thu, Aug 23, 2018 at 6:15 PM Sonal Goyal <[hidden email]> wrote:
Hi Junfeng,

Can you please show by means of an example what you are trying to achieve? 

Thanks,
Sonal
Nube Technologies 





On Thu, Aug 23, 2018 at 8:22 AM, JF Chen <[hidden email]> wrote:
For example, I have some data with timstamp marked as category A and B, and ordered by time. Now I want to calculate each duration from A to B. In normal program, I can use the  flag bit to record the preview data if it is A or B, and then calculate the duration. But in Spark Dataframe, how to do it? 

Thanks!

Regard,
Junfeng Chen



--
To achieve, you need thought. You have to know what you are doing and that's real power.