Keeping track of how long something has been in a queue

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Keeping track of how long something has been in a queue

Hamish Whittal
Hi folks,

I have a stream coming from Kafka.

It has this schema:
{
    "id": 4,
    "account_id": 1070998,
    "uid": "green.th",
    "last_activity_time": "2020-09-03 13:04:04.520129"
}

Another event arrives a few milliseconds/seconds later:

{
    "id": 9,
    "account_id": 1070998,
    "uid": "green.th",
    "last_activity_time": "2020-09-03 13:04:05.123456"
}

Later, say 12 seconds later, she's seen again as an event. Now her event is:

{
    "id": 119,
    "account_id": 1070998,
    "uid": "green.th",
    "last_activity_time": "2020-09-03 13:04:17.345678"
}

She's now been in the queue for 13s (and the total users in the queue are now 43).
I need to keep track of two things:

(1) Since many of these are coming in from many users, I want to know, over some time period, how many of them are in the queue at any point,

(2) If Ms green.th was first seen at 13:04:04 and then at 13:04:05, she's been in the queue for 1 second (ignoring the ms).

How does one go about computing these sorts of more complex things in Spark Streaming? Would one have to keep track of her first-seen-time in a column and then do a diff the next time she's seen? With append / update mode, how does one begin doing this sort of thing?

Any help would be most gratefully appreciated.
Hamish
Reply | Threaded
Open this post in threaded view
|

Re: Keeping track of how long something has been in a queue

Hamish Whittal
Sorry, I moved a paragraph,

(2) If Ms green.th was first seen at 13:04:04, then at 13:04:05 and finally at 13:04:17, she's been in the queue for 13 seconds (ignoring the ms).
Reply | Threaded
Open this post in threaded view
|

Re: Keeping track of how long something has been in a queue

Jungtaek Lim-2
You may want to google around "session window" and "duration", and check whether the concept fits your requirements. Probably adding some custom logic on top of the session window would work for you, which requires you to implement a custom function for flatMapGroupsWithState.

Hope this helps.

Thanks,
Jungtaek Lim (HeartSaVioR)

On Fri, Sep 4, 2020 at 11:21 PM Hamish Whittal <[hidden email]> wrote:
Sorry, I moved a paragraph,

(2) If Ms green.th was first seen at 13:04:04, then at 13:04:05 and finally at 13:04:17, she's been in the queue for 13 seconds (ignoring the ms).