Is there a way to make the broker merge big result set faster?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Is there a way to make the broker merge big result set faster?

Mu Kong
Hi, community,

I have a subquery running slow on druid cluster.

The inner query yield fields:

SELECT D1, D2, D3, MAX(M1) as MAX_M1
FROM SOME_TABLE
GROUP BY D1, D2, D3

Then, the outer query looks like:

SELECT D1, D2, SUM(MAX_M1)
FROM INNER_QUERY
GROUP BY D1, D2

The D3 is a high cardinality dimension, which makes the result set of the inner query very huge.
But still, the inner query itself takes 1~2 seconds to "process" and transfer the data to the broker.

The outer query, however, takes 40 seconds to process.

As far as I understand how broker work with the historicals, I think the druid simply fetch the result of each segment directly from historicals' memory for the inner query,
so that there isn't any computation when druid deals with the inner query.
However, as the inner query finishes, all the results from the historicals will be passed to one single broker for merging the result.
In my case, because the result set from the inner query is tremendous, this phase takes a long time to finish.

I think the situation mentioned in this thread is quite similar to my case: https://groups.google.com/d/msg/druid-user/ir7hRpxg0PI/3oqCDAwoPjMJ
Gian mentioned "Historical merging", and I have tried that by disabling the broker cache, but it didn't really make the query faster.

Is there any other way to make broker merge faster?

Thanks!


Best regards,
Mu