How to optimise QLC Joins?

,

Hi there all! :slight_smile:

Apologies in advance for what might be a bit of a long post!

I’m a core team member of Nostrum, a Discord library written in Elixir.

Under the hood, we’re using ETS to cache data received by Discord (e.g. users, guilds, presences) allowing our users to fetch them using some basic methods we have or more advanced methods by implementing their own queries with the qlc module.

A basic query to fetch all guilds in the guild cache might look like this:

find_large_communities(Threshold, GuildCache) ->
    qlc:q([Guild || {_, Guild} <- GuildCache:query_handle(),
                    % Filter for guilds that are over the provided size
                    map_get(member_count, Guild) > Threshold]).

This exposes a really nice and very powerful query interface to end users allowing them to run database style queries & joins on our structures using the properties they are already familiar with.

I’ve been trying to flesh out some more example queries to share with users including one that looks for all users that are online on Discord, and returns the matching user objects for them. I’m trying to optimise this as much as possible but can’t seem to get the query to work without doing a full scan on the second table.

Going over the example a little bit more, we have a presence cache and a user cache, the former is a ETS table with schema {{GuildId, UserId}, Presence} and a user table with schema {UserId, User}, hence our query to try return a list of user objects for online users looks like this:

find_online_users(RequestedGuildId, PresenceCache, UserCache) ->
    qlc:q([User || {{GuildId, PresenceUserId}, Presence} <- PresenceCache:query_handle(),
                   % Filter to members in the requested guild ID
                   GuildId =:= RequestedGuildId,
                   % Fetch any members where the status is not offline
                   map_get(status, Presence) /= offline,
                   % Get a handle on the users ETS table
                   {UserId, User} <- UserCache:query_handle(),
                   % Return all users for which we have found an online presence
                   UserId =:= PresenceUserId]).

When looking at the output of qlc:info/1 on this query, I see the following:

iex(2)> IO.puts(:qlc.info(:nostrum_queries.find_online_users(1226944827137069107, Cache.PresenceCache, Cache.UserCache)))
qlc:q([
       User ||
           {{GuildId, PresenceUserId}, Presence} <-
               ets:table(nostrum_presences,
                         [{traverse,
                           {select,
                            [{{{'$1', '$2'}, '$3'},
                              [{'andalso',
                                {'=:=', '$1',
                                 {const, 1226944827137069107}},
                                {'/=', {map_get, status, '$3'}, offline}}],
                              ['$_']}]}}]),
           {UserId, User} <- ets:table(nostrum_users),
           UserId =:= PresenceUserId
      ])
:ok

From what I understand here, and what I can see from filling the nostrum_users ETS table with junk, this query is fairly optimised in finding the presences for the first step, running a select query that converts the list comprehensions into a joined andalso.

When we get to the final stage where it fetches the nostrum_users table and searches for users matching the PresenceUserId list the query appears to be very slow. My loose understanding is that it is pulling the whole contents of the table in and then running the filter, with 10 million users in the users table, this query took around 6 seconds on my laptop, even though the PresenceUserId results will only have around 2 users in this test example.

My question is: is there a smarter way to write a query of this style, one that will fetch from the users table in a more optimised way that will not pull the entire copy of the table into the query?

I have not yet completely wrapped my head around the inner-workings here so am happy to answer any additional questions or provide any additional context/information. Thank you in advance to anyone that is curious enough to have a poke at this one!

Kind regards,

Joe

4 Likes

The mapping from PresenceUserId to User is just a lookup in the second table, so an option is to either replace the {UserId, User} <- UserCache:query_handle() with a direct ets:lookup_element/3, or split that off to a post-processing pass leaving the qlc:q to only yield a list of user ids.

Personally I’d stick with ordinary ETS functions like selects, matches, and lookups.

2 Likes

I second this recommendation. The syntax for match specifications looks intimidating, but it is very powerful, and worth learning if you’re going to be using ETS tables.

Another recommendation: If you wish to select multiple rows in a single table with a single call to ETS, you may want to look in to using ordered_set tables, and organizing the ETS table key such that the rows you are interested are ‘nearby’ in the sort order. ets:select/* has a nice optimization where a partially bound key in a match specification will only search a subset of the ETS table if it knows that’s the only place to look, which can be much faster than searching the entire table.

3 Likes

Thank you both for the answers here!

My main reasoning for looking to do this natively with QLC is that it means we can offer a fairly standard API to end users that lets them use the ETS backend, the Mnesia backend or a custom backend as long as it can return a QLC query handle (though I realise this functionality might be a pipe dream!).

Thank you for this recommendation also, Discord already uses snowflake integer IDs for objects, so organising guild memberships using keys in this way could be handy :slight_smile:

1 Like

Does what is written here about Joins help optimising this? It sounds like it might be talking about the problem you are seeing:

https://www.erlang.org/doc/man/qlc.html#join

If you want to dig deeper the parse transform that changes the qlc:q into the respective ets:table call is implemented here:

That’s a fairly complex piece of code but if you trace or debug it while compiling your specific example you might get some hint where a specific case optimisation could be added or how changing the query might improve performance.

2 Likes