RabbitMQ, OAuth2, and ConduitAMQP

How misreading of a library's documentation led to spelunking in RabbitMQ source code.

amqp is one of the most used RabbitMQ Elixir libraries used by many projects directly and also a dependency of Broadway RabbitMQ. It does its job well, but it lacks few polishing touches that are usually needed in a ‘real world’ usage. F.e. you’re on your own when it comes to handling connection hickups. Also, its README presumes at least somewhat experienced user. One knowing that it wouldn’t be OK to merely spawn a “bare” process for the consume/4, as it was suggested in an older README version. But also a user that knows that even the proposed Task should often rather be a supervised one, using Task.Supervisor.

I have already equipped amqp with connection monitoring in previous projects, using :gen_statem behaviour and Process.monitor(conn.pid). It worked, and it also allowed the whole service to boot even if the AMQP broker was unreachable at that moment. But :gen_statem implementation was bit verbose, so this time I picked ConduitAMQP for a new project. The library helpfuly builds on top of AMQP and leverages Connection to provide resiliency.

An issue encountered

Though, as this project required an AMQP connection to be authenticated via OAuth2, and as often more code means more bugs, I hit upon an issue where ConduitAMQP library wasn’t relaying connection options to its amqp dependency, as summarized in the following image.

The issue summarized

So I got blocked for quite some time, by not being able to authenticate against RabbitMQ, even though the Keycloak-vended JWT had all the expected fields set. Luckily, succesfully using the same token against RabbitMQ’s management/HTTP API was a hint that the issue might lie with the ConduitAMQP library itself.

First I checked RabbitMQ’s own logs, but they were a bit sparse

HTTP access denied: Authentication using an OAuth 2/JWT token failed: provided token is invalid.

giving no details of the root cause of invalidity.

Going down the RabbitMQ hole

So I was left with no other choice but to amend and build RabbitMQ from its source code. First I exposed Err to the logs, making the following change in rabbit_auth_backend_oauth2.erl,

@@ -85,8 +85,8 @@ update_state(AuthUser, NewToken) ->
  case check_token(NewToken) of
    %% avoid logging the token
    {error, _} = E  -> E;
-     {refused, {error, {invalid_token, error, _Err, _Stacktrace}}} ->
-       {refused, "Authentication using an OAuth 2/JWT token failed: provided token is invalid"};
+     {refused, {error, {invalid_token, error, Err, _Stacktrace}}} ->+       {refused, "Authentication using an OAuth 2/JWT token failed: provided token is invalid: ~p", [Err]};    {refused, Err} ->
       {refused, rabbit_misc:format("Authentication using an OAuth 2/JWT token failed: ~p", [Err])};
    {ok, DecodedToken} ->

then set the relevant config in a .config file,

[
  {rabbit, [
    % Other settings, f.e.
    % {log_levels, [ ... ]},
    {log, [      {console, [        {level, debug},        {enabled, true}      ]}    ]}  ]}
].

and finally ran make run-broker RABBITMQ_CONFIG_FILE=above.config. Thanks to the above config, RabbitMQ’s console now finally clued me in to the actual cause of token invalidity

HTTP access denied: Authentication using an OAuth 2/JWT token failed: provided token is invalid: {badarg, [<<"guest">>]}

which made me realize that Conduit AMQP likely wasn’t respecting the provided options, username: "" and password: token, and thus making the underlying amqp library falling back to "guest" as the default username and password. Actually, the culprit was me, at least initially, as I missed the documentation’s nuanced point of exclusivity between the :url and other options - I’ve since emphasized this exclusivity for future users. But even better, I’ve later also realized that amqp library supports both URI and options since 1.3.0, I’ve updated ConduitAMQP to make use of this.

(Should you also want to run RabbitMQ from its source, I’ve had success with these asdf-installed tools:

make   4.3
bazel  0.13.0
erlang 24.0.5
elixir 1.12.2-otp-24
rebar  3.16.1

And I also needed to brew install libxsl xmlto.

You can also do without a custom .config file and just tail a logfile, whose full path is printed just as RabbitMQ finishes booting up.)

Conclusion

One lesson of this investigation was that it’s often most effective to just dive into the source code itself. I was also reassured by how relatively approachable foreign codebases can be, when written in a functional language, Erlang & Elixir in this case.

While spelunking around RabbitMQ’s code I also realised how somewhat arbitrary the current implementation of Keycloak support is. I don’t think that the token format presented in issue #36, which then prompted the PR for Keycloak support, is in any way specific to Keycloak. Rather, the format shown in the issue, with "permissions" nested under an "authorization" field, is just a possible format that the issue-author just happened to be using, and can be easily replicated as shown in below image.

Reproducing nested "permissions" in Keycloak

In short, I don’t think it’s warranted that RabbitMQ expects this exact structure to be present. Perhaps the existing "scope" field, also used by UAA, could be extended to also allow such nested structures.

And another lesson is, as always, to RTM.