RetroFeed Work Log #5 - User Authentication

We’re not out of the woods yet with setting things up for RetroFeed, buy we’ve come to a very interesting point - setting up user authentication.

The current file structure is the same one from the last article. It’s:

.
├── Dockerfile
├── config
│   ├── env.local
│   └── env.test
├── knexfile.js
├── migrations #hidden
├── package.json
├── scripts # hidden
├── secrets.tar.enc
├── src
│   ├── auth
│   │   └── auth.ts
│   ├── controllers
│   │   ├── app
│   │   │   ├── admin.hbs
│   │   │   ├── app.ts
│   │   │   ├── home.hbs
│   │   │   └── layout.hbs
│   │   └── tech
│   │       └── status.ts
│   ├── index.ts
│   ├── infra
│   │   ├── common.ts
│   │   ├── config.ts
│   │   ├── db-conn.ts
│   │   ├── request-id-middleware.ts
│   │   ├── request-time-middleware.ts
│   │   ├── request-version-middleware.ts
│   │   └── session-middleware.ts
│   └── services
│       └── user
│           ├── entities.ts
│           ├── events.ts
│           └── service.ts
├── test # hidden
└── test-e2e #hidden

Our focus is going to be on the auth.ts and src/services sources.

The standard choice for authentication in nodejs / express land is passport. Implicitly it works for NestJs. Though, as we’ll see, we’ll to operate at a lower abstraction level. On top of this I plan on using auth0 as an identity provider.

First off, some issues. What we’re dealing with is identity. The auth systems job is to verify identity (authentication) and to check authorization on top of it. Both of them are important operations. But since we’re a small operation, we’ll focus just on auth for now, and leave authorization for later.

At this point we’ll actually start dealing with some code. Gasp! Don’t worry, it’s not too much and the bulk of it still feels like configuration.

To boot, there’s the notion of identity of a user. This comes from the identity provider, which we only want to be GitHub for now, but which might grow to be Bitbucket, Google, Twitter etc. Basically any provider which is more “developer focused”. So Facebook or Instagram probably won’t make the cut. This identity is modulated and very largely managed by Auth0. Given how much data and machinery this provides, what’s left for us to do? Well, we provide the service specific customizations of a users profile, which is still a big thing to handle.

The first thing I did was setup several GitHub applications. Four of them to be exact, one for each environment. These are basically us registering with GitHub as entities which’ll need their auth services (and more). We’re going to need a similar setup for each of our providers in the future. Hot on the heels of this is the Auth0 configuration. Again I created four applications here, one for each environment, and linked them with the GitHub ones as a “connection”. I disabled the default username and password “connection”, cause I don’t want that sort of hassle in this day and age. These configs are basically secrets more or less, so they’re not part of the OSS bits of this project.

On our side, there’s a bit of coding to do. The two things we’re going to need are the users service and an implementation of the passport auth flow. The former defines what it means to be a user of our application and all the extra things we store atop the Auth0/GitHub provided identity. The latter is largely code as config / magick. But more on it later.

The user service is the first entry into the services src directory. You can check it out here at the 0.0.4 version. In general services are where the action is going to happen in our code since they are the implementors of the business logic.

Optional: a brief aside on architecture is in order here. I hope to keep these rare, as I don’t really have the bandwidth to provide tutorial-level information on the very many topics I’m covering here. It’s unfortunately just the nature of the beast that building internet applications requires touching so many things, even for simple apps like this one. But this particular topic is quite important. Since I plan to keep the client code simple - that is the part of the application which executes on the clients machine, this means the complexity is gonna move on the server code. To keep things manageable here the code is split into layers. The kernel consists of the “services” code. Things like the user service, the feeds service etc. These deal with implementing the business logic and serve the other components of the application. They’re also the seeds of actual services in a future microservices architecture. Because just because we’re doing a monolith now, doesn’t mean we should not try to anticipate future developments. Especially when they allow for a nicer structure overall. And like with separate services, I’d like the same constraints to hold - each service has separate storage, and it is the only way in which that data is accessible. Transactional boundaries should also correspond to service ones. Conversely the API controllers, webpage server-side rendering controllers, cron tasks, callback handlers etc all make use of the services layer. Ditto for possible utility script or even scheduled jobs etc. The API the client will interact will also be a thin layer atop the services one.

As a last point on architecture choices, Postgres will be the main storage backend . We’ll follow a poor man’s event sourcing design. There’s gonna be entity tables (for users, feeds etc) and entity-event tables which describe operations that happen on entities. Each method of a service should basically correspond to an event type from here. There’s gonna be some technical tables of course, like join tables, without all this extra machinery.

The first thing to do is define the tables the service will depend on. These are added via CREATE TABLE statements in the migrations directory. The entity table looks like:

CREATE TYPE auth.UserState AS ENUM ('Active', 'Removed');

CREATE TABLE auth.users (
    -- Primary key,
    id Serial,
    PRIMARY KEY (id),
    -- Core properties
    state auth.UserState NOT NULL,
    agreed_to_policy Boolean NOT NULL,
    display_name Text NOT NULL,
    nickname Text NOT NULL,
    picture_uri Text NOT NULL,
    -- External foreign key
    provider_user_id Varchar(128) NOT NULL,
    -- Denormalized data
    time_created Timestamp NOT NULL,
    time_last_updated Timestamp NOT NULL,
    time_removed Timestamp NULL
);

CREATE UNIQUE INDEX users_provider_user_id ON auth.users(provider_user_id);

While the events table for it looks like:

CREATE TYPE auth.UserEventType AS ENUM ('Created', 'Recreated', 'Removed', 'AgreedToPolicy');
CREATE TABLE auth.user_events (
    -- Primary key
    id Serial,
    PRIMARY KEY (id),
    -- Core properties
    type auth.UserEventType NOT NULL,
    timestamp Timestamp NOT NULL,
    data Jsonb NULL,
    -- Foreign key
    user_id Int NOT NULL REFERENCES auth.users(id)
);

CREATE INDEX user_events_user_id ON auth.user_events(user_id);

As you can see it’s not a lot of stuff. But we’ll add more as we progress. Think language preferences, global notification settings etc. All of the stuff which pertains to a user profile as it were.

The service lives under src/services/user. There’s three files which I hope to keep as a standard for future services. entities.ts is for DTOs for the various types one operates with - think User, Feed etc. events.ts is for the associated event types like UserCreated and server.ts is the actual code. You can check them out separately cause they’re quite big.

The main stuff is in service.ts with the UserService::getOrCreateUser and UserService::getUserByProviderId methods. There’s a couple of things to notice:

Dependency injection rules.
async/await double rules.
knexjs is used liberally inside the service. If things get too hairy, a separate data access layer might be good, but for now mixing DB access and business logic is decent.

getOrCreateUser is where the magic happens, and it is a big upsert statement powered by Postgres’ insert ... on conflict do update functionality. It gets called in the auth flows with the data provided by auth0. Also notice that the statement is “raw”. See the optional box below:

Optional: raw queries? In 2018? what gives? Ever since my time at StackOverflow I’ve been a fan of these and of micro-ORMs / query builders. If knex.js has support for something then I’ll happily use that, otherwise I just call into the DB directly. I am tying the code to the DB. But in 2018 with Postgres/MariaDB you can do that and not _suffer. This isn’t some greedy 90s DB vendor we’re talking about. They’re solid choices with large OSS communities behind them not likely to go away anytime soon. And most software is going to require / be written and tested against one DB anyways. So why complicate your life?

OK, so much for the services part. Now to do things like login, logout, remembering a user, or link data from GitHub. This is where the second topic of our discussion comes in - passport.js. Passport is an auth library for express. Auth0 has an plugin for it and nestjs accommodation for it. So we’re supposed to simply integrated it.

Of course in reality this was a very painful experience. I don’t know why but auth always is. The code lives in src/auth/auth.ts. It’s a big file containing controllers, middleware, DTOs etc. I might break them up at some point. In any case, it’s self contained and should technically be the start and end of the integration.

The gist of what happens here is:

We define the /real/auth controller, which registers the /login, /logout and /callback paths.
Login is the path a user is redirected to when they press a “Login” button in the UI, or when they access any route which requires auth, but they haven’t yet logged in. This simply invokes passport logic, which triggers a bunch of oauth flows, which do a bunch of redirects, which ask the user for credentials etc, which in the end get to ..
Callback is the path the auth flow finishes with. This also invokes passport logic, which will call in to the user service, and in the end redirect to /admin.
Logout is the path a user is redirected to when they press a “Logout” button. This simply clears the passport session information and redirects to the home page.
Notice that all these processes are “standard” web ones - no SPAs involved. Just redirects and good ol’ fashioned HTTP serving.
All the examples use passport with express and usually install a bunch of middleware on some empty routes. That didn’t gel well with NestJs. So I had to explicitly call the middleware in the handlers. Unorthodox but it worked.
The AuthStrategy is used to configure passport and to integrate it with NestJs. This bit actually worked quite well and kudos to NestJs for being aware of “authentication”. Here is where we actually call getOrCreateUser, and where we use Raynor a bit to ensure the data from the provider is good.
The AuthSerializer is another NestJs extension point, and it controls how the user profile info is mapped to the session’s middleware storage, and implicitly the cookie itself. By default this uses the whole object, which is stupid, so this code uses just the provider’s id of the user. Here is where we call getUserByProviderId.
The ViewAuthGuard is a special NestJs guard which raises an ViewAuthFailedException if the request isn’t authenticated. This in turn gets handled by ViewAuthFailedFilter, which simply redirects to the /real/auth/login route. The filter is installed globally, while the guard needs to be applied on a route/controller basis.

OK, I think this is enough. Next up will be the topic of web integrations - sitemaps, robots, microdata etc.