Raynor
A little while ago I wrote and open-sourced a small TypeScript/JavaScript data validation and marshalling library called Raynor. Since it proved quite useful to use around the house, I figured I could do a little bit more with it than just dump it on GitHub. This article is a first step in that direction. It is an overview and tutorial of what Raynor is and how it operates. If you find this sort of stuff interesting I hope this will provide enough info for you to be able to try it in our own projects.
A code example is worth 1000
bytes, so here’s one before I say anything more:
As you can see, the core idea is transforming a basic JS object into a more complicated one and checking that certain constraints apply. The actual input is usually the result of some sort of deserialization procedure. In this and many other cases, this would be JSON. But form encoding, URL encoding, some custom binary format etc. can all be used. Since there’s such a great variety in both formats and implementations it is currently a non-goal for Raynor to deal with this aspect of marshalling.
The rest of this article is split into two parts. The first is the actual tutorial. The second acts as a rationale for building it and a comparison with other similar systems.
Tutorial
Raynor is available on NPM as raynor. It’s enough to install it via npm install --save raynor
. Since it’s written with TypeScript, it already contains its .d.ts
files, so there’s nothing extra for you to do.
Raynor deals with one type of object of its own - the marshaller. It is the object which does the transformation and checking mentioned in the introduction, through the extract
method. It can act both ways, however. So, in our example um
can also transform u
back into a regular JS object, through the pack
method. All marshallers are derived from the Marshaller<T>
interface, which is quite small, and looks like this:
The template type T
represents the class which the marshaller produces. The extract
method usually does the heavy lifting. The input is any
, and it commonly is null
, a boolean, a number, a string, an array or an object. This fact must be checked however, since if a method expects an array, it should fail gracefully when encountering a number. When a condition is encountered which prevents the raw
from being transformed, an ExtractError
should be thrown. Finally, pack
is the reverse of extract
functionally-wise. Code-wise, it’s more the case that it is a much simpler method than extract
, as many of the validation steps are not needed.
As an example, let’s suppose we’ve encoded booleans as 'TRUE'
and 'FALSE'
in our application, and we wish to have a marshaller that can transform them into proper booleans. This is quite the contrived example, but it will help highlight most of the issues with writing a marshaller from scratch. It might look something like this:
The input to extract
is any
, so the first thing to do is check that it is actually a string. So if somebody asked CapitalBooleanMarshaller
to extract an array of strings, it would fail. The second step is to basically turn "TRUE"
to true
and "FALSE"
to false
. If the input is none of these things, another exception is raised. As stated, pack
is much simpler than extract
. This will usually be the case, as, intuitively, the data inside the application is already well-formed. It’s only the data outside the application which poses problems.
Builtin Marshallers
Naturally, Raynor comes with a bunch of marshallers for the simple types of JavaScript. Things like numbers, booleans, strings, nulls, Date
s etc. are handled from the start, so you can start building off them. There are also marshallers for very common objects based on these types. For example, IntegerMarshaller
extends the NumberMarshaller
and checks that the input is an integer. PositiveIntegerMarshaller
goes a bit further and checks that the integer is positive. IdMarshaller
is an alias for this one. WebUriMarshaller
checks that a string input is in URI form and the protocol is either http
or https
.
For example:
As a side note, my plan is to move these marshallers out of Raynor proper and into a second library, which would contain type definitions and marshallers for the common business type objects one works with: IBANs, currency, geo locations, URIs, emails, etc.
We’ll go into more details about how these are implemented later in the article. But a very common way to build a marshaller is to start with an already existing marshaller, which expresses a certain constraint on the input, and add another constraint to it. For example, if your app’s domain is my-app.com
and you want to check that a particular resource you use has a URI from my-app.com
, you could start with the SecureWebUriMarshaller
and extend it like so:
You can even extend MyAppUriMarshaller
once more, to perhaps only allow images, like so:
The order in which the filter
methods are applied is from the base class up to the derived class, or from more general to more specific. So, for example, you can safely assume that the uri
which makes it to MyAppImageUriMarshaller
starts with https://my-app.com/
.
Building Marshallers For Classes
A great deal of use cases are covered by these marshallers and their extensions. However, as the inputs turn from simple types to objects, arrays or complex aggregates of them, it becomes tedious and even downright hard to write marshallers for them. Luckily Raynor comes with a rich set of tools for dealing with these cases, in the form of a set of annotations for class definitions and the supporting infrastructure needed to turn these into marshallers automatically. The marshallers thus built implement the very common pattern of having a complex object be recursively broke down into smaller and simpler objects, which are handled by other types of marshallers.
The initial example in the article used these annotations. We’ll focus now on a different and bigger example in order to dive deeper into the functionality put forth by Raynor.
A classic thing to model is a mathematical point. For starters it might look like this:
The class has two public properties x
and y
, a constructor of two arguments, and an instance function getNorm
, which returns the mathematical norm \(\sqrt{x^2 + y^2}\). The striking point is the MarshalWith
annotation applied to the public properties. A you might have guessed, this informs Raynor of which marshaller to use when trying to construct a Point
. Any sort of marshaller can be specified, but the actual argument must be a constructor function / class.
In order to create a marshaller, the MarshalFrom
function is used, which transforms an annotated class into a marshaller class. For example:
Notice the way the function is used with regards to new
. What it returns is actually a constructor function / class, so you need to obtain an instance before any work can be done. The reason we do this instead of returning an instance directly is for compatibility with the MarshalWith
annotation. It is very easy to compose marshallers in the same way we compose types. For example, a hypothetical rectangle would look like this:
You can probably intuit what the generated marshaller does. But in broad strokes, it first checks to see if its argument is an object. Then, for each annotated field f
with a attached marshaller Mf
, it tries to find the value for property f
inside the input object, and, once found, use Mf
to extract the value. This is then written to the property f
of the output object. If any error is encountered, such as the input object not having a required field, or a sub-marshaller failing, the whole process stops and an ExtractError
is raised.
The instance obtained from extract
belongs to the correct class. In the previous example you could call getNorm()
on it and it would work. There aren’t many constraints for how the annotated class must look like. You can have a regular constructor, methods and un-annotated properties. The only hard constraint is that calling the constructor with no arguments not leave the object in a really bad state, or somehow fail. So a constructor like the one for Point
above is a-OK, but one which would complain via an exception if x
is undefined would be a no-no. It is still best to think of such objects as DTOs, but now you have the possibility of adding a little bit of logic to them.
More From The Annotations Toolkit
MarshalWith
also accepts a second parameter, which is the name of the field in the original object. It is helpful when dealing with objects from external sources which might not follow your project’s style. Our point class might look like this:
A very common pattern, especially for APIs which have been in use for longer periods of time and have seen features added and removed, is for objects to have optional fields in an object. Raynor provides the OptionalOf
function to turn a regular marshaller into one which is optional, in the context of the annotation system. Extending points to three-dimensional space provides an excellent example for this:
Raynor comes with a bunch of other utility marshallers, such as ArrayOf
, MapOf
, OneOf2
, OneOf3
, MarshalEnum
etc. Together they are meant to provide a rich language for describing the structure and constraints for objects. As an more complex example, consider:
Note that, in the near future, MapOf
will be upgraded to work with the ES6 Map
. There will be a similar SetOf
which will work with Set
. There were some problems with TypeScript support for these when emitting to ES5, and I need to figure things out a little bit first, but hopefully it won’t be long.
As a final point of behavior, the marshaller produced by MarshalFrom
ignores extra fields. It doesn’t include them in the output object, even though they would still match the type required according to TypeScript, but it also doesn’t raise an error when encountering them. This helps in general with data migration and service evolution inside your own application. It also helps when marshalling responses from external APIs, as only the bits of their response which interests you need be modeled, while the rest can be ignored.
Extending Annotation Marshallers
So far the tools provided have been good at modeling validation setups where an object is valid if all of its fields are valid. And again, as with the simple marshallers, this captures a great deal of use cases. Fortunately, to capture more complex situations, there’s no need to introduce a third component to Raynor, but rather build on the ones we already have. We need only extend a marshaller obtained via the annotations mechanism, and provide a filter
function which describes the extra validation we want to do.
Returning to our two-dimensional point example, suppose we want constrain our points to be on the unit circle. We could achieve this via:
TypeScript doesn’t handle the case of extending a class expression quite well when generating a .d.ts
file unfortunately for consumption by other packages. So if you want to expose UnitCirclePointMarshaller
from a package, you’ll have to compromise and code it like this:
Advanced: RaiseBuildFilter Marshallers
Almost all marshallers you’ve seen here are descendent from RaiseBuildFilterMarshaller<A, B>
. In fact, only some vary basic marshallers, such as the ones for booleans and null
s, and the ones used in annotations aren’t. Among other things, this marshaller contains the logic for filtering via a hierarchy of marshallers with the filter
function.
RaiseBuildFilterMarshaller<A, B>
or RBFM<A, B>
for short, captures a very common pattern when dealing with deserialization. It breaks down extract
into three phases: raise, build and an optional filter. Conversely, it breaks down pack
into two phases: unbuild and lower. Lower is the opposite of raise and unbuild the opposite of build. Extract takes its input and passes it through the raise operation. This result is then sent through build. Finally, all of the defined filters are applied in the order of most general to most specific. Pack naturally does the reverse. It doesn’t do any filtering, but straight up passes the input to unbuild, and the result of that to lower.
The first phase, raise, does very basic checks on the input. Just enough to know that the any
input is in fact a number
and not an array or a null. The result of it is usually a JavaScript primitive or object. The type parameter A
is the output of the raise
method. The second phase, build, transforms this result into something more structured. Here a number
might be turned into a Date
or a string
into an EmailAddress
. The type parameter B
is the output of the build
method. The result of build
is something structurally sound. The last phase - filter - which consists of a bunch of methods applied in series, usually checks different business type logic constraints. For example, that the Date
is from this year or that an EmailAddress
has a certain format.
RBFM<A, B>
itself is an abstract class, and it is meant to be used as the base in hierarchies of marshallers, where the various marshallers implement different phases. For example, a direct descendant of RBFM
might implement the raise and lower phases by implementing the raise
and lower
methods. A descendent of this one would implement the build and unbuild phases by implementing the build
and unbuild
methods. At this point, the latter marshaller is usable as is, since there are no abstract methods left to be implemented. But a whole hierarchy of marshallers can be built on top, each of which only specifying a filter
function and gradually defining different constraints on the inputs.
As a more concrete example, consider the builtin PositiveIntegerMarshaller
. It is defined as:
It only defines a filter
function which checks that the input is positive. IntegerMarshaller
, in turn, is defined as:
It again defines just a filter
function which checks that the input is positive, via Number.isInteger
. NumberMarshaller
is where things get more interesting. It looks like:
So it implements the build phase of the RBFM<A,B>
with B = number
. The phase itself is quite simple, being the identity function in both directions. Finally, BaseNumberMarshaller<number>
looks like:
It implements the raise phase of the RBFM<A,B>
, with A = number
. The phase is more involved, with checks for the input being a number, as well as it not being a strange value.
There are many such hierarchies in the built-ins of Raynor. The DateMarshaller
is a similar one, and it also uses BaseNumberMarshaller
but with B = Date
this time, since it wants to transform a numeric timestamp to a Date
object.
Background, Inspiration etc.
Raynor is by no means an original piece of work. It owes a debt of gratitude to Protocol Buffers, Thrift, Avro, Cap’n Proto, JSON Schema, Json.NET, and a host of other similar technologies, since people have been dealing with this stuff for quite some time.
What it does add is an improvement in one of my personal pain-points, which is more involved checking of the actual input. I believe that when designing Internet applications we should be really thorough in how we model data and how we check the inputs and outputs of our systems. And just looking at something structurally is many times not enough. Unfortunately this is what most of the options from above do. JSON Schema goes a little bit further, since it has a set of builtin constraints on particular fields, but it doesn’t go nearly far enough. So I wanted something which would allow the expression of more complex assertions about entities. This naturally involved expressing them in a programming language, rather than just declaratively. It also limited the initial solution to be bound to a programming language, rather than its own stand-alone IDL. The choice of JavaScript/TypeScript was natural because it’s found on both clients and servers alike, and is quite popular to boot.
A second improvement is that the entities produced by Raynor are instances of regular JavaScript classes. They aren’t auto-generated by some tool and they aren’t restricted too much. You can have extra methods on them, extra fields, an interesting constructor and you can even build them into hierarchies of classes or do more complex things with them.
A non-goal for Raynor was providing a “wire format”. Tools like Protocol Buffers, Thrift etc. also define their own, and these dictate how an object is transformed into a representation suitable for transfer outside of the generating process’ memory. Raynor doesn’t do that, but rather relies on external mechanisms. Most of the time this means relying on JSON and JSON.parse
/JSON.stringify
for serialization and deserialization. This opens up the possibility for using different mechanisms which extend it’s range of usages, however. URL encoding or form encoding are two examples, but also other methods of parsing JSON, such as fetch
’s response.json
or Express’ body-parser
. As long as it can turn bytes into a JavaScript object it can be used, basically.
In the future it might be the case that some support for wire formats is added, for convenience’s sake, but also to support more advanced usage, such as multiple levels of serialization applied to a single entity.
Another non-goal for Raynor was providing RPC “service definitions”. This is mostly on account of trying to follow the old adage “Do One Thing And Do It Well”. Furthermore, RPC vs REST vs GraphQL vs what-have-you is not a done battle, and it’s prudent to be agnostic to these things. Extra bits can be built upon it rather than tightly integrated, in any case.
Anywho, I’ll wrap up now, since this article is already quite large. Hope it was an interesting read as well.