Replay Testing

Status: Refer Github Project.

Fifthtry is the largest Realm project so far, 55K+ lines of code, and we have reached over 700 test steps. At high level I can say realm and especially realm-testing is working pretty well for us so far.

There are a few issues identified in testing:

  1. writing tests is a chore.
  2. coverage can be improved.
  3. tests should be faster to run.
  4. ability to run headless.

Lets see how tests currently work in Realm, and what do I mean by these, and then will describe how I feel we can solve these problems by moving to a new approach: “replay testing”.

How Realm Tests Work Currently

Realm is a full stack framework, and all tests are “integration” tests so to say. In order for your tests to be “stable”, during testing you must only do what end users do, and make assertions on what end users see.

There is a file fifthtry/frontend/Test.elm, that is an Elm app, and is available when going to /test/. Test.elm “drives” the test process: it makes HTTP requests to backend, and then it makes assertions on the response. Each “step” in Realm test is one such request to backend, and within the step you can have any number of assertions, eg data is so and so, or rendered view has so and so many divs, whatever.

There is also a test UI, on /test/ you can see all the tests and steps listed on left side bar, and while test is running, the main area shows the response received from the server.

Writing Tests Is A Chore

This is how a typical test looks like:

register : List RT.Step
register =
        validate : String -> JE.Value -> R.TestResult
        validate val _ =
            RU.match "create-account-success" "ok" val
    [ RT.Navigate Index.anonymousEmpty Routes.index
    , RT.apiTrue "name-available" (Data.usernameAvailable amitu_username)
    , RT.apiFalse "@@@-is-invalid" (Data.usernameAvailable (Username "@@@"))
    , RT.apiFalse "api-is-invalid" (Data.usernameAvailable (Username "api"))
    , RT.apiError "invalid-username" (RT.only "email" "Email is not valid.") <|
        Api.createAccount (Email "a") amitu_name amitu_pass
    , RT.api "create-account"
        [ RU.dalways (R.TestPassed "Got Signed Cookie")
        , \s _ -> R.UpdateContext [ ( "signed", JE.string s ) ]
        (Api.createAccount amitu_email amitu_name amitu_pass)
    , RT.apiError "invalid-signature"
        (RT.only "signed" "Invalid signature: \"not enough parts: found 1\"")
        (Api.createAccountWithOTP amitu_email amitu_name amitu_pass otp "")
    , RT.apiS ( "verify-otp", "signed" ) [ validate ] <|
        ( JD.string
        , Api.createAccountWithOTP_Payload amitu_email amitu_name amitu_pass otp
    , RT.Navigate Index.loggedInNotEmptyWithoutUsername Routes.index

This is the actual test case of testing account creation on fifthtry.

We create HTTP requests using the Realm.Test, aliased to RT helpers, eg RT.Navigate and RT.apiError and so on.

In Realm, APIs eg: Api.createAccount, and “Data” (“data” is same as API, but while APIs can fail due to user input, “data” never fail) eg Data.usernameAvailable, are both discouraged. The preferred thing is to use RT.Navigate and RT.SubmitForm for the simple reason that these two are largely enough for most cases, you want to get page and data together, when using API you get page first, and fetch data later, which is slow, and has other issues, like since RT.SubmitForm etc return a whole page, you can see if the page is looking fine, RT.API etc return only the JSON data, which you can not so easily “see” if they are right. API/Data must only be used when end user is not semantically changing page, which is relatively rare.

Here is how individual page based assertion looks like:

     else if id == anonWithoutDocument then
        [ BaseTest.anonymous "base" t.config.base t.context
        , RU.true "track-is-public" config.trackInfo.public
        , RU.match "track-access-is-nothing" Nothing config.trackInfo.access
        , RU.match "document-is-missing" Nothing config.moduleInfo

Here we are using a bunch of assertion functions defined under by Realm.Utils, aliased here as RU, eg RU.true.

As you can see everything is fully typed checked. If JSON by server cant be decoded using the Payload Decoder in API/Data or by Page decode in case of Navigate/Submit, the test fails. Similarly all values we are asserting on are type checked by Elm.

It works quite well, am quite proud of what I have created, but damn! its so much code! More than a quarter of all Elm code I have written for fifthtry is just test assertions.

But its more than volume of code. As you can see in quoted example I am using BaseTest.anonymous, a utility function that is used in a bunch of places: 13 so far. And to improve productivity while writing tests one is tempted to write more such utility functions, but then they end up having weird names like anonWithoutDocument, and you really don’t know what it means looking at the function name. So changing functions becomes quite a chore, all names are very imprecise, do not give you intuitive understanding, and things are inter-related, if you modify a commonly used, badly named utility function, it becomes chore to get it all working right.

Lets just say this is this least joyful aspect of working with Realm. Mind you it is way better than anything I have tried in the past (the exact problem happens in every testing framework I have used, be it pytest, cargo test and so on), but can be improved: you will see how soon.

Coverage Can Be Improved

Looking at my test cases, I have a lot of tests like this:

     else if id == anonymousWithDhruv then
        [ BaseTest.anonymous "base" test.config.base test.context

     else if id == anonymousWithDhruv2 then
        [ BaseTest.anonymous "base" test.config.base test.context

As you can see both of them are only testing if anonymous version of page was rendered, and not that the Dhruv user is part of test.context.

I call it “chore induced sloppiness”, you give people mindless chore, and they become sloppy.

Anyways, the problem as you can see is not enough coverage. We are not testing every aspect of test.config, only test.config.base. Even in BaseTest.anonymous we are not testing every base attribute, only the field.

The more we assert, the brittle it will become, so maybe we testing less is good. But the less we test, the less we have confidence in our test, so more is good.

How do we solve this?

Before we go there lets appreciate that even if we tested everything, it can still not be enough. Say you had a page or a submit action that adds a user to a team. But lets say it is coded in such a manner that user is identified by email address, and if user doesn’t already exist, the user is created as well.

On success, on either case, user was already there, or user as created, we may be showing exactly the same data to end user, user with this email address has been invited. And yet when you are testing you want to ensure user was not created for example if the test scenario already creates the user in earlier steps.

You can obviously try to expose this data as output only for test, but anything that is done like this is a bad programming, as you will have to be really sure to communicate this well, and ensure it doesn’t go in production, and this gets tedious and problematic when you are interested in some condition that happened deep in your stack.

Tests Should Be Faster To Run

Right now the 702 step test of fifthtry takes about 65 seconds.

The way we run test is we have “tests”, and each test is composed of a number of “steps”. When we move from test to test, we “reset the database”.

We have a single rust service running in background, and a single database.

When running tests, I do not reset my manual test data/state, so I have actually two PostgreSQL “schema”s. public schema for my normal http requests, and test schema for test requests. During reset I delete the test schema, and recreate it from a schema.sql file (which itself is generates when I apply django migrations to change my database table definitions).

To speed up the tests we can reduce the amount of work done per test, and we can make the tests concurrent.

Since we are fully rendering the page on every step, there is a cost we are incurring, which can be reduced if we run in a non DOM context, that is if page wasn’t rendered. Since page rendering is a good thing when debugging failing test, we want to support both modes.

The plan is to run the tests in parallel, say 10 at a time. In each HTTP request we can include the test name, and in the initial database reset phase we can create a schema with name derived from test name, and then in middleware we can initialise the database connection so the test specific schema is first in schema search path.

Lets Talk About Observer

The tracing framework used by Realm: observer, records the sequence of function call invocations:

context: middleware [2020-10-12T03:00:50.135292+00:00] main:   3ms
-  38µs: path=/api/create-account/
-  50µs: method=POST
-  62µs: realm__pg__connection: 660µs
    -  31µs: db__select__: 618µs
        - 104µs: query=SELECT 1
        - 611µs: modified=1
-   1ms: fifthtry__account__create_account: 935µs
    -  56µs: fifthtry_db__account__is_email_available: 599µs
        -   3µs: [email protected]
        -  10µs: db__select__fifthtry_user: 576µs
            - 193µs: query=SELECT FROM fifthtry_user WHERE = $1 LIMIT $2 -- binds: [[email protected], 1]
            - 569µs: rows=1
        - 597µs: name is empty
    - 907µs: realm__response__json_with_context:  20µs
-   2ms: db__select__: 552µs
    -  64µs: query=SELECT 1
    - 545µs: modified=1

Lets observe the observer output carefully: first is calls like main: 3ms, this means a function named main was called, and it took 3ms to execute.

Inside main we see records like 38µs: path=/api/create-account/, which says that after 38µs from start of main, a key “path” with value “/api/create-account/” was “observed”. Similarly we see the “method”. And then there is a call to realm__pg__connection after 62µs from start of main, which took 660µs to execute. Inside the realm__pg__connection we a call to “db__select__”, and inside that we see more observed keys, eg “query” and “modified”.

We also have lines like - 597µs: name is empty, this is not a key value, but a log. Logs in observer can’t have data, we use &'static str to ensure all logged strings are hardcoded strings.

This output is much preferred over the traditional logging used by most frameworks.

Shape Of A Function Execution

Consider the observer trace output for “/” when user is logged in vs when they are not:

Just by looking at the overall “shape of the output”, you can see they are different. The “shape of output” for anonymous remains the same:

Even though the timing data is slightly different.

Trace Without Timing Data

The trace output we have shown, has a lot of data in it. Eg timing related data. We can see the trace without data:

- path=/
- method=GET
- realm__pg__connection
    - db__select
        - query: SELECT 1
           rows: 1
- fifthtry__index__get
    - fifthtry_common__base_with_notification
        - fifthtry_common__base
    - fifthtry_db__account__all
        - db__select__fifthtry_user
            - query: SELECT "fifthtry_user"."name", "fifthtry_user"."username" FROM "fifthtry_user" WHERE "fifthtry_user"."name" NOT LIKE $1 
               bind: observed
               rows: 0
- db__select
    - query: SELECT 1
       rows: 1

All timing information is gone, and we have also hidden some information like the bind: observed. Key-value data is still left intact.

“Snapshot Testing”

I got the idea of “snapshot testing” from React. In my previous attempt at snapshot testing, I did snapshot testing on the JSON output of my backend APIs.

That failed.

The main reason was output contained information “unstable” information, eg date time that changed from test run to test run.

If you look at the source of that library you will see support for “erasing” some information before taking snapshot of JSON, but this was error prone, buggy, and not fully implemented.

But the main problem with my last attempt at snapshot testing was that I was taking snapshot of JSON output of my APIs. As I have already discussed, JSON can remain the same even if different path were traversed, and often JSON not contain enough information to reliably test internal behaviour.

The plan is this time to do snapshot testing on all the following: output of the API, the “observer test trace”, and the ftd.

Lets talk about each of them one by one.

Snapshotting Output Correctly

Like I said the last attempt failed because I was doing JSON conversion and then erasing the data that was not stable, eg timestamps, primary keys, and so on.

What are we doing different now? We are going to use serde serialize hints to ignore the data that we do not want included in snapshot.

Consider a struct:

pub struct CR {
    pub crid: i32,
    pub title: String,
    pub track_id: String,
    #[serde(serialize_with = "realm::datetime_serializer_t")]
    pub updated_on: DateTime<Utc>,

In this struct updated_on is a problematic field. This changes every time we run the test.

So we are using realm::datetime_serializer_t serialiser that is implemented like this:

pub fn datetime_serializer_t<S>(x: &DateTime<Utc>, s: S) -> Result<S::Ok, S::Error>
    S: serde::Serializer,
    if crate::base::is_snapshot_test() {
    } else {

This and similar i32_serializer_t etc means all unstable values can be erased from JSON output.

Snapshotting Observer Output

Observer output without timing data we have already seen. But it only ignores timing information, key-value information is still retained, and if one of those values is a date time object we will have the same problem. For this reason observer comes with two set of functions: observer::observer_i32() and observer::transient_i32(), and if transient_*() functions are used, the trace output does not include the value field of transient key-value pairs during testing (but they are included in non test mode, so observer output can continue to be reliable log for your server).

- path=/logout/
- method=GET
- realm__pg__connection
    - db__select
        - query: SELECT 1
           rows: 1
- realm__in__parse_ud_cookie
    - ud: observed
- fifthtry__logout__get
    - fifthtry__index__get
        - fifthtry_common__base_with_notification
            - fifthtry_common__base
        - fifthtry_db__account__all
            - db__select__fifthtry_user
                - query: SELECT "fifthtry_user"."name", "fifthtry_user"."username" FROM "fifthtry_user" WHERE "fifthtry_user"."name" NOT LIKE $1 
                   bind: observed
                   rows: 2
- db__select
    - query: SELECT 1
       rows: 1

Here we see few keys, eg ud and bind to not contain a value, but a constant value “observed”.

Snapshotting the “UI” or FTD

What is remaining is the how the UI actually looks like, can we snapshot that as well?

Web frameworks deal with HTML/CSS. Or abstractions like React that renders in HTML/CSS. Right now Realm is similar, uses Elm.

But React/Elm are not easy to render without having a DOM environment, which is possible but relatively slow.

And even if we could, we still have the problem of the Elm/React rendered DOM containing “unstable” data that will change from test run to test run for exact same scenario. You can wrap such data into a DOM node with some class so one can erase that out before creating snapshot, or you can pass some variable to React/Elm that while rendering they ignore such unstable data when running in test mode.

If you have server side rendering setup done, this should be relatively straightforward to implement.

As of now Realm does not have proper server side rendering, you can do server side rendering, but will have to write HTML template which is independent of Elm file, we do not run backend side javascript engine. Plan is to get FTD working and move to it so both HTML template and Elm file are generated by common FTD source files, and FTD will have support for elision of unstable data.

The Plan

Based on all this, this is the current short term plan:

Lets look at these steps in detail:

Recording Mode

There would be a page /record/, this will show existing replay files you can start with, or you can start from scratch.

You will have to give the test that you are about to record a name and some description.

If you select a base recording file as starting point it will first invoke the Replay Runner, and run that recording file, and put the browser in the state you can start with. Example say a recording may create common set of users, log you in as one of them, so you do not have to repeat these steps.

This will drop you to the last URL in the recording file, or on / if no replay file was selected when recording started.

NOTE: We will be designing this so that we can leave the /record/ URL available on production as well, and will let anyone create a replay file, so people can submit bug reports properly. Each replay will run in its own dedicated schema, so concurrent record/replay sessions don’t affect each other or production.

A cookie with recording name will be dropped to indicate we are in recording mode, and which recording file to update.

You can now perform any operation you want to, if you are jumping context say resuming from a link sent via email ensure you open your email from same browser. Someday we will include replay “continuation” tag in each outgoing email so you can resume from different device as well.

If you are switching device, go to /record/ and select any of the existing recording sessions, so proper cookie can be dropped there (the recording file selected at beginning of recording session will not be re-run when doing this, so you will have to manually login etc).

Recording The Test

When we have started the recording session via visiting the /record/ URL, all browsing activity will be recorded.

In realm there is a function realm::end_context() which is executed at the end of HTTP request, it has access to both HTTP request and HTTP response objects.

The check if we are recording mode is presence of a cookie named realm_recording: name of recording. realm::end_context() looks for this cookie and if found it captures the request and response in recording file.

Recording File Format

For each recording we have a JSON file, generated by serialising following rust struct:

struct ID(String);

struct Recording {
    id: ID,
    title: String,
    description: String,
    base: ID,
    steps: Vec<Step>,

struct Step {
    method: http::Method,
    path: String,
    query: std::collections::HashMap<String, String>,
    data: serde_json::Value,
    test_trace: String,
    // output: serde_json::Value,
    // ftd: String,
    activity: Activity,

struct Activity {
    okind: String,
    oid: String,
    ekind: String,
    ada: serde_json::Value,

It is assumed the file is stored in tests/{id}.json. ID can have slashes in it.

Replay Runner

Replay tool is implemented in rust so we do not need any node etc dependencies to run, only our server. Replay tool can also bypass the HTTP layer and run it from command line.

Constructing In Object

Currently the In object stores http::request::Request<Vec<u8>> constructed via hyper.

The plan is to make it private, and expose only the data that is needed, eg query params, cookie data, headers and so on.

Cookie Management

Currently tests run as part of /test/ that is accessed via web browser, so cookie is managed by web browser. In proposed solution rust is going to be calling the user supplied “middleware”, and even during recording phase we want to start with replaying some existing recorded file, and ensure at the end the cookies are set, so we have to do some cookie management ourselves.

We will maintain our own cookie store, we can use cookie jar, but a simple HashMap will also do. Cookies are accessed (get, set, delete) by realm applications via In object, so we can do it without changing application code.

/start-recording/ and middleware

When clicks on Record button on /record/, an POST request is sent on /start-recording/, with id, title, description and base parameters. We will create the recording file, and the test schema would be reset, and the base recording would be replayed back, and at each step assertions would be verified, and on success it will set the cookies and will redirect to the last URL.

middleware is called from realm::serve::handle_sync() function. We will modify handle_serve() to intercept calls to /start-recording/ to call middleware inside a loop to replay the base recording.

Table Of Content

What is Realm?

A Bit On Motivation

Routing is Hard

What does Realm do?

Backend Data And Type Safety


Quick Start Realm Tutorial

In Depth Tutorial (not ready)

Hello Rust
Hello Elm
Hello Static Files
Hello Server Side Rendering
Pre-Commit Hooks

Routing, Request And Response

Frontend, Data, Navigation, And APIs

How To Guides

File Upload

Backend: S3 File Upload
Authenticated File Serving
Frontend: Uploading Files From Elm

How to use storybook?

How to implement “loading..”?









Environment Variables

Internals - Only for Realm Developers, not Users

“Realm DATA”
iFrame Controller
Shutdown Routine
Testing Internals

Change Log

Get Realm Starter Working

Transparent Offline Feature

How to make http requests in Realm?


Replay Testing

Tutorial: ToDo App

Realm Testing

Enhance Realm Starter

Double Load Issue

Deploy To Heroku Button

End failure

Realm-Starter Github Template

Proposal: Tracker And Visit

Proposal: Activity Store

Proposal: Bundling

Proposal: Retry On Network Error

Storybook: Editable JSON

Storybook: Notes

Storybook: Reference



Change Log

How to Publish


Code Snippets

Skip rustfmt For Some Section

Close Modal Dialog When Clicked Outside

Ignoring Lints In Python

Ignoring Lints (clippy and rustc warnings) In Rust

Handle DateTime in Rust & Elm

Handle CiText value read in Rust

Transport Enum Type to and fro Rust/Elm through JSON