Hello Redis

Redis an in-memory database – it saves data to a server’s memory instead of writing it to disk. I don’t know much more about it (yet), but being able to read and write data without a lot of overhead might solve a problem I’ve run into.

The problem

I’m learning about Redis now because the of the Remix app I’m developing that uses Discourse as the back end for its comment system. An “article” on the Remix app will have a corresponding “topic” on Discourse. An article’s comments will be saved as its corresponding Discourse topic’s “posts.”

This project was moving along at an impressive rate, until I hit a bit of a snag. To retrieve an article’s comments from Discourse, the Remix app makes an API request to the Discourse /t/{topicSlug}/{topicId}.json route. The payload that’s returned includes the topic’s first 20 posts, and a stream array that’s a list of the IDs of all the topic’s posts.

Because topics in my development site only have a few replies, I didn’t notice an obvious issue. Posts can only be retrieved from Discourse in batches of 20. If a topic has more than 20 posts, to get the next batch of posts, an API request is made to (for example)

/t/{topicId}/posts.json?post_ids[]=21&post_ids[]=22&post_ids[]=23...&post_ids[]=40

The post_ids[] values used to retrieve the second batch of posts are the 21st to 40th values of the stream array that was returned from the initial request.

The problem is that the payload that’s returned from /t/{topicId}/posts.json doesn’t include the stream array that was returned from the initial request to /t/{topicSlug}/{topicId}.json. If a topic has more than 40 posts, the stream array returned from the initial request needs to be saved somewhere, so that its values can be used for making request to get posts beyond the 40th post.

I thought of a few ways of dealing with this issue: saving the stream array as a pipe-delimited string in the application’s SQLite database, saving each value from the stream array in a TopicPost SQLite table, sending the data to the client… These approaches all seemed bad or hacky.

I was trying to avoid adding an in-memory database as a requirement to the app, but it seems like the best way of dealing with the issue. I suspect it will also be useful for caching other data that’s returned from Discourse.

The reason I’m using Redis as the in-memory database is because I’ve already got it up and running on my local computer. I installed it with this script when setting up a local Discourse site: https://github.com/discourse/install-rails/blob/main/linux#L43-L101. (Documentation for how to install Discourse in a local development environment are here: https://meta.discourse.org/t/install-discourse-on-ubuntu-or-debian-for-development/14727.)

Installing Redis

Here’s the Redis part of the script I linked to above:

//...
  cd /tmp && \
    wget https://download.redis.io/redis-stable.tar.gz && \
    tar -xzvf redis-stable.tar.gz && \
    cd redis-stable && \
    make && \
    sudo -E make install
  cd /tmp && \
    rm redis-stable.tar.gz && \
    rm -Rf redis-stable

  sudo adduser --system --group --no-create-home redis
  FILE="/etc/systemd/system/redis-server.service"
  if [ ! -f "$FILE" ]; then
    sudo bash -c "cat > $FILE" <<EOF
[Unit]
Description=redis in-memory data store
After=network.target

[Service]
User=redis
Group=redis
ExecStart=/usr/local/bin/redis-server /etc/redis/redis.conf
ExecStop=/usr/local/bin/redis-cli shutdown
Restart=always

[Install]
WantedBy=multi-user.target
EOF
  fi

  sudo mkdir -p /var/log/redis /var/lib/redis /etc/redis
  sudo chown redis:redis /var/log/redis /var/lib/redis /etc/redis
  sudo chmod 755 /var/log/redis /var/lib/redis /etc/redis

  FILE="/etc/redis/redis.conf"

  if [ ! -f "$FILE" ]; then
    sudo bash -c "cat > $FILE" <<EOF
bind 127.0.0.1
protected-mode no
port 6379
dir /var/lib/redis
dbfilename dump.rdb
save 900 1
save 300 10
save 60 10000
logfile /var/log/redis/redis-server.log
loglevel debug
EOF
    sudo chown redis:redis "$FILE"
    sudo chmod 644 "$FILE"
  fi

  sudo systemctl daemon-reload
  sudo systemctl enable redis-server
  sudo systemctl start redis-server
  sudo systemctl --no-pager status redis-server
  sudo redis-cli ping

That script:

  • downloads and compiles Redis
  • creates a redis user and group that’s used to run the Redis server
  • creates a systemd service file (redis-server.service) that’s configured to automatically start up at boot (which also eats up ~20% of my computer’s memory. I usually stop it with sudo systemctl stop redis-server and start it manually with sudo systemctl start redis-server when I’m using it.)
  • creates directories for the Redis logs and grants ownership of them to the redis user and group
  • creates a configuration file at /etc/redis/redis.conf
    • binds to 127.0.0.1 to restrict connections to localhost
    • sets protected-mode no (maybe ok for production with connections being limited to localhost?)
    • sets the port to 6379 (the default Redis port)
    • sets var/lib/redis as the directory for data and persistence files
    • sets dumb.rdb as the file name for disk persistence
    • sets a save directive (save 60 10000) this determines how often data is saved to disk based on number of changes in a given timeframe (todo: I’m just guessing that this is a good default value)
    • sets a log file directory and the log level to debug (probably not the correct log level for production)

Redis CLI

I started Redis with sudo systemctl start redis-server, opened the cli with redis-cli, then started trying some of the examples given on https://redis.io/docs/latest/develop/connect/cli/.

Then I realized I was messing up my local Discourse installation. The default Redis configuration supports up to 16 logical(?) databases (0 to 15). When the cli is opened with redis-cli, it’s using database 0. Discourse is also using database 0. To use a different database, start the cli with the -n <db. flag:

$ redis-cli -n 1 
127.0.0.1:6379[1]>

Now back to the Redis docs. Starting with the first example:

127.0.0.1:6379[1]> INCR mycounter
(error) ERR value is not an integer or out of range

I guess I missed a step:

127.0.0.1:6379[1]> SET mycounter 0
OK
127.0.0.1:6379[1]> INCR mycounter
(integer) 1
127.0.0.1:6379[1]> GET mycounter
"1"
127.0.0.1:6379[1]> SET mycounter foo
OK
127.0.0.1:6379[1]> GET mycounter
"foo"
127.0.0.1:6379[1]> INCR mycounter
(error) ERR value is not an integer or out of range

I’m not surprised by the error returned from trying to increment the string "foo", but being able to being able to call INCR on mycounter when its value was set to 1 is interesting. For example:

127.0.0.1:6379[1]> SET mycounter 1
OK
127.0.0.1:6379[1]> TYPE mycounter
string

Redis stores all values internally as strings. If you use a value in a command that expects an integer, it will try to parse the string as an integer:

127.0.0.1:6379[1]> SET mycounter "100"
OK
127.0.0.1:6379[1]> INCR mycounter
(integer) 101
127.0.0.1:6379[1]> TYPE mycounter
string

The TYPE command returns what kind of data structure a key holds. I’ll look into the commands that can be run on the following types:

  • string: (the basic type?)
  • list: a list of strings
  • set: a set of unique strings
  • hash: stores mappings of string fields to string values

list commands

LPUSH: push one or more elements to the beginning of a list. If the list doesn’t exist, a new one will be created:

127.0.0.1:6379[1]> LPUSH slugs "this-is-a-test"
(integer) 1
127.0.0.1:6379[1]> LPUSH slugs "this-is-only-a-test"
(integer) 2

RPUSH: add one or more element to the end of a list:

127.0.0.1:6379[1]> RPUSH slugs "please-do-not-adjust-your-set"
(integer) 4
127.0.0.1:6379[1]> LRANGE slugs 0 -1
1) "testing-testing"
2) "this-is-only-a-test"
3) "this-is-a-test"
4) "please-do-not-adjust-your-set"

LRANGE: get elements from a list (0 is the first element, -1 is the last element):

127.0.0.1:6379[1]> LRANGE slugs 0 -1
1) "this-is-only-a-test"
2) "this-is-a-test"

LPOP: return the first list element (and remove it from the list):

127.0.0.1:6379[1]> LPOP slugs 
"testing-testing"
127.0.0.1:6379[1]> LRANGE slugs 0 -1
1) "this-is-only-a-test"
2) "this-is-a-test"
3) "please-do-not-adjust-your-set"

RPOP: like LPOP but for the end of a list:

127.0.0.1:6379[1]> RPOP slugs
"please-do-not-adjust-your-set"
127.0.0.1:6379[1]> LRANGE slugs 0 -1
1) "this-is-only-a-test"
2) "this-is-a-test"

LREM: remove an element by value (LREM key count value). key is the name of the list, count indicates how many instances of the value to remove. Setting count to 0 removes all instances of value:

127.0.0.1:6379[1]> LRANGE postIds 0 -1
1) "10"
2) "9"
3) "8"
4) "7"
5) "6"
6) "5"
7) "4"
127.0.0.1:6379[1]> LREM postIds 0 7
(integer) 1
127.0.0.1:6379[1]> LRANGE postIds 0 -1
1) "10"
2) "9"
3) "8"
4) "6"
5) "5"
6) "4"

set commands

SADD: add one or more elements to a set, creates the set if it does not exist:

127.0.0.1:6379[1]> SADD categories "uncategorized" "fun" "games"
(integer) 3

SMEMBERS: return a set’s members:

127.0.0.1:6379[1]> SMEMBERS categories
1) "uncategorized"
2) "fun"
3) "games"

SREM: remove an element from a set:

127.0.0.1:6379[1]> SREM categories "uncategorized"
(integer) 1
127.0.0.1:6379[1]> SMEMBERS categories
1) "fun"
2) "games"

SCARD: return the number of elements in a set:

127.0.0.1:6379[1]> SCARD categories
(integer) 2

hash commands

The hash type seems designed to store objects of field/value pairs. Similar to Ruby hashes?

  • HSET: add fields to a hash, creates it if it doesn’t exist
  • HGET: retrieve a specific field from a hash
  • HMGET: retrieve multiple fields from a hash
  • HGETALL: get all fields and values from a hash
127.0.0.1:6379[1]> HSET user:1 name "Bob Smith" email "bob@example.com" username "bobsmith"
(integer) 3
127.0.0.1:6379[1]> HGET user:1 name
"Bob Smith"
127.0.0.1:6379[1]> HMGET user:1 name email
1) "Bob Smith"
2) "bob@example.com"
127.0.0.1:6379[1]> HGETALL user:1
1) "name"
2) "Bob Smith"
3) "email"
4) "bob@example.com"
5) "username"
6) "bobsmith"

Similar to Ruby hashes, but the values returned from the HGETALL command seem to indicate a difference. That data looks awkward to work with. Possibly that’s not the case. I’ll test it on my Remix app to see how it works:

Redis Remix

Install node-redis with:

$ npm install node-redis

Then, (maybe unnecessarily), I’ll create a function that returns a single instance of the Redis client:

// app/services/redisClient.ts

import { type RedisClientType, createClient } from "redis";

let client: RedisClientType | null;

export const getRedisClient = async () => {
  if (!client) {
    client = createClient({ url: "redis://localhost:6379/1" });
    client.on("error", (err) => console.error("Redis Client Error", err));
    await client.connect();
  }
  return client;
};

And a route to test things out on:

// app/routes/hello-redis.tsx

import { json } from "@remix-run/node";
import { useLoaderData } from "@remix-run/react";

import { getRedisClient } from "~/services/redisClient.server";

export async function loader() {
  const client = await getRedisClient();
  await client.set("foo", "bar");
  const foo = await client.get("foo");

  return json({ foo });
}

export default function HelloRedis() {
  const { foo } = useLoaderData<typeof loader>();

  return (
    <div className="max-w-screen-md mx-auto">
      <p>The value of foo is: {foo}.</p>
    </div>
  );
}

That works as expected:

Testing the hash type:

// app/routes/hello-redis.tsx

import { json } from "@remix-run/node";
import { useLoaderData } from "@remix-run/react";

import { getRedisClient } from "~/services/redisClient.server";

export async function loader() {
  const client = await getRedisClient();
  await client.hSet("user:1", "name", "bob");
  await client.hSet("user:1", "email", "bob@example.com");
  const bob = await client.hGetAll("user:1");

  return json({ bob });
}

export default function HelloRedis() {
  const { bob } = useLoaderData<typeof loader>();

  return (
    <div className="max-w-screen-md mx-auto">
      <p>name: {bob.name}</p>
      <p>email: {bob.email}</p>
    </div>
  );
}

That’s pretty great:

How about setting field/value pairs from an object?

import { json } from "@remix-run/node";
import { useLoaderData } from "@remix-run/react";

import { getRedisClient } from "~/services/redisClient.server";

export async function loader() {
  const client = await getRedisClient();

  const user1 = {
    name: "sally",
    email: "sally@example.com",
  };
  await client.hSet("user:1", user1);
  const user = await client.hGetAll("user:1");

  return json({ user });
}

export default function HelloRedis() {
  const { user } = useLoaderData<typeof loader>();

  return (
    <div className="max-w-screen-md mx-auto">
      <p>name: {user.name}</p>
      <p>email: {user.email}</p>
    </div>
  );
}

Yup 🙂

I was surprised about being able to pass the user1 object to client.hSet. It turns out that flat objects can be passed for the field/value pairs, but nested objects can’t be passed. For example:

  const user1 = {
    name: "sally",
    email: "sally@example.com",
    phone: {
      home: "444-4444",
      work: "555-5555",
    },
  };

To get that to work the nested objects need to be serialized into a string:

  const user1 = {
    name: "sally",
    email: "sally@example.com",
    phone: JSON.stringify({
      home: "444-4444",
      work: "555-5555",
    }),
  };
  await client.hSet("user:1", user1);
  const user = await client.hGetAll("user:1");

  return json({ user });
}

export default function HelloRedis() {
  const { user } = useLoaderData<typeof loader>();
  const phoneNumbers = user.phone ? JSON.parse(user.phone) : null;
  const homePhone = phoneNumbers ? phoneNumbers.home : null;

My app needs to deal with a lot of nested objects. I think the best approach will be to serialize the entire object with JSON.stringify, then store the object as a string with the Redis set command. For example:

const data: Topic = await response.json();
const postStreamForUser: PostStreamForTopic = {
  id: data.id,
  slug: data.slug,
  postStream: {
    stream: data.post_stream.stream,
    posts: data.post_stream.posts
      .filter(isRegularReplyPost)
      .map((post: Post) => ({
        id: post.id,
        username: post.username,
        avatarUrl: generateAvatarUrl(post.avatar_template, baseUrl),
        createdAt: post.created_at,
        cooked: post.cooked,
        postNumber: post.post_number,
        updatedAt: post.updated_at,
        userId: post.user_id,
      })),
  },
  details: {
    canCreatePost: data.details.can_create_post,
    participants: data.details.participants.map(
      (participant: Participant) => ({
        id: participant.id,
        username: participant.username,
        postCount: participant.post_count,
        avatarUrl: generateAvatarUrl(participant.avatar_template, baseUrl),
      })
    ),
  },
};

await client.set("postStream:scossar", JSON.stringify(postStreamForUser));

Caching data with Redis

It seems obvious now that my app is going to need this. The basic pattern for working with cached data is to try to retrieve it from its key, if the key hasn’t been set, or has expired, perform the operation to get fresh data and set store the key’s value:

import { json } from "@remix-run/node";
import { useLoaderData } from "@remix-run/react";

import { getRedisClient } from "~/services/redisClient.server";

export async function loader() {
  const client = await getRedisClient();
  const cacheKey = "postStream:anon";

  try {
    let stringifiedPostStream = await client.get(cacheKey);
    if (stringifiedPostStream) {
      console.log("data was returned from the cache!");
      const postStream = JSON.parse(stringifiedPostStream);
      return json({ postStream });
    }
  } catch (error) {
    console.error("error retrieving or parsing cached data", error);
    // maybe do something, but don't throw a response
  }

  // this would be API request, only called if the cached had expired
  const postStreamForUser = {
    topicId: 1,
    title: "Test Topic Title",
    postStream: {
      posts: [{ id: 1, cooked: "<p>this is a test</p>" }],
      stream: [1],
    },
  };

  try {
    // set a 60 second expiration time to confirm it works
    await client.set(cacheKey, JSON.stringify(postStreamForUser), { EX: 60 });
  } catch (error) {
    console.error("error saving the data to cache", error);
    // maybe do something, but don't throw a response
  }

  return json({ postStream: postStreamForUser });
}

export default function HelloRedis() {
  const { postStream } = useLoaderData<typeof loader>();
  return (
    <div className="max-w-screen-md mx-auto">
      <h1>{postStream.title}</h1>
    </div>
  );
}

Storing the stream array

Back to the problem that kicked this all off.

Here’s an example of the stream data that could be returned from an API request to a /t/{topicSlug}/{topicId}.json route:

const stream = [
  696270, 1470085, 1470113, 1470120, 1470131, 1470134, 1470136, 1470137,
  1470139, 1470140, 1470141, 1470142, 1470144, 1470146, 1470150, 1470162,
  1470217, 1470270, 1470344, 1473868, 1473989, 1474236, 1477925, 1483682,
  1483701, 1483703, 1483715, 1483912, 1484084, 1484206, 1484612, 1491133,
  1491142, 1491148, 1491263, 1493394, 1493566,
];

And here’s a reminder of why try...catch blocks are useful. After pasting the above code into the Remix app’s hello-remix.tsx loader function, then calling await client.lPush("topicStream", stream); the following was output to the dev server’s terminal:

TypeError: Invalid argument type
    at encodeCommand (/home/scossar/remix/discourse/discourse_remix_comments/node_modules/@redis/client/dist/lib/client/RESP2/encoder.js:17:19)
    at RedisCommandsQueue.getCommandToSend (/home/scossar/remix/discourse/discourse_remix_comments/node_modules/@redis/client/dist/lib/client/commands-queue.js:138:45)

So it’s not possible to pass an array of numbers as the value to client.lPush.

This doesn’t solve the Invalid argument type issue, but it does prevent the app from crashing:

const stream = [
  696270, 1470085, 1470113, 1470120, 1470131, 1470134, 1470136, 1470137,
  1470139, 1470140, 1470141, 1470142, 1470144, 1470146, 1470150, 1470162,
  1470217, 1470270, 1470344, 1473868, 1473989, 1474236, 1477925, 1483682,
  1483701, 1483703, 1483715, 1483912, 1484084, 1484206, 1484612, 1491133,
  1491142, 1491148, 1491263, 1493394, 1493566,
];

try {
  await client.lPush("topicStream", stream);
} catch (error) {
  console.error("something has gone wrong");
}

Now try mapping the array’s numbers to strings:

const stream = [
  696270, 1470085, 1470113, 1470120, 1470131, 1470134, 1470136, 1470137,
  1470139, 1470140, 1470141, 1470142, 1470144, 1470146, 1470150, 1470162,
  1470217, 1470270, 1470344, 1473868, 1473989, 1474236, 1477925, 1483682,
  1483701, 1483703, 1483715, 1483912, 1484084, 1484206, 1484612, 1491133,
  1491142, 1491148, 1491263, 1493394, 1493566,
];
const stringifiedStream = stream.map(String);

try {
  await client.lPush("topicStream", stringifiedStream);
  let topicStream = await client.lRange("topicStream", 0, -1);
  console.log(`topicStream: ${topicStream}`);
} catch (error) {
  console.error("something has gone wrong");
}

Great! Except the values that are returned from client.lRange(key, start, end)are the opposite to what I was expecting:

topicStream: 1493566,1493394,...1470085,696270

When multiple values are added to a list with LPUSH, the last value that’s provided is added first, the first value that’s provided is added last:

127.0.0.1:6379[1]> LPUSH orderTest 1 2 3 4 5
(integer) 5
127.0.0.1:6379[1]> LRANGE orderTest 0 -1
1) "5"
2) "4"
3) "3"
4) "2"
5) "1"

That makes sense. To add multiple values and preserve their order, RPUSH needs to be used instead of LPUSH:

127.0.0.1:6379[1]> DEL orderTest
(integer) 1
127.0.0.1:6379[1]> RPUSH orderTest 1 2 3 4 5
(integer) 5
127.0.0.1:6379[1]> LRANGE orderTest 0 -1
1) "1"
2) "2"
3) "3"
4) "4"
5) "5"

On the Remix app:

client.del("topicStream");
const stream = [
  696270, 1470085, 1470113, 1470120, 1470131, 1470134, 1470136, 1470137,
  1470139, 1470140, 1470141, 1470142, 1470144, 1470146, 1470150, 1470162,
  1470217, 1470270, 1470344, 1473868, 1473989, 1474236, 1477925, 1483682,
  1483701, 1483703, 1483715, 1483912, 1484084, 1484206, 1484612, 1491133,
  1491142, 1491148, 1491263, 1493394, 1493566,
];
const stringifiedStream = stream.map(String);

try {
  await client.rPush("topicStream", stringifiedStream);
  let topicStream = await client.lRange("topicStream", 0, -1);
  console.log(`topicStream: ${topicStream}`);
} catch (error) {
  console.error("something has gone wrong");
}

That works:

696270,1470085,1470113,...1491263,1493394,1493566

Instead of pulling in the full list of postIds, maybe LRANGE can be used to just get the next batch?

let page = 1,
chunkSize = 8;
let start = page * chunkSize;
let end = start + chunkSize - 1;
let topicStream = await client.lRange("topicStream", start, end);
console.log(`topicStream: ${topicStream}`);

Awesome! Redis also doesn’t return an out of bounds error if I try to access a range that’s beyond the lists size.

Keeping the stream in sync with Discourse

Discourse has a “post_deleted” webhook that can be configured to fire when a post is deleted. It might be useful for keeping a stream list in sync with Discourse. (To remind myself, the arguments to LREM are key, count, element. element is the element’s value, not its index. Setting count to 0 will remove all elements with the value from the list. That’s great for the stream array, as the postIds are unique):

await client.rPush("topicStream", stringifiedStream);
await client.lRem("topicStream", 0, String(1470139));
let page = 1,
  chunkSize = 8;
let start = page * chunkSize;
let end = start + chunkSize - 1;
let topicStream = await client.lRange("topicStream", start, end);
console.log(`topicStream: ${topicStream}`);

Terminal output:

topicStream: 1470140,1470141,1470142,1470144,
1470146,1470150,1470162,1470217

This is great! I was assuming I’d have to convert the stream array to a single string, then set it on Redis with something like:

127.0.0.1:6379[1]> SET stream "1, 2, 3, 4, 5"
OK
127.0.0.1:6379[1]> GET stream
"1, 2, 3, 4, 5"

That’s enough Redis for now.

I’m eager to get back to working on the Discourse Comments app, but first, it’s a sunny day out there: