Using Redis as a primary database

When we think of a database, we think about rows, columns, tables, and the relationships between different tables. And we are correct about this. Most applications we use/develop use some sort of relational database, such as PostgreSQL, MySQL, and MariaDB. Then, we have applications that use NoSQL databases, like MongoDB.

Redis is popularly known for being used as a cache store alongside Relational Database Management System (RDBMS). As it is an in-memory data store, the read/write speed is higher than traditional RDBMS, which stores data on the disk. However, there are downsides to this database as well. For instance, there are memory limitations as all the data is stored in memory (or RAM), and a large amount of RAM can significantly increase the cost of the server on the cloud.

Using Redis as a Cache

Redis can be viewed as an intermediate data store or cache that temporarily stores frequently requested data from the database so that database operations are reduced. When the data is not present in the cache, it is retrieved from the persistent database and cached to Redis for further retrieval.

Redis as a Primary database

When we think of using Redis as a primary database, we need to figure out how to store all of our application’s data on Redis, i.e., user data, login credentials, relationships, indexes, etc. Redis has grown from simply storing key-value pairs to storing multiple data structures like Sets, Bitmaps, Hyperloglogs, and others.

Database design of Shortomega

As key-value stores like Redis don’t have rows, columns, tables, or relationships, I need to devise a solution.

Let’s start with a simple data — user’s email and password:

Each user has a unique email and corresponding password to log in. We typically store the hashed value of the password using a hashing algorithm like SHA-1 or SHA-256 to protect the plain password from attackers.

We can store email ID with the corresponding user ID by adding the prefix “email” to the key:

email:<userid> -> <email address>

And the hashed password using:

password:<password> -> password

However, there are some issues with this approach. Whenever we need to add new attributes to the same user, we must add a new hash with a new prefix every time. This will significantly increase the memory space needed to store each user's attributes. Moreover, we usually need to set and retrieve email and password together and for that, we have to perform 2 database operations.

Redis has a built-in structure called hashes which is a collection of field-value pairs. We use the HSET command to add fields to the hash. HSET syntax looks something like this:

HSET key field value [field value ...]

Now we have only one hash for each user data and can store data in this way:

HSET user:<userid> email mail@test.com password <hashed-password> [...other fields and value]

We can easily retrieve all fields using HGETALL command:

Now that we have successfully stored user data efficiently, I will explain how I am building a URL shortener application — Shortomega using Redis as the primary database. I am storing the user login credentials similarly as explained before.

To store short URLs and their corresponding long URL, I am simply storing them in key-value pairs to query them in O(1) time.

short:<short-url> -> <long-url>

In order to associate a short URL to a user, I created a set of short URLs for each user as shown below so that I can retrieve all the created short URLs in a single Redis query.

user:<userid>:urls -> [set]

Challenges

Using sets to store short URLs created by a particular user, I need to perform additional N queries to retrieve corresponding long URLs. This is similar to the N+1 query problem in databases. To address this, I needed to choose between these 2 options:

Store the long URL along with the short URL for a particular user. This will result in redundancy. Managing and updating the URLs at the two places would be cumbersome.
Another option is to write a custom Lua script that runs on the Redis instance and returns the pairs of short and corresponding long URLs. This solves the N+1 query problem and gives the result in a single query. I chose to use this method. The script looks something like this:

local urls = redis.call('SMEMBERS', KEYS[1])
local result = {}
for i, url in ipairs(urls) do
    local longUrl = redis.call('GET', 'short:'..url)
    if longUrl then
        table.insert(result, url)
        table.insert(result, longUrl)
    end
end
return result

In the above Lua script, I am fetching all the short URLs based on the user ID passed from the NestJS backend. By looping over each URL, I get the corresponding long URL that is simply stored in a key-value pair. Lastly, I insert the pair in a table which I return to the back-end for further processing.

To benchmark the performance of retrieving the data in a single database query, I ran a Python script where I first stored and retrieved the data using N+1 queries and I executed Lua script on the Redis instance to retrieve the data in one single query. I plotted the graph as shown below:

In the above plot diagram, we can see the execution time difference between N+1 queries and the Lua script. A user with around 5000 URLs can fetch all the URLs in under 50 ms instead of 300 ms.

Analytics

Analytics include counting of total and unique visitors for each short URLs. This helps the user to track the engagements of created short links across different regions.

The count of total visits can easily be stored in a key-value pair. The value is incremented each time someone clicks on the link. But what about unique visitors? I can store unique IP addresses of visitors in a set and then count the elements in the set to get the unique visitors. However, it will take O(N*M) space in the database to store all the N IP addresses of M links, consuming the memory of the server. As mentioned before, increasing the RAM of a server instance on the cloud is more expensive than adding more disk space. It turns out that there is one special built-in data structure in Redis that can solve this problem. I am talking about HyperLogLog.

HyperLogLog is a probabilistic data structure that estimates the cardinality of a set. Basically, it estimates the number of unique elements in a set without the need for actual storage of elements in a set. It uses up to 12 KB of memory irrespective of count of elements in the set and provides a standard error of 0.81%.

We can add and count elements using following syntax:

PFADD key [element [element ...]]
PFCOUNT key [key ...]

I can now add the IP addresses to the HyperLogLog and count the unique IP addresses.

PFADD short:<hash>:ips [IP addresses ...]
PFCOUNT short:<hash>:ips

Lastly

There are endless possibilities of how we can use Redis as a cache and even as a primary database. The fast operations and in-memory operations of Redis make it an ideal choice for high-performance and real-time applications.

The URL shortener application — Shortomega is work in progress and I will add further development and details here or in another blog.

Enjoy Building!