12+ months with big redis
Congratulations! You’ve inherited a big redis. Don’t feed it after dark & don’t get it wet.
I’ve been godparent to my team’s big redis for over a year now and here are some observations. (PS I love redis, I read antirez’s blog sometimes, it’s one of my fave OSS products. If there were something better for this use case I’d switch – I haven’t come across it yet).
- As a database
- As a distributed system
- Writing caching logic
- ORM stuff
- Monitoring
- Managed vs unmanaged
- Conclusions
As a database
It’s really hard to know what’s in there. The data structures (list, set, hash, wacky zsets) are nice but not nice enough. Often I need to key on something like (userid, itemid)
so that I can individually manipulate items but still enumerate everything belonging to a user. Every KV store is bad at this.
You could list the keys that are like 'item-{userid}-*'
, but the keys
command is O(N) on all your keys. scan
doesn’t always return all elements. Some KV stores are a little better at prefix searching but this is always a pain.
In general, the lack of nesting leads to pollution. It’s the same with TTLs – inability to do TTLs per hash item means that you have to write your own cleanup. And cleanup often consists of enumerating all keys and writing a regex. Good luck knowing which of your logical types is taking up how much storage / IO
Like every DB, a join operationn across separate physical stores is impossible. This can be invisible in unit tests if you’re not careful, so be careful.
As a distributed system
The client library is potentially to blame here (DNS stuff), but I’ve seen my share of messy failovers. A messed-up failover on a managed redis cluster led to a case where each master had 100% CPU until it melted and the next one took over. (To be fair, I have no idea what actually happened and continuing load from my clients may have been to blame).
On some stock linuxes, new replicas can’t resync from masters if the system has less than 50% free RAM. Redis warns about this on startup in its logging, pay attention. That means when I lose a box, it sometimes comes back up empty. I’ve had clients still think it’s in the cluster, leading to bad behavior for a percentage of my userbase. Once again, may be buggy client code rather than a flaw in the clustering protocol, but be warned.
The clustering logic has gone through multiple generations and so when people talk about sharding redis, they may be talking about various redis features (there’s sentinel, cluster, application-level sharding).
Whatever you do, drill your rebuilds & failovers on the reg to cut down on surprise & mystery.
Writing caching logic
When you cache data you have to write everything twice. Think this makes things twice as hard? It’s more than twice. Various conditions (including my buggy app code) cause redis & the persistent DB to diverge. Point operations are roughly comparable between sql & redis, but bulk operations can be pretty different
If you’re read-limited, you need tools to predict % cache hits that you can (1) expect and (2) handle. If you don’t have these tools you’ll either have to cache everything (good luck to you – this gets tricky in corner cases like a fire drill), and you’ll periodically fill it up or vastly overprovision it (or in my case, both).
At minimum measure your cache hit rate please, and alert on it. Understand the different cost of cache hit vs read-through. If you don’t do these things, on the day your cache hit rate changes and your system melts, you won’t be able to understand what’s happening.
If you’re caching to deal w/ write load, you need a way to delay the writes, i.e. caching requires queueing. (Now you have three problems).
It’s critical to have high-level assumptions about your architectural components, how they should behave, and what they should be used for. This lets you make claims about why a workload is overwhelming a component, verify that in stats, and make a plan to fix it (i.e. scale infra, rewrite app, or rearchitect = add different components).
I’ve seen people get this right and solve big problems quickly, or get it wrong and hand-maintain a melting cluster for months (hint I was in the latter category).
ORM stuff
Without going into too much detail, there are various norms we settled on for how to parameterize redis keys and how to organize stuff across separate physical redises. By ‘separate’ I mean shards in a cluster as well as entirely separate clusters / instances. And eventually we wrapped these norms in a simple (~150 line) ORM-like creation.
As with all ORMs, individual devs went through an adoption cycle:
- not liking it (denial)
- trying to use it and shooting themselves in the foot (anger)
- liking it a lot and attempting to use it to solve all problems (bargaining)
- hating it again as they discover use cases that are made harder (depression)
- understanding the narrow problem it solves and using it where it works (acceptance)
Monitoring
Things I wish I had done at the beginning to observe the health of the cluster. (I won’t say how long it took to add these, it’s embarrassing).
- make sure that your load is balanced across machines – if not your sharding scheme may be wrong
- capture logs and escalate the warning level – it doesn’t log much and when it does, it’s important (usually a replication issue)
- alert on cpu and scale your cluster so you stay under some number. I like 50%
- don’t let RAM fill up
Managed vs unmanaged
I’ve used managed redis on two cloud vendored (both clustered and unclustered) and various methods of self-hosting. The managed hosts are not that bad if your workload doesn’t break the bank
Questions to ask of your managed redis: does it support clustering, how exactly does failover work and will your client library behave, how much capacity will you need to handle your workload (probably need to loadtest – they won’t be able to quote you ops/sec)
Conclusions
Operationally, SQL is creaking at the seams. Most KV stores shipped with clustering built in to a relatively early version. SQL has okay replication but not okay sharding. Citus is working on this in postgres-land but I haven’t tried it yet. Cockroachdb is also in this space. The vendors have products addressing this problem (redshift, aurora, bigquery) but meh, vendor lock-in + extra costs + incomplete SQL compatibility.
The ability to control what’s cached in memory on a SQL box would be valuable. Postgres 9 has a prewarm command and I think they’re working on LRU caching, but I probably want to control this in my application logic or with a query instead of something automatic.
A lot of sql’s creakiness comes from joins. The ability to store rows of different types next to each other would be valuable, and would get rid of the locality case for denormalizing data. If I could guarantee that all rows across all tables for a certain user were on the same few pages of storage, SQL would be real nice. The ability to query multiple tables without getting an inner product number of rows back would be great.
Most of my redis complaints are around the organization of data (or lack thereof).
Performance design for DBs is an area of active development and I’m excited to see what’s next. Mongo made some important points and the SQL world listened. I hope the next round of experimental DBs (datomic being the one I most want to try) also stretch the industry-standard feature set.