Improving Valkey with Madelyn Olson (2026)

Thomas Betts: Hello, and welcome to the InfoQ Podcast. I'm Thomas Betts. Today I'm speaking with Madelyn Olson. Madelyn is the maintainer of the Valkey project and a principal software development engineer at Amazon ElastiCache and Amazon Memory DB, focusing on building secure and highly reliable features for the Valkey engine.

She recently gave a deep dive technical presentation at QCon San Francisco about recent changes to the Valkey hash table and the associated performance improvements. I found it fascinating. So Madelyn, welcome to the InfoQ Podcast.

Madelyn Olson: Thanks so much for having me and I'm glad you enjoyed the talk. The QCon conference was one of the best ones I've been to in a while. The audience was really great. They asked really informative questions.

The Valkey Origin Story [01:19]

Thomas Betts: Yes. QCon is definitely my favorite conference. So I think we need to start off with the origin story of Valkey. Maybe people haven't heard the name before. Where did Valkey come from?

Madelyn Olson: And so I'll start by giving a little bit of history before the actual creation of Valkey. So I actually was a maintainer of the open source Redis project since about 2020. So me and some of the other major contributors of Redis had kind of built a pretty exciting development community. And so when Redis decided to change their license, so back in 2024 in March, Redis moved from an open source permissive BSD license to a commercial license, SSPL and variant called RSAL. That community sort of got together and said, "Hey, we want to continue building what we've been working on". So me and another Redis maintainer, his name was Xiao, he works at Alibaba, went in. We got four other engineers from the Redis community. So we got an engineer from Ericsson, Tencent, Huawei and Google. And so that group of folks, we got together, we went to the Linux Foundation and we were able to create Valkey.

So it was a very fast creation. From the time the license change happened to when Valkey was created was only about eight days. And that's because we had a knit community together. So we went, created the project. And so that was about 18 months ago. So since then, Valkey's been doing a lot of stuff. We had a bunch of ongoing engineering work that we just sort of continued. So we did launch a Valkey that was just like a fork of Redis. So that was version 7.2. The first real release was 8.0. That was last year and that was sort of like a statement like, "Hey, we can build stuff". And since then we've had two more major releases. We had Valkey 8.1 earlier this year, although not really related to the fact I just did a talk at QCon and we had another release that had Valkey 9.0.

So that released in November. So we've had a bunch of releases under our belt now. There's a lot of managed providers of Valkey now. Folks like Amazon ElastiCache, the service I work on, now supports Valkey. Memorystore, one of GCP's offerings, has a support for Valkey. There's also a lot of other third party providers like Aiven and Percona that also have a managed Valkey offerings. So we're seeing a lot of excitement kind of around the community and it's going really well.

Getting Started with Valkey [03:40]

Thomas Betts: Yes. I think you segued into my next question, which is how does someone get started with this? Is it a Valkey-as-a-service offering? It sounds like that is. If you have Redis and you want to switch, is that something you can do? Is there anything developers need to know? There's a lot of getting started, but moving kind of questions tied into that from both the infrastructure and the engineering side. So start wherever you want.

Madelyn Olson: We like to say Valkey is a drop-in replacement to Redis open source 7.2. So that was the last BSD version of Redis. So if you're on that version, you can always safely upgrade to any Valkey version. Valkey is fully backwards compatible in that sense. There's some newer versions of Redis. I mentioned Redis went to a proprietary license. They've actually since moved back to AGPL. So if you're on either of the proprietary or AGPL versions of Redis, there might be some incompatibility you have to look for. We see that most users on old versions of Redis are able to move kind of safely to Valkey. And so what does that look like? Valkey and Redis are typically used as a cache. So typically you can just delete the cache, move it to Valkey and it should work just fine. But one of the reasons people really like Valkey is that it has high availability options.

So what you can do is basically attach replicas to your existing cluster and sync all the data and then you can do failovers. So this is what we call like an online upgrade process. If you're using a managed service like ElastiCache or Memorystore or Aiven, they'll make that all seamless for you. You usually just click a button. I can speak a little bit for ElastiCache since I work on the service. It really is just like you click on a button on the console and say, "Hey, I want to move to Valkey", and it does it all. It's all online. You don't really have any outage and it's quite seamless. So the upgrade process is pretty straightforward. From a tooling and client side application perspective, most clients that work with Redis will also work for Valkey. The big ones are like redis-py, the Spring Data Redis provider, all that type of stuff works just as well with Valkey as it does with Redis.

What we've heard from a lot of users, one thing I think I mentioned in the talk as well is a lot of people move to Valkey and it's so seamless and easy. We've been trying to get people to write blog posts being like, "Hey, this is what we migrated and this is what we learned". And they're like, "We migrated and we learned nothing. We just clicked a button". So we're still working on getting user stories.

Thomas Betts: Yes, you're a victim of your own performance. You made it too easy.

Madelyn Olson: Right. Too easy. And part of that was because the fundamentals were really good, right? There was already an online upgrade story and we are a fork. So it's not like we're trying to build compatibility. We start with compatibility for free and we just have to maintain it. So yes, the comparability story's been pretty good.

Valkey is a hash map over TCP [06:26]

Thomas Betts: I do want to get into what was the focus of your QCon presentation. My naive understanding is Valkey's just a cache. It's a key value store. I stick stuff in, I pull stuff out. And that's kind of, you said, you're like, it's a hash map over TCP. Can you say a little bit more about that? And like what is it underneath the covers? What's the magic of Valkey?

Madelyn Olson: Yes. And I think you synthesized the simple version. The version that most people know about Valkey is that it's just a hash map. It's a key value store and people know that the values can be not strings, right? So a traditional key value store is something like Memcached. You put the key is a string and the value is a string. But the real power of Valkey is that it has complex data types or values. So some use cases we like to talk about, you can store like a set and be like, "Oh, has this user logged in recently? Should we show an advertisement to this user?" You can store all that in the set objects and then do very quick checks on that data type. So I think that's the first thing that really differentiates Valkey from just a simple hash map. And then the other thing is like to build, like the hash map is straightforward.

It's all the stuff around it that's really complicated. The stuff like horizontal clustering, stuff like replication, stuff like durability, stuff like observability and statistics. That's where most of the work goes into Value, right? The actual core hash map and the data is actually straightforward, but it's all this other stuff around it that we have to maintain and build that is where we spend most of our engineering time.

Improving the performance without breaking anything [07:59]

Thomas Betts: That's what makes it a product and not just a hash map. Not that I could implement a hash map myself. There's no reason for me to do that, but the reason of having it as a service on some scalable appliance or scalable infrastructure is like all you said. And once you go to having all those things, that adds layers of complexity. And if I recall, your presentation was talking about basically changing everything under the covers, but keeping all of the horizontal clustering, the durability, everything else you mentioned, all that had to stay working while you tweaked around inside, right?

Madelyn Olson: Right. And that was really sort of the gist of the talk. So the talk was in like a modern performance track at QCon. And so we had recently rebuilt this hash table and that's exciting in itself, but what's really impressive is we didn't have any performance regressions. And so I guess I can just kind of start doing the story now. So back in 2022, we were thinking about, this is actually back in the Redis days, this is pre-fork. Me and some of the other contributors were like, "Hey, we built a hash table in 2009". And although some of this stuff was known in 2009, people were still kind of big into simplicity, they didn't want to over engineer. And so we built a hash table that was pretty good at the time and kind of just looking back at it, we're like, "Hey, we can make stuff a lot better". And the big things we realized were, we were doing a lot of independent memory allocations.

So when we want to basically take an object and put it inside Valkey, we were building container objects, we were using link lists to basically keep track of when objects hash to the same bucket inside the hash table and we had relatively high load factor. So load factor is basically the ratio of how many places we can put stuff in a hash table versus how filled they actually are. And that's because we were using kind of older techniques and we weren't taking great advantage of modern hardware. So one of the big things that's happened in the last 10 or so years is hardware hasn't gone that much faster, but it's gotten better able to operate on multiple pieces of data at the same time.

So the hash table we built really wasn't that aware of that functionality. So the things we were trying to do by going over this hash table, and it took us over... Let's see, I think we started working on this hash table monetization in 2023, and it took us until basically the end of last year to sort of like finish it all to make sure it all worked. And there's like a lot of problems we had to solve. And there's even problems I didn't talk about in the talk, because the talk was a little bit, it had to be simplified to talk about one specific area. And for example, one of the problems we had to solve was in the clustering mode. So Valkey, when Valkey is horizontally distributed, you have multiple different servers. Each key is hashed to a specific slot, that's deterministic. So key foo will always be like slot 12,000.

And so how these slots are distributed across the nodes is dynamic. So that's how you scale out is you move slots to different nodes and you'll move those keys along with it. So one of the things we have to do, we have to basically know which keys are in which slot on a node. So we need like an O of one-ish way to basically, not just, we need a way to iterate over the keys in a given slot when we want to migrate them. In the original version, like when we forked Valkey off, the data structure we were using basically looked like it was a giant linked list of all the keys. So we basically maintained this linked list, which has 16 byte pointers, right? Basically a pointer to the next pointer and like the previous one, right? Linked list. And so that was really expensive, right? 16 bytes doesn't sound like a lot, but when your average data set size is only like 100 bytes, that's a lot of overhead. And one of the things we did is we basically, instead of having these linked lists, is we decomposed these giant hatch tables, which comprised all the data into basically a dictionary per slot. And now that sounds conceptually straightforward. Instead of you first compute the slot, the key is in, and then you go look for it in the specific per slot dictionary. But there's certain things that Valkey does to operate the whole data set, stuff like expiration and eviction. And the way those are working is they were sampling items from every dictionary to sort of just determine which one to kick out. And all of a sudden we needed to do this across upwards of thousands of these per slot dictionaries. And so we actually had to spend a bunch of time and research.

We prototyped a bunch and we consolidate on this data structure called a binary index tree, which basically lets us sample randomly across all of these personal dictionaries proportional to how much data is in them. So a binary index tree is basically a... I mean, it's a binary tree and it's cumulative to the number of items in like the leaf nodes. So like we're able to basically pick a random number and that'll tell us which slot dictionary we should be sampling to get the specific item out. That's a problem we had to solve and we solved it for Valkey eight and that was like sort of our first big jump. And then the next big jump was we basically started to, instead of having dedicated allocations for the key and the Valkey objects themselves, we started kind of compacting all this memory stuff together. In my talk, I talked about it's moving from static structure, right? So a bunch of small fixed size structures to basically dynamically allocating bigger blocks and memory, which is more aligned with how high performance caches kind of are built nowadays. We took a lot of inspiration from stuff like Segcache, which is based on top of Pelikan, which is a caching framework. So I know I just talked about a lot about a lot of different low level details. And so this is all the stuff that makes me really excited about everything that's going on in Valkey.

What does performance mean for Valkey [13:58]

Thomas Betts: Well, I just love talking to someone who's this passionate about like the specific thing they do and like, you couldn't do this if you weren't that excited about it. So you said the first jump was that binary index tree and then you started to make other changes. And I think if I go back a couple minutes in the recording, it said there were no performance regressions. How are you measuring performance as you're making these changes? What does performance mean?

Madelyn Olson: It's funny, when I was talking at some folks at QCon, this is like one of the soapbox that I like to get on top of, which is a lot of people when they think about performance, they usually think about latency, like how long does it take to get a response back from when you send a request? But the problem with Valkey is Valkey is so fast that the actual time to get a response back is almost entirely dominated by network. So most simple commands of Valkey take about one microsecond. So if you're doing any network hops, which is the intended mode of Valkey, you're going to see at least hundreds of microseconds if you're doing like an internal AZ hop and like up to like a millisecond if we do cross AZs. When we talk about performance, we're almost always talking actually about throughput because once you sort of hit the limits of throughput, you start seeing these huge latency spikes because of contention within the engine itself.

So when I talk about performance, I'm always talking about throughput. So to your actual question, how do we measure throughput? So throughput's pretty easy to measure, right? You just send a lot of traffic to the engine and see how much it can actually process at a given time, which is quite... I mean, it's not trivial, but it's relatively straightforward. We have built-in tooling inside Valkey to do basically load testing, sending lots of traffic, the tool called Valkey Benchmark. We're currently evaluating some other approaches. So that's like our end goal. We're like, "Hey, this is the end thing we have to compare against". But we also do a lot of what we call micro benchmarking, which is basically, oh, it's nice. When you have a bunch of C code, you can just basically go put it on a machine and run it 10,000 times and see how long it takes. So when we were rebuilding this hash table in a bunch of different ways, like every step along the way, we were doing this micro benchmarking to see how long did it take before, how long does it take now? And that was kind of the best way to sort of guide our performance journey, to make sure we weren't regressing in weird cases.

And then the other thing that we don't do a lot, but we should do a lot more, is in our world, a lot of the performance like that, we spend a lot of time basically waiting for main memory access, right? So in the same way that if you have to fetch something from disk, it takes a long time. In our world, fetching something from main memory takes a long time. So we spend a lot of time looking at like CPU counters, which basically tell us how much time are we waiting on. And the terminology is like backend stalls, right? How long are we waiting for memory to be available to be processed? So we also spend a lot of time looking at those counters to see, "Hey, are we actually doing a good job pre-fetching memory?" So that's another big thing we care a lot about is before we actually want to execute a command, we want to make sure all that memory is pulled in from main memory into the CPU caches so that it can execute very quickly. So we also compare how much time are we spending on those different areas, like executing commands versus stalling for memory. And then also like perf and flame graphs and stuff. So perf is a way to basically

Improving Valkey with Madelyn Olson (2026)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Merrill Bechtelar CPA

Last Updated:

Views: 5816

Rating: 5 / 5 (50 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Merrill Bechtelar CPA

Birthday: 1996-05-19

Address: Apt. 114 873 White Lodge, Libbyfurt, CA 93006

Phone: +5983010455207

Job: Legacy Representative

Hobby: Blacksmithing, Urban exploration, Sudoku, Slacklining, Creative writing, Community, Letterboxing

Introduction: My name is Merrill Bechtelar CPA, I am a clean, agreeable, glorious, magnificent, witty, enchanting, comfortable person who loves writing and wants to share my knowledge and understanding with you.