Building A Log-Structured Merge-Tree: Part 3

December 13, 2020

Last time we looked at flushing the memtable to disk and by producting a series of Static Sorted Table (SST) files that held all of the memtable’s key value pairs. I think that maybe that blog came out of order, but we’re going to roll with it. We’ll just have to do some hand-waving this time around to progress through this blog. We’ll introduce an SSTManager to help implement the public interface of the LSMT, and we’ll fill it in next time. For now, we can understand the SSTManager to be abstract and not dwell on the implementation.

Internal State

The lsmt struct we will store the internal state for the running database. We’ll keep the user provided Options, the active memtable, any memtables waiting to flushed to disk (inactive memtables), the SSTManager which will manage our on disk SST files, a lock for flushing to disk (we’ll build out the plumbing for flushing to disk in a later blog), and a closed flag.

type lsmt struct {
  options           config.Options
  activeMemtable    *mt.Memtable
  inactiveMemtables []*mt.Memtable
  sstManager        sst.SSTManager
  flushLock         common.Semaphore
  closed            bool
}

We’ll go over how these are used in the public interface in this blog and get into more detail in subsequent blogs. For now know that we’re going to hand the user an instance of the lsmt struct when they open the database.

Public Interface

The public interface for the LSMT is going to be pretty small and straightforward. Over time we’ll piece together the components behind the interface, but from here on out this will be a stable interface. We’ll need to keep this interface in mind and remember that everything under the hood is in service of these five functions.

func (db *lsmt) Get(key []byte) ([]byte, error)
func (db *lsmt) Write(key, value []byte) error
func (db *lsmt) Delete(key []byte) error
func (db *lsmt) Iterator(start, end []byte) (common.Iterator, error)
func (db *lsmt) Close() error

Write and Delete

From previous posts we’ll recall that that only the memtable accepts writes. Since Write and Delete are the only non-read functions in the public interface it might be easiers to start here since the scope of the implementation is then limited to effecting the memtable.

The implementation for Write is here. We do some checks up front, namely, we ensure that the database isn’t closed, that the key and value aren’t nil or of size 0, and that the key and value conform to user specified options. If the write passes those checks then we simply invoke the Write method on the active memtable.

Delete is very similar, except it only takes a key instead of the key and value pair. The implementation for Delete is here. We’ll notice that Delete actually performs a Write to the active memtable and that it writes a Tombstone value. This means that in our functions that perform reads we’ll need to treat Tombstone values as deleted values. Since the LSMT works in levels we use Tombstone to track deletes through the levels. The special tombstone value can only be written during Deletes.

Get

The Get implementation is going to move through the available layers until it can find a value for the key being searched. There’s some special case logic in here for the Tombstone value. If, at any point, a Get to an underlying Memtable or the SSTManager returns a Tombstone value then we’ll return nil to the user to signify that the key has no value.

Iterator

At a high level, the Memtable and the SSTManager both provide Iterators to their own data. However, users of the LSMT don’t necessary want or need to know about the implementation details here, to the user there is only one Iterator type and it’s for the entire LSMT. Since this is the public interface’s Iterator implementation we’ll have to hide all of the implementation details.

In order to provide one single iterator then we’ll need an abstraction that allows many Iterators to compose into one Iterator instance. In service of this we’ll introduce the MergedIterator. The MergedIteraor will compose iterators, but has a few properties that are desirable for our use case. First, it will only produce one value per key. That means if that two or more of the underlying Iterators have the same key, only one key/value pair for that key will be returned.

Second, is that it resolves the multiple key problem by looking at Iterators in the order they are passed in. In our case, that means that we’ll order the Memtable before the SSTManager such that if both of their Iterators have the same key we’ll take the Memtable’s value for that key since it’s the most up to date.

Later on we’ll see that we might need to even compose MergedIterators into MergedIterators and we’ll see how powerful of an abstraction this can be. But, for now we’ll use a MergedIterator in our public interface’s Iterator implementation.

One thing to note in this implementation is that we have the inactiveMemtables which we’ll order behind the activeMemtable but in front of the SSTManager.

Close

Close will cause the database to stop accepting writes and do the background work to stop the LSMT. For now we’ll just set the closed struct value to true to signal that the database has been closed.

Next Up

Next time we’ll start to look at an implementation of the SSTManager for block based SST storage.