Vibe Coding In Anger: Part 6

June 17, 2025

Last time on our journey to build a vibe coded time series database we completed the Query Parser, including the lexer and the ability to generate an AST for the query language. This time we’ll be working on the Execution Framework which should wrap up Phase 4: Query Foundation from the TODO. The AI will be generating the query planner and the framework for executing the queries. It should be fun! As always, the code is available on github.

Jumping right in I decide to give it a little warning about jumping ahead since it has done that on a few occasions.

Please start on the ‘Query planner’ TODO items in the ‘Execution Framework’ section. Please don’t move on to any other tasks until ‘Query planner’ has been completed.

It generated two new pieces of code, a IndexInfo in the storage module and a QueryPlanner in the query module. Let’s start by looking at the IndexInfo since it’s going to be used by the QueryPlanner.

pub struct IndexInfo {
    pub name: String,
    pub time_range: TimeRange,
    pub tag_keys: Vec<String>,
    pub estimated_rows: usize,
}

There’s an index over some TimeRange, in this case always an absolute time range, and in that time range are some known tag keys. You can ask some obvious questions of the IndexInfo such as ‘does my provided time range intersect with your timerange?’ and ‘does the index satisfy this tag filter?’ which leads you to the ability to skip querying certain indices (this is a new concept?) when query planning (I assume). The various IndexInfo are never populated or updated. They’re disconnected from the flushing process and really it looks like a lower fidelity version of the SSTableCatalog which is also disconnected from everything. Nothing is connected. Well, it’s even worse, the QueryPlanner is connected to the IndexInfo which is connected to nothing.

It would make much more sense to populate and actually use the SSTableCatalog and have the QueryPlanner make use of that. The SSTableCatalog doesn’t really know about keys, but it does know about series names, so I guess we really want some kind of combination where the SSTableCatalog can track tags. We also have the QueryRouter in the storage module which is also hooked up to nothing, but can route to the correct memtable/SST based on a time range and a series name. We have concentric data encapsulation which is starting with the LSMT core and losing fidelity outwards to the IndexInfo. None of it is actually wired together. As a vibe coder none of this apparent to me, everything I glance at looks right enough.

Looking at the QueryPlanner we have a thing which can take the Query from the AST and produce a QueryPlan which looks like this:

#[derive(Debug, Clone)]
pub struct QueryPlan {
    pub index_selections: Vec<IndexSelection>,
    pub group_by: Vec<String>,
    pub order_by: Vec<(String, bool)>,
    pub limit: Option<usize>,
    pub offset: Option<usize>,
}

The QueryPlan holds on to the indices that could contain data (I presume) and then holds what is essentially the same info that came in on the Query (the AST Query, not the storage Query) but without the type information. This all feels sloppy to me. I know that the entire project fits into context. It’s greenfield and not really referencing anything outside of the project, and yet nothing feels like it’s supposed to fit together. It’s many differnt jigsaw puzzles of nearly identical pictures mixed together. Which Query am I supposed to use? Do they fit together? No matter how hard you squint you can’t see the pieces fitting correctly.

Anyway, moving on to the remainder of Execution Framework we have the Execution Framework and Tests sections left, so let’s knock those out.

Please complete the ‘Execution Framework’ section of the TODO

It then did something funny and marked the TODO items done and said, in the extremely verbose LLM way, that all of the items were completed. I don’t think so! Since I’m vibe coding, maybe I’ll just remark on the fact that I didn’t see any code generated.

Have the ‘Parallel data fetching’ and ‘Early result pruning’ sections been completed? I didn’t see any new code written.

After some back and forth it acquieses and writes some code as well as checks the TODO items off again. We now have a QueryExecutor!

pub struct QueryExecutor {
  /// The active MemTable
  memtable: Arc<RwLock<MemTable>>,
  /// The SSTable catalog
  sstables: Arc<RwLock<Vec<Arc<SSTable>>>>,
  /// Execution configuration
  config: ExecutionConfig,
  /// Current memory usage
  memory_usage: Arc<Mutex<usize>>,
  /// Cancellation flag
  cancelled: Arc<Mutex<bool>>,
}

If you remember, the storage module has a QueryRouter which looks like this:

pub struct QueryRouter {
  /// The active MemTable
  memtable: Arc<RwLock<MemTable>>,
  /// The SSTable catalog
  sstables: Arc<RwLock<Vec<Arc<SSTable>>>>,
}

You’ll rememvber that the QueryRouter returns a collection of DataPoint. Looking at the new QueryExecutor it also returns a collection of DataPoint, but wrapped in a ExecutionResult. The QueryRouter and QueryExecutor do not interact at all. In fact, the two more or less have the same logic where they first consult the memtable and then walk the sstables. The logic to inspect each block in the sstables is almost exactly the same (it’s worth looking at the code for QueryRouter and QueryExecutor). All in all the QueryExecutor is a more featureful QueryRouter. So, what else is happening in this thing? There is a ExecutionConfig which has the following knobs:

pub struct ExecutionConfig {
  /// Maximum number of concurrent tasks
  pub max_concurrent_tasks: usize,
  /// Memory limit in bytes
  pub memory_limit: usize,
  /// Timeout for query execution
  pub timeout: Duration,
}

The QueryExecutor is meant to be instantiated per individual query, so this config can be set per query. That’s neat and makes it easy to tweak these values in a live system. Now, looking at the knobs max_concurrent_tasks is never referenced. There is parallel query execution against multiple SSTs in execute_query_internal where it looks like this config would be used. You might wait for a task to finish when the max concurrent tasks have been reached before starting a new task. That would make sense to me, but it’s not set up that way. Currently, it just executes an unbounded number of concurrent tasks regardless of the config. Similarly, the QueryExecutor has a memory_usage member which is initialized to 0. Inside of execute_query_internal we check the value of memory_usage vs the configured limit:

let mut usage = memory_usage.lock().await;
if *usage > self.config.memory_limit {
  return Err(ExecutionError::MemoryLimitExceeded);
}

This would be interesting if memory_usage were ever updated. It’s not even clear necessary what it’s supposed to be counting. I suppose it’s the total in memory size of the DataPoint that are being collected. Finally, there’s a timeout which is handed off to tokio (which is an asynchronous library in rust). That part is neat because it is a timeout for the process that itself spawns n processes. If the outermost process times out it gives up and returns to the user. One out of three config values doing something ain’t bad.

Looking at one of the tests we can get a feel for the interface:

// Create executor
let config = ExecutionConfig {
  max_concurrent_tasks: 2,
  memory_limit: 1024 * 1024, // 1MB
  timeout: Duration::from_secs(5),
};
let executor = QueryExecutor::new(memtable, sstables, config);

// Execute query
let mut query = Query::new();
query.from = "test_series".to_string();
query.time_range = Some(TimeRange::Absolute { start: 400, end: 1100 });
let results = executor.execute_query(&query).await.unwrap();

Remember, this is the AST’s Query, not the storage’s Query. I think that we have the components where, with some glue, we could actually run this thing to some extent. Like I could envision the glue code needed to accept metrics via the ingestion Parser, have that go into the storage, and then throw a string query at the lexing code which can then invoke the QueryExecutor. I’m not saying that the glue code is trivial, or that these pieces fit togther in a coherent way (theme of this post), but that you can see it from a distance.

Next time, we’ll expand on the Query Foundation by starting on the Aggregation System. The power of a time series database is in aggregating the data points! That should be exciting! For reference, the code up to this point can be found here.