Go back

Database 101: Understanding B-Trees

Created at: 2024-03-09 19:43

Last updated at: 2024-08-04 22:29 (2a4b223)

Reading time: 3 min read (581 words)

Tags:

#database

Table of Contents

In the last months, I’m really interested in how database works internally. One of my questions was: “How databases manages indexes? And how they are organized into the storage?”.

I tried to find some resources to understand it, but I didn’t find a good resource that could explain to me in a clear way exactly how it works.

Then, I started to read the Database Internals book, inside it they have some chapters related to B-Tree and how they work. I really recommend this book if you want to learn better how databases work internally.

In this post, I’ll explain to you how B-Trees works as a way I would like to learn about it.

As a way to a better communication through the post, I’ll be calling B-Tree, but I think that it’s important to mention that it will be in notion of B-Tree as an “umbrella” for a given particular family of structures. In case, for our scenario of explanation, I’ll be using the “on-disk B+Tree” as our reference.

If you want to understand better the differences between B-Tree and B+Tree, I recommend you to read this post here.

The structure of a B-Tree

B-Trees are a kind of self-balancing tree data structure that maintains sorted data over a number of levels. It’s composed by nodes that have two specific properties: a key and a value, where the key is the data that will be sorted to let the tree be balanced and handle the binary search through it.

This tree are composed by three specific kind of nodes:

The structure of a B-Tree with the nodes and their kinds

A core property of B-Trees is what we can call it as “fanout”, which is the maximum number of childrens that each non-leaf node can have. The fact that the fanout is how much pointers targeting the children nodes

a node can have, it’s a key property to understand how B-Trees are efficient to search through the data.

In the example of the scenario above, the fanout of the tree is 3. This means that each non-leaf node can have up to 3 children. For a production-ready B-Tree, this fanout number would be much higher, in the order of 100 or more, but for the sake of simplicity, we’ll use 2 as a reference.

The B-Tree Operations

Why we need the balancing?

How B-Tree are being structured in storages?

How Database applies B-Tree structure?

Conclusion

Resources

----- < eof > ----- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || ||