szpak

Indexes

When amount of data becomes huge, indexes help search it efficiently.


Index in a database means the same thing as an index in a book. Its main purpose is to quickly find data during query execution.

Managing Big Collections

Imagine you have one million followers on social media. For some reason, you need to contact five of them. You know their names, but that’s all the information you have. How do you deal with this situation? You could scroll through the entire list of users, sure, it’s only 15 minutes of moving your thumb up and down. However, you’re a smart person, so you use the search option. Thanks to this, you find all five followers in seconds. You type the name, hit the search button, and the list is filtered to show only the names matching your criteria.

The word instant is key here. Your expectation of an immediate result is why search functionality exists. The same applies to an RDBMS when querying data. The database system needs a way to efficiently index and retrieve values stored in columns. Whenever a query contains comparators like:

  • >
  • <
  • =
  • BETWEEN
  • IN

The database engine may use an index to quickly find the relevant records, just like you use the search option when looking for specific followers. “May” is important. The query planner can still choose a sequential scan when that is cheaper. In projects with hundreds of tables and billions of records, maintaining fast access to data is crucial. Indexes are one of the key features that help keep an RDBMS performant at scale.

Index Types in PostgreSQL 14

PostgreSQL supports several types of indexes, each optimized for different query patterns. If an index type isn’t explicitly specified, PostgreSQL defaults to B-Tree indexing.

B-Tree Index

B-Tree indexes are the most commonly used and work well in most scenarios unless the data type requires a more specialized approach.

The structure of a B-Tree index is based on a balanced tree. PostgreSQL’s query planner will use this index type whenever a query involves one of the following operators:

  • >
  • <
  • =
  • BETWEEN
  • IN
  • IS NULL
  • IS NOT NULL
  • LIKE (for left-anchored patterns, e.g., name LIKE 'John%')

A major advantage of B-Tree indexes is their ability to efficiently support sorting, making them ideal for queries that use ORDER BY. Additionally, they are well-suited for indexing primary keys and foreign keys, ensuring fast lookups in relational databases.

Hash Index

Hash indexes store 32-bit hash codes of values rather than the actual data. Because hashes are fixed in size, the only operator that can be used with this index type is:

  • =

PostgreSQL may use a hash index for simple equality comparisons when the planner estimates that it is cheaper than other options. However, B-Tree indexes often remain the default choice because they support more operators and more query shapes. One downside of hash indexes is that they cannot be used for range queries or sorting.

Generalized Inverted Index (GIN)

This type of index is primarily used for full-text search and other advanced indexing needs, such as JSONB and array fields.

Unlike standard indexes that map row locations to values, a GIN index inverts this relationship by mapping values to their locations. This makes it particularly efficient for searching within complex data structures.

How It Works

The best way to understand an Inverted Index (II) is with an example. Let’s take two text documents:

LocationNameContent
1FirstMan is the vainest of all creatures that have their being upon earth
2SecondThere are many creatures on the earth, but human is one of the vainest

The first step in creating an II is removing meaningless words (e.g., is, a, the, of). These are called stop words. After this operation, the content becomes:

New Content
Man vainest all creatures that have their being upon earth
There many creatures earth, but human one vainest

Next, we extract individual words and map them to their locations:

WordDocumentPosition in Text
man11
vainest12
vainest28
all13
creatures14
creatures23
that15
have16
their17
being18
upon19
earth110
earth24
there21
many22
but25
human26
one27

This structure is essentially a reverse index, allowing fast lookups of words across multiple documents. Real full-text search engines add more machinery: normalization, ranking, compression, and many implementation details. The principle is still the same: instead of scanning every document, jump from a term to the places where that term appears.

Operators Using GIN

The following operators trigger GIN index usage:

  • <@ (contained within)
  • @> (contains)
  • = (exact match)
  • && (overlap)

However, this efficiency comes at a cost. GIN indexes require more storage than B-Tree indexes and have higher maintenance overhead. While they significantly speed up full-text searches, JSONB queries, and array lookups, they add complexity to database management. Additionally, building and updating GIN indexes takes longer compared to B-Tree indexes, making them less suited for frequently updated datasets.

Conclusion

Indexes are a fundamental tool for optimizing database performance. B-Tree indexes handle most queries efficiently, hash indexes improve equality comparisons, and GIN indexes power full-text searches and complex data retrieval. Understanding these index types allows you to design performant database schemas that scale effectively. By choosing the right indexing strategy, you can significantly improve query performance, reduce server load, and enhance the user experience of your applications.