Transactions

What is the most important part of a database? If your answer is “the data,” you’re absolutely right. RDBMSs like PostgreSQL are designed to handle all aspects of processing and retrieving data, ensuring reliability and validity. They ensure that every read and write operation adheres to a strict set of rules that preserve the integrity of the information stored. Without these rules, data would quickly become unreliable and inconsistent, making the entire system untrustworthy. In this article, you’ll learn about the problem RDBMSs solve best: the phenomenon of Data Racing and how transactions provide a robust solution to this issue.

Problematic Operations

Actually, all but SELECT, because reading never changes the values it reads, it is idempotent. Manipulating data, on the other hand, introduces the Data Racing problem, which arises because RDBMSs allow multiple threads to access data simultaneously. If only one connection could be established at a time, this wouldn’t be an issue, but it would also make the database impractical for real-world applications.

The solution to this problem is called a Transaction. A transaction is a constraint that ensures an operation runs as a single unit of work. Such an operation may contain one or more queries that alter data. The key principle is that all queries within the transaction must succeed; otherwise, the entire transaction is rolled back. To achieve this, four key properties define whether an RDBMS is transactional or not. These properties are represented by the acronym ACID.

”A” Is for Atomicity

Each transaction is a set of SQL queries where every query is treated as a single unit. If even one query fails, the whole operation is considered failed. Consider a set of queries responsible for creating a new user in the system. The first query inserts a record into the users table, the second inserts data into the user_profiles table, and the last one updates the user_count table, which tracks the total number of created accounts. If, after the second query, the PostgreSQL server crashes, the third query won’t execute, leaving the system in an inconsistent state. To prevent this, the entire transaction is rolled back to the initial state, deleting the user and profile rows so that user_count remains accurate. In short, Atomicity ensures an all-or-nothing approach.

Atomicity is particularly useful in financial applications. Imagine transferring money between two bank accounts. The transaction should debit one account and credit another. If the debit is successful but the credit fails, the system would have an incorrect balance. Thanks to atomicity, the entire transaction rolls back, ensuring that no partial operations affect the data.

”C” Is for Consistency

Another critical property of a transaction is Consistency. It ensures that all constraints on tables and columns are respected. Suppose the users table enforces constraints requiring the username, email, and password columns to be strings with a minimum length of six characters. If an insertion attempt violates any of these constraints, the transaction fails. This prevents corrupt or invalid data from entering the system.

Consistency is maintained through constraints, such as foreign keys, unique indexes, and checks. Without these, inaccurate or contradictory data could be stored, leading to significant issues in reporting and business logic. Consistency also guarantees that any database operation leaves the system in a valid state before and after the transaction is executed.

”I” Is for Isolation

A transaction is considered isolated when it runs independently of other operations affecting the same tables. This does not mean that only one operation runs globally, only those that could interfere with the transaction’s context are restricted. To manage this, the RDBMS queues transactions, executing them sequentially or in parallel while ensuring data integrity.

However, such queuing impacts performance since the time required to complete a transaction is the sum of the times needed for all its queries. This trade-off is necessary to ensure data validity. Some systems may choose lower isolation levels to optimize performance, depending on the business requirements.

There are two general strategies for achieving Isolation: Optimistic Locking and Pessimistic Locking.

Optimistic Locking

This approach ensures that data remains unchanged between selection and processing within a thread. It involves adding an extra column to the table to track the current revision of a record. Additionally, the WHERE clause of an UPDATE query must check the revision number. If the stored revision is higher than expected, indicating that the data has been modified, the update fails, preventing unintended overwrites. This method is useful in systems without persistent connections, such as HTTP-based backends.

Optimistic locking is well-suited for scenarios with low contention, where the likelihood of conflicts is minimal. However, if multiple clients attempt to modify the same data frequently, the chances of retries increase, reducing efficiency.

Pessimistic Locking

This strategy locks the entire record for exclusive use by the current thread. Other processes must wait until the lock is released before they can access the row. Many RDBMSs, including PostgreSQL, have built-in support for this approach. However, it comes with a downside: Deadlocks. A deadlock occurs when multiple transactions wait indefinitely for each other to release locks, leading to a system freeze. Preventing deadlocks requires additional logic on the application side.

Pessimistic locking is ideal for high-conflict scenarios, such as inventory management systems where concurrent updates are frequent. However, it requires careful handling to avoid unnecessary performance bottlenecks.

”D” Is for Durability

The final requirement for an ACID-compliant transaction is Durability, ensuring that committed changes are permanent. A transaction is not considered successful until the operating system acknowledges that the data has been written to persistent storage. This guarantees that data remains intact even after system failures such as power outages, machine restarts, or network disruptions.

Durability relies on mechanisms like write-ahead logging (WAL), which ensures that changes are recorded before being committed. This allows databases to recover quickly from crashes without losing data.

Summary

ACID transactions provide essential safeguards to ensure data integrity, consistency, and reliability in an RDBMS. It’s crucial to remember that data is the most valuable asset in any system. While application code can be fixed if it contains bugs, losing even a small amount of data can render an entire system unusable. Understanding how ACID properties work helps developers design robust applications that protect data integrity.

In the next article, I’ll dive deeper into Transaction Isolation Levels, covering different levels like Read Uncommitted, Read Committed, Repeatable Read, and Serializable, so stay tuned.