ACID in Database – A for Atomicity

Atomicity

Overview

Imagine a busy office where employees are working on multiple tasks at once. Each task needs to be completed fully for the office to run smoothly. If one task is only partially done, it can disrupt the whole workflow, causing confusion and inefficiency. Similarly, in the world of databases, maintaining data accuracy and consistency is crucial.

This is where atomicity plays a key role. Atomicity is a foundational principle in database management that ensures each transaction is handled as a single, complete unit, as known as a unit of work. This means that a transaction must either be fully completed or not executed at all. If any part of the transaction fails or encounters a problem, atomicity guarantees that all changes are undone, returning the database to its previous state. Just as completing each office task fully ensures smooth operations, atomicity ensures that databases remain accurate and reliable, even if something goes wrong.

Transaction everywhere and in MySQL

Everything in databases relates to transactions. Every statement such as INSERT, UPDATE, DELETE is executed in transactions. Whenever we run a query, the database management system will do it within an implicit transaction context.

And explicitly, it will start with BEGIN TRANSACTION and end with COMMIT or ROLLBACK

In MySQL, particularly with the InnoDB storage engine, there is an internal field called transaction ID associated with each row. This field is physically changed by INSERT, UPDATE and DELETE operations to record which transaction has locked the row. You can also check the transaction id (trx_id) with the below command

SELECT trx_id, trx_state, trx_started, trx_requested_lock_id, trx_wait_started, trx_mysql_thread_id, trx_query

FROM INFORMATION_SCHEMA.INNODB_TRX;

It may display nothing because there is no transaction running.

So how can databases understand when to commit or rollback if errors occur?. In MySQL databases using InnoDB engine, there are three types of log to perform atomicity:

  • Undo log
  • Redo log
  • Binary log

So, what are they?

For Undo log

According to MySQL documentation, Undo log is a collection of undo log records associated with a single read-write transaction. It is used to undo (rollback) changes if a transaction is aborted or if there is a need to restore the database to a consistent state. Then where are these logs stored in the database? From MySQL 8.0 and later, it will create a storage segment on disk called `undo tablespace` with at least 16MiB by default to store them. Notice that there is nothing said we can have only 1 undo tablespace. We can create it by ourselves by CREATE UNDO TABLESPACE.

Therefore, another transaction needs to see the original data as part of a consistent read, it will retrieve the unmodified data from this segment. In addition, if a transaction contains data more than the undo tablespaces can store, MySQL will automatically reallocate and assign more spaces to it.

For Redo log

What happens if your database crashes or shuts down accidentally while a transaction is in progress? When your database is up again, it will use the redo logs during crash recovery to correct data which are written by incomplete transaction. Modifications that did not finish updating data files before an unexpected shutdown are replayed automatically during initialization and before connections are accepted.

For Binary log

We have undo log to track actions in a transaction, redo log to track the whole database server if it shuts down due to a certain reason and how about the last ones?

The binary log contains “events” that describe database changes such as table creation operations or changes to table data. It also contains events for statements that potentially could have made changes (for example, a DELETE which matched no rows), unless row-based logging is used. The binary log also contains information about how long each statement took that updated data. The binary log has two important purposes:

1/ For replication, the binary log on a replication source server provides a record of the data changes to be sent to replicas. The source sends the information contained in its binary log to its replicas, which reproduce those transactions to make the same data changes that were made on the source.

2/ Certain data recovery operations require use of the binary log. After a backup has been restored, the events in the binary log that were recorded after the backup was made are re-executed. These events bring databases up to date from the point of the backup.

Furthermore, the binary log is not used for statements such as SELECT or SHOW that do not modify data.

Conclusion

In theory, everything in the database is performed in a transaction, all or nothing. This will make your data correct as expected. However, in practice, there are a lot of things to deal with. For instance, we often face popular cases such as retrieving data in a transaction by another transaction, locking tables or rows, missing data, etc. All of these relate to C in ACID – Consistency which the next blog will explain and dive deep into it.

Thank you for reading.

関連記事

カテゴリー:

ブログ未分類

情シス求人

  1. チームメンバーで作字やってみた#1

ページ上部へ戻る