How database maintains resiliency in case of multiple failures?

4 min readJun 25, 2024

In our previous discussion on How database maintains resiliency? we explored how databases handle single points of failure. In this section, we will delve into managing complex failures and ensuring consistency within a distributed system.

Scenario Overview

Generation 1: Node A is the leader. Bob sends a request to A. A accepts the request but crashes before replicating it to other nodes.
Generation 2: Node B is elected as the new leader. Alice sends a request to B. B accepts the request but crashes before replicating it to other nodes.
Generation 3: Nodes A and B recover, but Node C crashes. A is elected as the leader once again.

Challenges

Nodes A and B have different log entries at index 1 (Bob’s request and Alice’s request). A must decide which log entry to commit and replicate to maintain consistency and integrity in the distributed system.

Generation Clock Solution

Generation Incrementation:

Each time a new leader is elected, the generation number is incremented. Requests are marked with the generation number of the leader that accepted them.

Log Entries with Generations:

Bob’s request is marked with Generation 1.
Alice’s request is marked with Generation 2.

Leader’s Role After Election:

When A is elected as the leader for Generation 3, it checks the logs of all available nodes (A and B) for uncommitted entries. Uncommitted entries are those that have not been replicated to a quorum of nodes.

Conflict Resolution:

Leader finds two different entries at index 1: one from Bob (Generation 1) and one from Alice (Generation 2). Leader resolves the conflict by picking the entry with the higher generation number (Alice’s request with Generation 2). Leader overwrites its own log with Alice’s request, marking it with the current generation number (Generation 3).
Leader replicates the updated log entry to other nodes.
This ensures that both leader and nodes have a consistent view of the log.

Handling Temporary Disconnections

If a leader temporarily disconnects but doesn’t crash, it might try to send requests to other nodes when it comes back online.
Each request is sent with the generation number.
Nodes can compare the generation number of incoming requests with the current generation number they know.
Nodes reject requests with a lower generation number and accept those with the higher generation number.

What Quroum does?

A quorum is a majority of nodes in the cluster. In a three-node cluster, a quorum is achieved with at least two nodes. Decisions, including committing updates and electing a new leader, require a quorum to ensure consistency.

When the leader receives an update, it adds it to its own log as uncommitted and sends replication messages to other nodes. Once a quorum of nodes acknowledges the update, it becomes committed.

During a leader election, nodes vote for a new leader. The node with the most up-to-date log is often preferred to minimize the cost of log synchronization. The new leader must apply any uncommitted updates before accepting new ones to ensure consistency.

Raft Consensus Algorithm

The Raft algorithm is designed to manage replicated logs efficiently and ensure consistency. Here are some key points about Raft:

Leader Election:

Raft optimizes leader election by choosing the node with the most up-to-date log.
This minimizes the cost of log synchronization and ensures the new leader can quickly start accepting new requests.

Log Replication:

The leader appends new entries to its log and sends them to followers. Followers append the entries to their logs and send acknowledgments.
Once a quorum of followers acknowledges the entries, the leader commits them.

Safety and Consistency:

Raft ensures that committed entries are durable and will be applied by any future leader.
The generation clock (term number in Raft) helps manage conflicting entries and ensures that the cluster maintains a consistent state.

How followers know which entries are committed?

The High-Water Mark (HWM) is a crucial concept in distributed systems, particularly in consensus algorithms like Raft. It addresses several key challenges in maintaining consistency and ensuring that all nodes in the cluster have a synchronized view of the committed log entries.

The HWM is a value maintained by the leader, representing the index of the latest committed log entry. The leader includes the HWM in its heartbeat messages to followers. Followers use the HWM to determine which log entries can be committed.

When a follower initially syncs logs from the leader, it receives the log entries and the current HWM. The follower stores these log entries in its Write-Ahead Log (WAL) but does not immediately commit them. The leader periodically sends heartbeats to followers. Each heartbeat contains the current HWM, which represents the highest index of the committed log entries. Upon receiving a heartbeat, the follower checks the HWM included in the message. The follower then commits all log entries in its WAL up to the index specified by the HWM. This ensures that the follower's committed state is consistent with the leader's committed state.