Byzantine Fault Tolerance is the ability of a distributed computer network to correctly reach a sufficient consensus despite malicious nodes in the system failing or sending out incorrect information. The goal of BFT is to protect against catastrophic system failures by reducing the influence of these malicious nodes.
BFT is derived from the Byzantine Generals’ Problem, a computer science term for a situation where involved parties must agree on a single strategy to avoid a complete failure. However, it assumes some of the involved parties might be corrupt or otherwise unreliable.
The diagrams below illustrate the basic concept of the Byzantine Generals’ Problem with different network sizes:
Source: Kiran Vaidya
These examples have only a few “lieutenants”, but one could imagine the increase in complexity for achieving consensus with hundreds of lieutenants (nodes) in a distributed computer network.
Byzantine Fault Tolerance is the characteristic that defines a system that tolerates the class of failures belonging to the Byzantine Generals’ Problem. BFT has been needed in airplane engine systems, nuclear power plants, and almost any system with actions that depend on many sensors. It also applies to blockchain-based systems, where trust must be established in a distributed network of nodes.
Practical Byzantine Fault Tolerance (pBFT) is an algorithm that optimizes aspects of Byzantine Fault Tolerance (in other words, protection against Byzantine faults) and has been implemented in several modern distributed computer systems, including some blockchain platforms. These blockchains typically use a combination of pBFT and other consensus mechanisms.
Miguel Castro and Barbara Liskov introduced the Practical Byzantine Fault Tolerance (pBFT) algorithm in a paper released in 1999. It provided high-performance Byzantine state machine replication, processing thousands of requests per second with sub-millisecond increased in latency.
After the release of pBFT, more BFT protocols were introduced in attempts to improve robustness and performance. Examples of these included Q/U, HQ, Zyzzyva, and ABsTRACTs, which addressed the performance and cost issues, as well as Aardvark and RBFT, which addressed robustness issues.
How does it work?
pBFT focuses on providing a practical Byzantine state machine replication that tolerates Byzantine faults (i.e. malicious nodes) by assuming there are independent node failures and manipulated messages sent through specific nodes.
Nodes in a pBFT system are sequentially ordered with one node being the leader and others referred to as backup nodes. All nodes in the system communicate with one another with the goal being that all honest nodes will come to an agreement of the state of the system using a majority rule.
Communication between nodes has two functions: nodes must prove that messages came from a specific peer node, and they must verify that the message was not modified during transmission.
For the pBFT system to function, the number of malicious nodes must not equal or exceed one third of all nodes in the system in a given vulnerability window. Similar to the proof of work consensus mechanism, the more nodes there are in a pBFT network, the more secure it becomes.
pBFT consensus rounds are called views and are broken into 4 phases:
- A client sends a request to the leader node to invoke a service operation.
- The leading node broadcasts the request to the backup nodes.
- The nodes execute the request, then send a reply to the client.
- The client awaits f+1 replies from different nodes with the same result, where f represents the maximum number of potentially faulty nodes.
The leading node is changed during every view and can be replaced with a protocol called a view change if a certain amount of time has passed without the leading node broadcasting the request. Also, a supermajority of honest nodes can determine when a leader is faulty and replace them with the next leader in line.
Two projects currently using Practical Byzantine Fault Tolerance are Hyperledger Fabric and Zilliqa.
Hyperledger Fabric is an open-source collaborative environment for blockchain projects hosted by the Linux Foundation using a permissioned version of pBFT for the platform. Since permissioned chains use small consensus groups and don’t need the same decentralization as public blockchains, pBFT is effective for providing high-throughput transactions.
Also, permissioned blockchains are private and operated by invite with known identities, meaning there is already an element of trust between parties. This mitigates the need for a trustless environment and allows the network to reap the benefits of pBFT without the downsides.
More information on Hyperledger Fabric and pBFT can be found here.
Zilliqa uses an optimized version of classical pBFT to achieve consensus about data on the blockchain. Zilliqa also uses a proof of work consensus round every ~100 blocks to perform network sharding, where miners are split into smaller groups referred to as shards. Each shard is capable of processing transactions in parallel, yielding a high throughput for the network.
The network uses multi-signatures to reduce the communication overhead of classical pBFT and have reached a few thousand transactions per second in their testing environments.
For more information on consensus in the Zilliqa blockchain, click here.
- Transaction finality: The nature of pBFT means that transactions can be agreed upon and finalized without needing multiple confirmations. There is no waiting period to ensure a transaction is secure after including it in a block.
- Energy efficiency: Unlike proof of work consensus mechanisms, pBFT can achieve network consensus without requiring energy intensive computations. Some pBFT systems use proof-of-work to prevent Sybil attack (where a single adversary is controlling multiple nodes on a network, pretending to be multiple parties), but only after a set number of blocks (i.e. 100) and not for every block.
- Low reward variance: pBFT requires collective decision through voting on records by signing messages, unlike proof-of-work where only the leader proposes the next block. Thus, every node in a pBFT system can be incentivized, lowering reward variance for miners.
- Scaling: pBFT is a promising consensus solution when the group of nodes is small but becomes inefficient for large networks. This is because each node must talk to every other node to keep the network secure, which can quickly grow into a huge communication cost as the amount of nodes scales upwards.
- Sybil attacks: The pBFT models is susceptible to Sybil attacks, where a single party creates or manipulates a large number of nodes in the network and compromises security. This threat is reduced with larger network sizes but considering the scalability issue of pBFT it typically needs to be used in combination with another consensus mechanism.
* The information contained in this article is for education purpose only and not financial advice. Do your own research before making any investment decisions.
This article is contributed by Victor Lai with the help of our Senior Analyst Kieran O'Day.