Ning Li, Co-founder & CTO

Blockchain. Distributed database. These terms are often used carelessly, and, more often than not, incorrectly. Both blockchains and distributed databases have a similar goal of maintaining a consistent copy of a particular dataset across a number of nodes. Maintaining consensus on the data that is stored, as well as keeping redundant copies of this dataset, are the major similarities between the technologies.

On the surface, their fundamental technology is quite similar, but that’s as deep as it goes. Similar does not mean interchangeable.

This article will explore the nuanced differences between blockchain and distributed databases by focusing on three important aspects: intrinsic nature, core value proposition and storage technology.

  • The differences in nature:
  1. Centralized vs decentralized management.

Public blockchains are a collaborative creation, with their ultimate goal being to create a world that is completely decentralized, and where the ownership of digital assets is protected and transferable at all times. On the other hand, distributed databases are centrally managed by a service provider. Their goal is to create a logical center, that can provide efficient, low cost services with great scalability.

  1. Trilemma:

Both technologies face technical trilemmas, which is referring to the difficulty of optimizing a technology while balancing tradeoffs. For example, the blockchain trilemma is concurrently achieving high security, decentralization and scalability.
I.e. it’s easier to achieve high security and scalability by sacrificing decentralization. Distributed databases face a fundamentally different set of issues. As a service provider, DD managers must consider business support, engineering implementation complexity and evolving hardware requirements.

  1. Consensus mechanisms:

Blockchain systems attempt to solve the Byzantine Generals Problem with clever algorithms, thereby becoming Byzantine Fault Tolerant, or BFT. In short, this is how blockchains reach verifiable decentralized consensus, even with malicious nodes. The most commonly used consensus algorithms Proof of Work/Proof of Stake (probability based algorithms) and Practical BFT (deterministic algorithms). The consensus generated based on the probability class of PoW/PoS algorithms is temporary, meaning it can rewritten. As time goes by and additional blocks are added to the chain, the probability of overturning the previous blocks become smaller, approaching zero. Byzantine fault-tolerant algorithms often have poor performance, with a low tolerance of 1/3 faulty nodes.

PBFT deterministic algorithms are irreversible once consensus is reached. That is, the consensus result will always be final.

Distributed database systems rarely have to solve the Byzantine Generals Problem, since there is a central point of control that coordinates the whole system, but do have to consider system failures. Mainstream algorithms used by DDs include Paxos and Raft. These fault-tolerant algorithms tend to perform better and process faster, and tolerate faulty nodes that do not exceed over 1/2 of the network.

  • The difference in value propositions

The core value of blockchain technology is not to provide rudimentary data services (like the decentralized database), but to build a new ecosystem of digitized data assets and automated trust services. The global blockchain updates its state autonomously, and data is traceable to its source.

On the other hand, the core value of distributed database is to provide data storage and access services to business systems. The database is designed to provide operational-support, mainly for business products and development projects, with the data being stored with a focus on supporting analysis and retrieval.

  • Understand blockchain and distributed database through storage technology

From the birth of Bitcoin in 2008 to the emerging generation of blockchain 3.0, the fundamental storage technology of blockchain has not drastically changed.

The main data structures in blockchain are divided into two categories – transactions and blocks:
• Transactions trigger updates to the blockchain’s world state. The transaction itself contains two types of data: input and output. The transaction input indicates the source of the data, and the transaction output indicates the destination of the data.
• Blocks store transaction data. They are composed of a block header and body. The block header records important metadata, such as the hash address of the previous block and creation time stamp. The body contains the transaction quantity and complete transaction data.

Lets look at an example using the storage principles of Bitcoin.

The Bitcoin/blocks/ folder is formatted like the blk00000.dat file in Figure 1. This file stores block data and has a maximum size of about 128 MiB.

0_1545620456989_3a7590f1-19f7-4384-bec2-d19e199fb6aa-image.png

                                                                         Figure 1 

The Bitcoin/blocks/index/ folder stores the index data of all blocks, using the key/value pair database in leveldb (ldb) format, such as below in Figure 2.

0_1545620476479_56c0d548-8d45-42b1-9766-18e9f8ac8991-image.png
Figure 2

Each block is at most 2MB, and the block data is stored in the block file (such as blk00000.dat in Figure 1). Block data is separated by a "magic number" (such as 0xF9BEB4D9 in Figure 3). The blocks/ folder stores data from multiple blocks, up to the maximum limit. If it is larger than 128MiB, a new block file will be created (such as blk00001.dat).

0_1545620958022_a2e8e6ed-3eab-4b22-864e-c11939bc685a-image.png

Distributed databases do not use blocks and transactions, as explained above.

The first distributed databases were implemented around 2005, with the first wave NoSQL. The primary problem solved by these databases was that individual machines of the time lacked storage capacity to store all of the data on a single machine. Following the invention of relational database management systems there have been a lot of adaptions as business requirements have changed. New middleware products and sub-library schemes have continued to be developed, such as Hbase, Cassadra, and MongoDB. Googles publication of Spanner and F1 papers in 2012/13, showed the industry the possibility of NoSQL's scalability integrating with a large-scale production system.

As mentioned in my previous article on the evolution of blockchain technology, I will continue a series of articles that further explain the world of blockchain. My goal in this article was to breakdown the difference in nature between blockchains and distributed databases, and in my upcoming article I will elaborate how smart contracts can be perceived as “nuclear power” for the world of blockchain, rather than an interface for external services. I will also describe a series of valuable applications for smart contracts.

Ultrain, as a leader in blockchain technology, has a clear goal to share knowledge with different audiences and attract more participants into our growing community. Together, we wish to promote the large scale adoption of blockchain technology in our existing economy, and create a programmable business society.