Andrés
February 22, 2024
Investor Insight
Crypto Discussions

Ethereum’s Data Availability Problem

Data Availability in Ethereum refers to the process by which all necessary data to verify a block is available to all network participants. This has become a problem in the network due to the growing demand for scalability and decentralisation.

Ethereum has suddenly become crowded with scalability solutions such as L2s, causing issues, mainly when concerned with ensuring that the details about transactions are available on demand.

The demand for Data Availability solutions has incentivised innovation in the space, as the need for network integrity and security has increased.

How Scalability Solutions Ironically, Created More Congestion

The challenge is indeed ironic because the introduction of scalability solutions, rollups, has increased network congestion and constant data flow has shown how Ethereum’s infrastructure is not able to handle such traffic.

Imagine: Ethereum is a big city where the main blockchain is the main road that connects to all parts of the city.

As the city has grown, neighbourhoods have been connected to the main road causing traffic.

In an attempt to solve this problem, the city government ordered the creation of new districts situated apart from the current city.

Each district was initially designed to handle its own local traffic, theoretically easing congestion on the main road.

However, as the districts grew, they needed to communicate and transfer goods to the main city which led to more traffic being directed through the main city’s roads.

The main road, once again, had to support the main city’s (Ethereum) and the districts’ (rollups) traffic.

This analogy helps understand how the introduction of multiple rollups, initially created to solve the city’s traffic, ironically led to increased congestion on Ethereum.

Not All Network Participants can Access Transaction Data

When it comes to full nodes on Ethereum, they are able to simply have all needed information at all times due to the information being 100% required to be there at all times.

Full nodes download copies of all the data on each block and this is inefficient since it causes scaled storage requirements, increased synchronisation times (new nodes joining the network can take days catching up to the network’s current state), and higher hardware entry barriers for nodes.

Data Availability in L2 solutions can be classified into 2 methods: on-chain data availability, and off-chain data availability.

On-chain data availability stores all transaction data on the L1, offering higher security but at greater cost, while off-chain data availability stores data off-chain, with only summaries (hashes) on-chain. This ends up being more cost-effective but it relies on external entities for data retrieval.

The need for data at all times for transaction verification is essential to ensure that the summarised blockchain data truly represents a valid set of transactions.

One main strategy has been used to address this challenge and it consists of implementing light nodes which only utilise block headers and rely on full nodes for more detailed data. Light nodes and Layer 2 rollups are perfect examples of network participants that require strong data availability assurances but are not able to download and process transaction data for themselves. Why? Because avoiding the process of downloading data is what makes light nodes “light” and it’s what enables rollups to efficiently scale blockchains.

There are a few projects helping the data availability problem.

Celestia

Celestia acts as a helper for Ethereum, which is getting crowded by many rollups’ data by acting as a place where transactions can be published and accessed by anyone. Celestia is able to handle a great amount of data and makes it easier for light nodes to check if data is correct.

Explained in simpler terms, imagine there is an exclusive library (Ethereum) that has become so popular because everyone wants to store their own book there. Imagine a group of smart people comes up with the idea of creating special shelves (rollups) in the library where you are still able to place your books easily and quickly, without having to find space in the main shelves. However, although these special shelves magically hold a lot of books, they’re not very efficient when retrieving books from them.

Celestia acts as a magical bookshelf that helps organise these special shelves, it also has a super efficient storage system that keeps everything organised without getting messy. Celestia keeps summaries or important pages from all the books that are neatly organised.

Celestia is very efficient and smart because it doesn’t make librarians check every single stored item to know it’s there; they only need to look at a few items to be sure everything else is in the right place. This is possible because of Celestia’s Data Availability Sampling (DAS) that lets small computers (light nodes) confirm data is there without having to download everything.

- DAS allows nodes to verify the availability of the entire block by only checking a small, random piece of it. It is a key technique for scalability as it ensures necessary data is available to verify transactions without requiring the data to be stored or processed on the main blockchain.

Avail

Avails main focus is data availability with a modular approach that allows it to decouple data hosting, execution, and verification to optimise each component’s efficiency.

Avail addresses the limitations of Data Availability by the on-chain and off-chain approach explained above. Instead, it offers scalable data hosting without transaction execution. To ensure high data availability, they implement several technologies:

- Erasure Coding

It is a technique implemented to protect data from corruption or loss by spreading out information over a distributed set of shards, meaning that even if some parts of the data are lost or corrupted, the complete data can still be used by reconstructing from the remaining data shards.

- KZG Polynomial Commitments

A type of commitment scheme (cryptographic technique that allows a Sender to commit a chosen value) that ensures the integrity and availability by proving only that certain pieces of data are available and correct, without needing to show all the data every time.

- Data Availability Sampling (also used by Celestia, explained above)

EigenDA

EigenDA is a secure, high throughput, and decentralized data availability (DA) service built on top of Ethereum using the EigenLayer restaking primitive. While it was recently announced and information is not as extensive as Celestia and Avail, similarly to Avail, they will also use ZKG commitments.

Conclusion

Data Availability is a key aspect of blockchain security and scalability, mainly due to the increasing presence of rollups and scalability solutions.

Solutions like the ones mentioned above address challenges with unique mechanisms to ensure that data is always accessible and verifiable across the network.

In this article, we delved into how there have been projects that aimed to solve this problem; however, it is something that Ethereum developers are constantly working on, and thus, new concepts and methods will continue to arise.