Deep Dive into Storage Proofs

·10 min read time.·

A technical deep dive into how Ethereum stores and verifies its state

Ethereum's primary goal is to be accessible to all. Its platform, brimming with potential, allows anyone to build decentralized applications with smart contracts.

Every day, Ethereum has:

  • ~1.2 million transactions
  • ~0.5 GB of new data stored
  • Thousands of contract deployments and state changes

This exponential growth means that while Ethereum's core architecture must remain standardized and transparent for accessibility, it also needs to handle :

  • Efficient data retrieval at scale
  • Optimized storage patterns
  • Simple and secure ways to verify data
  • A minimal duplicate storage across the network
  • Smart ways to handle access and process of historical data

Ethereum is built around cryptographic storage proofs, which provide mathematical evidence that a specific piece of data exists on the blockchain, eliminating the need to trust an oracle or download the entire chain. This fundamental architecture extends beyond primary transaction verification, enabling the secure verification of any data stored on Ethereum, including token balances, contract states, and transaction histories.

Let's dive into how Ethereum's sophisticated proof systems work, and how we can leverage it to verify any historical blockchain data.

Understanding Ethereum's Data Architecture

To effectively navigate the examples of extracting and using data on Ethereum, it is essential to first understand how the network structures and manages its vast and complex data infrastructure. Given the billions of transactions and state changes, Ethereum's data architecture must strike a delicate balance between accessibility, efficiency, and scalability.

The Two Layers of Ethereum

Ethereum has two main layers that work together: The Consensus Layer and the Execution Layer.

While the consensus layer is responsible for network security, validator coordination, and state agreement, our focus will be on the Execution Layer. This is where all the heavy lifting of data processing takes place.

The Execution Layer: The Heart of Ethereum's Data

The Execution Layer handles all blockchain data by utilizing:

  • Merkle Patricia Tries for structured storage, which enables the generation of proofs through its hash-linked structure
  • RLP (Recursive Length Prefix) for standardized data encoding

Together, these components allow us to verify any piece of blockchain data through cryptographic proofs.

The execution layer is where:

  • Smart contracts run
  • Account balances and data are stored
  • Transactions are recorded and processed

State Storage: Tree Structures

Ethereum faces a considerable challenge: an archive node storing entire history requires approximately 12 TB, while even a basic node needs about 1000 GB. Let's see how Ethereum manages this scale while ensuring efficient access and verification.

First, it is essential to grasp Ethereum's sophisticated storage system's underlying foundational structures. It involves understanding the core concepts that form the basis of its complex data management mechanisms.

1. Bare Trees to Efficient Tries

A tree is a hierarchical data structure composed of nodes connected by edges, where each node (except the root) has exactly one parent node. This structure is fundamental in computer science, as it naturally represents hierarchical relationships while enabling efficient data organization and retrieval.

Tries (pronounced "try," from retrieval) are specialized trees optimized for storing strings and similar sequential data. Unlike regular trees, they share common prefixes between entries, making them particularly efficient for storing similar data.

To better understand this concept, let's look at a simple example storing three words: "slow," "slower" and "water":

 Basic trie structure showing storage of words 'slow', 'slower', and 'water'

While tries offer efficient lookups, they have a notable drawback: they can waste space with single-path chains. Notice how "slower" creates individual nodes after "slow", each storing just one character. For our example, this inefficiency is manageable. However, this becomes significant for a blockchain that stores millions of accounts.

2. Optimizing with Radix Tries

To address the storage inefficiency of basic tries, a compressed version is necessary. Radix tries achieve this by compressing single-path chains into more compact representations.

Here's how our previous example looks in a radix trie:

Radix trie structure showing storage of words 'slow', 'slower', and 'water'

This optimization provides several key benefits:

  • Fewer nodes
  • Shared prefixes (like "slow")
  • Efficient lookups while using less storage

3. Adding Security through Merkle Trees

In a decentralized system, proving that our data remains untampered is vital. For this, Merkle trees introduce cryptographic security to our data structure, enabling verifiable proofs of data integrity.

In a Merkle tree, each piece of data is hashed, and these hashes are combined up the tree:

Merkle tree structure showing storage of words 'slow', 'slower', and 'water'

This structure provides crucial features:

  • Each level represents a layer of hashing
  • Every parent node contains the combined hash of its children
  • Any modification to leaf data propagates up to the root hash
  • Verification requires only the branch path to the root

4. The Patricia Merkle Trie Solution

Ethereum has adopted the Patricia Merkle Trie, a hybrid structure that combines radix tries and Merkle trees to enhance efficiency and security. This trie serves as the foundational structure for Ethereum's state storage.

Applying this to our previous example, we get:

Patricia Merkle Trie structure showing storage of words 'slow', 'slower', and 'water'

The Patricia Merkle Trie combines two key benefits:

  • Path Compression for Efficiency
    • Common prefixes share paths, reducing redundancy (e.g., "slow")
    • Single paths are combined, optimizing storage (e.g., "water")
    • Minimizes overall storage requirements
  • Hash Chains for Security
    • Each node is cryptographically hashed
    • Hashes are combined upwards to form the root hash
    • Any changes in the data affect the root hash
    • Enables efficient proof generation and verification

Ethereum's State Organization

With the Patricia Merkle Trie as our foundation, let's examine how Ethereum uses this structure to organize its entire state. This hierarchical organization enables efficient access and verification of any account or contract data.

The entire state is organized as:

Ethereum state organization

This structure has two main components:

  1. State Trie: The top level

    • Root hash stored in each block
    • Maps addresses to account data
    • Any change to any account affects this root
    • Enables global state verification
  2. Storage Tries: For each contract

    • Individual trie for contract data
    • Maps storage slots to values
    • Changes in contract data only affect its trie
    • Enables efficient contract state verification

So, we can verify our data against the state root in the block header

Data Serialization: Understanding RLP

While Patricia Merkle Trie provides the structure for Ethereum's state, we need a standardized way to encode this data. This is where RLP (Recursive Length Prefix) comes in.

RLP is Ethereum's primary serialization protocol, designed to encode complex data structures into a standardized format. Key characteristics that make it suitable for Ethereum:

  • Structure-focused: Encodes data structure without assuming data types
  • Deterministic: The same input always produces the same output
  • Nested data support: Can handle complex, nested structures like account states

For example, a basic account state in Ethereum contains:

# Account state: [nonce, balance, storageRoot, codeHash]
account = [1, 100, "0x1234", "0x5678"]
 
# After RLP encoding, we get something like:
encoded = 0xc80164821234825678  # This is what actually gets stored in the trie

This data is encoded using RLP and stored in our Patricia Merkle Tries, ensuring that:

  • All nodes interpret the data consistently
  • Information can be efficiently verified through proofs
  • The execution layer maintains standardized data encoding, which is crucial for reliable data management.

Practical Example: Verifying USDC Balances

Now that we understand the structure of Ethereum's state let's explore how to verify a USDC balance with proof. USDC, one of the most widely used tokens on Ethereum, serves as an excellent example of how to utilize Ethereum's state architecture.

The Verification Challenge

At its core, this process relies on two essential elements:

  1. Block Header: The primary source of trust from Ethereum, containing the state root hash, which serves as the entry point to all account states for the block.
  2. State Proofs: Cryptographic proofs that link our data to the trusted block.

To verify a USDC balance, we utilize:

  1. Account Proof: Confirms the existence of USDC's contract and its storage root in the state trie.
  2. Storage Proof: Validates that the balance exists at the specified slot in USDC's storage trie.

The authenticity of the block header is crucial; without it, the proofs become meaningless if we are verifying against a manipulated block.

Proof verification flow

Breaking Down USDC Storage

Every smart contract on Ethereum has its persistent storage area, functioning as a vast key-value store where:

  • Each slot is 32 bytes long
  • The contract's actual implementation determines slots emplacement
  • Data is persistent across transactions

While ERC20 tokens adhere to a standard interface, their storage layout differs for all implementations. Instead, it varies depending on the specific implementation, highlighting the need for developers to be flexible and adaptable in their designs.

The unique storage slot for accessing a specific user's balance in USDC is computed by concatenating the user's address with USDC's balance slot (9) and then applying Ethereum's robust and secure cryptographic hash function, keccak256. The resulting hash, a unique and deterministic value, determines the precise location of the balance in the contract's storage.

USDC storage slots

Extracting and Building Proofs

To verify a balance, we need to:

  1. Find the Balance Location
# Compute the exact storage slot for user's balance
storage_slot = keccak256(user_address, 9))  # 9 is USDC's balance slot
  1. Extract the Complete Proof Chain
# Get both account and storage proofs
proof = web3.eth.get_proof(
    USDC_ADDRESS,          # Contract address
    [storage_slot],        # Storage slot we want
    block_number          # Block we're interested in
)

The eth_getProof call returns:

  • Account proof (path from state root to USDC contract)
  • Storage proof (path from contract storage root to balance)
  • All are RLP-encoded

This decoding and verification process can be implemented both off-chain and on-chain. Various libraries and tools exist to handle these operations, making it easier now that we understand the underlying concepts. Whether working with Python for proof generation or Solidity for on-chain verification, the process follows the same logical flow:

  • Decode each piece of data via RLP
  • Verify its hash matches our expectations, by comparing to the parent hash.

Taking it Further: Usage on L2 Blockchains

When working off-chain, we can handle permissions over the block passed from Ethereum (number is root hash) to ensure it is correct. But while working on other blockchains, we need a trusted source of Ethereum block headers for these proofs to be useful. Optimism provides an elegant solution through its L1 Block Oracle, which:

  • Maintains synchronized and verified Ethereum block headers
  • Acts as a trusted source of state roots
  • Enables proof verification directly on L2

With this trusted block header source, here is the whole process:

  1. Get the block header from Oracle
  2. Decode and verify the proofs against it
    • Each hash must match its parent
    • Any mismatch invalidates the entire proof

The security stems from cryptographic properties: once we have a trusted block header, it's impossible to forge any part of the proof chain.

Implementing Block Header Verification

To implement this verification process, we need to properly handle block headers through RLP encoding/decoding. Here's how:

Encoding on L1 (Python)

BLOCK_HEADER = (
    "parentHash", "sha3Uncles", "miner", "stateRoot",
    "transactionsRoot", "receiptsRoot", "logsBloom",
    "difficulty", "number", "gasLimit", "gasUsed",
    "timestamp", "extraData", "mixHash", "nonce",
    "baseFeePerGas", "withdrawalsRoot", "blobGasUsed",
    "excessBlobGas", "parentBeaconBlockRoot"
)
 
def encode_block_header(block: Dict[str, Any]) -> bytes:
    """Encode a block header -> RLP encoded"""
    block_header = [
        (
            HexBytes("0x")
            if isinstance(block.get(k), int) and block.get(k) == 0
            else HexBytes(block.get(k))
        )
        for k in BLOCK_HEADER
        if k in block
    ]
    return rlp.encode(block_header)
 
def get_rlp_header(web_3: Web3, block_number: int) -> str:
    """Get and encode block header"""
    block = web_3.eth.get_block(block_number)
    encoded_header = encode_block_header(block)
    return "0x" + encoded_header.hex()

Decoding on L2 (Solidity)

function decodeBlockHeader(bytes memory rlpData) public pure returns (BlockHeader memory) {
    // Define the structure for block header
    struct BlockHeader {
        bytes32 parentHash;
        bytes32 unclesHash;
        address coinbase;
        bytes32 stateRoot;  // This is our anchor for proof verification
        // ... other fields
    }
 
    // Decode the RLP data using a standard RLP library (e.g., RLPReader from Hamdi Allam)
    RLPReader.RLPItem[] memory decodedList = RLPReader.toList(rlpData);
 
    BlockHeader memory header = BlockHeader({
        parentHash: bytes32(decodedList[0].toUint()),
        unclesHash: bytes32(decodedList[1].toUint()),
        coinbase: address(uint160(decodedList[2].toUint())),
        stateRoot: bytes32(decodedList[3].toUint()),  // Extract state root
        // ... other fields
    });
 
    return header;
}

Once decoded, we can use the state root to verify our proofs:

  1. Block header provides the trusted state root
  2. State root validates the account proof
  3. Account proof confirms the storage root
  4. Storage root verifies the final balance

Leveraging State Proofs: Beyond Simple Verification

Our example of verifying the USDC balance provides a clear understanding of how Ethereum's proof system operates. Additionally, these proofs can be utilized for a variety of applications:

  1. Historical Data Access

    • Token balances at any block
    • Contract states at any point
    • Transaction verification
  2. Cross-chain Systems

    • Block header verification for bridges
    • L2 systems using L1 data
    • Cross-chain oracles
  3. Efficient Data Solutions

    • Snapshot systems
    • State synchronization
    • Light client support

These cryptographic proofs make Ethereum's data both accessible and verifiable. Any developer can build applications that access historical state, verify data trustlessly, and create cross-chain interactions taking advantage of Ethereum's security guarantees.

Looking Ahead: Evolution of Ethereum's State Verification

Ethereum's execution layer, while currently built on RLP and Patricia Merkle Tries, continues to evolve...

Emerging Solutions

Several promising developments are being explored:

  1. Simple Serialize (SSZ)

    • A unified approach to replace RLP, as SSZ is actually used in Ethereum's consensus layer
    • Aims for better efficiency and type safety in data handling
  2. Verkle Trees

    • An evolution of Merkle trees. Verkle trees are based on Vector Commitments, which are more efficient than Merkle trees
    • Could dramatically reduce proof sizes and improve light client capabilities : For a tree with a billion data points, a Verkle tree would require less than 150 bytes to produce a proof, while a typical binary Merkle tree would need around 1 kilobyte
  3. STARK-Based Verification

    • Scalable Transparent Arguments of Knowledge
    • Offers enhanced privacy through zero-knowledge proofs
    • Maintains strong security guarantees without relying on elaborate setup ceremonies
    • Utilizes post-quantum-secure cryptography, ensuring long-term security

While the fundamental concepts explored in this article remain valuable, their implementation might look quite different in the future. The verification processes could become simpler and more efficient, with smaller proofs and easier access for light clients. What's crucial is that these improvements will maintain the security and trustlessness that make Ethereum's state verification as robust as it is today. When those changes arrive, it will be time for another dive !


Real-World Application: Votemarket V2

With Stake DAO, we extensively leverage these mechanisms for our needs - we use storage proofs to extract and verify votes. Check out more of our work here.

Further Reading