The Data Directory
Everything Bitcoin Core persists lives in the data directory (default: ~/.bitcoin/ on Linux, ~/Library/Application Support/Bitcoin/ on macOS). Here's what's inside:
Block Storage: FlatFile
Raw block data is stored in a sequence of flat files named blk00000.dat, blk00001.dat, etc. This design is managed by the FlatFileSeq class.
How It Works
- Pre-allocated chunks: each file is pre-allocated in 16 MiB chunks to reduce filesystem fragmentation.
- Append-only: new blocks are appended to the current file. When a file is full, a new one begins.
- Position tracking: each block's position is recorded as a
FlatFilePos(file number + byte offset within that file). - Network magic prefix: each block on disk is preceded by the 4-byte network magic (
0xf9beb4d9for mainnet) and a 4-byte size field, so blocks can be located even if the index is lost.
BlockManager
The BlockManager class (in node/blockstorage.h) sits between validation and the raw files. It handles:
- Reading blocks:
ReadBlock()takes aFlatFilePosand returns the deserializedCBlock. - Writing blocks:
SaveBlockToDisk()serializes a block to the current blk file and returns the position. - File management: tracks which files have space, creates new files when needed.
- Block Tree DB: maintains a LevelDB database (
BlockTreeDB) that maps block hashes to theirFlatFilePoslocations, plus metadata like height, version, and total work.
The Block Index
The block index is an in-memory tree of CBlockIndex objects (one per known block header). It's loaded from the blocks/index/ LevelDB at startup.
What CBlockIndex Stores
Each CBlockIndex records everything known about a block header without needing the full block data:
- Block hash and height
- Previous block pointer:
pprevlinks to the parent, forming a tree - Cumulative chain work:
nChainWork(total PoW from genesis to this block) - Validation status: which stages of validation this block has passed (
nStatus) - Disk position:
nFile+nDataPospointing into the blk files
The block index contains entries for every header the node has ever seen, including orphaned or invalid branches. A header can exist in the index before its full block data has been downloaded or validated. The nStatus flags track which stages each block has completed.
Best Chain Selection
The "best chain" (active chain) is the chain of blocks with the most cumulative proof of work from genesis. ChainstateManager maintains a pointer to the tip of this chain. When a new block arrives with more work, the node switches to the new chain (potentially triggering a reorganization).
The UTXO Database
The UTXO set (Unspent Transaction Output set) tracks every bitcoin that exists and hasn't been spent. This is the single most performance-critical data structure, accessed on every transaction validation.
CCoinsView Hierarchy
Bitcoin Core uses a layered cache architecture for UTXO access:
Coin Representation
UTXOs are keyed by COutPoint (txid + output index). The LevelDB key uses a compact serialization with obfuscation (XOR with a random key) to make the database resistant to compression-based attacks.
Undo Data
For each block file (blk?????.dat), there's a corresponding undo file (rev?????.dat). The undo data stores the information needed to reverse a block: the inputs that were consumed when the block was connected.
Why Undo Data Exists
When a chain reorganization happens, the node needs to "disconnect" blocks and restore the UTXO set to its previous state. The undo data for each block contains the Coin objects for every input spent in that block, so they can be put back.
Each CTxUndo contains a vector of Coin objects: one for each input of the transaction, representing the UTXO that was consumed. To disconnect a block, Bitcoin Core replays these coins back into the UTXO set.
Pruning
A fully synced Bitcoin node stores ~600+ GB of block data. Pruning allows nodes to delete old block and undo files while keeping the UTXO set, which is all that's needed for validation going forward.
How Pruning Works
- Manual pruning:
-prune=Nkeeps only the most recent N MiB of block data. - File-level deletion: entire blk/rev file pairs are deleted, not individual blocks within a file.
- Minimum kept: the last 288 blocks (~2 days) are always kept to handle potential reorganizations.
- What's preserved: the block index (headers), UTXO set, and all optional indexes remain intact. Only raw block/undo data is deleted.
A pruned node cannot serve historical blocks to peers, rescan the blockchain for new wallet keys, or rebuild indexes from block data it has deleted. It can still fully validate new blocks and transactions.
Optional Indexes
Bitcoin Core provides an indexing framework (BaseIndex) for building secondary indexes on top of the blockchain. These are optional and configurable via CLI flags.
BaseIndex Framework
All indexes inherit from BaseIndex, which provides:
- Sequential processing: blocks are indexed in order, following the active chain.
- Reorg handling: when the active chain changes, the index automatically rolls back and re-indexes.
- Background threading: indexing runs on its own thread, so it doesn't block the main validation pipeline.
- Sync tracking: stores a
CBlockLocatorrecording how far the index has progressed.
TxIndex (Transaction Index)
Enabled with -txindex. Maps every transaction ID to its position on disk (CDiskTxPos: block file + offset within the block). This is required for the getrawtransaction RPC to work on confirmed transactions that the node's wallet doesn't track.
CoinStatsIndex
Enabled with -coinstatsindex. Tracks cumulative UTXO statistics at each block height using MuHash (a rolling hash). Powers the gettxoutsetinfo RPC to return the total UTXO count, total amount, and a hash of the entire UTXO set, without scanning the UTXO database each time.
Compact Block Filters (BIP 157/158)
Enabled with -blockfilterindex and -peerblockfilters. This is the modern mechanism for light client support, replacing the deprecated BIP 37 bloom filters.
How It Works
- Filter construction (BIP 158): for each block, a compact probabilistic filter is built using a Golomb-Coded Set (GCS). The filter encodes all scriptPubKeys spent or created in the block.
- Filter serving (BIP 157): light clients download these small filters (a few KB per block instead of the full ~1-2 MB block), check if any of their addresses match, and only download the full block if there's a match.
- Filter chain: filters are chained via header hashes (
getcfheaders) so clients can verify they have the correct filter for each block.
Why Not Bloom Filters?
BIP 37 bloom filters had the client send its filter to the server, which revealed information about the client's addresses (privacy leak). BIP 157/158 reverses this: the server builds one filter per block and serves it to all clients identically. The client tests the filter locally, revealing nothing to the server about which addresses it's interested in.
P2P Messages for Block Filters
getcfilters: request filters for a range of blockscfilter: response containing one block filtergetcfheaders: request filter header chain (for verification)cfheaders: response containing filter headersgetcfcheckpt: request evenly-spaced checkpoints for efficient sync