Skip to content

GFS (The Google File System)

Touches several topics, related to distributed systems:

  • parallel computation
  • fault tolerance
  • replication
  • consistency

GFS is a good example of successful real-world system, that have documented design and implementation.

Context

  • Many Google services needed a fast unified storage system

MapReduce, Crawling, Indexing, Log storage/analysis

  • Shared across many applications

1000s of clients, 100s of chunkservers, 1 coordinator;

  • Huge capacity
  • High performance
  • Fault tolerance
  • Aimed at batch processing, not interactive use

Capacity story

  • big files split into 64MB chunks
  • each chunk sharded into 3 replicas
  • each chunk is a Linux file

Throughput story

  • many clients, each reading/writing a few large files
  • huge parallelism

Fault tolerance story

  • each file chunk replicated on 3 chunkservers
  • client writes sends to all of chunk's copies
  • read just needs to consult one copy

MIT notes: https://pdos.csail.mit.edu/6.824/notes/l-gfs.txt with more details.