DBMS/Cassandra

Cassandra Storage Engine

seungh0 2023. 3. 4. 17:13
반응형

Storage Engine

  • Cassandra uses a storage strucutre similar to a Log-Structure Merge Tree.
    • Cassandra avoids reading before writing. Read-before-write, especially in a large distributed system, can result in large latencies in read performance and other problems.
      • For example, two clients read at the same time; one overwirtes the row to make update A, and the other overwrites the row to make update B, removing update A.
      • This race condition will result in ambiguous quert results - which update is correct?
  • To avoid using read-before-write for most writes in cassandra, the storage engine groups insert and updates in memory, and at intervals, sequentially writes the data to disk append mode. Once written to disk, the data is immutable and is never overwritten.
  • Reading data involves combing the immutable sequentially-written data to discover the correct query result.
  • can use LWT (Lightweight transactions) to check the state of thte data before writing.
    • however, this feature is recommended only for limited use.
  • A log-structured engine that avoids overwrites and uses sequential I/O to update data is essential for writing to solid-state disks (SSD) and hard disks (HDD). On HDD, writing randomly involves a higher number of seek operations than sequential writing. The seek penalty incurred can be substantial. Because Cassandra sequentially writes immutable files, thereby avoiding write amplification and disk failure, the database accommodates inexpensive, consumer SSDs extremely well. For many other databases, write amplification is a problem on SSDs.
반응형