In this article we give a brief introduction to erasure coding, give few relevant pointers to tutorials and current solutions both on commercial and in free open source technologies.
Erasure codes can be used to construct very reliable storage. The coding can be used to provide redundancy to protect from loosing data when storage components become broken or unconnected. Some coding methods require the least possible amount of storage to provide for high redundancy. A well known example of erasure coding storage is RAID-5, which uses XOR coding. An alternative technique is to create multiple identical copies data, this is often called replication or mirroring. When targeting similar security levels the mirroring uses significantly more storage space and is thus more expensive than erasure coding. As an example typical RAID-5, where a parity data unit is created for each 4 disks consumes 125% of disk space over the amount of stored data. Similar goal to survive from one lost disk can be achieved with creating a single replica, which uses 200% disk space over the amount of stored data.
The techniques for erasure coding are many, and they are based on different mathematics. Usually the ones that are based on the use of simple XOR coding are very fast to compute. These codes, however, often lack either in flexibility or they consume more space, or require more complex management of data units. Other type of codes, e.g. Reed-Solomon, that are based on matrix calculations are ideal regarding their space requirement, and enable using any combination of data units to recover the original data. An excellent introduction by Jim Planck to the theory behind the erasure codes can be found here.
A good illustration about the feasibility of using the erasure codes for storage is the late emergence of company called CleverSafe. They use coding technologies to provide a very reliable storage for the customers with reasonable amount of used resources. The company also hosts an open source project about their technology. Another coding storage project that is finally starting to mature and to be stable in operation is GridBlocks DISK. The GridBlocks is entirely written on Java, by the authors of JStorage.com, and licensed under liberal BSD open source license. The DISK has been successfully run both in LAN and WAN environments.
0 Responses