Apache Hadoop Ecosystem
Hadoop HDFS - 2007 - A distributed file system for reliably storing huge amounts of unstructured, semi-structured and structured data in the form of files. Hadoop MapReduce - 2007 - A distributed algorithm framework for the parallel processing of large datasets on HDFS filesystem. It runs on Hadoop cluster but also supports other database formats like Cassandra and HBase. Cassandra - 2008 - A key-value pair NoSQL database, with column family data representation and asynchronous masterless replication. HBase - 2008 - A key-value pair NoSQL database, with column family data representation, with master-slave replication. It uses HDFS as underlying storage. Zookeeper - 2008 - A distributed coordination service for distributed applications. It is based on Paxos algorithm variant called Zab. Pig - 2009 - Pig is a scripting interface over MapReduce for developers who prefer scripting interface over native Java MapReduce programming. Hive ...