Apache Hadoop Ecosystem

Apache Hadoop Ecosystem

  • Hadoop HDFS - 2007 - A distributed file system for reliably storing huge amounts of unstructured, semi-structured and structured data in the form of files. 
  • Hadoop MapReduce - 2007 - A distributed algorithm framework for the parallel processing of large datasets on HDFS filesystem. It runs on Hadoop cluster but also supports other database formats like Cassandra and HBase. 
  • Cassandra - 2008 - A key-value pair NoSQL database, with column family data representation and asynchronous masterless replication. 
  • HBase - 2008 - A key-value pair NoSQL database, with column family data representation, with master-slave replication. It uses HDFS as underlying storage. 
  • Zookeeper - 2008 - A distributed coordination service for distributed applications. It is based on Paxos algorithm variant called Zab. 
  • Pig - 2009 - Pig is a scripting interface over MapReduce for developers who prefer scripting interface over native Java MapReduce programming. 
  • Hive - 2009 - Hive is a SQL interface over MapReduce for developers and analysts who prefer SQL interface over native Java MapReduce programming. 
  • Mahout - 2009 - A library of machine learning algorithms, implemented on top of MapReduce, for finding meaningful patterns in HDFS datasets. 
  • Sqoop - 2010 - A tool to import data from RDBMS/DataWarehouse into HDFS/HBase and export back. 
  • YARN - 2011 - A system to schedule applications and services on an HDFS cluster and manage the cluster resources like memory and CPU. 
  • Flume - 2011 - A tool to collect, aggregate, reliably move and ingest large amounts of data into HDFS. 
  • Storm - 2011 - A system to process high-velocity streaming data with 'at least once' message semantics. 
  • Spark - 2012 - An in-memory data processing engine that can run a DAG of operations. It provides libraries for Machine Learning, SQL interface and near real-time Stream Processing. 
  • Kafka - 2012 - A distributed messaging system with partitioned topics for very high scalability. 
  • SolrCloud - 2012 - A distributed search engine with a REST-like interface for full-text search. It uses Lucene library for data indexing.

Comments

  1. Much thanks to you for setting aside opportunity to composing your experience.This is extremely useful.
    Education | Article Submission sites | MBA Guide | Technology

    ReplyDelete
  2. Wonderful post!!Thank you for sharing this info with us.
    Keep updating I would like to know more updates on this topic
    Very useful content, I would like to suggest this blog to my friends.

    Best Hadoop Training in Chennai
    Big Data Hadoop Training in Chennai

    ReplyDelete
  3. This technical post helps me to improve my skills ,thanks for this wonder post I expect your upcoming blog, so keep sharing...
    Articles
    Technology updates

    ReplyDelete

  4. Pretty blog, so many ideas in a single site, thanks for the informative article, keep updating more article.
    Digital marketing course in chennai

    ReplyDelete
  5. Best tutorial on hadoop for freshers among lots of available. Super work
    Hadoop Training in Chennai

    ReplyDelete
  6. Thanks for sharing your knowledge with us .This will absolutely going to help me in my future .

    Big Data Training Chennai

    Best hadoop training institute in chennai

    ReplyDelete
  7. It's event-driven, and builders not should depend on the ops to check their code. They'll shortly run, check and deploy their code with out getting tangled within the conventional workflow.This is great blog. If you want to know more about this visit here Internet of Things.

    ReplyDelete

Post a Comment

Popular posts from this blog

Big Data Before The Internet

Big Data After The Internet