1 | Hadoop | distributed/Clustered data processing system. you can assume it is like a Container-web/EJB |
2 | Map/Reduce, execution on cluster | it is algorithm for large set of data processing. Hadoop, ( |
1 . Hive --- Hadoop jobs are written in SQL
2. Pig -- Hadoop jobs are written data processing language
3. Cascadig - Hadoop jobs are written as work flow. ( use of JAVA API )
4. Cascalog - Hadoop jobs are written in Clojure language
5. mrJob - Hadoop jobs are written in python
6. Caffeine - developed from Google as **alternative to MapReduce concept
** 7. S4
8. mapR - it is commercial distribution of Hadoop system for enterprise. it has its own file system
**
** |
| 3 | NoSQL | it is document-oriented data base system rather than RDBMs, MongoDB, Casendra, Redis, BigTable, CouchDB, HBase, HyberTable, Riak | | 4 | File System | 1. S3 from amazon
2. HDFS ( Hadoop distributed file system ) |
| 5 | Server | 1. EC2
2. Google App Engine
3. Elastic Beanstalk.
4. Heroku - Ruby web application hosting. |
| 6 | Visualization | 1. Gephi - java based tool, to create node and connecting lines between nodes visually. Developed by LinkedIn
2. GraphViz -
3. Processing -
4. ProtoViz - it is javaScript framework
5. Fustion Tables - from Google
6. Tableau |
| 7 | Data Acquisition | 1. Google Refine
2. NeedleBase
3. ScraperWiki |
| 8 | Serialization | 1. JSON
2. BSON
3. Thrift
4. Avro
5. Protocol Buffers |
| | |