1	Hadoop	distributed/Clustered data processing system. you can assume it is like a Container-web/EJB
2	Map/Reduce, execution on cluster	it is algorithm for large set of data processing. Hadoop, (

                                           1 . Hive  --- Hadoop jobs are written in SQL                                                                                                                                    
                                           2.  Pig -- Hadoop jobs are written data processing language                                                                                                                     
                                           3.  Cascadig - Hadoop jobs are written as work flow. ( use of JAVA API )                                                                                                        
                                           4.  Cascalog - Hadoop jobs are written in Clojure language                                                                                                                      
                                           5. mrJob - Hadoop jobs are written in python                                                                                                                                    
                                           6. Caffeine - developed from Google as **alternative to MapReduce concept                                                                                                       
                                          ** 7. S4                                                                                                                                                                         
                                           8. mapR - it is commercial distribution of Hadoop system for enterprise. it has its own file system                                                                             
                                           **                                                                                                                                                                              
                                          **                                                                                                                                                                               |

                                           2. HDFS ( Hadoop distributed file system )                                                                                                                                      |

| 5 | Server | 1. EC2

                                           2. Google App Engine                                                                                                                                                            
                                           3. Elastic Beanstalk.                                                                                                                                                           
                                           4. Heroku - Ruby web application hosting.                                                                                                                                       |

| 6 | Visualization | 1. Gephi - java based tool, to create node and connecting lines between nodes visually. Developed by LinkedIn

                                           2. GraphViz -                                                                                                                                                                   
                                           3. Processing -                                                                                                                                                                 
                                           4. ProtoViz - it is javaScript framework                                                                                                                                        
                                           5. Fustion Tables - from Google                                                                                                                                                 
                                           6. Tableau                                                                                                                                                                      |

| 7 | Data Acquisition | 1. Google Refine

                                           2. NeedleBase                                                                                                                                                                   
                                           3. ScraperWiki                                                                                                                                                                  |

| 8 | Serialization | 1. JSON

                                           2. BSON                                                                                                                                                                         
                                           3. Thrift                                                                                                                                                                       
                                           4. Avro                                                                                                                                                                         
                                           5. Protocol Buffers                                                                                                                                                             |

| | |

Big Data

results matching ""

No results matching ""