首页 HiC2010-Introduction_to_Hadoop_C++_Extension@百度

HiC2010-Introduction_to_Hadoop_C++_Extension@百度

举报
开通vip

HiC2010-Introduction_to_Hadoop_C++_Extension@百度 Introduction to Hadoop C++ Extension 肖 康 xiaokang@baidu.com Outline Big Picture Why HCE HCE Implementation HCE Usage HCE Reference Other Works Baidu Statistics Current >10 cluster, 4000 nodes largest cluster: 1000 nodes 8 core/16GB/12*1TB pe...

HiC2010-Introduction_to_Hadoop_C++_Extension@百度
Introduction to Hadoop C++ Extension 肖 康 xiaokang@baidu.com Outline Big Picture Why HCE HCE Implementation HCE Usage HCE Reference Other Works Baidu Statistics Current >10 cluster, 4000 nodes largest cluster: 1000 nodes 8 core/16GB/12*1TB per node data per day: >3PB jobs per day: >3w Soon >10000 nodes data per day: 10PB Big Picture Computing Resource Management Layer Communication Intensive – HPC … Data & Computing Intensive – DC... Scheduling Layer (HPC Scheduler, Agent) Scheduling Layer (DC Scheduler, Agent) Classification RegressionVector Clustering Computing Model MapReduce DAG Computing Model Algorithm Description Layer SQL-like Representation Layer Why HCE Current API java streaming/bistreaming pipes Why HCE java language efficiency sort, compress/decompress C++ 10% ~ 40% improvement java memory control full featured C++ API HCE Implementation 韩富晟 hanfusheng@baidu.com TaskTracker Child Child JVM MapTask or ReduceTask run Launch socket C++ Wrapper Library command Status/progress C++ Map or Reduce or Reader or Writer or Partitioner or Combiner or Comitter class Tasktracker Node HCE Data Data HCE Implementation Java RunTask HceMapRunner HDFS ① HceOutputCommitter HceNoJavaInputFormat HceSubmitter LineRecordReader Mapper MapOutputCollector LocalFS IFileWriter Status/Progress/Counters HceMapRunner HceOutputCommitter IFileReader Reducer LocalFS ReduceInputReader C++ LineRecordWriter HadoopOutputCommitter Shuffle & MSort Hadoop File.out map.out HCE Usage – basic interface setup(), cleanup(), map(), reduce() is not optional, return 0 for success。 emit() for output K/V, TaskContext for conf and counter HCE Usage – word word文档格式规范word作业纸小票打印word模板word简历模板免费word简历 count map 韩富晟 hanfusheng@baidu.com HCE Usage – wordcount reduce 韩富晟 hanfusheng@baidu.com HCE Usage – wordcount run Partitioner Combiner OutputCommitter RecordReader RecordWriter $HADOOP_HOME/bin/hadoop hce \ -mapper wordcount-demo \ -reducer wordcount-demo \ -file ./wordcount-demo \ -jobconf mapred.reduce.tasks=1 \ -input /user/test/sample_input \ -output /user/test/sample_output HCE Usage – advanced interface Interface Function JobConf Get job configuration from object Counter Allow for user defined counter Partitioner HashPartitioner by default Utilitity : IntHashPartitioner and MapIntPartitioner Combiner Allow for user defined combiner RecordReader LineRecordReader by default SequenceRecordReader for SequenceFile RecordWriter LineRecordWriter by default SequenceRecordWriter for SequenceFile HCE Reference JIRA : MAPREDUCE-1270 https://issues.apache.org/jira/browse/MAPREDUCE-1270 patch demo tarball design doc install manual tutorial performance test Other Works JobHistory Server separate JobTracker & JobHistory query job history from DB TaskScheduler queue schedule based on CapacityTaskScheduler queue priority support queue update at run time preemption support Other Works Shuffle ( in plan ) problems random IO connection shuffle use reduce slot separate shuffle from tasktracker & reduce task C++ implementation in HCE random IO vs. sequential IO pull vs. push model data distribution service : shuffle, bt Thanks
本文档为【HiC2010-Introduction_to_Hadoop_C++_Extension@百度】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_688726
暂无简介~
格式:pdf
大小:885KB
软件:PDF阅读器
页数:16
分类:互联网
上传时间:2011-12-12
浏览量:14