Google放弃MapReduce转向新型超大规模分析系统

Urs Hölzle

Urs Hölzle, senior vice president of technical infrastructure at Google, announced a new cloud analytics system at the 2014 Google I/O conference in San Francisco.

    Google已经放弃使用MapReduce,转而去支持一个称为“云数据流”的全新的云分析系统。MapReduce系统起初由Google研发,随后开放源码,用于在许多台服务器上运行数据分析作业。

    MapReduce已经成为一个广为人知的,在服务器集群上并行执行分布式计算的基础设施和编程模型。它是Apache Hadoop的基础,Hadoop是一个大数据基础设施平台,得到了广泛的部署并且成为许多公司的商业产品的核心。

    然而,这项技术近来已经无法处理Google需要分析的数据量。Urs Hölzle,加州山景城技术基础设施的高级副总裁,声称数据量达到几PB时计算会变得很麻烦。

    “实际上我们已经不再使用MapReduce了”,本周三于旧金山召开的Google I/O会议上,Hölzle在他的主题演讲时如是说。Google公司已经在“好几年前”就不再使用这个系统了。

    云数据流系统,Google也会将其以服务的方式提供给使用其云平台的开发者,不存在MapReduce的规模限制。

    “云数据流系统是十余年数据分析经验的产物”,Hölzle说到。“它的运行速度和可伸缩性将优于以往的任何其他系统”。

    他说:“它是一个完全托管的服务,支持自动调优,自动部署,自动管理和自动缩放。它使得开发者可以很容易地使用统一的编程,为批处理和流服务创建复杂的管道”。

    所有这些Google考虑的特性在MapReduce中都不可用:它难以快速提取数据,需要许多不同的技术的支撑,批量处理和流式处理互不相关,并且需要部署和操作MapReduce集群。

Hölzle在展示中公开了其他几项Google云平台的新服务:

    云保存API支持应用将用户数据保存在云端或别处,并且不需要任何服务端代码。使用应用Google引擎服务的PAAS用户和使用计算引擎的IAAS用户可以使用该特性开发应用。

    云调试使得在云端调试跨服务器部署的软件代码变得更加容易

    云追踪提供了跨不同组的时延统计(例如:数据库服务调用时延)并提供分析报告

    云监控是一个智能监测系统,该系统是与Stackdriver(一个Google5月收购的云监测创业项目)进行集成的产物。该特性对云基础设施资源进行监控,例如磁盘和虚拟机,也可以从服务层面对Google的服务以及十余种非Google的开源包进行监控。


原文地址:Google Dumps MapReduce in Favor of New Hyper-Scale Analytics System

英文原文:

Google has abandoned MapReduce, the system for running data analytics jobs spread across many servers the company developed and later open sourced, in favor of a new cloud analytics system it has built called Cloud Dataflow.

MapReduce has been a highly popular infrastructure and programming model for doing parallelized distributed computing on server clusters. It is the basis of Apache Hadoop, the Big Data infrastructure platform that has enjoyed widespread deployment and become core of many companies’ commercial products.

The technology is unable to handle the amounts of data Google wants to analyze these days, however. Urs Hölzle, senior vice president of technical infrastructure at the Mountain View, California-based giant, said it got too cumbersome once the size of the data reached a few petabytes.

“We don’t really use MapReduce anymore,” Hölzle said in his keynote presentation at the Google I/O conference in San Francisco Wednesday. The company stopped using the system “years ago.”

Cloud Dataflow, which Google will also offer as a service for developers using its cloud platform, does not have the scaling restrictions of MapReduce.

“Cloud Dataflow is the result of over a decade of experience in analytics,” Hölzle said. “It will run faster and scale better than pretty much any other system out there.”

It is a fully managed service that is automatically optimized, deployed, managed and scaled. It enables developers to easily create complex pipelines using unified programming for both batch and streaming services, he said.

All these characteristics address what Google thinks does not work in MapReduce: it is hard to ingest data rapidly, it requires a lot of different technology, batch and streaming are unrelated, and deployment and operation of MapReduce clusters is always required.

Hölzle announced other new services on Google’s cloud platform at the show:

  • Cloud Save is an API that enables an application to save an individual user’s data in the cloud or elsewhere and use it without requiring any server-side coding. Users of Google’s Platform-as-a-Service offering App Engine and Infrastructure-as-a-Service offering Compute Engine can build apps using this feature.
  • Cloud Debugging makes it easier to sift through lines of code deployed across many servers in the cloud to identify software bugs.
  • Cloud Tracing provides latency statistics across different groups (latency of database service calls for example) and provides analysis reports.
  • Cloud Monitoring is an intelligent monitoring system that is a result of integration with Stackdriver, a cloud monitoring startup Google bought in May. The feature monitors cloud infrastructure resources, such as disks and virtual machines, as well as service levels for Google’s services as well as more than a dozen non-Google open source packages.

本文链接:http://bookshadow.com/weblog/2014/06/26/google-dumps-mapreduce-in-favor-of-new-hyper-scale-analytics-system/
请尊重作者的劳动成果,转载请注明出处!书影博客保留对文章的所有权利。

如果您喜欢这篇博文,欢迎您捐赠书影博客: ,查看支付宝二维码