• No results found

Cloudera s Commitment to Open Source and Open Standards

N/A
N/A
Protected

Academic year: 2021

Share "Cloudera s Commitment to Open Source and Open Standards"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

Cloudera’s Commitment

to Open Source and

Open Standards

A Cloudera White Paper

(2)

CLOUDERA’S COMMITMENT TO OPEN SOURCE AND OPEN STANDARDS

Executive Summary

3

The Benefits of Open Source Software

4

Cloudera and the Hadoop Software Ecosystem

4

The Cloudera Software Platform Lifecycle

7

The Value of Support Subscriptions

8

About Cloudera

9

(3)

CLOUDERA’S COMMITMENT TO OPEN SOURCE AND OPEN STANDARDS

WHITE PAPER

Executive Summary

Today, software for every layer of the enterprise stack is available

under a permissive open source license. In fact, the world’s most

popular OS (Linux), Web server (Apache HTTP Server), relational

database (MySQL), and distribution of Apache Hadoop (CDH from

Cloudera—downloaded more than all alternatives combined) are all

open source software.

Many people intuitively recognize the surface benefits of source

code being available for inspection and modification. However, all

open source platforms are not the same. Buyers of Apache

Hadoop-based enterprise data hubs in particular should be aware that deep,

direct involvement in the open source development process, in

a way designed to help customers solve business problems, has

a tangible impact beyond the simple availability of source code.

Otherwise, the “open source” label has limited practical benefit

beyond its surface appeal.

Furthermore, it’s important to understand that open licenses and

governance do not necessarily lead to industry standards. Even an

open license won’t prevent lock-in risk if the software involved is

shipped by a single vendor. So, a vendor’s commitment to standards

is just as important as its commitment to open source.

If Apache Hadoop had been

created as proprietary software it

would not have spread as rapidly.

We’ve seen incredible growth in

the use of Hadoop, partly because

it’s useful. But many would have

been cautious to make a

vendor-controlled platform part of their

infrastructure, useful or not.

Doug Cutting,

(4)

CLOUDERA’S COMMITMENT TO OPEN SOURCE AND OPEN STANDARDS

The Benefits of Open Source Software

First, it’s important to document the baseline, generic business benefits of any open source platform:

• Freedom from lock-in. Thanks to permissive licensing, when using an open source distribution of Hadoop, you’re free to use the software without paying royalties and free to switch to a different platform without moving your data. Furthermore, unlike with proprietary software, there is no acquisition cost for open source software.

• Extended evaluation and testing, with no obligation. One of the open source movement’s greatest contributions was to make source code freely available for inspection under a permissive license. For that reason, you are free to install, test/evaluate, or even deploy an open source platform in produc- tion for any length of time without any obligation to the distro vendor.

• Rapid innovation on a global scale. As famously evidenced by the Apache and Linux movements (and documented by Eric S. Raymond in his seminal 1997 essay, “The Cathedral and the Bazaar”), no single vendor can out-innovate a global, diverse

community of contributors. The rapid evolution, and widespread adoption, of the Hadoop codebase since 2006 over proprietary alternatives is yet more evidence of this vision in action.

• Community-driven development across the ecosystem - to extend, modify, and enhance the platform collaboratively. One of the original purposes of open source licenses was to allow users to improve software themselves, as well as to ensure future compatibility. (Otherwise, the typical result is UNIX: a tangle of incompatible, slowly evolving proprietary offerings.) Fortunately for Hadoop users, one of its greatest strengths is a dedicated network of loosely affiliated developers—including employees of Hadoop platform vendors and platform users across a variety of industries—who are constantly collaborating to improve the code.

These benefits are powerful, time-tested, and supported by research (by Gartner Group, Black Duck Software, and others). That said, they are just “table stakes” when deploying a strategic open source platform like Hadoop. There are other considerations that make a selection process necessary and important.

Cloudera and the Hadoop Ecosystem

Every vendor of an open source Hadoop distribution will deliver the generic benefits described above. However, it’s also important to ask:

• Does the vendor have sufficiently deep and wide involvement in the Apache community, as well as expertise for all components, to support the entire stack (not just the core)?

• Does the vendor have sufficient impact on the platform roadmap to align it with customer needs?

• Is the vendor committed to shipping an open platform based on open standards? Cloudera’s commitment to meeting each of the needs described above has made it the partner of choice for Hadoop users since its founding in 2008. Since that date, Cloudera has helped more customers deploy Hadoop-based enterprise data hubs to production than all other distribution vendors combined.

Community Involvement for Across-the-Stack Support

Across-the-stack support describes the vendor’s ability to help a customer keep their system running, available, and performing for their use case(s) across the entire platform—not just for the Hadoop core (HDFS, MapReduce, and YARN). This process

(5)

CLOUDERA’S COMMITMENT TO OPEN SOURCE AND OPEN STANDARDS

WHITE PAPER

comprises a continuum of diagnostics and root cause analysis; workarounds (immediate/ temporary fixes); patches, bug fixes and enhancements (permanent fixes); and tuning and optimization.

To be effective, this process requires not only a deep familiarity with all components across the stack but the related ability to implement any necessary code changes across it, as well. This process provides that:

• Your critical systems are available and optimally tuned at all times.

• Your operations team needn’t spend extensive cycles (or make resource investments) to become proficient with the platform.

• Your issues are resolved efficiently, comprehensively, and permanently.

Cloudera is uniquely qualified to provide the above because we employ more code contributors and committers across the Hadoop stack than any other vendor, and because they collectively contribute more code to upstream Hadoop ecosystem projects than any other vendor’s employees. Our deep understanding of each component, combined with our ability to affect code-level changes across the platform, gives us a unique ability to provide comprehensive, production-grade support to our customers. (Furthermore, Cloudera Manager—the most mature, extensible, and complete cluster management suite in the industry—makes ongoing maintenance and support much easier.)

As an example of how this process works, consider the example of a Cloudera Enterprise customer that has documented and reported a problem with HDFS. As a parallel process, Cloudera engineers reproduce the issue and raise/move a JIRA through the Apache commit process, as well as provide a patched CDH build to the customer that may be deployed immediately via rolling upgrade. After the patch is committed upstream, Cloudera includes that patch in the next quarterly CDH release (see “The Cloudera Software Distribution Lifecycle”)—which the customer subsequently uses to replace their custom build, at a time of their choosing (again, via rolling upgrade) and without any fear of breaking existing applications.

As a byproduct of this process, more than half of all Hadoop-related tickets that are closed/ resolved by a platform vendor employee are assigned to Cloudera employees (source: Apache JIRA), and our support engineers are omnipresent on project mailing lists (and in some cases, write patches themselves)

In contrast, with a different approach, the customer would either break upstream compat-ibility or have to wait for their patch via the next upstream Apache release. In either case, the customer will be deprived of all the benefits of a stable platform over the long term.

Impact on the Roadmap

Any claim on impact on the roadmap has a very specific implication: The vendor’s ability to drive the strategic direction of the open source platform to meet the needs of its customers. The requirements for meeting this expectation are relatively straightforward: the vendor must have a leadership (committer or PMC member) position within each component’s project in order to represent customer interests as well as implement code changes, and the vendor must have the skillsets, credibility, and experience to create and integrate new projects and encourage external contributions as needed.

Cloudera takes this mission seriously, with employee committers holding approximately 90 seats across all of Apache’s Hadoop projects. Thanks to this leadership position, in the constant effort to align the platform roadmap with customer needs, Cloudera has the

(6)

CLOUDERA’S COMMITMENT TO OPEN SOURCE AND OPEN STANDARDS

best track record of contributing key enterprise features (examples: HDFS NameNode HA, MR1 HA, HttpFS, network encryption, HBase snapshots, HDFS caching, HDFS encryption) to the Apache open source codebase - as well as shipping/supporting those features into our platform.

Furthermore, more than a dozen ecosystem projects have been founded by our employees to fill functionality gaps, and consequently adopted by other platform vendors, including:

Project

Function

Shipped by:

Hue Graphical UI /Web App

Framework Cloudera, Hortonworks, MapR Impala Interactive SQL query Cloudera, MapR, Amazon Parquet (co-founder) Columnar file format Cloudera, IBM, MapR, Pivotal Apache Flume Streaming data ingest Cloudera, Hortonworks,

IBM, MapR Apache Sentry

(incubating) Role-based authorization and control Cloudera, IBM, MapR Apache Sqoop RDBMS connectivity Cloudera, Hortonworks, MapR ...and others, including Apache Avro, Apache Bigtop, Apache Crunch, and Kite SDK. No other vendor can match this combined portfolio of successful ecosystem projects and contributed features that are in production use with customers, today. Furthermore, Cloudera brings community-driven innovations to customers in the form of a platform that has been battle-tested for business-critical production workloads since 2008.

Commitment to Open Standards

Even with this deep and broad involvement in the open ecosystem, freedom from platform lock-in would not be guaranteed without an equally strong commitment to open standards. An “open standard” can be defined in the context of Hadoop as a platform component that is shipped and supported by multiple vendors. These standards emerge on the basis of their widespread adoption by users and other open support projects such that commercial vendors then prioritize support and certification for them. For that reason, open standards:

• Have a track record of continuous support and investment across vendors, ensuring that architectures built on them today will be sustainable for the future.

• Enable customers to choose the best support partner for their needs, and have the confidence that they can find support elsewhere if they choose to make a change.

• Ensure compatibility within and across the ecosystem.

Cloudera is the main shipper and supported of open standards in the ecosystem – in fact, every major component in CDH is shipped by at least one other vendor in addition to Cloudera. It’s important to note that multivendor support is NOT a feature of all compo-nents in the Hadoop ecosystem, and that the use of an Apache-licensed, Apache-governed component is not a guarantee of freedom from lock-in or sustained, long-term investment.

(7)

CLOUDERA’S COMMITMENT TO OPEN SOURCE AND OPEN STANDARDS

WHITE PAPER

The Cloudera Software Platform Lifecycle

The only “official” releases of Apache components are those that are voted as such by their respective developer communities; “Apache Hadoop” is simply that. (Any platform vendor that would have you believe otherwise is being disingenuous.) But thanks to the magic of the Apache License, deep and wide involvement in upstream development across the Hadoop ecosystem, and continual customer and partner feedback, the path is clear for Cloudera to bring users new production-ready Apache code regularly and predictably— while maintaining a stable, consistent platform across releases.

With that approach, users get the best-of-both-worlds benefits of a platform that is both stable and continually refreshed with new innovations. But how?

Major Releases

Each major release of CDH (aka CDH X) begins with inclusion of the latest stable releases of Apache components after extensive testing, integration, and tuning (fit-and-finish). In cases where functionality is not production-ready or compatibility is broken across those major releases, we’ll often skip the problematic parts—choosing instead to curate critical bug fixes and features and backport them into whatever release is already present in CDH. (For example, due to backward incompatibility across Apache Hive 0.10 and 0.11, Cloudera never shipped the latter in its entirety.)

Minor and Point Releases

Thanks to the broadest customer and partner feedback channels in the industry, Cloudera’s Apache com- mitters are also continually writing new bug fixes and contributing them to the project trunks upstream. (Cloudera has an “upstream-first” policy; patches always go there as a first step.) In some cases, they are writing and committing entire features—some of which were described in the previous section—to plug functionality gaps.

Users who rely on Apache exclusively have to wait for an official Apache release to get access to those patches (in some cases, forever)—and when they do, their only option is to consume the entire patchset, regardless of their impact on existing applications. In contrast, for CDH users, every three months critical patches are selectively aggregated and backported to CDH and made available in the form of minor releases (aka CDH X.Y)—with some very critical ones shipping as point releases, as well. In all cases, Cloudera is diligent about ensuring that these patches don’t alter application behavior (or worse, break applications entirely).

CDH

Trunk Development Over Time

Stable, Released Code

Critical New Bug

Fixes & Features

(8)

CLOUDERA’S COMMITMENT TO OPEN SOURCE AND OPEN STANDARDS

For these reasons, CDH is always straddling the present and future of trunk development. Users get the best of both worlds: stable, released code in combination with curated, forward-looking features and bug fixes. The advantages being:

• Users can confidently access new Apache releases after extensive testing and integration work.

• User can count on their issues being fixed permanently upstream.

• Users can access the most critical new upstream bug fixes and innovations at a regular cadence, between Apache releases.

• Compatibility and stability is ensured across releases, as well as with the upstream project trunks.

• Upgrades are significantly easier.

This approach has been validated time and time again by Cloudera’s customers as the best option for enterprise-class deployments. And if they’re successful, so are we.

The Value of Support Subscriptions

Support in the form of an annual subscription is one of the most important services that Cloudera provides. With a Cloudera Enterprise subscription, you get the benefits of:

• Support as a strategic advantage. Unique to Cloudera, our Predictive Support model means we’re regularly monitoring the status of your environment (via Cloudera Manager), allowing us to isolate and prevent issues before they even occur. We also ensure that customers are optimizing their use of Cloudera’s technical resources, starting with the onboarding process, by analyzing support cases and platform usage across all deployments proactively.

• Dedicated experts across the globe. Cloudera employs a team of engineers around the world that are dedicated to customer success. Each team member has deep expertise across the enterprise data hub, as well as extensive experience with IT and data management infrastructures. Our team is unmatched in its ability to provide timely issue resolution and effective systems integration and optimization.

• Leadership in the Hadoop ecosystem. As described previously, Cloudera’s team of project committers and founders plays a leading role in planning and development across the ecosystem. In addition to extensive knowledge and experience with Hadoop, Cloudera’s support and engineering teams can go beyond troubleshooting and workarounds to provide enhancements that matter to customers.

• Access to the full spectrum of Cloudera Manager features. Cloudera Enterprise support customers have access to enterprise-class Cloudera Manager features such as LDAP support, rolling upgrades, automated disaster recovery, and advanced monitoring and reporting.

Freedom From Lock-in in Practice and Principle

Portability is defined as the ability to migrate from one vendor’s open source platform to a competing platform or one built internally, in a non-disruptive way – allowing you to make purchasing (or extension) decisions completely based on merit. Portability pertains to technical architecture and the ability to obtain support from other sources: Unless the components of your platform that store or process data are truly portable, switching costs will be prohibitive regardless of license permissiveness.

(9)

1-888-789-1488 or 1-650-362-0488

Cloudera, Inc. 1001 Page Mill Road, Palo Alto, CA 94304, USA

cloudera.com

About Cloudera

Cloudera is revolutionizing enterprise data management by offering the first unified Platform for Big Data, an enterprise data hub built on Apache Hadoop. Cloudera offers enterprises one place to store, access, process, secure, and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Cloudera’s open source Big Data platform is the most widely adopted in the world, and Cloudera is the most prolific contributor to the open source Hadoop ecosystem. As the leading educator of Hadoop professionals, Cloudera has trained over 22,000 individuals worldwide. Over 1,200 partners and a seasoned professional services team help deliver greater time to value. Finally, only Cloudera provides proactive and predictive support to run an enterprise data hub with confidence. Leading organizations in every industry plus top public sector organizations globally run Cloudera in production. www.cloudera.com. Because the Apache components in CDH contain the same code that is found in the upstream Apache projects (as described above), those components are fully portable to their Apache counterparts. Furthermore, whether you are a paying support customer or a self-supporting user, you are using the precisely the same CDH code.

Consequently, customers have the freedom to choose a Cloudera Enterprise subscription solely based on the value it provides. If they choose, they can either discontinue their subscription and self-support on CDH, or move their data out of CDH to an internally built platform based on stock Apache Hadoop or to another Apache-derived platform (albeit with the loss of differentiating features of CDH, such as interactive SQL query, as a byproduct of the migration process).

Summary: Commitment to Standards and Customer Success

Bring Open Source Benefits Home

You should now thoroughly understand not only why open source software makes a positive difference for customers in a generic sense, but also the requirements that a Hadoop platform vendor specifically has to meet to ensure a long-term, successful deployment.

You now also have a good understanding why Cloudera, because of its total commitment to meeting those requirements and to supporting and shipping standards, is uniquely qualified to bring you that success with a Hadoop-based enterprise data hub.

References

Related documents

We have audited, in accordance with auditing standards generally accepted in the United States and the Comptroller General of the United States’ Government

Sponsored by the Flight Attendant Medical Research Institute and the San Francisco Aeronautical Society, the symposium is presented in conjunction with the San Francisco

9.2.1 A medical director with a full time commitment to the operation of the ICU and who is a Fellow of the College of Intensive Care Medicine. The medical director must have

However, image of a space object could be taken at any point in the sphere centered at the object, and the appearance of the same satellite changes greatly in images taken

An implementation plan, currently in development at a metropolitan university, maps out how the educational development team is building acceptance of UD principles across the

This is an indirect test of the model extension of Ghatak ( 1999 ) in Subsection 3.1 , which predicts a negative repayment effect, against Katzur and Lensink ( 2012 ), who show

Early childhood services, schools, parents and carers all have an important role to play in supporting children’s mental health, wellbeing and development during this period

Choose the correct alternative to fill the missing term/terms in the given series.. They have certain relationship between them. The same relationship exists between the