Hortonworks for Hadoop – The Open Source Leader
Cloudera, Hortonworks, and MapR are among the choices of Hadoop distributions for supporting big data analytics. At DesignMind, our team of big data consultants is certified across various big data analytics tools, so we can create solutions to meet each of our clients’ unique needs.
Today, let’s discuss some distinguishing features of Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF).
Big Data Analytics Tools Offer Similar Functionality But Important Differences
HDP has the essential Hadoop ecosystem tools to store, process, and interact with your data. This collection of open source tools provides endless possibilities for integration, and has proven attractive to the ecosystem of technology vendors. Microsoft, Rackspace, SAP, and Teradata are among the strong group of partners Hortonworks has built. You probably already know the competing distributions all offer broadly similar functionality while making slightly different choices in how to provide it, so let’s go into some of the differences.
More than any other player in the landscape, Hortonworks is committed to everything in their distribution being totally open.Click to tweet
Hortonworks is all about Open Source
Among the big data analytics tools, Hortonworks is all about open source. CEO Justin Sears explains their philosophy: “Community-driven open source to fundamentally change the landscape of the enterprise software market. There is ultimately no way for proprietary development to outpace or outthink the legions of amazing developers contributing to open-source projects every minute of every day.”
More than any other player in the landscape, Hortonworks is committed to everything in their distribution being totally open. You can download it, compile it, and run it yourself, and don’t have to worry about certain features only being available in an upgraded proprietary option. When they build something new or make an acquisition of existing proprietary software, they immediately turn it open source. (For example, in 2014, they bought proprietary XA Secure and open sourced it as Apache Argus, now known as Apache Ranger.)
Another example of Hortonworks’ commitment to improving Hadoop with open standards is their role as a founding member of the Open Data Platform Initiative (ODPi) along with Pivotal, IBM and numerous other partners from the ecosystem.
According to the ODPi FAQ: ODPi is a nonprofit organization committed to simplification and standardization of the big data ecosystem with a common reference specification called ODPi Core.
As a shared industry effort and Linux Foundation project, ODPi is focused on promoting and advancing the state of Apache Hadoop® and big data technologies for the enterprise.
Hortonworks Offers Unique Tools Including Ambari, Ranger, and HDF
Being focused on the admin side of things myself, the first example I’ll discuss of a unique approach by Hortonworks is the tool used for cluster management. Hortonworks uses Apache Ambari as its primary interface to manage a Hadoop cluster and its components. It allows administrators to manage cluster health, view job logs, start/stop and add/remove services, etc. Others have similar proprietary solutions (Cloudera Manager, MapR Control System), but of the most widely used distributions only Ambari is open source.
I’ve already mentioned Apache Ranger, which was formerly proprietary software, which Horton purchased and open sourced. This is another area where Hortonworks and their competition have diverged. For enhanced authorization, Cloudera and MapR both went with Sentry (originally a proprietary Cloudera project which was open sourced) which provides similar functionality. Apache Ranger also provides some things like encryption, key management and Auditing that are provided by (the proprietary) Cloudera Navigator in Cloudera’s stack.
Another differentiator, Hortonworks has recently introduced HDF (Hortonworks Data Flow), this is currently anchored by Apache NiFi which is an open source Apache project focusing on data flow. NiFi is optimized to simplify the flow of data within an organization while providing the ability to interact with the data at a very granular level in near real time. You can grab a row of data from diverse sources —from Twitter, Mongo, wherever—analyze it, clean it up, and push it into one or more backend data stores for storage and analysis. NiFi also includes a web UI for a “Seamless experience between design, control, feedback, and monitoring”.
A Great Data Platform for On Premises or Cloud
There are many more areas where the competing distributions make differing choices (Tez, Navigator, Impala, MapR-FS, etc.) and they all have their strengths and weaknesses. What can’t be denied is that Hortonworks has created a strong data (and data flow) platform. It allows you to manage your data, data-in-motion and data-at-rest to empower actionable intelligence. And it works whether your data is on-premises or in the cloud.
So is Hortonworks the right choice for you? The answer won’t be the same for every organization and making the call requires understanding your data and your goals. You might want to check out Five Distinguishing Features of Hortonworks Big Data Analytics by my colleague Akshay Iyengar.
If you need assistance getting the most out of Hortonworks or other big data analytics tools, consider partnering with DesignMind on your journey to gaining better insights from your data. DesignMind’s team of big data consultants possesses deep, senior expertise. Our multi-vendor certified partners can help with Hortonworks or another tool in the Hadoop ecosystem to craft the perfect solution for your organization.
Mike Wilcox is Principal Big Data Consultant at DesignMind, and a Hortonworks Data Platform Certified Administrator. DesignMind consultants are experts in BI, Big Data Analytics Tools, Databases, and Cloud Solutions. Contact us in San Francisco to learn how you can harness your data.