• Resources
  • Jobs
  • Blog
  • Contact
  • 415.538.8484

DesignMind

Big Data, Business Intelligence, and Data Analytics Consultants

 
  • Company
    • Meet the DesignMind Team
    • Partners
    • Subscribe
    • Resources
    • Datasheets
    • White Papers
  • Services
    • Big Data Consulting
    • Business Intelligence & Data Warehousing
    • Cloudera Integration
    • Data Science
    • SQL Server and Database Services
    • Software Development
    • Technical Staffing
  • Videos
  • Meetups
  • Clients
July 24, 2014Mark Kidwell

Hadoop Speeds Data Delivery at Bloomberg

Many organizations are adopting Hadoop as their data platform because of two fundamental issues:

  1. They have a lot of data that they need to store, analyze and make sense of, on the order of 10s of terabytes or greater.
  2. The cost of doing the above in Hadoop is significantly less expensive than the alternatives.

But those organizations are finding there are other good reasons for using Hadoop and other NoSQL data stores (HBase, Cassandra). Hadoop has rapidly become the dominant distributed data platform, much as Linux quickly dominated the Unix operating system market. With that platform come a rich ecosystem of applications for building data products, whether it’s for the growing SQL on Hadoop movement or real-time data access with HBase.

At the latest Hadoop SF meetup at Bloomberg’s office, two presenters discussed how Bloomberg was taking advantage of this converged platform to power their data products. Bloomberg is the leading provider of securities data to financial companies, but they describe their data problem as “medium data” – they don’t have as much data to deal with, but they do have strong requirements around how quickly they need to deliver it to their users. They have thousands of developers working on all aspects of these data products, but especially custom low-latency data delivery systems.

When Bloomberg explored the use of HBase as the backend of their portfolio pricing lookup tool, they had quite a challenge – support an average query lookup of 10M+ cells in around 60 ms. Initial efforts to use HBase were promising, but not quite fast enough. Through several iterations of optimization, including parallel client queries, scheduling garbage collection, and even enhancing high availability further to minimize the impact of a failed server (HBASE-10070), they were able to hit their targets, and allow the move from custom data server products to HBase.

With the move to Hadoop, Bloomberg’s also needed better cluster management capabilities. Open-source tools are already dominant in this space, and while Bloomberg leverages a combination of Apache Bigtop for Hadoop, Chef for configuration management, and Zabbix for monitoring, many other good tools exist (I’m most fond of Ansible, Monit and proprietary Cloudera Manager personally). Combining the abilities of the Hadoop platform for developing and running large-scale data products with more efficient provisioning and operational models gives Bloomberg exactly what they need. It’s a model that’s going to play out repeatedly in the coming years at many organizations as Hadoop proves its capabilities as a modern data platform.

Mark Kidwell is Principal Big Data Consultant at DesignMind. He specializes in Hadoop, Data Warehousing, and Technical and Project Leadership. 

 

Recent Posts

Advanced Bookmarks and Buttons in Power BI

January 17, 2021

How To Export Underlying Data in Power BI

December 15, 2020

Right Sizing Your Power BI Gateway on AWS

May 26, 2020

Creating Power BI Custom Visuals

April 21, 2020

My Power BI report is too slow. What tools and techniques should I use?

March 2, 2020

Are Your Power BI Performance Issues Due To High Memory Consumption?

January 17, 2020

Power BI Parameters – How to Use & Update Parameters in the Power BI Service

December 9, 2019

Power BI Tips: How To Increase The Number of Rows Exported to Excel

September 6, 2019

Power BI Data Modeling Tips for Developers and Analysts

October 15, 2018

Company

  • About
  • Team
  • Clients
  • Partners
  • Privacy and Cookie Policy

Resources

  • Blog
  • Resources
  • White Papers
  • Datasheets

Community

  • Microsoft Technical Groups
  • Bay Area BI User Group
  • San Francisco Data Platform
  • Silicon Valley Data Platform
  • San Francisco Power BI
  • Twitter
  • LinkedIn
  • Subscribe
  • Contact

© 2021 DesignMind. All rights reserved.