• Resources
  • Jobs
  • Blog
  • Contact
  • 415.538.8484

DesignMind

Big Data, Business Intelligence, and Data Analytics Consultants

 
  • Company
    • Meet the DesignMind Team
    • Partners
    • Subscribe
    • Resources
    • Datasheets
    • White Papers
  • Services
    • Big Data Consulting
    • Business Intelligence & Data Warehousing
    • Cloudera Integration
    • Data Science
    • SQL Server and Database Services
    • Software Development
    • Technical Staffing
  • Videos
  • Meetups
  • Clients
January 28, 2016Mark Ginnebaugh

Choosing a Data Lake Storage System

Data Lake Storage System: Which One Should You Use?

When you create a Data Lake, one of the most overlooked questions is, “What storage technology should back the lake?” Most companies just go with whatever tech stack they are familiar with, or are being sold.  In reality, the Data Lake storage system should be chosen using the same questions you ask when you build out any other piece of the system:

Data Lake Storage System

1. Does the system cover all requirements and SLAs that are currently known?
2. Can the system be easily expanded if more functionality (or space) is needed?
3. Is the system in line with budgetary and engineering talent constraints?

Once these questions have been reviewed and answered the selection of storage technology can be started.

There are five widely accepted storage systems being used for Data lakes. Each of them have both pros and cons as the basis for a Lake.

Data Storage System Pros and Cons

Type of System Pro Con
Hadoop Based System Easily expandable and cheaper storage Slower data retrieval times
Non-Hadoop Based Storage + Hadoop / non-Hadoop Compute, e.g. S3 + Hive / Spark Decouples storage and compute, optimized for cloud platforms More difficult to implement on-prem
Massively Parallel Processing System (MPP), e.g. H.P. Vertica or IBM Netezza Fast record retrieval and ease of setup High Cost
NoSQL System (Cassandra, HBase) Easily expandable and fast Tech community less familiar with NoSQL systems
SQL Database (SQL Server, Oracle, MySQL) Well defined technology Cannot handle large amounts of data without high cost

DesignMind Data Lake Storage System White Paper

At DesignMind, we have developed a proprietary pattern that not only ingests large amounts of data, but:

  1. Makes data available to users at all levels of the system
  2. Allows data to be accessed by multiple formats
  3. Allows for simplified schema evolution management

Read more in our white paper, “Data Lake Storage Systems That Work”. Questions? Contact us and we’ll get back to you promptly.

Recent Posts

Advanced Bookmarks and Buttons in Power BI

January 17, 2021

How To Export Underlying Data in Power BI

December 15, 2020

Right Sizing Your Power BI Gateway on AWS

May 26, 2020

Creating Power BI Custom Visuals

April 21, 2020

My Power BI report is too slow. What tools and techniques should I use?

March 2, 2020

Are Your Power BI Performance Issues Due To High Memory Consumption?

January 17, 2020

Power BI Parameters – How to Use & Update Parameters in the Power BI Service

December 9, 2019

Power BI Tips: How To Increase The Number of Rows Exported to Excel

September 6, 2019

Power BI Data Modeling Tips for Developers and Analysts

October 15, 2018

Company

  • About
  • Team
  • Clients
  • Partners
  • Privacy and Cookie Policy

Resources

  • Blog
  • Resources
  • White Papers
  • Datasheets

Community

  • Microsoft Technical Groups
  • Bay Area BI User Group
  • San Francisco Data Platform
  • Silicon Valley Data Platform
  • San Francisco Power BI
  • Twitter
  • LinkedIn
  • Subscribe
  • Contact

© 2021 DesignMind. All rights reserved.