July 2015 San Francisco – Silicon Valley SQL Server and BI Meetings

Our Silicon Valley SQL Server User Group is on a roll!  If learning more about SQL Server and BI are on your bucket list, you’ll find some terrific speakers at the Bay Area user group meetings this month.

On July 2, 2015 in Mountain View, we’ll welcome well-known BI expert and author James Taylor, who specializes in Decision Management. His July 2 talk is titled “A New Approach to Defining BI Requirements”. You’ll learn about decision modeling with DMN, a new standards-based approach to modeling decisions. A proper approach to specifying decision requirements can maximize the ability to use advanced analytics.

Are you considering a career shift to the world of Big Data or Data Science? On July 8th, we have “Big Data & Data Science, How To Get Past The Hype and Into a Team”. Big Data and Data Science are two of the most hyped areas in high tech today. But what do Big Data and Data Science really mean, and how can you move into these hot career paths?  Andrew Eichenbaum gave this talk at SQLSaturday to a SRO crowd.

July 2015 Silicon Valley SQL Server BI Meetings

And lastly on July 21st at the Silicon Valley SQL Server User Group, SQL Server guru Ami Levin, will deliver Part 2 of his eternally popular topic, “Relational Databases: Where Are My Primary Keys?”.  Ami will revisit some of the fundamental design principles of relational databases: normalization rules, key selection, and the controversies associated with these issues from a practical perspective.




DesignMind Showcases Food And Beverage Data Services at FoodBytes

DesignMind joined the excitement at FoodBytes in San Francisco on June 25th to showcase our data analytics and data science work in the food and beverage field. At FoodBytes, entrepreneurs gathered in San Francisco to show off how they want to change the food industry.  FoodBytes 2.0 was a world-class, half-day event dedicated to helping investors in the food industry meet new companies that are disrupting and/or innovating in food-related software and applications, distribution, and manufacturing.

SF New Tech founder Myles Weissleder is known for his company which is like speed dating for startups.  At FoodBytes the speakers shared that infamous five minute timer.

The demand for food and beverage data analytics is growing rapidly.  With the right tools and expertise, it’s now cost-effective for companies to gather their own data, analyze it, and develop speedy insights.  These insights may drive product development, operations, marketing or customer service. The challenge facing most companies is developing the internal competencies to analyze the data in an effective manner and develop actionable insights across an entire organization.

Andrew Eichenbaum, DesignMind’s Chief Data Scientist, had his five minutes of fame early on in the day. You can catch Andrew in this ABC7 News clip.

DesignMind To Sponsor FoodBytes 2.0 Summit on July 25th

DesignMind has signed on as a technology sponsor at the FoodBytes 2.0 Summit, a unique half-day conference about new ideas in food, technology, and capital. The event is on June 25th in San Francisco at the Bluxome Winery.  FoodBytes is hosted by SF New Tech and Rabobank, and is designed to bring together early stage food and agriculture companies with funders and the general public.  DesignMind is a technology partner to Columbus Foods, Jamba Juice, and Kendall Jackson wines, and has specialized expertise in data science in the food industry.  Andrew Eichenbaum, who heads DesignMind’s Data Science group, will represent DesignMind at the event.

Ten companies will demonstrate their business models from desktop farms to homebrew beer to your door. There will also be discussions with industry leadersFoodbytes-2.0 around the state of food based funding, and the future of ag tech. The FoodBytes 2.0 Summit is dedicated to helping investors in the food industry meet new companies that are disrupting and/or innovating in food-related software and applications, distribution, manufacturing, production and more.

FoodBytes attendees – investors, technologists, media, and others – will view live demonstrations from innovators who are disrupting the food industry, meet food industry leaders, and network. The event will feature live demos from pioneers in food and agriculture, and a fireside chat on the implications of technology on the future of food.

Tickets can be purchased on the FoodBytes 2.0 Summit website.  You can get a 20% discount using the code “DesignMind20“, or just use this direct link.  See you there!

June 2015 Bay Area User Group Meetings

You’ll find some terrific speakers at the Bay Area user group meetings in June.

In Zero To Dashboard In 60 Minutes on June 4th in San Francisco, you’ll learn all about Power BI and the incredible dashboards you can create, Josh Vickery gave this presentation to the Bay Area Microsoft Business Intelligence User Group last month in Mountain View and it was fantastic.

June 2015 Bay Area User Group

On June 10th at the San Francisco SQL Server user Group, we have The Analytics of Growth Hacking with Jack Mardack of the hot mobile data-sharing app, Chartcube.

And lastly, SQL Server guru Ami Levin, who recently joined energy technology company Tachyus, will speak on June 16th about the fundamentally important topic, Relational Databases: Where Are My Primary Keys?  This meeting of the Silicon Valley SQL Server User Group will be at the Microsoft campus in Mountain View.

Mark Ginnebaugh CEO of DesignMind.


Power BI: 5 Things You Need to Know Now

Power BI has been out for a while. It’s a reporting solution that Microsoft has developed, enabling analysts, BI developers, and power users to create ad hoc analysis in the most popular BI tool in the world: Excel. However, recently, Microsoft is putting serious focus on Power BI in terms of making it more accessible, easier to use, and visually more powerful than any analytics tool they’ve ever built beforPower BI e.

So what is Power BI? What do you need to buy and how can you make it work for you? Here are five things you need to know so you can embark down the path of data discovery.  This latest version of Power BI is an incredibly useful toolset that helps you ingest data, prepare it, analyze it, and present it effectively to your team or customers.

1. Power BI is available for free with Office 2013 Pro Plus or a business license for Office 365.
To have availability to all the Power BI tools in Excel, including Power View, you will need at least Office Pro Plus version. Excel 2010 users can download Power Pivot for free, but will not have access to the rest of the Power BI tools. It’s also worth mentioning that if your company has a business license for Office 365, you not only have the right version of Excel 2013 to run all the Power BI tools, but you get to install Office 2013 Pro Plus on up to five machines. Christmas may have come early for some of you just after I said that.

2. Don’t have Office? You can still get Power BI Public Preview for free.
The latest Power BI service works without Office, Excel, or Office 365. It includes a free Power BI Dashboard that allows you to import data from databases, web apps, or flat files and then create visually compelling dashboards full of interactivity. You can download it here free.

3. Power BI for Office 365 can refresh data from Azure or your on-premise servers.
Power BI can refresh data from Azure without any additional setup. For customers with on-premise SQL Server, Oracle, or other database vendors, you can set up a Data Management Gateway service on your server to automate data refreshes from your OLTP systems. Whether you use Power Pivot or Power Query to develop your Power BI creation, you can have a Hybrid environment that’s secure, backed by encryption, and easy to use.

4. You can create impressive dashboards and visualizations with Power BI Designer.
The days of setting up drivers and writing custom code to ingest data from popular SaaS applications is past.  The new Power BI Designer can bring in data from SalesForce, Google Analytics, Marketo, etc. and visualize that data instantly.  It has easy connectors to pull data, do mashups, and create impressive dashboards in minutes.

5. Don’t have Office 365 or the right version of SharePoint? No problem. We have Power Update.
Now with Power Update, you don’t need to worry about refreshing your workbooks. This wonderfully simple utility can help refresh your Power BI workbooks without a Power BI tenant in the cloud, or even worrying about data in the cloud. You can refresh your workbooks right on a file server for your viewers to see.

Angel Abundez is VP, Business Intelligence at DesignMind. He specializes in Microsoft SQL Server BI tools, SharePoint and ASP.NET.  Angel heads DesignMind’s Business Intelligence group.

Big Data, BI, and SQL Server at SQLSaturday Silicon Valley

Are you a business analytics or SQL Server professional in the Bay Area? If you’re trying to keep up with the cutting edge developments in the data world, or just hone your skills, join us at SQLSaturday Silicon Valley on March 28, 2015.

It’s a free, all-day event at the Microsoft Technology Center in Mountain View featuring 30+ in-depth sessions with top SQL Server experts.  You’ll learn valuable skills and best practices for analyzing, visualizing, and reporting on data, as well as unlocking the power and promise of Big Data.

Top experts will cover Power BI, DBA, data visualization, Document DB, Excel, new developments with Flash Storage, and  Power Pivot. They’ll share how to get the most from your business data and make better data-driven decisions.

Here are just a few of the sessions you can attend:

  • Intro to Time Series Forecasting – Peter MyersSQLSaturday Silicon Valley
  • A Practical Guide To Using Charts & Graphs – Dan Bulos
  • Storage For the DBA – Denny Cherry
  • Automating Power BI Creations – Angel Abundez
  • Becoming a Top DBA – Automation in SQL Server – Joseph D’Antoni
  • Roles on a Big Data Team – Andrew Eichenbaum
  • Intro to Azure DocumentDB – Ike Ellis
  • Automating Your Database Deployments – Grant Fritchey
  • Flash & SQL Server – Re-Thinking Best Practices – Jimmy May
  • Common SQL Server Mistakes and How to Avoid Them – Tim Radney

You can see all the sessions here.  Come join us on the 28th – and tell your SQL Server and BI colleagues about this incredible day of learning and networking.

Big Data: How To Do It Right

If you’re beginning your first foray into analyzing your organization’s Big Data, you need to spend some time thinking about the big picture. The payoffs of effectively analyzing your data can be enormous, but you need to plan carefully in order to achieve the optimal outcome.

Here’s a checklist of questions to consider:

  • What data do you have, and what can you obtain?Big Data DesignMind
  • What problems do you have that might be solvable, given sufficient understanding of your data in a perfect-world scenario (without yet trying to determine what is technically possible with today’s tools)?
  • Do you have the resources and executive buy-in to pursue high ROI opportunities uncovered by your initiative?
  • Have you evaluated the various Hadoop vendors such as Cloudera, MapR, Qubole, or Hortonworks, to see what each has that sets them apart from their competitors?
  • Do you have the internal resources to oversee your big data project and continue to reap the benefits going forward? Read our whitepaper on Building Your Big Data Team to see what kind of manpower you’ll need.

Big Data is a popular buzzword these days, but it is really important, as companies who know how to extract valuable information have a huge competitive advantage. However without proper planning, a Big Data project can become a money pit. You should definitely do your homework first!

Hiring a Data Scientist: Interviewing Basics

Interviewing Basics DesignMind With a set of candidate resumes in hand, you now have the task of the interview…

Let me start with a personal view that the hiring system is completely screwed up. A candidate is judged fit or unfit for a position with only about one day of interaction. These interactions are split up amongst a handful of people, so each person has about an hour to say if they want to spend more waking time with this person then they do their own family.

First, you must agree on the interviewing basics.  Here’s the basic advice for all tech hiring that I gave my team here at DesignMind in San Francisco:

1. You should have a mid-size group of people interviewing the candidate. Five to eight is a good range of formal interviewers. There can be more if the candidate goes out to lunch with a group, or if you do some pair interviewing. But after 6-8 interview sessions, almost any candidate will burn out.

2. A range of people need to interview the candidate. Having people who do similar work is definitely necessary, but people outside of the core group should also be on he interviewing team. Knowing if a candidate can talk to people of different backgrounds is a requirement if this person will be on cross-functional teams. Also, knowing how a candidate will interact with a perceived “subordinate” is great insight on how that person works inside of an organization.

3. Interview feedback should happen within an interview team pow-wow within a day of the interview. First round of responses should be “yes”, “no”, or “maybe”, where:

  •   No means no
  •   Maybe means no
  •   Yes means maybe

There can be mitigating circumstances where a maybe can be turned into a yes. But if not, you need to pass on the candidate.

4. If the hiring team gives the candidate a yes, it is time to check references. If there are any dubious responses, you need to dig into them and find the reason. Often, finding someone in your extended network who has worked with this person is a great way to get an honest, unbiased answer.

5. Last, and most important, the candidate is interviewing you and your company during the interview process. Don’t forget to sell your company and its people during the interview.

Andrew Eichenbaum is VP, Data Science Solutions at DesignMind. He specializes in data mining, data modeling, and artificial intelligence. Andrew heads DesignMind’s Data Science division.

Hiring a Data Science Team: Hiring Your First Data Scientist

In the first blog post Hiring a Data Science Team: Types of Data Scientists, we discussed types of Data Scientists. Now you’re at the point where you’ve decided that you need your first data scientist, and need to decide who to hire.

Most groups who are hiring their first Data Scientist are in one of two situations:Data Scientist

1. You have data, and we have questions, but we do not know how to answer the questions given our data.
2. You need help understanding our business in a data centric format.

If you’re in group one, I suggest looking for an Algorithms Expert or Statistician. They’re very good at finding known or novel ways of answering the questions you have at your fingertips.

However, if you’re in group two, a bit more thought needs to go into your selection. Let’s start with three simple yes/no questions:

1. Do you have a good handle on our data flows, storage, and reporting?
2. Do you fully understand the data you’re currently acquiring and have stored in your databases?
3. Do you have a set of well defined questions you want answered?

If you answered “no” to #1, you need a Data Wrangler or Data Miner/Algorithms expert with significant data management experience. This answer overrides both other answers, because if you don’t have a good handle on the raw data, you won’t have a handle on anything downstream.

If you answered yes to #1 and no to #2, you need a Data Miner. The first job of your new Data Scientist will be to come in and validate all of your raw data and base assumptions.

If you answered yes to #1 and #2, but no to #3, you need an experienced Algorithms Expert or Data Miner. Your Data Scientist is there to help you define your data driven path. Their first job is to understand the current status of your analytics systems and make suggestions on where and how to make improvements to the current systems.

Finally, if you answered no to #1 and yes to #2, you are lying to yourself. When you don’t have a good understanding of what data is coming into your system and how it gets there, you can never be sure of the quality of your results.

In the next installment, we’ll talk about the interview process for hiring a Data Scientist.

Andrew Eichenbaum is VP, Data Science Solutions at DesignMind. He specializes in data mining, data modeling, and artificial intelligence. Andrew heads DesignMind’s Data Science division.

Hiring a Data Science Team: Types of Data Scientists

Hiring a new member to your team is always a daunting task. Now combine that with looking to fill “The Sexiest Job of the Century”, Data Scientist, and you have quite the conundrum on your hands:

  • Who and what is a Data Scientist?Types of Data Scientists
  • How do you hire one?
  • Where do you find them?
  • How do you vet them?

Over the next few weeks we’ll discuss these topics in a series on the DesignMind blog. To start, let’s discuss the four major categories of Data Scientists:

Algorithms Expert These are the people who will ask you what questions you want answered. They then try to answer these questions by matching the form and format of your available data to a set of Machine Learning or Optimization techniques. Algorithms Experts usually come from a Computer Science, Electrical Engineering, or Mathematics background.

Data Miner Data Miners are “why” scientists who ask why you’re asking the questions you are asking. They then try to find patterns in the data and build individual or derived Performance Metrics that will help focus the business in their direction and outcomes. Data Miners usually come from a science-based background like Physics, Biology, or Chemistry.

Data Wrangler Just like a cowboy, a data wrangler will manage your data flows and makes sure data is internally consistent. They look at your raw data and say, what do you need from this and what are you missing, then architect and build the systems to accomplish this. Data Wranglers come from a diverse set of backgrounds.

Statistician This is a mathematician who looks for patterns in your raw data. They are the classic actuary, where given a set of possible outcomes, they try and look for patterns in your data stream that will try to predict any of the outcomes. Statisticians usually come from Applied Math or Statistics background.

It’s interesting to note that most Data Scientists are a blend of more than one category, and that’s a good thing as Data Scientists are required to fill multiple roles.

In the next installment, we’ll talk about hiring your first Data Scientist and how they fit into your team.

For further reading, I suggest:

Andrew Eichenbaum is Principal Data Science Consultant at DesignMind. He specializes in data mining, data modeling, and artificial intelligence. Andrew heads DesignMind’s Data Science division.

Hadoop Speeds Data Delivery at Bloomberg

Many organizations are adopting Hadoop as their data platform because of two fundamental issues:

  1. They have a lot of data that they need to store, analyze and make sense of, on the order of 10s of terabytes or greater.
  2. The cost of doing the above in Hadoop is significantly less expensive than the alternatives.

But those organizations are finding there are other good reasons for using Hadoop and other NoSQL data stores (HBase, Cassandra). Hadoop has rapidly become the dominant distributed data platform, much as Linux quickly dominated the Unix operating system market. With that platform come a rich ecosystem of applications for building data products, whether it’s for the growing SQL on Hadoop movement or real-time data access with HBase.

At the latest Hadoop SF meetup at Bloomberg’s office, two presenters discussed how Bloomberg was taking advantage of this converged platform to power their data products. Bloomberg is the leading provider of securities data to financial companies, but they describe their data problem as “medium data” – they don’t have as much data to deal with, but they do have strong requirements around how quickly they need to deliver it to their users. They have thousands of developers working on all aspects of these data products, but especially custom low-latency data delivery systems.

When Bloomberg explored the use of HBase as the backend of their portfolio pricing lookup tool, they had quite a challenge – support an average query lookup of 10M+ cells in around 60 ms. Initial efforts to use HBase were promising, but not quite fast enough. Through several iterations of optimization, including parallel client queries, scheduling garbage collection, and even enhancing high availability further to minimize the impact of a failed server (HBASE-10070), they were able to hit their targets, and allow the move from custom data server products to HBase.

With the move to Hadoop, Bloomberg’s also needed better cluster management capabilities. Open-source tools are already dominant in this space, and while Bloomberg leverages a combination of Apache Bigtop for Hadoop, Chef for configuration management, and Zabbix for monitoring, many other good tools exist (I’m most fond of Ansible, Monit and proprietary Cloudera Manager personally). Combining the abilities of the Hadoop platform for developing and running large-scale data products with more efficient provisioning and operational models gives Bloomberg exactly what they need. It’s a model that’s going to play out repeatedly in the coming years at many organizations as Hadoop proves its capabilities as a modern data platform.

Mark Kidwell is Principal Big Data Consultant at DesignMind. He specializes in Hadoop, Data Warehousing, and Technical and Project Leadership. 


Online Learning: Next-Gen Education

Technologists, venture capitalists, educators, policy makers, investors, and edtech entrepreneurs gathered in San Francisco on June 24th to discuss the productive use of technology to transform education – from pre-K to life-long learning.  Sponsored by SVForum, law firm Orrick, Herrington, and Sutcliffe, and Microsoft, the Next-Gen Education conference brought together some of the most progressive education companies in the world.Online Learning

Discussions about the K-12 world included experts from Kidaptive, KIPP, Rethink Education, Clever, and EdSurge.  Higher education experts were from NovoEd, Pathbrite, InsideTrack, Learn Capital, Minerva Project, and Udemy.

Using leading edge Big Data and Business Intelligence technologies, these organizations are able to bring online learning to students around the world. Just one example of these groundbreaking companies is NovoEd. Founded in 2012 by Stanford University professor Amin Saberi, NovoEd creates online courses that foster more social interactions between students and teachers. NovoEd’s list of partners now includes Stanford, Princeton, University of Michigan, University of Virginia Darden School of Business, Wharton, and the Carnegie Foundation, among others.

Worldwide, what are the major markets for online classes? Students in the United States by a landslide, followed by online learners in India, the United Kingdom, Canada, and China.


Joy Mundy of Kimball Group on Dimensional Modeling

Joy Mundy Kimball Group

Joy Mundy of the renowned IT consulting firm Kimball Group focuses on Data Warehouse and Business Intelligence solutions.

She spoke at SQLSaturday Silicon Valley on designing dimensional models. Joy also emphasized the importance of consulting with business users during the design process. “Get as close to the business users as possible and make them part of the design team,” she advised. Before joining Kimball Group, Joy worked on Microsoft’s BI Best Practices Team.

“The biggest problem I think is that we tend not to talk enough to the business users about what their requirements are and instead build our designs from a technology perspective rather than from a business perspective. It’s a very old problem, there is nothing new here.

The problem exists mainly because we, the people who are in charge of building these systems are technical people because it’s, there is a lot of technology involved and a lot of moving parts that are difficult to put together, but the underlying problem is a business problem and it’s out of the normal comfort zone and skill set of the technological people who are building the solutions.”