A/B Experiment with Microsoft Windows Experimentation

Unleashing the Power of Data to Drive B2B Success 

Meet Mary Hu, Principal Data Science Manager at Microsoft, leading data science for Windows A/B Testing. Beyond her analytical expertise, she's also the co-founder of the Women in Data Science community at Microsoft, a platform she and Connie created to empower individuals in the data-driven world. Today, Mary shares her insights on the transformative power of A/B testing in our latest blog post. 

A/B Experimentation and Microsoft Windows

At DesignMind, we understand the evolving needs of businesses seeking data-driven solutions.  In this collaborative post with Mary Hu, we will delve into the world of A/B experimentation and its transformative role in Microsoft Windows. Together we will explore how Microsoft’s Windows Experimentation Team leverages A/B testing to enhance the Windows operating system and share real-world case studies that highlight the remarkable impact of data-driven decision making. 

What is A/B Testing? 

An A/B experiment, also known as split testing, is a methodology that allows businesses to compare two variations of a specific element within their marketing, website, or product. It involves dividing a target audience into two segments and presenting each segment with a different version of the element being tested. By measuring user behavior and engagement, companies can determine the most effective version and optimize their strategies accordingly. 


Source: Microsoft Teams Experimentation Team. 

The Essence of A/B Testing in Business: Microsoft Windows Client Side  

The OS differs from apps and online services in many ways that impact the engineering of an experimentation system as well as the execution of real-world experiments. For example, the interconnectedness of components, the criticality of their functionality, the possibility of offline scenarios, the complexity of data collection, and the process of OS updates all affect experimentation.  

For example, consider an experiment designed to improve the OS component that manages the lifetime of Windows processes [5] (e.g. suspending applications, discussed in detail in Section V.B). The feature has no UI, yet it may deeply impact users’ experiences. Determining the causal impact of changes (e.g. quality) is critical, but assessments must occur with devices being potentially offline. Furthermore, there needs to be a way to stop the experiment and return devices to the previous behavior faster than shipping code updates. 

Why isn't classic software testing not sufficient to validate the quality and impact of a code change in Windows? Here are five major motivations: 

  1. Myriad Software, Hardware, And Scenario Combinations: the surface area of an OS can be enormous. Various hardware and software configurations as well as the scenarios in which they are used (e.g. combinations of apps) can cause unforeseen behaviors, e.g. crashes and hangs. Complete testing of all combinations in-house is impractical; experimentation can help to assess the impact of changes in more contexts during development. A/B Experiment with Microsoft Windows
  1. Isolating Impact of one Change Amidst Many Concurrent Changes: Many features, improvements, and fixes go into a release of an OS. The concurrent nature of these changes creates difficulties in attributing issues to a specific change. This can be particularly problematic when an innovation has the potential to destabilize the system. Experimentation can help engineers isolate the impact of a single change, amidst concurrent changes, and make necessary adjustments during development. 
  1. Assessing trade-offs: While many features and behaviors of the OS may be deterministic, their impact may not be clear. Many entail tradeoffs between multiple desirable attributes (e.g. performance and battery life), which given real-world constraints may not all be achievable simultaneously. Experimentation allows teams to assess the trade-offs and to make informed decisions about their features. 
  1. Limiting the impact of failures: Unlike online services, the impact of failures can be very high in the OS. Online services can revert to known good behavior quickly and effectively; problems in the OS can be highly impactful and difficult to remediate. Experimentation can behave as exposure control and recall tool, limiting the potential downsides of innovations until the impact of changes are thoroughly understood. 
  1. User-facing OS features: while parts of the OS (e.g. networking and app platform) are hidden from the user, many other parts of the OS are user-facing. From navigation menus, to system alerts, to settings, many OS features are user-facing, and changes in these features can drastically impact user experiences. Consequently, some OS features have as much need for experimentation as user-facing online services or apps. 

A/B Testing is a Game Changer

A/B experiment case studies clearly demonstrate the immense potential of data-driven decision-making for businesses. By employing A/B experiments, B2B companies can optimize various aspects of their operations, such as website design elements, email marketing campaigns, and pricing strategies. These experiments help them gain valuable insights into customer behaviors, preferences, and expectations, enabling them to refine their strategies and achieve superior outcomes. 

At DesignMind, we encourage businesses to embrace the power of A/B experiments. With our expertise and guidance, you can unlock the full potential of your data and revolutionize your decision-making process. Reach out to us today and embark on a transformative journey towards data-driven success. 

Connie Yang is Principal, AI and Data Science at DesignMind.  She is an accomplished AI and Data Science leader with a strong background in data engineering.  Learn about DesignMind's AI and Data Science solutions


P. L. Li et al., "Experimentation in the Operating System: The Windows Experimentation Platform," 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), Montreal, QC, Canada, 2019, pp. 21-30, doi: 10.1109/ICSE-SEIP.2019.00011.