Real-Time Machine Learning at Headspace

Introduction

  • Current session bounce rates for sleep content
  • Semantic embeddings for recent user search terms (if a user recently searched for “preparing for big exam”, the ML model can assign more weight to Focus-themed meditations)
  • Users’ biometric data (i.e., if step counts and heart rate are increasing over the last 10 minutes, we can recommend Move or Exercise content)

Real-Time Inference Requirements

  • Ingest, process, and forward along relevant events (actions) that our users perform on our client apps (iOS, Android, web).
  • Quickly compute store, and fetch online features (millisecond latency) that enrich the feature set used by a real-time inference model
  • Serve and reload the real-time inference model in a way that synchronizes the served model with online feature stores while minimizing (and ideally avoiding) any downtime.

Challenges Adapting the Traditional ML Model Workflow

  1. Pull relevant raw data from upstream data stores: For Headspace, this involves using Spark SQL to query from the upstream data lake maintained by our Data Engineering team.

High Level Architecture Overview

The Publishing and Serving Layer: Model Training and Lifecycle Deployments

Updating and Reloading Real-Time Models

Blue Green Architecture

  • Maintain two parallel pieces of infrastructure (two copies of feature stores, in this case).
  • Designate one as the production environment (let’s call it the “green” environment, to start) and route requests for features and predictions towards it via our Lambda
  • Every time we wish to update our model, we use a batch process/script to update the complementary infrastructure (the “blue” environment) with the latest features. Once this update is complete, switch the Lambda to point towards the blue production environment for features/predictions.
  • Repeat this each time we want to update the model (and its corresponding feature store).

The Receiver Layer: Event Stream Ingestion with Apache Spark Structured Streaming Scheduled Job

  • It leverages the same unified language (Python/Scala) and framework (Spark) shared by data scientists, data engineers, and analytics team members, allowing multiple Headspace teams to reason about user data using the familiar Dataset / DataFrame APIs and abstractions.
  • It allows our teams to implement custom micro-batching logic to meet business requirements (for example, we could trigger and define micro batches based on custom event-time windows and session watermarks logic on a per-user basis).
  • It comes with existing Databricks infrastructure tools (Scheduled Jobs, automatic retries, efficient DBU credit pricing, email notifications for process failure events, built-in Spark Streaming dashboards, ability to quickly auto-scale to meet spikes in user app event activity) that significantly reduce the infrastructure administration burden on ML engineers.
  • 1 Maximum Concurrent Runs
  • Unlimited retries
  • New Scheduled Job clusters (as opposed to an All-Purpose cluster)

The Orchestration Layer: Decoupled Event Queues and Lambdas as Feature Transformers

  • Unpack the message and fetch the corresponding online and offline features for the user whom we want to make a recommendation for (Step 5 in above diagram). For instance, we may augment the event’s temporal/session-based features with user tenure level, gender, locale, etc.
  • Performs any final pre-processing business logic (for instance, mapping our Headspace user IDs to user sequence IDs to fit the interface expected by certain collaborative filtering models that we deploy).
  • Selecting the appropriate served Sagemaker model and invoking it with the input features (Step 6 in above diagram).
  • Forwarding along the recommendation to its downstream destination (Step 7 in above diagram). The actual location depends on whether we want users to pull down content recommendations or push recommendations out to users:
  • Pull: Persisting the final recommended content to our internal Prediction Service, which is responsible for ultimately supplying users with their updated personalized content for many of the Headspace app’s tabs upon client app request. An example experiment of using real-time inference infrastructure to allow users to fetch personalized recommendations from the Headspace app’s Today tab:
  • Push: Placing the recommendation onto another SQS push notifications queue that results in a push notification or in-app modal content recommendations being pushed to users. Examples of (above) in-app modal push recommendations triggered from a user recent search for sleep content and (below) iOS push notification from a recent user content completion:

Summary

  • What format should the message payload that our Structured Stream jobs send to the prediction SQS queue use?
  • What is in the model signature and HTTP POST payload that each Sagemaker model expects?
  • How do we synchronize the model image and the online feature stores so that we can safely and reliably update retrained models once in production?

--

--

--

Headspace is meditation made simple. Learn with our app or online, when you want, wherever you are, in just 10 minutes a day.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Image Classification Using Convolutional Neural Network (CNN)

Sound Classification on iOS Using Core ML 3 and Create ML

End to End Transformer Architecture — Encoder Part

Part II: Manual Feature Engineering techniques for the Kaggle Home Credit Default Competition

TensorFlow 1.0 vs 2.0, Part 2: Eager Execution and AutoGraph

Review: Semi-Supervised Learning with Ladder Networks

Diving into Deep learning— Part 2— Building fraud detection X-Ray image classifier — First place in…

It’s all YOLO

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Headspace

Headspace

Headspace is meditation made simple. Learn with our app or online, when you want, wherever you are, in just 10 minutes a day.

More from Medium

Enabling effective data science through lifecycle rules in SageMaker Studio

A Case Study of Spark: Understanding the Analytics Engine for Big Data and ML

Reproducible Data Science and why it matters

A Deep Dive into Spark’s Univariate Feature Selector