Edge authentication and rule engine: Netflix engineers should know about Daml

I came across recently two posts on the Netflix engineering blog, which show that they are putting a lot of effort into challenges which are very suitable for Daml. No doubt they tackle them well, still, it could be more effective for them to use a platform which solves problems which are currently treated separately in a uniform and elegant way.

Or if not Netflix, their competitors should know about Daml, so that they can catch up…

Edge authentication

Edge Authentication and Token-Agnostic Identity Propagation

The old identity platform was token based, and the tokens were propagated deep in their systems:

As they write: There are several protocols and tokens in use across the Netflix streaming product. These tokens were consumed by, and potentially mutated by, several systems within the Netflix streaming ecosystem. To complicate things further, there were multiple methods for transmitting these tokens, or the data contained therein, from system to system. etc.

The solution they came up with was edge authentication, where tokens don’t travel deep into their systems.

Just linke in a Daml ledger…

Rule engine

Building a Rule-Based Platform to Manage Netflix Membership SKUs at Scale

The summary of the challenges:

Membership Engineering at Netflix is responsible for the plan and pricing configurations for every market worldwide. Our team is also the primary source of truth for various offers and promotions. Internally, we use the term SKU (Stock Keeping Unit) to represent these entities. The original SKU catalog is a logic-heavy client library packaged with complex metadata configuration files and consumed by various services. However, with our rapid product innovation speed, the whole approach experienced significant challenges:

  • Business Complexity: The existing SKU management solution was designed years ago when the engagement rules were simple — three plans and one offer homogeneously applied to all regions. As the business expanded globally, the complexity around pricing, plans, and offers increased exponentially.
  • Operational Efficiency: The majority of the changes require metadata configuration files and library code changes, usually taking days of testing and service release to adopt the updates.
  • Reliability: It is exceptionally challenging to effectively gauge the impact of metadata changes in the current form. With 50+ services consuming the SKU catalog library, a small change could inadvertently result in a significant outage with a global blast radius. Additionally, the business implications for pricing-related errors are enormous.
  • Maintainability: With the increase in ongoing experimentation around SKUs, the configuration files have exploded exponentially. Besides, the mixed-use of the metadata files and business logic code adds another layer of maintenance complexity.

The core principles they demanded from the new solution:

  • Ownership Clarity : Membership Engineering team owns the SKU catalog data and provides a platform for stakeholders to configure SKUs based on their needs.
  • Self Service : SKU changes need to be flexibly configurable, validated comprehensively, and released rapidly. In comparison, the API interface for consumer services should be consistent and static regardless of the business requirement iteration.
  • Auditability : SKU changes workflow would require engineers’ review and approval. Bad changes can quickly revert to mitigate issues and provide history for auditing.
  • Observability : SKU resolution insight is critical and helpful for engineers to diagnose what went wrong in the change lifecycle.

This also sounds like a good use case for a Daml based contract package.

The way they chose:

After evaluating multiple open-source and commercial rule evaluation frameworks, we chose our internal Rules Management and Evaluation Framework — Hendrix. Hendrix is a simple interpreted language that expresses how configuration values should be computed. These expressions (rules) are evaluated in the current request session context and can access data such as A/B test assignments, necessary member information, customized input, etc. We’ll skip over Hendrix’s specific details and focus on the SKU platform adoption in this article for brevity.