Posts

Showing posts from January, 2020

Uber's Michelangelo vs. Netflix's Metaflow

  Uber's Michelangelo vs. Netflix's Metaflow Michelangelo Pain point Without michelangelo, each team at uber that uses ML (that’s all of them - every interaction with the ride or eats app involves ML) would need to build their own data pipelines, feature stores, training clusters, model storage, etc.  It would take each team copious amounts of time to maintain and improve their systems, and common patterns/best practices would be hard to learn.  In addition, the highest priority use cases (business critical, e.g. rider/driver matching) would themselves need to ensure they have enough compute/storage/engineering resources to operate (outages, scale peaks, etc.), which would results in organizational complexity and constant prioritization battles between managers/directors/etc. Solution Michelangelo provides a single platform that makes the most common and most business critical ML use cases simple and intuitive for builders to use, while still allowing self-serve extensibi...

Cluster Management at LinkedIn

Image
In 2014 LinkedIn released a cluster management solution called Helix. Helix solves some problems that arise when a system scales to be too large to manage even on just a few hosts. A successful system will start to go through a few transition states that, when large enough, will become frequent enough to require an automated solution. First, your system will become too large to host on a single machine. So now you need to shard it. Then your system will either have hosts fail once in a while, or some shards might start getting too big or taking too much load. So then you can start using replication to solve for that. As your cluster grows, the average size of shards also grows - sometimes you’ll have to split shards because they become too big, or more broadly redistribute. So now you need something to allow for that. Partitioning/sharding, fault tolerance and scalability - these are the higher level concepts just described, and the problems Helix solves for. If you can sol...