Uber's Michelangelo vs. Netflix's Metaflow

  Uber's Michelangelo vs. Netflix's Metaflow Michelangelo Pain point Without michelangelo, each team at uber that uses ML (that’s all of them - every interaction with the ride or eats app involves ML) would need to build their own data pipelines, feature stores, training clusters, model storage, etc.  It would take each team copious amounts of time to maintain and improve their systems, and common patterns/best practices would be hard to learn.  In addition, the highest priority use cases (business critical, e.g. rider/driver matching) would themselves need to ensure they have enough compute/storage/engineering resources to operate (outages, scale peaks, etc.), which would results in organizational complexity and constant prioritization battles between managers/directors/etc. Solution Michelangelo provides a single platform that makes the most common and most business critical ML use cases simple and intuitive for builders to use, while still allowing self-serve extensibi...

Cassandra: A Case Study

Cassandra was developed at Facebook and some would say it's an intersection between Amazon's Dynamo and Google's BigTable.  It's an open source distributed active-active NoSQL column-oriented data store with tuning capabilities that optimize for write-heavy workloads.  It uses quorum reads and writes to balance consistency with availability and automatically manages replication of data - if a server fails, there is no availability loss assuming you've configured the right number of replicas.  And when a new server is brought in to replace it, all Cassandra needs to know is the ip of the server it's replacing, and it'll manage getting the new server up to speed.  Since Cassandra uses append-only writes, one of the tradeoffs made is that it doesn't allow for fast deletions - and deletions can actually increase the size of the data until compaction time.

Cassandra is different from what we were taught to be active-active databases. Most active-active setups are not good at scaling for higher write throughput because all writes have to go to all the primary(master) servers. With Cassandra, when a client connects, it can connect to any node in the cluster and that node will act as the coordinator for the duration of the session.  All nodes know the mappings of row-key to key range to server, and each node is assigned to store data for some key range. When a request comes in, the chosen coordinator will forward the request (whether it's a read or write) to the correct set of replicas for the requested row-key.  It'll use quorum reads/writes and send the correct response back to the client.  For any writes that aren't able to be propagated to a server (e.g. it failed), the writes are buffered and written to it once it's back up - this is called hinted handoff.

As everyone knows, SQL databases are starting to go out of style for many of todays most common workloads, but moreover even NoSQL is starting to be challenged - more and more systems require polyglot persistence systems.  With this in mind, Cassandra is a powerful and tunable system used on its own or in concert with other storage technologies.

Comments

Popular posts from this blog

ChatGPT - How Long Till They Realize I’m a Robot?

Architectural Characteristics - Transcending Requirements

Laws of Software Architecture