The Best Intro to Distributed Systems
I recently switched teams at Facebook. I went from working on user-facing search technologies to running the distributed systems that allow other teams at my company to move fast. To be clear, I used to bake the break, now I build the bakers an oven! Although I had some background in these systems, things change quickly and the field is deep with decades of knowledge to pull from and build upon. I needed a refresher!
I found one - a friend recommended a book called 'Designing Data Intensive Applications'. It sounded a bit off-topic - the title didn't have 'distributed' or 'systems' in it! Turns out it gave me exactly what I needed. DDIA is the best introduction and deep dive into all the areas of expertise that the DS specialty has to offer - and wow is this field deep! The book requires a bit of background - you'll need to have a bachelor's in CPSC or similar experience, as the first few chapters build on that knowledge. Once the foundation is set, the book goes into very important concepts that anyone who's been working in a software infrastructure environment like AWS, Google Cloud or Facebook infra, would need to know. The foundational chapters cover reliability, maintainability and scalability and then go on to talk about how to think about and handle data - for example, how do you decide what encoding technology to use when starting a software project?
Then, in part II, the book goes into the real meaty parts of distributed systems - the things that are absolutely essential to know for anyone to be successful if you're going to go oncall for AWS or a team that owns & maintains an instance of ZooKeeper. Replication, Transactions and the fundamental problems that the field of Distributed Systems is trying to solve - these are a few of the topics that Part II covers.
Lastly is the most fascinating part of the book - and it's really the reward after getting through all of the previous material - these are the newest and most interesting concepts that the field has to offer at this time. Part III does a deep comparison between batch and stream processing and makes the argument that stream processing is the Next Big Thing in software architectures. If batch processing was one of the most important developments at Google's inception, then stream processing will be the most important development in the inception of...well, whatever the next Google is!
After finishing the book and having taken a few notes & flashcards (which I'll post up somewhere for others to use when I get around to it), I was much more confident going into meetings and discussions about how our systems work, how they're currently broken and how they could be better. The book was a watershed moment in my career, and I'd encourage anyone in software engineering role in a DS-focussed team to read it. This one needs to go to the top of your 'software career-ing' reading list.
And please do send me a note if you end up reading it (or have already) and tell me your thoughts!
Comments
Post a Comment