Showing posts from 2019

ChatGPT - How Long Till They Realize I’m a Robot?

I tried it first on December 2nd... ...and slowly the meaning of it started to sink in. It's January 1st and as the new year begins, my future has never felt so hazy. It helps me write code. At my new company I'm writing golang, which is new for me, and one day on a whim I think "hmmm maybe ChatGPT will give me some ideas about the library I need to use." Lo-and-behold it knew the library. It wrote example code. It explained each section in just enough detail. I'm excited....It assists my users. I got a question about Dockerfiles in my teams oncall channel. "Hmmm I don't know the answer to this either"....ChatGPT did. It knew the commands to run. It knew details of how it worked. It explained it better and faster than I could have. Now I'm nervous....It writes my code for me. Now I'm hearing how great Github Copilot is - and it's built by OpenAI too...ok I guess I should give it a shot. I install it, and within minutes it'

The Curious Case of the Document Database

Let’s talk about an oft-overlooked NoSQL database type. It’s got all the best parts of a k-v store and allows limited SQL-like query-ability too! They’re called document databases and they’re all the rage. A document database stores objects that can be serialized to JSON or some similar serialization format. These JSON ‘documents’ are keyed by an ID, similar to how k-v stores work. When you want to fetch an entire document, all you need is the key for it. But the magic of document databases allows you to fetch only pieces of a document and also to fetch data from multiple documents using selection criteria that mirrors basic SQL query functionality. This is all made possible by the tree-like structure that documents in a doc DB must conform to. JSON data can contain keyed fields, nested structures, and lists. Using this structure a doc DB can extract specific pieces of a doc so that the entire doc doesn’t have to be returned and parsed in the application layer. Query-ability

Consistency in Redis

Most uses of Redis will focus more on latency and availability rather than consistency - that’s because at its core, Redis is essentially a cache. Generally speaking, you store things in Redis in memory and you update or read them extremely quickly. You need to make sure that the cache is always available, so in most cases you’d only choose Redis if you’re leaning towards an A class system (A for Availability) rather than a C class system (Consistency). However, it’s important to know that a replicated instance of Redis is capable of giving you different levels of consistency up to and including read-after-write consistency - the kind of consistency that guarantees data reads from anywhere that happen after a successful response to a write will receive that write. Even if the read goes to a different replica than the write did. What Redis can’t give you is linearizability - the guarantee that any set of observers of the system will only be able to see a single copy of the syste

CAP Theorem Explained

When building large-scale software systems today, you have to make tradeoffs.  You can't have an ACID compliant data store with infinite storage/throughput/connections that's always available in any part of the world with super low latency where clients can read/write concurrently without any risk of inconsistencies that's free.  If you could, the problem would be solved and our industry could go build spaceships at SpaceX or retire and make sourdough every Sunday. Instead, we need to make tradeoffs.  Does our product/system need ACID semantics?  Is latency more important?  Can we allow certain types of data inconsistencies for a short time in favor of availability?  How much are we able to spend so that we don't have to sacrifice as much? These are some questions that everyone building a large-scale software system has to grapple with in the design phase.  A great way to begin your thinking is using CAP Theorem - or at least what it's slowly been crystallize

XFN Development - What's it all About?

XFN (cross-functional) work is one of the challenges of a senior engineer in most tech companies.  Broadly, it means to interact with team members of a different team than your own.  Concretely, this can mean anything from aligning goals or gathering feedback from other teams to inform your roadmap, to pair programming to flesh out the design of a new interface.  Your team wants you to make progress on the goals through XFN, but also to make the team look good (ie competent, smart, motivated) to those who are judging.  In my organization, this type of work is reserved for more seasoned engineers - although you're interacting with others on system designs, a lot of it is not the type of stuff taught in CS classes.  It's about personal interactions. When you first start talking to a member from a different team, it's important to ensure they feel that you're someone they want to work with.  You should be able to describe the system you own or are building on a white

Redis!...Huh? What ISN'T it good for?

Redis is an in-memory key-value data store that allows you store your actual data structures rather than having a mapping layer between your application and your storage.  Support exists for any data type you'd need including lists, sets and hashes/maps. It's in-memory but also has options to push to disk - you can push to disk on every write with a huge performance cost, or at some regular interval.  Writes can be configured to happen via an append-only log, which makes them lightning fast. Pushing to disk every 1 second has comparable performance to never pushing to disk at all. Redis supports replication in a few different ways.  By default it's asynchronous, but can be configured to be synchronous for safety.  Combined with append-only logging on every write, you can have 100% consistency of your data on any successful write. Redis Cluster allows automatic sharding and handling of many different failure scenarios, so if a small number of the hosts in your cluster

Don't NOT Repeat Yourself!

Sometimes duplicate code is good!! ...what?  What do you mean?  You're 'on to me'? .... Ok ok ok hold on, just hear me out! Picture it: you're writing unit tests.  Headphones on, hoody blowing in the cold wind from your AC unit in your dark apartment. You've written all your tests, they're passing and you're feeling great.  You refactor.  The tests have a lot of duplicate setup code, so having the DRY sense of humor you've got, you mop up.  Get it all lookin' fine and tidy.  Common methods for all the setup, some parameters to handle the different configurations of the unit tests. Freshhhhhhhhhhhh :D You push your code to test env - it breaks - OH SHEEIT.  You missed a few edge cases. No prob, no prob! Just add a few unit tests, slip in an if-else here and there in your production code, and you can get your changes in and make it to office before the 2pm happy hour! Nice - easy peasy. But WAIT.  The edge cases you missed require

Dependency Injection Hell-o World!!

When new, wide-eyed engineers first start out writing maintainable, readable, extensible {insert-everything-good}ible software, they will quickly stumble upon the concepts of Dependency Injection (DI) and interfaces.  Combined with interface programming, DI prescribes software engineers to inject dependencies into a class via a constructor or setter method. For example, you might inject 2 operands '1' and '2' into an instance of the Add class: class Add { // ... Add(IntClass op1, IntClass op2) { this.op1 = op1; this.op2 = op2; } // ... }; IntClass is an interface, and Add doesn't care what the implementation of it is. It just cares that it exposes the methods that it wants: class Add { // ... int execute() { return this.op1.getVal() + this.op2.getVal(); } // ... }; Add cares that IntClass exposes getVal() which should return an int, but doesn't care how it's implemented. Now if you want to write a unit tes

How to Frame Metric Collection

Depending on the type of software development you’re doing, it can be tough to figure out what metrics you need to collect. An iterative process (aka fancy words for trial-and-error) will work to get you to where you need to be eventually, but along the way the MTTR of outages will suffer and you might lose users/revenue. There’s a simple way to think about metrics that will help you build an intuition on what to monitor and what to measure. It’s this: Measure the business, measure the software. But don’t conflate the two. Measuring the business is critical to being able to notify and escalate to the proper personnel when there is an outage. Business metrics include things that impact your bottom line - user signins per minute, items added to the cart per second, items sold per day. Anything that directly and immediately affects customers is a business metric. Software metrics are signals about how your software is running. There are 3 categories of software metrics: OS me

Cassandra’s Data Model

I did a previous post on Cassandra but that one focused on its fault tolerance, network architecture and scalability. This one focuses on the structure of data stored in Cassandra. Cassandra is a wide columnar data store.  Logically, you can think of data stored in Cassandra like a compound index in a conventional SQL data store. If you know the row key and the column names you want, you can get the data and you need. If you are ok searching through ALL the columns, then you just need the row key. And if you don’t have either but you’re data is stored somewhere in a Cassandra table, you’re in for an expensive full table scan. However, different from SQL data stores is that you can essentially have an unlimited number of columns and each row can choose to have whatever columns it wants. That’s why it’s called a wide-columnar data store - you could have millions of columns if you wanted to! Within each row, the columns are stored in sorted order, so finding a specific column can be d

The case for caching

Though the concept of caching seems quite simple to most engineers, there is actually a lot of intriguing nuance to it. The choices for caching and the reasons to use it vs. not are varied, but let’s try to simplify. First thing to ask is ‘what for?’ Caching is useful when you want decrease latency and/or decrease load on components of your system. You can use it in places where there is a separate, durable source of truth and it’s not terrible if the data in the cache expires or is lost some other way. Caches are not good tools for request buffering or source of truth data - data will be lost from time to time. Second thing to ask is ‘where should we put it?’ And there are essentially 4 options: the end users client/browser, a CDN, a reverse proxy in front of your own web servers, or on your own web servers. If the data you’re caching is specific to the user and not too large, it can be put into the client/browser - this is the most effective approach for latency and for reducin

Cassandra: A Case Study

Cassandra was developed at Facebook and some would say it's an intersection between Amazon's Dynamo and Google's BigTable.  It's an open source distributed active-active NoSQL column-oriented data store with tuning capabilities that optimize for write-heavy workloads.  It uses quorum reads and writes to balance consistency with availability and automatically manages replication of data - if a server fails, there is no availability loss assuming you've configured the right number of replicas.  And when a new server is brought in to replace it, all Cassandra needs to know is the ip of the server it's replacing, and it'll manage getting the new server up to speed.  Since Cassandra uses append-only writes, one of the tradeoffs made is that it doesn't allow for fast deletions - and deletions can actually increase the size of the data until compaction time. Cassandra is different from what we were taught to be active-active databases. Most active-active setu

Paxos vs. Blockchain: A User's Perspective

A robust distributed system can tolerate partial failures in the system - that means that the system should continue to work as expected even if parts of it are failing.  There are 3 main 'partial failures' that the field of Distributed Systems tries to solve for: 1.) Out-of-sync system clocks 2.) Process pauses 3.) RPC requests with no response Paxos is one algorithm that tries to solve for these by using consensus among nodes to decide on the sequence and order of events, but system administrators must tune the timeout parameter to find the best middle ground between waiting too long for a node (one computer in the network of systems) and not waiting long enough. The blockchain algorithm gets around the idea of time entirely by deciding on an order of events once, and then getting as many nodes as possible to agree and persist the actual ordering itself.  Here's an excerpt straight from Satoshi's paper itself: The solution we propose begins with a timestamp

The Best Intro to Distributed Systems

I recently switched teams at Facebook.  I went from working on user-facing search technologies to running the distributed systems that allow other teams at my company to move fast.  To be clear, I used to bake the break, now I build the bakers an oven!  Although I had some background in these systems, things change quickly and the field is deep with decades of knowledge to pull from and build upon.  I needed a refresher! I found one - a friend recommended a book called 'Designing Data Intensive Applications'.  It sounded a bit off-topic - the title didn't have 'distributed' or 'systems' in it!  Turns out it gave me exactly what I needed.  DDIA is the best introduction and deep dive into all the areas of expertise that the DS specialty has to offer - and wow is this field deep!  The book requires a bit of background - you'll need to have a bachelor's in CPSC or similar experience, as the first few chapters build on that knowledge.  Once the foundati