ChatGPT - How Long Till They Realize I’m a Robot?

Image
I tried it first on December 2nd... ...and slowly the meaning of it started to sink in. It's January 1st and as the new year begins, my future has never felt so hazy. It helps me write code. At my new company I'm writing golang, which is new for me, and one day on a whim I think "hmmm maybe ChatGPT will give me some ideas about the library I need to use." Lo-and-behold it knew the library. It wrote example code. It explained each section in just enough detail. I'm excited....It assists my users. I got a question about Dockerfiles in my teams oncall channel. "Hmmm I don't know the answer to this either"....ChatGPT did. It knew the commands to run. It knew details of how it worked. It explained it better and faster than I could have. Now I'm nervous....It writes my code for me. Now I'm hearing how great Github Copilot is - and it's built by OpenAI too...ok I guess I should give it a shot. I install it, and within minutes it'...

Redis!...Huh? What ISN'T it good for?

Redis is an in-memory key-value data store that allows you store your actual data structures rather than having a mapping layer between your application and your storage.  Support exists for any data type you'd need including lists, sets and hashes/maps.

It's in-memory but also has options to push to disk - you can push to disk on every write with a huge performance cost, or at some regular interval.  Writes can be configured to happen via an append-only log, which makes them lightning fast. Pushing to disk every 1 second has comparable performance to never pushing to disk at all.

Redis supports replication in a few different ways.  By default it's asynchronous, but can be configured to be synchronous for safety.  Combined with append-only logging on every write, you can have 100% consistency of your data on any successful write.

Redis Cluster allows automatic sharding and handling of many different failure scenarios, so if a small number of the hosts in your cluster are experiencing failures, the cluster as a whole can continue to operate.

When configured this way, if a Redis host is lost, there will be no write operations lost since they will be replicated and written to disk before the write call responds with a success.  Redis works for so many different applications that it begs the question - what ISN'T it good for?


Large amounts of Data

Redis is an in-memory data store, which means your data has to fit in memory.  If the use case requires a lot of data and you don't have money for many machines with expensive RAM, then Redis might be the wrong choice.  You can make it work, but only with more cost and complexity in the form of more granularly sharded data.

Though Redis is in-memory, once there isn't enough space for data, it can start swapping values out to disk.  Keys must remain in-memory always by design (and to ensure fast lookups), but values for the most rarely-used keys can be swapped out to disk once memory runs out.  So, fine - if access patterns of your data mean a few keys are accessed frequently and others are not, then maybe you can still make the case for using Redis.

If your keys are rarely accessed and/or non-latency-sensitive, you should consider using something different.  Redis is meant for use cases where you need high performance lookups and your dataset (at least the keys) can fit into memory.

Relational Data

If your data access patterns require a lot of relations between keys, then Redis will require you to make many network calls before you can get to the piece of data you want for any particular query.  It's not a graph database and it's not a replacement for SQL.  Key-value stores are strong in use cases where a single key can be used to get the exact piece(s) of data you want.

Use something like MySQL or PostgreSQL - or if your data looks like a graph with vertices and edges, use a graph database like neo4j.

Range Queries

Redis has functionality to query ranges, but the performance falls short. If range queries are one of your key needs, you should know Redis often falls short. MongoDB has better performance.

ACID guarantees

Redis is NoSQL - one thing that means is that you don't get ACID guarantees.  If you're making updates to multiple keys, they will not be transactional unless they are under the same hash slot (usually this means key).  So Redis can't do distributed transactions.  However, you could build a 2-phase commit on top of Redis and do it yourself.  It just wouldn't be strongly consistent no matter what you do.

Conclusion

Redis' strength lies in low-latency non-relational use cases where data consistency is a 2nd priority.  High availability is also a high priority for Redis and Redis Cluster improves on that model.  For what it does best, low latency, Redis is absolutely best in class for storage.  If, however, your use case prioritizes large amounts of data, relational data and/or ACID guarantees, don't go to war with your own architecture - steer clear of Redis.

Comments

Popular posts from this blog

ChatGPT - How Long Till They Realize I’m a Robot?

My experience with Udacity

Architectural Characteristics - Transcending Requirements