Our official quarantine in Argentina began on my birthday. Without many other options to communicate, I've been texting a lot more than usual.
As is increasingly common, to avoid uploading very personal data directly to a website, I decided to build a simple API to serve statistics about my chats, while keeping those CSV files safe and sound on my personal machine.
Here's that repository, and here's a quick view of how I use it, if you're so curious.
Great, with that out of the way, we can start making some visualizations. All of the below come from a single conversation, mostly in English.
Tug of War
Who sends more of what?
Depending on which conversation I load, these naturally move. But general patterns seem to emerge, I send often less messages but they're more verbose, both absolutely and relatively (more unique words, higher number of words, longer words).
I tend not to use as many emojis but I do like to communicate sending audios and videos and images (and stickers, which deserve an entire analysis of their own that I won't get to here.)
My Response Times
Their Response Times
Remarkably similar, as one would expect in a healthy friendship or relationship. Within my generation, it's practically blasphemy to assume that how long you have to wait for someone's response isn't a message in itself.
How about when we chatted, in general? Well, unless it's between 5 and 7 am, we're likely chatting.
Hourly Activity
Average messages per minute of the day.
Barcodes
A weekly view of message frequency.
Saving my favorite for last, here's a view of our word usage...
Word Frequency and Parity
Who says what and how often?
This chart is just amazing to me, I wish I could figure out a better form factor to communicate more data in a more compact space. I use a lot of ambiguous language like "pretty," "okay," "little," "sort [of],", "around." She tends to use more direct, firm language such as "definitely," "exactly," "true," "least," "perfect."
Some appearances are a simple question of how we use English differently. She prefers to say "ok," whereas I say "okay." I say "cause," she says "because," I say "yeah," she says "yes." Going deeper into the data (words 250 - 1000), you can start to see sentiments emerge.
This is a first foray in trying to deconstruct chat data, but I'm already fascinated. I'd like to push harder on certain angles, doing more analysis of word usage, sentiment, and finding tools that can still provide some standardized metrics across multiple languages, as most of my conversations will flow between English and Spanish or Portuguese. Group chat data may be another interesting avenue as well.