Database 101: How social media “likes” are stored in a database

Daniel Reis - May 8 '23 - - Dev Community

Did you ever think about how Instagram, Twitter, Facebook or any social media platforms track who liked your posts? Let's figure it out in this post!

If you’re just starting working with databases, you might want to start off by reading my initial post, Database 101: Data Consistency for Beginners. That captures my own exploration of how many database paradigms exist as I look far beyond my previous experience with just SQL and MySQL. I’m keeping track on my studies in this Database 101 series.

Table of Contents

1. Prologue

Recently, I was invited to speak at an event called "CityJS." But here's the thing: I'm the PHP guy. I don't do JS at all, but I accepted the challenge. To pull it off, I needed to find a good example to show how a highly scalable and low latency database works.

So, I asked one of my coworkers for examples. He told me to look for high numbers inside any platform, like counters or something like that. At that point I realized that any type of metrics can fit this example. Likes, views, comments, follows, etc. could be queried as counters. In this article, you will find my studies of how do proper data modeling for these using ScyllaDB.

2. Let's Research

First things first, right? After deciding what to cover in my talk, I needed to understand how to build this data model.

We'll need a posts table and also a post_likes table that relates who liked each post. So far, it seems enough to do our likes counter.

My first bet for a query to count all likes was something like:

Ok and if I just do a query with SELECT count(*) FROM social.post_likes it can work, right?

Well, it worked but it was not as performant as expected when I did a test with a couple thousands of likes in a post. As the number of likes grows, the query becomes slower and slower...

"But ScyllaDB can handle thousands of rows easily… why isn’t it performant?" That's probably what you're thinking right now (or maybe not).

ScyllaDB – even as a cool database with cool features – will not solve the problem of bad data modeling. We need to think about how to make things faster.

3. Researching Data Types

Ok, let's think straight: the data needs to be stored and we need the relation between who liked our post, but we can't use it for count. So what if I create a new row as integer in the posts table and increment/decrement it every time?

Well, that seems like a good idea, but there's a problem: we need to keep track of every change on the posts table and if we start to INSERT or UPDATE data there, we'll probably create a bunch of nonsense records in our database.

Using ScyllaDB, every time that you need to update something, you actually create new data.

scylla@cqlsh:socials> INSERT INTO socials.posts (id, user_id, description, image_url, created_at, likes) VALUES (4d18bb8c-9c57-44fe-827a-4a2d65f331e5, 3edd5f1d-67e9-4a3e-af1a-9adbb41e2129, 'Such a cool event P99 Conf!', 'https://i.imgur.com/Xp8gi7t.jpg', '2023-04-23 15:02:49', 1);

scylla@cqlsh:socials> INSERT INTO socials.posts (id, user_id, description, image_url, created_at, likes) VALUES (4d18bb8c-9c57-44fe-827a-4a2d65f331e5, 3edd5f1d-67e9-4a3e-af1a-9adbb41e2129, 'Such a cool event P99 Conf!', 'https://i.imgur.com/Xp8gi7t.jpg', '2023-04-23 15:02:50', 2);

scylla@cqlsh:socials> INSERT INTO socials.posts (id, user_id, description, image_url, created_at, likes) VALUES (4d18bb8c-9c57-44fe-827a-4a2d65f331e5, 3edd5f1d-67e9-4a3e-af1a-9adbb41e2129, 'Such a cool event P99 Conf!', 'https://i.imgur.com/Xp8gi7t.jpg', '2023-04-23 15:02:51', 3);
Enter fullscreen mode Exit fullscreen mode
scylla@cqlsh:socials> SELECT * from posts;

 id                                 | user_id                           | created_at                    | description               | image_url                     | likes
--------------------------------------+--------------------------------------+---------------------------------+-----------------------------+---------------------------------+-------
 4d18bb8c-9c57-44fe-827a-4a2d65f331e5 | 3edd5f1d-67e9-4a3e-af1a-9adbb41e2129 | 2023-04-23 15:02:48.000000+0000 | Such a cool event P99 Conf! | https://i.imgur.com/Xp8gi7t.jpg |    1
 4d18bb8c-9c57-44fe-827a-4a2d65f331e5 | 3edd5f1d-67e9-4a3e-af1a-9adbb41e2129 | 2023-04-23 15:02:50.000000+0000 | Such a cool event P99 Conf! | https://i.imgur.com/Xp8gi7t.jpg |    2
 4d18bb8c-9c57-44fe-827a-4a2d65f331e5 | 3edd5f1d-67e9-4a3e-af1a-9adbb41e2129 | 2023-04-23 15:02:51.000000+0000 | Such a cool event P99 Conf! | https://i.imgur.com/Xp8gi7t.jpg |    3
Enter fullscreen mode Exit fullscreen mode

You will have to track everything that changes in your data. So, for each increase, there will be one more row unless you don't change your clustering keys or don't care about timestamps (a really stupid idea).

After that, I went into ScyllaDB docs and found out that there's a type called counter that fit our needs and is also ATOMIC!

Ok, it fit our needs but not our data modeling. To use this type, we have to follow a few rules but let's focus on the ones that are causing trouble for us right now:

  • The only other columns in a table with a counter column can be columns of the primary key (which cannot be updated).
  • No other kinds of columns can be included.
  • You need to use UPDATE queries to handle tables that own a counter data type.
  • You only can INCREMENT or DECREMENT values, setting a specific value is not permitted.

This limitation safeguards correct handling of counter and non-counter updates by not allowing them in the same operation.

So, we can use this counter but not on the posts table... Ok then, it seems that we're finding a way to get it done.

4. Properly Modeling

With the information that counter type should not be "mixed" with other data types in a table, the only option that is left to us is create a NEW TABLE and store this type of data.

So, I made a new table called post_analytics that will hold only counter types. For the moment, let's work with only likes since we have a Many to Many relation (post_likes) created already.

These next queries are what you probably will run for this example that we created:

## Social when you like a post

UPDATE socials.post_analytics SET likes = likes + 1 WHERE post_id = 4d18bb8c-9c57-44fe-827a-4a2d65f331e5;

INSERT INTO socials.post_likes (post_id, user_id, liked_at) VALUES (4d18bb8c-9c57-44fe-827a-4a2d65f331e5, 3edd5f1d-67e9-4a3e-af1a-9adbb41e2129, '2023-04-23 15:02:50');

# Social when you dislike a post

DELETE FROM socials.post_likes WHERE post_id = 4d18bb8c-9c57-44fe-827a-4a2d65f331e5 AND user_id = 3edd5f1d-67e9-4a3e-af1a-9adbb41e2129;

UPDATE socials.post_analytics SET likes = likes - 1 WHERE post_id = 4d18bb8c-9c57-44fe-827a-4a2d65f331e5;
Enter fullscreen mode Exit fullscreen mode

Now you might have new unanswered questions in your mind like: "So every time that I need a new counter related to some data, I'll need a new table?" Well, it depends on your use case. In the social media case, if you want to store who saw the post, you will probably need a post_viewers table with session_id and a bunch of other stuff.

Having these simple queries that can be done without joins can be way faster than having count(*) queries.

5. Final Considerations

Me talking at CityJS stage

Me at CityJS stage saying a bunch of nonse data modeling using TS

I learned a lot not only by studying new ways of data modeling but also having to learn TypeScript to create the CityJS presentation and build this use case.

As everything is brand new for me, I'll do my best to keep sharing my studies. Please feel free to correct me in the comments! Discussing is the best way to learn new things.

Don't forget to like this post, follow me on the socials and fill your water bottle xD

Twitter DanielHe4rt PT-BR
Twitter DanielHe4rt EN
Twitch Channel

. . . . . . . . . . . . . . . . . .
Terabox Video Player