Choosing database in system design

Prashant Mishra - Oct 2 - - Dev Community

When Image/videos are involved:
For immutable data such as image and video we use blob storage like amazon s3
Along with s3, CDN is used
So, Blob -> s3 + CDN

When Searching is involved:
If the app involves/needs searching service then elastic search/solana is used which are built on top of Apache lucine and provide search capabilities.
Fuzzy search: If the use meant to search “airport” but instead searched “airprot”, We can say that EDIT DISTANCE IS 2 or fuzzyfactor is 2.
Note: elastic search/solan are not databases they are search engines.
They don’t guarantee that the data present in them will not be lost, So these should not be used as primary source of data, Primary source of data should be somewhere else and should be loaded in these search engines.
These are special cases of Document DB

When Metric kind of data is involved:
If you are building some metric tracking system like grafana, prometheus, etc
Time series database can be used (extension of RDB, with added features of its own)
Regular RDB: Gives ability to access/update random record(s), but whenever we are building metric kind of system we won’t do random updates, they will be sequential update in append only, like one update happened at T1 next will happen at T2 where T2> T1.
Also read won’t be random, it will be like get details between tx and ty time, or between month jan-march. Time series databases are optimised for such kind of queriers.
Example: influxDB, OpenTsDb

When Analytics kind of data is involved:
Data store on which analytics is to be performed.
When we want to do some analysis based on operations performed.
For such things datawarehouse is used where you can dump you data and perform various queries querying capabilities on top of that data to serve a lot of reports.
These are not used for transactional systems. These are used for offline reporting
Example: hadoop

SQL:

SQL should be used when you have structured data of entities having standard attributes like Employee, having Id, name, address ,phno, etc.

no-sql can also be used for storing structured/un-structured/somewhat structured data

flowchart

No-SQL
DocumentDB
If you are trying to build catalog kind of a system like amazon having info. of items present on the system.
eg. shirt (size,color), Fridge(size, powerSavingMode).
On amazon you not only need to see the items but also want to query that, querying on json in traditional db is bit tricky but there are some databases that are optimized for such kind of data which are large in volume but also somewhat structured where a lot of attributes that can come in, and wide variety of queries can come in.
Example: mongoDb

When ever increasing data but limited query is involved:
Ever increasing data like in UBER.
All drivers of Uber are continuously sending location, say x number of drivers resulting in y number of location in a day( y includes various locations of all the drivers throughout the day).
This is a growing record as the no. of drivers on uber are increasing day by day.
So when you have finite no. of queries like finding location of a driver and large no. of data then columnnar db is great choice like: Casandra, HBase, etc.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player