TL;DR style notes from articles I read today.
SQL or NoSQL, that is the question!
- The answer depends on your context of usage.
- Use a SQL database when you need ACID compliance (Atomicity, Consistency, Isolation, Durability).
- Use SQL for logic-related discrete data requirements you can identify upfront and when data integrity is essential - when your data is structured and unchanging.
- Use NoSQL for data requirements that are indeterminate, unrelated and evolving; when you need to store large volumes of data without structure.
- Use NoSQL when you need to get started faster, for projects with simpler or looser objectives and where speed and/or scalability are paramount.
Full post here, 8 mins read
SQL is 43 years old — here’s 8 reasons we still use it today
- It excels at accessing and organizing relational databases.
- RDBMS and SQL are battle-tested for many different scenarios, including those where the loss of data, corruption and failure are catastrophic.
- SQL is easy to learn and with half of all developers using SQL and RDBMS, skill sets transfer easily between companies and industries.
- Though not completely interoperable, SQL syntax varies only slightly between vendors.
- It helps to bring computation to the data than bringing data to the computation
- SQL/RDBMS is the best option for most systems, especially where data integrity is essential.
Full post here, 8 mins read
Modern data practice and the SQL tradition
- Most RDBMS offer some schema-less support today allowing you to have a single database for both structured & unstructured data without sacrificing ACID compliance.
- ETL is a necessity for most modern data-driven endeavours but data cleaning and transformation are often decentralized and this distorts data.
- Push data cleaning to the DB level for smoother & cleaner data pipeline. Focus on good data type definitions.
- Postgres and even SQLite and other RDBMS offer some text manipulation and free-text search functions good enough for most applications. You can deploy NLTK or ElasticSearch for more complexity rather than start there.
- Relational databases are cost-effective compared to distributed systems. Performance & stability are easier to achieve too when complexity starts building up.
Full post here, 13 mins read