Big data, machine learning, and artificial intelligence have brought up issues with data storage. Believe it or not this affects web developers in some interesting ways. The biggest way you could be affected is if end up working with NoSQL databases because they are completely different from SQL databases.
For now you're safe if you don't know anything about NoSQL databases, but you don't want to fall behind. Knowing about NoSQL databases can actually open up more opportunities for you as a developer. Take a few minutes to learn about these two databases and their differences.
SQL
Most web developers have some kind of experience with SQL databases. These are the relational databases that we typically deal with. In a SQL database, there is a schema that sets the rules for how tables are related to each other. This relational model is there to help keep the integrity of the references between tables.
The schema has to be defined before you're even able to add data to the database. That means when you want to add a new column to a table, you have to update the schema of the database to allow it. What this does is keep all of your data consistent. So the schema is responsible for the data-type of each column, it holds any restrictions you need on the data like a character length, and it handles every piece of data according to the rules you set.
You'll have rows and columns in each table and each column in a row will have its own data-type and primary key. The schema puts a lot of restrictions on the scalability of the database. Since the tables are all related and the data entered has to conform to the database schema, it's more expensive to scale up. Some common ways to scale SQL databases include getting better servers or adding copies that have read-only privileges.
We use these databases the most in web development mainly because they've been around forever. Relational databases were created in the 1970s as the original solution to data storage. Since they use such a rigid schema, they do have some safeguards against what kind of data is being entered into your database.
NoSQL
These are the databases that just came to rise in the early 2000s. The main reason NoSQL exists is because we have a lot more data now and there needs to be a different way of handling it. NoSQL basically represents databases that are non-relational and they can use different data models such as search, graph, and document.
With a NoSQL database, you don't need a predefined schema. That means you'll be able to add new columns to tables (or collections in MongoDB) without changing the whole database. That's nice when you're making a web app on the fly (also known as agile development) and you aren't sure what kind of data or you'll be dealing with yet.
Scaling with a NoSQL database is relatively simple because you don't have much of a schema to deal with. You can take the data in one database and break it up into data on multiple databases. There's a chance you might even notice your web app working more efficiently because of the way you can query the databases.
You have a lot of options to choose from for NoSQL databases. There are MongoDB, Couchbase, and Cassandra just to name a few. They have their own terminologies and subtleties that make it tricky to use multiple NoSQL databases, but the underlying concepts are about the same.
You'll mainly see NoSQL databases being used when there is a lot of unstructured data that needs to be used somehow or if there is a lot of data and using NoSQL will make it faster to query the database. An example of when you might use a NoSQL database is if you have something like Amazon. Having billions of items that need to be queried in your database wouldn't do well in a relational setup.
Key Differences
Here's a table with the key differences between SQL databases and NoSQL databases.
SQL | NoSQL |
---|---|
Relational database | Non-relational database |
Rigid, predefined schema that needs to be updated for new columns | Super flexible schema that can be updated on the fly |
Scaling requires better hardware or read-only copies of the database | Scaling can be done by breaking up the database and putting it on separate servers |
Data will remain consistent because of schema rules and restrictions | Data will be able to take on any form it needs to |
Has the data model that sets up tables, rows, and columns | Has multiple data models with different uses like document and search |
Speed and efficiency is limited by the schema | Fast and efficient but data could have many inconsistencies |
Both SQL and NoSQL databases are great, but you really have to know something about your application and its potential in the future. Most web apps will be fine using SQL databases forever. When you start dealing with massive amounts of data that need to be analyzed using different tools like machine learning, you need NoSQL databases.
I briefly touched the NoSQL world in my machine learning days, but it was using HBase which involves Hadoop which is a demon to deal with. MongoDB isn't that bad to get started with if you're interested because they have a lot of documentation and examples. I like it so far.
Hey! You should follow me on Twitter because reasons: https://twitter.com/FlippedCoding