Paradoxically the main reason behind the popularity of NoSQL data stores is the fact that their lack of ability to do advanced queries (joins, groupings, ranking and analytics) that allows these data stores to be scaled much, much easier than any RDBMS, which is a very valuable feature in todays world of massively distributed systems. Follow along and refresh your knowledge about 25 top most advanced NoSQL Interview Questions and Answers you should learn for your next developer interview in 2020.
π΄ Originally published on FullStack.Cafe - Kill Your Next Tech Interview
Q1: What are NoSQL databases? What are the different types of NoSQL databases? β
Topics: NoSQL
Answer:
A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases (like SQL, Oracle, etc.).
Types of NoSQL databases:
- Document Oriented
- Key Value
- Graph
- Column Oriented
π Source: interviewbubble.com
Q2: What do you understand by NoSQL databases? Explain. β
Topics: NoSQL
Answer:
At the present time, the internet is loaded with big data, big users, big complexity etc. and also becoming more complex day by day. NoSQL is answer of all these problems; It is not a traditional database management system, not even a relational database management system (RDBMS). NoSQL stands for βNot Only SQLβ. NoSQL is a type of database that can handle and sort all type of unstructured, messy and complicated data. It is just a new way to think about the database.
π Source: medium.com/@hub4tech
Q3: Explain difference between scaling horizontally and vertically for databases ββ
Topics: NoSQL
Answer:
- Horizontal scaling means that you scale by adding more machines into your pool of resources whereas
- Vertical scaling means that you scale by adding more power (CPU, RAM) to an existing machine.
In a database world horizontal-scaling is often based on the partitioning of the data i.e. each node contains only part of the data, in vertical-scaling the data resides on a single node and scaling is done through multi-core i.e. spreading the load between the CPU and RAM resources of that machine.
Good examples of horizontal scaling are Cassandra, MongoDB, Google Cloud Spanner. and a good example of vertical scaling is MySQL - Amazon RDS (The cloud version of MySQL).
π Source: stackoverflow.com
Q4: What are the advantages of NoSQL over traditional RDBMS? ββ
Topics: NoSQL Databases
Answer:
NoSQL is better than RDBMS because of the following reasons/properities of NoSQL:
- It supports semi-structured data and volatile data
- It does not have schema
- Read/Write throughput is very high
- Horizontal scalability can be achieved easily
- Will support Bigdata in volumes of Terra Bytes & Peta Bytes
- Provides good support for Analytic tools on top of Bigdata
- Can be hosted in cheaper hardware machines
- In-memory caching option is available to increase the performance of queries
- Faster development life cycles for developers
Still, RDBMS is better than NoSQL for the following reasons/properties of RDBMS:
- Transactions with ACID properties - Atomicity, Consistency, Isolation & Durability
- Adherence to Strong Schema of data being written/read
- Real time query management ( in case of data size < 10 Tera bytes )
- Execution of complex queries involving join & group by clauses
π Source: stackoverflow.com
Q5: When should we embed one document within another in MongoDB? ββ
Topics: MongoDB
Answer:
You should consider embedding documents for:
- contains relationships between entities
- One-to-many relationships
- Performance reasons
π Source: tutorialspoint.com
Q6: Define ACID Properties βββ
Topics: SQL Databases
Answer:
- Atomicity: It ensures all-or-none rule for database modifications.
- Consistency: Data values are consistent across the database.
- Isolation: Two transactions are said to be independent of one another.
- Durability: Data is not lost even at the time of server failure.
π Source: github.com/chetansomani
Q7: Does MongoDB support ACID transaction management and locking functionalities? βββ
Topics: MongoDB
Answer:
ACID stands that any update is:
- Atomic: it either fully completes or it does not
- Consistent: no reader will see a "partially applied" update
- Isolated: no reader will see a "dirty" read
- Durable: (with the appropriate write concern)
Historically MongoDB does not support default multi-document ACID transactions (multiple-document updates that can be rolled back and are ACID-compliant). However, MongoDB provides atomic operation on a single document. MongoDB 4.0 will add support for multi-document transactions, making it the only database to combine the speed, flexibility, and power of the document model with ACID data integrity guarantees.
π Source: tutorialspoint.com
Q8: Explain advantages of BSON over JSON in MongoDB? βββ
Topics: MongoDB JSON
Answer:
- BSON is designed to be efficient in space, but in some cases is not much more efficient than JSON. In some cases BSON uses even more space than JSON. The reason for this is another of the BSON design goals: traversability. BSON adds some "extra" information to documents, like length of strings and subobjects. This makes traversal faster.
- BSON is also designed to be fast to encode and decode. For example, integers are stored as 32 (or 64) bit integers, so they don't need to be parsed to and from text. This uses more space than JSON for small integers, but is much faster to parse.
- In addition to compactness, BSON adds additional data types unavailable in JSON, notably the BinData and Date data types.
π Source: stackoverflow.com
Q9: How can you achieve primary key - foreign key relationships in MongoDB? βββ
Topics: MongoDB
Answer:
By default MongoDB does not support such primary key - foreign key relationships. However, we can achieve this concept by embedding one document inside another (aka subdocuments). Foe e.g. an address document can be embedded inside customer document.
π Source: tutorialspoint.com
Q10: How do I perform the SQL JOIN equivalent in MongoDB? βββ
Topics: MongoDB
Answer:
Mongo is not a relational database, and the devs are being careful to recommend specific use cases for $lookup, but at least as of 3.2 doing join is now possible with MongoDB. The new $lookup operator added to the aggregation pipeline is essentially identical to a left outer join:
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}
π Source: stackoverflow.com
Q11: How does column-oriented NoSQL differ from document-oriented? βββ
Topics: NoSQL
Answer:
The main difference is that document stores (e.g. MongoDB and CouchDB) allow arbitrarily complex documents, i.e. subdocuments within subdocuments, lists with documents, etc. whereas column stores (e.g. Cassandra and HBase) only allow a fixed format, e.g. strict one-level or two-level dictionaries.
For example a document-oriented database (like MongoDB) inserts whole documents (typically JSON), whereas in Cassandra (column-oriented db) you can address individual columns or supercolumns, and update these individually, i.e. they work at a different level of granularity. Each column has its own separate timestamp/version (used to reconcile updates across the distributed cluster).
The Cassandra column values are just bytes, but can be typed as ASCII, UTF8 text, numbers, dates etc. You could use Cassandra as a primitive document store by inserting columns containing JSON - but you wouldn't get all the features of a real document-oriented store.
π Source: stackoverflow.com
Q12: What does Document-oriented vs. Key-Value mean in context of NoSQL? βββ
Topics: NoSQL
Answer:
A key-value store provides the simplest possible data model and is exactly what the name suggests: it's a storage system that stores values indexed by a key. You're limited to query by key and the values are opaque, the store doesn't know anything about them. This allows very fast read and write operations (a simple disk access) and I see this model as a kind of non volatile cache (i.e. well suited if you need fast accesses by key to long-lived data).
A document-oriented database extends the previous model and values are stored in a structured format (a document, hence the name) that the database can understand. For example, a document could be a blog post and the comments and the tags stored in a denormalized way. Since the data are transparent, the store can do more work (like indexing fields of the document) and you're not limited to query by key. As I hinted, such databases allows to fetch an entire page's data with a single query and are well suited for content oriented applications (which is why big sites like Facebook or Amazon like them).
Other kinds of NoSQL databases include column-oriented stores, graph databases and even object databases.
π Source: stackoverflow.com
Q13: What is Denormalization? βββ
Topics: SQL Databases
Answer:
It is the process of improving the performance of the database by adding redundant data.
π Source: github.com/dhaval1406
Q14: What is Sharding in MongoDB? βββ
Topics: MongoDB
Answer:
Sharding is a method for storing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.
π Source: tutorialspoint.com
Q15: When should I use a NoSQL database instead of a relational database? βββ
Topics: NoSQL Databases
Answer:
Relational databases enforces ACID. So, you will have schema based transaction oriented data stores. It's proven and suitable for 99% of the real world applications. You can practically do anything with relational databases.
But, there are limitations on speed and scaling when it comes to massive high availability data stores. For example, Google and Amazon have terabytes of data stored in big data centers. Querying and inserting is not performant in these scenarios because of the blocking/schema/transaction nature of the RDBMs. That's the reason they have implemented their own databases (actually, key-value stores) for massive performance gain and scalability.
If you need a NoSQL db you usually know about it, possible reasons are:
- client wants 99.999% availability on a high traffic site.
- your data makes no sense in SQL, you find yourself doing multiple JOIN queries for accessing some piece of information.
- you are breaking the relational model, you have CLOBs that store denormalized data and you generate external indexes to search that data.
π Source: stackoverflow.com
Q16: When would you use NoSQL? βββ
Topics: NoSQL
Answer:
It depends from some general points:
- NoSQL is typically good for unstructured/"schemaless" data - usually, you don't need to explicitly define your schema up front and can just include new fields without any ceremony
- NoSQL typically favours a denormalised schema due to no support for JOINs per the RDBMS world. So you would usually have a flattened, denormalized representation of your data.
- Using NoSQL doesn't mean you could lose data. Different DBs have different strategies. e.g. MongoDB - you can essentially choose what level to trade off performance vs potential for data loss - best performance = greater scope for data loss.
- It's often very easy to scale out NoSQL solutions. Adding more nodes to replicate data to is one way to a) offer more scalability and b) offer more protection against data loss if one node goes down. But again, depends on the NoSQL DB/configuration. NoSQL does not necessarily mean "data loss" like you infer.
- IMHO, complex/dynamic queries/reporting are best served from an RDBMS. Often the query functionality for a NoSQL DB is limited.
- It doesn't have to be a 1 or the other choice. My experience has been using RDBMS in conjunction with NoSQL for certain use cases.
- NoSQL DBs often lack the ability to perform atomic operations across multiple "tables".
π Source: stackoverflow.com
Q17: Explain BASE terminology in a context of NoSQL ββββ
Topics: NoSQL
Answer:
The BASE acronym is used to describe the properties of certain databases, usually NoSQL databases. It's often referred to as the opposite of ACID. The BASE acronym was defined by Eric Brewer, who is also known for formulating the CAP theorem.
The CAP theorem states that a distributed computer system cannot guarantee all of the following three properties at the same time:
- Consistency
- Availability
- Partition tolerance
A BASE system gives up on consistency.
- Basically available indicates that the system does guarantee availability, in terms of the CAP theorem.
- Soft state indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model.
- Eventual consistency indicates that the system will become consistent over time, given that the system doesn't receive input during that time.
π Source: stackoverflow.com
Q18: Explain eventual consistency in context of NoSQL ββββ
Topics: NoSQL
Answer:
Think about Eventual consistency (as opposed to Strict Consistency/ACID compliance) as:
- Your data is replicated on multiple servers
- Your clients can access any of the servers to retrieve the data
- Someone writes a piece of data to one of the servers, but it wasn't yet copied to the rest
- A client accesses the server with the data, and gets the most up-to-date copy
- A different client (or even the same client) accesses a different server (one which didn't get the new copy yet), and gets the old copy
Basically, because it takes time to replicate the data across multiple servers, requests to read the data might go to a server with a new copy, and then go to a server with an old copy. The term "eventual" means that eventually the data will be replicated to all the servers, and thus they will all have the up-to-date copy.
Eventual consistency is a must if you want low latency reads, since the responding server must return its own copy of the data, and doesn't have time to consult other servers and reach a mutual agreement on the content of the data.
The reason why so many NoSQL systems have eventual consistency is that virtually all of them are designed to be distributed, and with fully distributed systems there is super-linear overhead to maintaining strict consistency (meaning you can only scale so far before things start to slow down, and when they do you need to throw exponentially more hardware at the problem to keep scaling).
π Source: stackoverflow.com
Q19: Explain how would you keep document change history in NoSQL DB? ββββ
Topics: NoSQL
Answer:
There are some solution for that:
- Create a new version of the document on each change - Add a version number to each document on change. The major drawback is that the entire document is duplicated on each change, which will result in a lot of duplicate content being stored when you're dealing with large documents. This approach is fine though when you're dealing with small-sized documents and/or don't update documents very often.
- Only store changes in a new version - For that store only the changed fields in a new version. Then you can 'flatten' your history to reconstruct any version of the document. This is rather complex though, as you need to track changes in your model and store updates and deletes in a way that your application can reconstruct the up-to-date document. This might be tricky, as you're dealing with structured documents rather than flat SQL tables.
- Store changes within the document - Each field can also have an individual history. Reconstructing documents to a given version is much easier this way. In your application you don't have to explicitly track changes, but just create a new version of the property when you change its value.
{
_id: "4c6b9456f61f000000007ba6"
title: [
{ version: 1, value: "Hello world" },
{ version: 6, value: "Foo" }
],
body: [
{ version: 1, value: "Is this thing on?" },
{ version: 2, value: "What should I write?" },
{ version: 6, value: "This is the new body" }
],
tags: [
{ version: 1, value: [ "test", "trivial" ] },
{ version: 6, value: [ "foo", "test" ] }
],
comments: [
{
author: "joe", // Unversioned field
body: [
{ version: 3, value: "Something cool" }
]
},
{
author: "xxx",
body: [
{ version: 4, value: "Spam" },
{ version: 5, deleted: true }
]
},
{
author: "jim",
body: [
{ version: 7, value: "Not bad" },
{ version: 8, value: "Not bad at all" }
]
}
]
}
- Variation on Store changes within the document - Instead of storing versions against each key pair, the current key pairs in the document always represents the most recent state and a 'log' of changes is stored within a history array. Only those keys which have changed since creation will have an entry in the log.
{
_id: "4c6b9456f61f000000007ba6"
title: "Bar",
body: "Is this thing on?",
tags: [ "test", "trivial" ],
comments: [
{ key: 1, author: "joe", body: "Something cool" },
{ key: 2, author: "xxx", body: "Spam", deleted: true },
{ key: 3, author: "jim", body: "Not bad at all" }
],
history: [
{
who: "joe",
when: 20160101,
what: { title: "Foo", body: "What should I write?" }
},
{
who: "jim",
when: 20160105,
what: { tags: ["test", "test2"], comments: { key: 3, body: "Not baaad at all" }
}
]
}
π Source: stackoverflow.com
Q20: Explain use of transactions in NoSQL ββββ
Topics: NoSQL
Answer:
NoSQL covers a diverse set of tools and services, including key-value-, document, graph and wide-column stores. They usually try improving scalability of the data store, usually by distributing data processing. Transactions require ACID properties of how DBs perform user operations. ACID restricts how scalability can be improved: most of the NoSQL tools relax consistency criteria of the operatioins to get fault-tolerance and availability for scaling, which makes implementing ACID transactions very hard.
A commonly cited theoretical reasoning of distributed data stores is the CAP theorem: consistency, availability and partition tolerance cannot be achieved at the same time.
A new, weaker set of requirements replacing ACID is BASE ("basically avalilable, soft state, eventual consistency"). However, eventually consistent tools ("eventually all accesses to an item will return the last updated value") are hardly acceptable in transactional applications like banking.
Generally speaking, NoSQL solutions have lighter weight transactional semantics than relational databases, but still have facilities for atomic operations at some level. Generally, the ones which do master-master replication provide less in the way of consistency, and more availability. So one should choose the right tool for the right problem.
Many offer transactions at the single document (or row etc.) level. For example with MongoDB there is atomicity at the single document - but documents can be fairly rich so this usually works.
π Source: stackoverflow.com
Q21: How do you track record relations in NoSQL? ββββ
Topics: NoSQL
Answer:
All the answers for how to store many-to-many associations in the "NoSQL way" reduce to the same thing: storing data redundantly.
In NoSQL, you don't design your database based on the relationships between data entities. You design your database based on the queries you will run against it. Use the same criteria you would use to denormalize a relational database: if it's more important for data to have cohesion (think of values in a comma-separated list instead of a normalized table), then do it that way.
But this inevitably optimizes for one type of query (e.g. comments by any user for a given article) at the expense of other types of queries (comments for any article by a given user). If your application has the need for both types of queries to be equally optimized, you should not denormalize. And likewise, you should not use a NoSQL solution if you need to use the data in a relational way.
There is a risk with denormalization and redundancy that redundant sets of data will get out of sync with one another. This is called an anomaly. When you use a normalized relational database, the RDBMS can prevent anomalies. In a denormalized database or in NoSQL, it becomes your responsibility to write application code to prevent anomalies.
One might think that it'd be great for a NoSQL database to do the hard work of preventing anomalies for you. There is a paradigm that can do this - the relational paradigm.
π Source: stackoverflow.com
Q22: How does MongoDB ensure high availability? ββββ
Topics: MongoDB
Answer:
MongoDB automatically maintains replica sets, multiple copies of data that are distributed across servers, racks and data centers. Replica sets help prevent database downtime using native replication and automatic failover.
A replica set consists of multiple replica set members. At any given time one member acts as the primary member, and the other members act as secondary members. If the primary member fails for any reason (e.g., hardware failure), one of the secondary members is automatically elected to primary and begins to process all reads and writes.
π Source: mongodb.com
Q23: MongoDB relationships. What to use - embed or reference? ββββ
Topics: MongoDB
Problem:
I want to design a question structure with some comments, but I don't know which relationship to use for comments: embed or reference? Explain me pros and cons of both solutions?
Solution:
In general,
- embed is good if you have one-to-one or one-to-many relationships between entities, and
- reference is good if you have many-to-many relationships.
Also consider as a general rule, if you have a lot of [child documents] or if they are large, a separate collection might be best. Smaller and/or fewer documents tend to be a natural fit for embedding.
π Source: stackoverflow.com
Q24: Explain the differences in conceptual data design with NoSQL databases? βββββ
Topics: NoSQL Databases
Problem:
What's easier, what's harder, what can't be done at all?
Solution:
I'm answering this with MongoDB in the back of my mind, but I would presume most would be true for other DBs also.
Harder:
- Consistency is not handled by the database but must be dealt with in the application. Less guarantees means easier migration, fail-over and better scalability at the cost of a more complicated application. An application has to deal with conflicts and inconsistencies.
- Links which cross documents (or key/value) have to be dealt with on application level also.
- SQL type of databases have IDEs which are much more mature. You get a lot of support libraries (although the layering of those libraries make things much more complex than needed for SQL).
- Keep related data together in the same document could be tricky, since there is nothing corresponding to a join.
- Map/reduce as a means of querying a database is unfamiliar, and requires a lot more thinking than writing SQL.
Easier:
- Faster if you know your data access patterns (views or specific queries).
- Migration / Fail-over is easier for the database since no promises are made to you as an application programmer. Although you get eventual consistency.
- One key / value is much easier to understand than one row from a table. All the (tree) relations are already in, and complete objects can be recognized.
- No designing DB tables
- No ODBC/JDBC intermediate layer, all queries and transactions over http
- Simple DB-to-object mapping from JSON, which is almost trivial compared to the same in SQL
π Source: stackoverflow.com
Q25: Where does MongoDB stand in the CAP theorem? βββββ
Topics: MongoDB
Answer:
MongoDB is strongly consistent by default - if you do a write and then do a read, assuming the write was successful you will always be able to read the result of the write you just read. This is because MongoDB is a single-master system and all reads go to the primary by default.
On the other hand you can't just say that MongoDB is CP/AP/CA, because it actually is a trade-off between C, A and P, depending on both database/driver configuration and type of disaster: here's a visual recap, and below a more detailed explanation.
Scenario | Main Focus | Description |
---|---|---|
No partition | CA | The system is available |
and provides strong consistency | ||
--------------------------- | ------------ | ------------------------------------ |
partition, | AP | Not synchronized writes |
majority connected | from the old primary are ignored | |
--------------------------- | ------------ | ------------------------------------ |
partition, | CP | only read access is provided |
majority not connected | to avoid separated and inconsistent systems |
Consistency -
MongoDB is strongly consistent when you use a single connection or the correct Write/Read Concern Level (Which will cost you execution speed). As soon as you don't meet those conditions (especially when you are reading from a secondary-replica) MongoDB becomes Eventually Consistent.
Availability -
MongoDB gets high availability through Replica-Sets. As soon as the primary goes down or gets unavailable else, then the secondaries will determine a new primary to become available again. There is an disadvantage to this: Every write that was performed by the old primary, but not synchronized to the secondaries will be rolled back and saved to a rollback-file, as soon as it reconnects to the set(the old primary is a secondary now). So in this case some consistency is sacrificed for the sake of availability.
Partition Tolerance -
Through the use of said Replica-Sets MongoDB also achieves the partition tolerance: As long as more than half of the servers of a Replica-Set is connected to each other, a new primary can be chosen. Why? To ensure two separated networks can not both choose a new primary. When not enough secondaries are connected to each other you can still read from them (but consistency is not ensured), but not write. The set is practically unavailable for the sake of consistency.
π Source: stackoverflow.com
Thanks π for reading and good luck on your interview! Please share this article with your fellow Devs if you like it! Check more FullStack Interview Questions & Answers on π www.fullstack.cafe