Questions Index

Difference between Data and Information?
What is metadata & Data Dictionary?
What is Database and DBMS?
What are the 3 levels of ANSI Database?
Define Data Abstraction?
Define Mapping and Data Independence?
Types of Database Users?
Define ER Diagram?
Generalisation vs Specialization
Types of Keys?
Types of Joins?
Types of Indexing?
What is RDMS?
Advantages of DBMS over File System
Explain Different Languages Present in DBMS
What is Meant by ACID Properties in DBMS?
What is Meant by Normalization and Denormalization?
What is a Lock? Explain the Major Difference Between Shared Lock and Exclusive Lock
What is Data Warehousing?
Types of Normalization
Various Types of Anomalies
Difference Between Having and Where Clause
What is a Trigger?
What is a Stored Procedure?
What is Clustered and Non-Clustered Index?
What is a Live Lock?
What is the Purpose of SQL?
What is a Clause in Terms of SQL?
What is the Main Goal of RAID Technology?
State vs Schema
Describe the Role of a Transaction Log in a DBMS
Define SQL Injection
What is data mining in DBMS?
Char vs Varchar
What happens when u reach the max value of auto increment in SQL?
Default port for MySQL server:
How to store images in database?
Serial vs Non Serial Schedule
Serializability and Serialization
Conflict Serialization
View Serializable
Recoverable Schedule
Cascading Schedule
Cascadeless Schedule
Strict Schedule

Difference between Data and Information?

Data is raw, unprocessed, unorganized facts that are seemingly random and do not yet carry any significance or meaning.

Information refers to data that has been organized, interpreted, and contextualized by a human or machine so that it possesses relevance and purpose.

What is metadata & Data Dictionary?

A data dictionary in a Database Management System (DBMS) serves the vital purpose of providing a comprehensive repository of metadata, defining data structures, attributes, relationships, and constraints within a database. This enhances data integrity, facilitates efficient data management, and aids in database schema design.

For instance, in a healthcare DBMS, a data dictionary would detail patient records, specifying fields like ‘patient ID’, ‘name’, ‘DOB’, and associated constraints, streamlining data organization and retrieval.

What is Database and DBMS?

Database: A collection of logically related data.

DBMS (Database Management System) is a set of applications or programs that enable users to create and maintain a database. DBMS provides a tool or an interface for performing various operations such as inserting, deleting, updating, etc., into a database.

What are the 3 levels of ANSI Database?

Internal level (Physical level):

Describes how data is stored on the storage device.
Deals with physical storage of data.
Structure of records on disk - files, pages, blocks, indexes, and ordering of records.
Internal view is described by the internal schema.

Conceptual level (Logical level):

Describes what data is stored and the relationships among those data.
Hides low-level complexities of physical storage.
For example, a STUDENT database may contain STUDENT and COURSE tables, visible to users but without revealing storage details.
Database administrator works at this level to determine what data to keep in the database.

External level (View level):

It describes only part of the entire database that an end user concern or how data are viewed by each user.
Different user needs different views of the database, so there can be many views in a view level abstraction of the database. Used by end users and application programmers.
End users need to access only part of the database rather than the entire database.

Define Data Abstraction?

Data abstraction is the process of hiding unwanted and irrelevant details from the end user. It helps to store information in such a way that the end user can access data which is necessary, the user will not be able to see what data is stored or how it is stored in a database.

Define Mapping and Data Independence?

Mapping: The process of transforming requests and results between the three levels is called mapping.

Data Independence: The ability to modify a schema definition at one level without affecting schema definitions at the next higher level.

Physical Data Independence

Physical Data Independence is the ability to modify the physical schema without requiring any change in logical (conceptual) schema and application programs.
Modifications at the internal levels are occasionally necessary to improve performance.
Possible modifications at internal levels are changes in file structures, compression techniques, hashing algorithms, storage devices, etc.

Logical Data Independence

Logical data independence is the ability to modify the conceptual schema without requiring any change in application programs.
Modification at the logical levels is necessary whenever the logical structure of the database is changed.
Application programs are heavily dependent on logical structures of the data they access. So any change in logical structure also requires programs to change.

Types of Database Users?

Naive Users (End Users): Unsophisticated users who have zero knowledge of database system
End user interacts to database via sophisticated software or tools
e.g. Clerk in bank
Application Programmers : Programmers who write software using tools such as Java, .Net, PHP etc…
e.g. Software developers
Sophisticated Users : Interact with database system without using an application program
Use query tools like SQL
e.g. Analyst
Specialized Users (DBA): User write specialized database applications program. Use administration tools
e.g. Database Administrator

Define ER Diagram?

ER Diagram is a graphical representation of Database.The Entity Relational Model is a model for identifying entities to be represented in the database and representation of how those entities are related.

Generalisation vs Specialization

Generalisation: The process of taking the union of two or more lower-level entity sets to create a higher-level entity set (Bottom-up approach).
Specialization: The process of creating lower-level entity sets from a higher-level entity set (Top-down approach).

Types of Keys?

There are mainly seven types of keys in a database:
1) Candidate Key: The candidate key represents a set of properties that can uniquely identify a table.
2) Super Key: The super key defines a set of attributes that can uniquely identify a tuple.
3) Primary Key: The primary key defines a set of attributes that are used to uniquely identify every tuple.
4) Unique Key: The unique key is very similar to the primary key except that primary keys don’t allow NULL values in the column but unique keys allow them. So essentially unique keys are primary keys with NULL values.
5) Alternate Key: All the candidate keys which are not chosen as primary keys are considered as alternate Keys.
6) Foreign Key: The foreign key defines an attribute that can only take the values present in one table common to the attribute present in another table.
7) Composite Key: A composite key refers to a combination of two or more columns that can uniquely identify each tuple in a table.

Types of Joins?

A Join is one of the SQL statements which is used to join the data or the rows from 2 or more tables on the basis of a common field/column among them.

There are 4 types of SQL Joins:

Inner Join: This type of join is used to fetch the data among the tables which are common in both tables.
Left Join: This returns all the rows from the table which is on the left side of the join but only the matching rows from the table which is on the right side of the join.
Right Join: This returns all the rows from the table which is on the right side of the join but only the matching rows from the table which is on the left side of the join.
Full Join: This returns the rows from all the tables on which the join condition has been put and the rows which do not match hold null values.

Types of Indexing?

Indexing is used for the faster retrieval of the data and get the query result faster.

There are various types of indexing. They are :

Cluster Indexing: Clustered index can be defined as an ordered data file. Sometimes the index is created on non-primary key columns which may not be unique for each record. In this case, to identify the record faster, we will group two or more columns to get the unique value and create index out of them. This method is called a clustering index.
Primary Indexing: This is a type of Clustered Indexing wherein the data is sorted according to the search key and the primary key of the database table is used to create the index.
Secondary Indexing: gives us a list of virtual pointers or references to the location where the data is actually stored.
Dense Key: There is an index record for every search key in the data file.

Spare Key: Index record are not created for every search key. Here index record contain the search key and pointer to the actual address where data has been stored. And if the data has not been found then it will start the sequential search.

What is RDMS?

RDBMS stands for Relational Database Management System. RDBMS stores data in the form of tables as compared to DBMS which stores data as files and tables are related to each other result in the faster accessing of data.

Advantages of DBMS over File System

Data redundancy and inconsistency: The file system cannot control the redundancy of data as each user defines and maintains the needed files for a specific application to run. There may be a possibility that two users are maintaining the data of the same file for different applications. Hence changes made by one user do not reflect in files used by second users, which leads to inconsistency of data. Whereas DBMS controls redundancy by maintaining a single repository of data that is defined once and is accessed by many users. As there is no or less redundancy, data remains consistent.
Data sharing: The file system does not allow sharing of data or sharing is too complex. Whereas in DBMS, data can be shared easily due to a centralized system.
Data searching: For every search operation performed on the file system, a different application program has to be written. While DBMS provides inbuilt searching operations. The user only has to write a small query to retrieve data from the database.
Data integrity: There may be cases when some constraints need to be applied to the data before inserting it into the database. The file system does not provide any procedure to check these constraints automatically. Whereas DBMS maintains data integrity by enforcing user-defined constraints on data by itself.
System crashing: In some cases, systems might have crashed due to various reasons. It is a bane in the case of file systems because once the system crashes, there will be no recovery of the data that’s been lost. A DBMS will have the recovery manager which retrieves the data making it another advantage over file systems.

Explain Different Languages Present in DBMS

Following are various languages present in DBMS:

DDL(Data Definition Language): It contains commands which are required to define the database.
E.g., CREATE, ALTER, DROP, TRUNCATE, RENAME, etc.
DML(Data Manipulation Language): It contains commands which are required to manipulate the data present in the database.
E.g., SELECT, UPDATE, INSERT, DELETE, etc.
DCL(Data Control Language): It contains commands which are required to deal with the user permissions and controls of the database system.
E.g., GRANT and REVOKE.
TCL(Transaction Control Language): It contains commands which are required to deal with the transaction of the database.
E.g., COMMIT, ROLLBACK, and SAVEPOINT.

What is Meant by ACID Properties in DBMS?

ACID stands for Atomicity, Consistency, Isolation, and Durability in a DBMS these are those properties that ensure a safe and secure way of sharing data among multiple users.

Atomicity: This property reflects the concept of either executing the whole query or executing nothing at all, which implies that if an update occurs in a database then that update should either be reflected in the whole database or should not be reflected at all.
Consistency: This property ensures that the data remains consistent before and after a transaction in a database.
Isolation: This property ensures that each transaction is occurring independently of the others. This implies that the state of an ongoing transaction doesn’t affect the state of another ongoing transaction.
Durability: This property ensures that the data is not lost in cases of a system failure or restart and is present in the same state as it was before the system failure or restart.

What is Meant by Normalization and Denormalization?

Normalization is a process of reducing redundancy by organizing the data into multiple tables. Normalization leads to better usage of disk spaces and makes it easier to maintain the integrity of the database.
Denormalization is the reverse process of normalization as it combines the tables which have been normalized into a single table so that data retrieval becomes faster. JOIN operation allows us to create a denormalized form of the data by reversing the normalization.

What is a Lock? Explain the Major Difference Between Shared Lock and Exclusive Lock

A database lock is a mechanism to protect a shared piece of data from getting updated by two or more database users at the same time.

Shared Lock: A shared lock is required for reading a data item and many transactions may hold a lock on the same data item in a shared lock. Multiple transactions are allowed to read the data items in a shared lock.
Exclusive lock: An exclusive lock is a lock on any transaction that is about to perform a write operation. This type of lock doesn’t allow more than one transaction and hence prevents any inconsistency in the database.

What is Data Warehousing?

A data warehouse can be considered as a central repository where data flows from transactional systems and other relational databases and is used for data analytics. A data warehouse comprises a wide variety of an organization’s historical data that supports the decision-making process in an organization.

Types of Normalization

First Normal Form (1NF): In 1NF, each table cell should contain only a single value, and each column should have a unique name.
Second Normal Form (2NF): 2NF eliminates redundant data by requiring that each non-key attribute be dependent on the primary key. This means that there should be no partial dependency which means (prime -> nonprime)
Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key attributes are independent of each other. There should be no transitive dependency (nonprime -> nonprime)
Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures that each determinant in a table is a candidate key. This means that every prime attribute must be derived from the super key
(super key -> prime)

Various Types of Anomalies

Anomalies in DBMS arise primarily due to flawed database design, particularly when normalization rules are not properly applied. These issues manifest in three main forms:

Insertion Anomaly: Occurs when you cannot add data to the database due to the absence of other related data. For instance, being unable to add a new employee because their department does not exist in the database yet.
Deletion Anomaly: Happens when removing a record also unintentionally removes other valuable data. For example, deleting the last employee in a department might inadvertently remove the entire department from the database.
Update Anomaly: Arises when a piece of information that appears in multiple places is updated in one location but not the others, leading to inconsistent data across the database.

Difference Between Having and Where Clause

HAVING is used to specify a condition for a group or an aggregate function used in a select statement. The WHERE clause selects before grouping. The HAVING clause selects rows after grouping. Unlike the HAVING clause, the WHERE clause cannot contain aggregate functions.

What is a Trigger?

Trigger in DBMS is a special type of stored procedure that is automatically executed in response to certain database events such as an INSERT, UPDATE, or DELETE operation. Triggers can be used to perform actions such as data validation, enforcing business rules, or logging.

What is a Stored Procedure?

A stored procedure is like a function that contains a set of operations compiled together. It contains a set of operations that are commonly used in an application to do some common database tasks.

What is Clustered and Non-Clustered Index?

Clustered Index : A clustered index defines the order in which data is physically stored in a table. Table data can be sorted in only way, therefore, there can be only one clustered index per table. In SQL Server, the primary key constraint automatically creates a clustered index on that particular column.
Non-Clustered Indexes : A non-clustered index doesn’t sort the physical data inside the table. In fact, a non-clustered index is stored at one place and table data is stored in another place. This is similar to a textbook where the book content is located in one place and the index is located in another. This allows for more than one non-clustered index per table.

What is a Live Lock?

Livelock situation can be defined as when two or more processes continually repeat the same interaction in response to changes in the other processes without doing any useful work These processes are not in the waiting state, and they are running concurrently. This is different from a deadlock because in a deadlock all processes are in the waiting state.

What is the Purpose of SQL?

SQL stands for Structured Query Language whose main purpose is to interact with the relational databases in the form of inserting, deleting and updating/modifying the data in the database.

What is a Clause in Terms of SQL?

This is used with the SQL queries to fetch specific data as per the requirements on the basis of the conditions that are put in the SQL. This is very helpful in picking the selective records from the complete set of records.

What is the Main Goal of RAID Technology?

RAID stands for Redundant Array of Inexpensive (or sometimes “Independent”)Disks.

RAID is a method of combining several hard disk drives into one logical unit (two or more disks grouped together to appear as a single device to the host system). RAID technology was developed to address the fault-tolerance and performance limitations of conventional disk storage.

State vs Schema

The collection of information stored in a database at a particular moment in time is called database state while the overall design of the database is called the database schema.

Describe the Role of a Transaction Log in a DBMS

A transaction log (also known as a redo log or audit trail) is a file that records all changes made to the database during transactions. It serves several important purposes:

Recovery: In the event of a system failure, the transaction log can recover the database to a consistent state by replaying or undoing transactions.
Concurrency Control: The transaction log can support concurrency control mechanisms such as locking and rollback, ensuring that transactions are isolated and maintaining data integrity.
Audit Trail: The transaction log records all changes made to the database, enabling auditing and compliance with regulatory requirements.

Define SQL Injection

SQL injection, also known as SQLI, is a common attack vector that uses malicious SQL code for backend database manipulation to access information that was not intended to be displayed.

What is data mining in DBMS?

Data mining is the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis.

Char vs Varchar

Char:

CHAR datatype is used to store character strings of fixed length. In CHAR, if the length of the string is less than set or fixed-length then it is padded with extra memory space.

Varchar:

In VARCHAR, if the length of the string is less than the set or fixed-length then it will store as it is without padded with extra memory spaces. VARCHAR datatype is used to store character strings of variable length.

What happens when u reach the max value of auto increment in SQL?

When the column reaches the upper limit of the data type, the next attempt to generate a sequence number fails.

Default port for MySQL server:

3306

How to store images in database?

There are several ways to store images in a database, including as binary data, file paths, or using cloud storage. The best method depends on the specific requirements and constraints of the project. Other way can be storing images in SQL databases using BLOB (Binary Large Object) data types.

Serial vs Non Serial Schedule

Serial: Schedules in which the transactions are executed non-interleaved, i.e., a serial schedule is one in which no transaction starts until a running transaction has ended are called serial schedules.

Non Serial: The type of scheduling in which transaction is interleaved and more than one transaction execute concurrently then such type of scheduling is called Non Serial.

Serializability and Serialization

Serializability in DBMS ensures that multiple transactions can access and modify the same data without interfering with each other's operations. It helps to prevent data inconsistencies and anomalies that can occur when multiple transactions try to access and modify the same data concurrently.

Serialization refers to the process of managing transactions in such a way that their operations are executed in a serial (one after another) order, ensuring the consistency and isolation of data. This process is crucial for maintaining database integrity when multiple transactions are executed concurrently.

Conflict Serialization

A schedule is called conflict serializable if non-serial schedule can be transformed into a serial schedule by swapping non-conflicting operations.

View Serializable

A Schedule is called view serializable if it is view equal to a serial schedule (no overlapping transactions). A conflict schedule is a view serializable but if the serializability contains blind writes, then the view serializable does not conflict serializable.

Recoverable Schedule

Even if the schedule is conflict or view serializable still it does guarantee that the transaction is consistent considering the case of system failure. Schedules in which transactions commit only after all transactions whose changes they read commit are called recoverable schedules. In other words, if some transaction Tj is reading value updated or written by some other transaction Ti, then the commit of Tj must occur after the commit of Ti.

Cascading Schedule

When there is a failure in one transaction and this leads to the rolling back or aborting other dependent transactions, then such scheduling is referred to as Cascading rollback or cascading abort.

Cascadeless Schedule

Schedules in which transactions read values only after all transactions whose changes they are going to read commit are called cascadeless schedules.

Strict Schedule

Tj can read or write updated or written value of Ti only after Ti commits/aborts.

DBMS Interview Question for Freshers 📝