INTRODUCTION
SQL stands for Structured Query Language. Structured query language (SQL) is a programming language for storing and processing information in a relational database. A data analyst uses SQL to manipulate and gain insights from the data. In this project, we will analyze data and for this purpose we will use sakila sample database.
For this project, I will be using MySQL Workbench. You can download it from here. MySQL is an open-source relational database management system. You can use any other RDBMS software. Once your software is set up we shall import Data into our database.
IMPORTING DATA
After downloading sakila database extract the files from the zip file. Create database on workbench and name it Sakila.
Open MYSQL workbench and on file click on open SQL script
Open folder you extracted the data to and click on schema then open and run script
Repeat the same process to import data and run the SQL script
Data Analysis
- Find all films with PG-13 films with rental rate of 2.99 or lower
SELECT *
FROM (
SELECT *
FROM film
) x
WHERE x.rating = 'PG-13' AND x.rental_rate <= 2.99;
In my filter condition I used and since both of the conditions must be satisfied. The sub-query helped optimize our SQL query run time to save on time and memory.
- All films that have deleted scenes
SELECT *
FROM film f
WHERE f.special_features like '%Deleted Scenes%';
special_features field contains description about the film and that is where it specifies whether a film has deleted scene. LIKE operator is used to retrieve the data in a column of a table, based on a specified pattern.
- All active customers
SELECT *
FROM customer c
WHERE c.active = 1;
Active field in customers table has a boolean data type of 1 or 0 where 1 is True and 0 is a False. We filtered customers where active is 1 or True.
- Distinct names of customers who rented a movie on 26th July 2005
SELECT DISTINCT(concat(c.first_name, ' ', c.last_name)) as full_name
FROM customer c
inner join rental r USING(customer_id)
WHERE r.rental_date like '2005-07-26%';
I decided to concatenate first and last name just incase two or more customers have the same first name but different last name are not left out
- How many rentals we do on each day?
select date(rental_date),
count(rental_id) as num_of_rentals
from rental
group by date(rental_date)
order by num_of_rentals desc;
rental_date column is a datetime data type therefore I extracted date from rental date so that we had to group by date only.
- What are the three top earning days so far?
select date(payment_date) as date,
count(rental_id) as num_of_rentals,
sum(amount) as money
FROM payment
GROUP BY date(payment_date)
ORDER BY sum(amount) desc
limit 3;
After grouping our data we need to sort the data in a descending manner by use of order by and to choose the top 3 best days so far we use limit clause.
With this analysis of the data we are now aware of which type of movies to keep in stock, active customers to keep and can also further create a predictive model to predict the sales of film but for that, we need some more data.
Thanks for reading! For full project check it out on Github
If you want to get in touch with me, feel free to reach me at carsonallan71@gmail.com or my LinkedIn Profile.