Basic SQL Techniques For Data Analysis
You will learn common SQL commands
What you will learn from this post:
- Introduction to Relational Database & SQL (Structured Query Language)
- SQL Queries With Examples
Introduction to Relational Database & SQL:
The relational database is a type of database. It is a collection of tables with a predefined relationship between them. The relational database is compact, well structured, and efficient.
A relational management platform that allows you to manage the relational databases is called a relational database management system(RDBMS). Some examples of relational database management systems are MySQL, Oracle DB, Microsoft SQL Server, PostgreSQL.
SQL is the structured query language specially used to define, manage, manipulate, or retrieve the data inside the relational database management system(RDBMS).
Before going into examples download the sample database from this website. Open the file in MySQL Workbench and execute the code. Under schemas, you will see the new schema classicmodels.
SQL Queries With Examples:
1.SQL Select & From: Used to retrieve data from the table, SELECT indicates which columns you want to view and FROM indicates from which table.
#SQL select & From statement examples1. SELECT * FROM employees;
#returns all data from employees table2. SELECT firstName, lastName FROM employees;
#returns particular columns
2. Distinct: Distinct allows you to view the unique values in a column. One or more columns can be used in the expression.
#SQL distinct clause examples1. SELECT distinct paymentDate FROM payments;
#returns unique paymentdates
3. SQL Where: This allows you to specify conditions while retrieving data from the table. To specify conditions you need to use comparison or logical operators.
#SQL where clause examples1. SELECT * FROM customers WHERE city="NYC";
#returns customers data who belongs to NYC city
2. SELECT * FROM employees WHERE firstName="Mary" AND
jobTitle="VP Sales";
#returns employee data (two conditions should satisfy)
3. SELECT * FROM products WHERE quantityInStock>5000;
#returns products info whose quantity is greater than 5000
4. SELECT * FROM products WHERE productLine LIKE '%Cars';
#returns products info whose productline ends with Cars
5. SELECT * FROM offices WHERE city IN('London','Sydney','Tokyo');
#returns office address for three cities
6. SELECT * FROM payments WHERE amount BETWEEN 6000 AND 9000;
#returns payment information between 6000 to 9000, includes the
intervals
7. SELECT * FROM customers WHERE NOT city='Singapore';
#returns customer information except for city Singapore
8. SELECT * FROM orders WHERE comments IS NULL;
#returns order info(comment column is null)
4.SQL Aggregation: Aggregate functions aggregate across the entire column. You will use the aggregate functions all the time.
Min, Max, Count functions can be used with numerical and non-numerical columns and whereas sum, avg functions are used with numerical columns only.
COUNT
: returns the total number of rowsSUM
: returns the sum of all the valuesMAX
: returns the maximum valueMIN
: returns the minimum valueAVG
: returns the average value
#SQL aggregate functions examples1. SELECT COUNT(*) FROM employees;
#returns total number of rows (non null + null values)2. SELECT COUNT(customerName) FROM customers;
#counts particular column rows(only non null values)3. SELECT MAX(amount), MIN(amount), SUM(amount) FROM payments;
#returns maximum, minimum and total payment amount4. SELECT MAX(firstName) FROM employees;
#firstname is non numerical column so returns as close
alphabetically to "Y" as possible because of MAX function
5. AS (Alias): This allows you to rename a column or table. Note that it won’t alter the names in their original columns or tables.
#SQL Alias examplesSELECT MAX(amount) AS max_amount, MIN(amount) AS min_amount, SUM(amount) AS total_amount FROM payments;
6.SQL Group By: group by clause allows you to separate data into groups, which can be aggregated independently of one another.
You can group by multiple columns, but you have to separate column names with commas same as order by clause.
#SQL group bySELECT productline, sum(quantityInStock) FROM products group by productline;
#data is divided into groups (In our case it is productline) and then aggregates each group independently based on aggregation function you specified.See the below image to see how the groupby works
7.SQL Having: Having clause allows you to filter the data that has been aggregated. In where clause you cannot specify aggregate functions but in having clause you can specify aggregate functions.
Most of the time Having clause is used with the group by clause. Having can be used without group by clause, in an aggregate function, in that case it behaves like where clause.
#SQL HavingSELECT productline, sum(quantityInStock) FROM products group by productline having sum(quantityInStock)>60000;
#Grouped data is filtered using having clauseSee the below image to see how the having clause works
8.SQL Order By: Order by clause allows you to reorder your results in ascending or descending order based on the data in one or more columns. By default orders in ascending order
#SQL order by1. SELECT * FROM payments ORDER BY amount;
#sorts result in ascending order2. SELECT * FROM offices ORDER BY City Desc;
#sorts result in descending order by desc keyword3. SELECT * FROM employees ORDER BY firstName, employeeNumber desc;
#first sorts the result by firstName in ascending order (A to Z) and then by employeeNumber in descending order within those already sorted recordsTo understand sorting data by two columns(3rd example), see below image.
9. SQL Limit: Limit allows you to specify the number of records to return
#SQL limitSELECT * from orders limit 5;
#returns first 5 records
10. Joins: Before we discuss the concept of joins, I think it is important that we first distinguish the difference between a primary key and a foreign key.
The primary key is a unique identifier of the table. It must contain unique values and cannot have null values. Each table can have only one primary key. The table containing the primary key is called the parent table.
The foreign key is used to link two tables together. It is a column in one table that refers to the primary key in another table. Each table can have multiple foreign keys. The table containing the foreign key is called the child table.
Joins is used to combine rows from two or more tables based on the related column between them (Primary key and Foreign key)
Types of joins:
Inner Join
: Returns records that have matching values in both tablesLeft Join
: Returns all records from the left table, and the matched records from the right tableRight Join
: Returns all records from the right table, and the matched records from the left tableOuter Join
: Returns all records when there is a match in either left or right table
The most common type of join is left join
customerNumber in the customers table is a primary key and customerNumber in the orders table is a foreign key. (See ER diagram here)
SELECT c.customerName, c.phone, c.city, o.orderDate, o.status
FROM customers AS c
LEFT JOIN
orders AS o
ON c.customerNumber = o.customerNumber;
Let me wrap up the select command with the full syntax. The below sequence is important while writing queries.
#Syntax
SELECT column1, column2 .... column(n)
FROM table_name
WHERE condition1, condition2 .... condition(n)
GROUP BY column1, column2 .... column(n)
HAVING condition1, condition2 .... condition(n)
ORDER BY column1, column2 .... column(n)
LIMIT number;#example
SELECT productLine, sum(QuantityInStock)
FROM products
WHERE buyPrice>50
GROUP BY productLine
HAVING sum(QuantityInStock)>50000
To download the SQL script follow this link
SQL techniques learned through this post are slicing, sorting, grouping, filtering, counting, unique, aggregating data. All these techniques are useful while performing data analysis.
I hope you have gained a better understanding of what SQL is, its significance in data analytics but more importantly, realize that it is actually relatively easy to learn once you get hold of the fundamentals.
Until my next article, feel free to check out my other content here.