Basic SQL Techniques For Data Analysis

You will learn common SQL commands

Ektamanvar
7 min readMar 2, 2022
Photo by Kevin Ku on Unsplash

What you will learn from this post:

  1. Introduction to Relational Database & SQL (Structured Query Language)
  2. SQL Queries With Examples

Introduction to Relational Database & SQL:

The relational database is a type of database. It is a collection of tables with a predefined relationship between them. The relational database is compact, well structured, and efficient.

A relational management platform that allows you to manage the relational databases is called a relational database management system(RDBMS). Some examples of relational database management systems are MySQL, Oracle DB, Microsoft SQL Server, PostgreSQL.

SQL is the structured query language specially used to define, manage, manipulate, or retrieve the data inside the relational database management system(RDBMS).

Before going into examples download the sample database from this website. Open the file in MySQL Workbench and execute the code. Under schemas, you will see the new schema classicmodels.

SQL Queries With Examples:

1.SQL Select & From: Used to retrieve data from the table, SELECT indicates which columns you want to view and FROM indicates from which table.

#SQL select & From statement examples1. SELECT * FROM employees;
#returns all data from employees table
2. SELECT firstName, lastName FROM employees;
#returns particular columns

2. Distinct: Distinct allows you to view the unique values in a column. One or more columns can be used in the expression.

#SQL distinct clause examples1. SELECT distinct paymentDate FROM payments;
#returns unique paymentdates

3. SQL Where: This allows you to specify conditions while retrieving data from the table. To specify conditions you need to use comparison or logical operators.

#SQL where clause examples1. SELECT * FROM customers WHERE city="NYC";
#returns customers data who belongs to NYC city
where — equal operator
2. SELECT * FROM employees WHERE firstName="Mary" AND 
jobTitle="VP Sales";
#returns employee data (two conditions should satisfy)
where — AND operator
3. SELECT * FROM products WHERE quantityInStock>5000;
#returns products info whose quantity is greater than 5000
where — greater than operator
4. SELECT * FROM products WHERE productLine LIKE '%Cars';
#returns products info whose productline ends with Cars
where — LIKE operator
5. SELECT * FROM offices WHERE city IN('London','Sydney','Tokyo');
#returns office address for three cities
where — IN operator
6. SELECT * FROM payments WHERE amount BETWEEN 6000 AND 9000;
#returns payment information between 6000 to 9000, includes the
intervals
where — BETWEEN operator
7. SELECT * FROM customers WHERE NOT city='Singapore';
#returns customer information except for city Singapore
where — NOT operator
8. SELECT * FROM orders WHERE comments IS NULL;
#returns order info(comment column is null)
where — NULL operator

4.SQL Aggregation: Aggregate functions aggregate across the entire column. You will use the aggregate functions all the time.

Min, Max, Count functions can be used with numerical and non-numerical columns and whereas sum, avg functions are used with numerical columns only.

  • COUNT: returns the total number of rows
  • SUM: returns the sum of all the values
  • MAX: returns the maximum value
  • MIN: returns the minimum value
  • AVG: returns the average value
#SQL aggregate functions examples1. SELECT COUNT(*) FROM employees; 
#returns total number of rows (non null + null values)
2. SELECT COUNT(customerName) FROM customers;
#counts particular column rows(only non null values)
3. SELECT MAX(amount), MIN(amount), SUM(amount) FROM payments;
#returns maximum, minimum and total payment amount
4. SELECT MAX(firstName) FROM employees;
#firstname is non numerical column so returns as close
alphabetically to "Y" as possible because of MAX function

5. AS (Alias): This allows you to rename a column or table. Note that it won’t alter the names in their original columns or tables.

#SQL Alias examplesSELECT MAX(amount) AS max_amount, MIN(amount) AS min_amount, SUM(amount) AS total_amount FROM payments;

6.SQL Group By: group by clause allows you to separate data into groups, which can be aggregated independently of one another.

You can group by multiple columns, but you have to separate column names with commas same as order by clause.

#SQL group bySELECT productline, sum(quantityInStock) FROM products group by productline;
#data is divided into groups (In our case it is productline) and then aggregates each group independently based on aggregation function you specified.
See the below image to see how the groupby works

7.SQL Having: Having clause allows you to filter the data that has been aggregated. In where clause you cannot specify aggregate functions but in having clause you can specify aggregate functions.

Most of the time Having clause is used with the group by clause. Having can be used without group by clause, in an aggregate function, in that case it behaves like where clause.

#SQL HavingSELECT  productline, sum(quantityInStock) FROM products group by productline having sum(quantityInStock)>60000;
#Grouped data is filtered using having clause
See the below image to see how the having clause works

8.SQL Order By: Order by clause allows you to reorder your results in ascending or descending order based on the data in one or more columns. By default orders in ascending order

#SQL order by1. SELECT * FROM payments ORDER BY amount;
#sorts result in ascending order
2. SELECT * FROM offices ORDER BY City Desc;
#sorts result in descending order by desc keyword
3. SELECT * FROM employees ORDER BY firstName, employeeNumber desc;
#first sorts the result by firstName in ascending order (A to Z) and then by employeeNumber in descending order within those already sorted records
To understand sorting data by two columns(3rd example), see below image.

9. SQL Limit: Limit allows you to specify the number of records to return

#SQL limitSELECT * from orders limit 5;
#returns first 5 records

10. Joins: Before we discuss the concept of joins, I think it is important that we first distinguish the difference between a primary key and a foreign key.

The primary key is a unique identifier of the table. It must contain unique values and cannot have null values. Each table can have only one primary key. The table containing the primary key is called the parent table.

The foreign key is used to link two tables together. It is a column in one table that refers to the primary key in another table. Each table can have multiple foreign keys. The table containing the foreign key is called the child table.

Joins is used to combine rows from two or more tables based on the related column between them (Primary key and Foreign key)

Types of joins:

  • Inner Join: Returns records that have matching values in both tables
  • Left Join: Returns all records from the left table, and the matched records from the right table
  • Right Join: Returns all records from the right table, and the matched records from the left table
  • Outer Join: Returns all records when there is a match in either left or right table

The most common type of join is left join

customerNumber in the customers table is a primary key and customerNumber in the orders table is a foreign key. (See ER diagram here)

SELECT c.customerName, c.phone, c.city, o.orderDate, o.status 
FROM customers AS c
LEFT JOIN
orders AS o
ON c.customerNumber = o.customerNumber;
SQL Joins

Let me wrap up the select command with the full syntax. The below sequence is important while writing queries.

#Syntax
SELECT column1, column2 .... column(n)
FROM table_name
WHERE condition1, condition2 .... condition(n)
GROUP BY column1, column2 .... column(n)
HAVING condition1, condition2 .... condition(n)
ORDER BY column1, column2 .... column(n)
LIMIT number;
#example
SELECT productLine, sum(QuantityInStock)
FROM products
WHERE buyPrice>50
GROUP BY productLine
HAVING sum(QuantityInStock)>50000

To download the SQL script follow this link

SQL techniques learned through this post are slicing, sorting, grouping, filtering, counting, unique, aggregating data. All these techniques are useful while performing data analysis.

I hope you have gained a better understanding of what SQL is, its significance in data analytics but more importantly, realize that it is actually relatively easy to learn once you get hold of the fundamentals.

Until my next article, feel free to check out my other content here.

--

--