SQL Training

Purpose & Definitions

Purpose

The goal of this content is to provide A good foundational understanding of Oracle SQL for querying databases to get you at a level to perform analysis on data.

It won't cover management of data within databases, or any way to automate your scripts and code, that is beyond the scope of this content.

Why Use a Database?

What can Databases do? What can't they do?

Our databases are designed to work with relational data – meaning they are great and efficient for bringing data together and combining it.

Databases are an effective way to store and share information, to make sure everyone has the same and up-to-date versions.

Database tables can have millions of rows, more than excel can handle!

Databases are fast at working with large volumes of data.

The same script can be used to bring back the same output every time, no more having to manually work with the data each time!

Make graphs (at least not that look anywhere near good)

Drag and drop interface (again, technically yes but not recommended)

Many things that R/Python is better for

How to access a database?

SQL Developer

A free software produced by Oracle it has a basic and clean interface but more complex features for power users.

Toad

An alternative SQL IDE, a license is required for usage.

Many more!

SQL is just instructions that are sent to the database, so you can edit that plain text code in any text editor, and many tools offer database connections, including VSCode, RStudio, and many more!

Some terminology...

What is Google?

In our context it is a search engine

Used to find all sorts of information and help

Learning how to Google something is a very important skill!

What is SQL?

Structured Query Language

A language used to access and manipulate information stored in a database

Technically it is used to interact with the database, not just the data stored within it

A set of keywords, rules, and inputs that produce an action as an output.

What is Oracle SQL?

SQL has many different variants, like colloquialisms or accents

Most of the words and rules are the same, but some bits may be slightly different. Somethings will work in one variant, but not another or at least have to be done a different way

Oracle SQL is the database product and variant sold and supported by Oracle. Oracle SQL has traditionally been the main industry supplier of large business databases, but this is changing in favour of the big 3 cloud vendors

There are other differences such as speed, costs, support, and much more that goes into choosing a database variant

Other key players in the market are MySQL, SQLServer (Microsoft), SQLite, and many new cloud based solutions such as Google’s BigQuery

What is Oracle SQL?

The important thing is when you google something about SQL, always include the word “Oracle” in your search

NEVER use the Oracle site for help, it’s designed for database experts and is useless to 99% of people

What is a database?

A data storage solution, traditionally represented by 3 disks on top of each other, representing different harddisks, designed to be optimised for certain operations.

We will focus on relational databases that are organised into tables with rows and columns

Within the database there are Schemas, small workspaces for a user or set of users

A Schema contains tables (and other things)

A Table contains records (rows) and variables (columns)

In Excel terms, a database if a folder on your laptop, the schema is an excel file (workbook), a table is a sheet, and the rows and columns are rows and columns

Tables within a schema are only accessible to that schema’s users, unless rights are granted to other schemas to access them

What is a database?

Display of schema structure in a datbase

What is a query?

A single complete piece of SQL code

It must have a certain minimum level of detail (e.g. contain reference to at least one table and a list of columns to return)

Returns a table

Are executed one at a time

Many queries can be stored in a single .sql file, or script

Are a subset of statements, these however include code that will create tables, or update information i.e. perform an action that doesn’t have to return a table

What is a table?

It is the foundation of relational databases, a collection of records (rows) and variables (columns)

All entried in a given column are of the same type

Tables are stored within a schema and have a name – usually the name has no spaces, doesn’t start with a number, and is in all caps (but not always because some people are sadists)

Tables are the output/return type of a query or SELECT statement

Can have 0 rows, must have at least 1 column

What is a data type?

The type of a data column determines what it can contain and what operations/functions you can perform on it

Numeric objects in Oracle are usually either integer (whole number) or number (decimal number). Databases are usually pretty good at automatically converting integers to decimal numbers if required

What is a data type?

The type of a data column determines what it can contain and what operations/functions you can perform on it

Character objects in Oracle are usually either char/nchar (a fixed length, padded with usually a space) or varchar2/nvarchar2 (a variable length). These are also referred to as strings, as in a string of characters.

Strings as data in Oracle are written using single quotes ‘string’ Strings as database objects (e.g. tables, column names etc) in Oracle are written using double quotes “table_name”

What is a data type?

The type of a data column determines what it can contain and what operations/functions you can perform on it

Date objects in Oracle are usually stored as date, sometimes timestamp. In Oracle ALL date data has a time component – there is no data type without a time component – you can set it to zero but it’s still there. Date formatting is also based on local settings so there are special functions to interact with dates, but we’ll see more about this later on…

What is a data type?

The type of a data column determines what it can contain and what operations/functions you can perform on it

Boolean objects do not exist as a data type for a variable in Oracle; these are TRUE and FALSE values so you must store any column for this as either a string or a number (1/0)

What is a function?

Generally, a function is something that takes an input (arguments), does a calculation/computation, and in all our cases will return an output

For the purposes of this content, functions will usually take a variable (column) and potentially some additional information as arguments

For the purposes of this content, functions will usually return a variable (column) as an output, with the same number of rows as the input

Many functions will look like function_name(arg1, arg2, …, argn) but not all as we will see…

Part 0:

Comments,

Syntax,

& Error messages

Part 0:

By the end of this section you will be able to:

Explain the use of comments in code

Explain the use of whitespace and case in SQL

Explain the benefit of error messages

Explain what a null value is

Comments

A comment is the most important thing in any language.

They let you write notes in and around your code that the computer ignores.

Comments

A comment is the most important thing in any language.

They let you write notes in and around your code that the computer ignores.


							-- Comments in SQL are done like this

Comments

A comment is the most important thing in any language.

They let you write notes in and around your code that the computer ignores.


							-- Comments in SQL are done like this
							/* Or multi-line comments
								are done like this */

Comments

A comment is the most important thing in any language.

They let you write notes in and around your code that the computer ignores.


							-- Comments in SQL are done like this
							/* Or multi-line comments
								are done like this */
							select * from table; -- you can comment at the end of a line
							select * /* or even in the middle*/ table;

Syntax

Keywords in SQL are case insensitive so you can type them however you like.

Syntax

Keywords in SQL are case insensitive so you can type them however you like.


							SELECT ORIGIN, DEST FROM FLIGHTS;

Syntax

Keywords in SQL are case insensitive so you can type them however you like.

Whiteshape is also ignored in SQL, so you can layout your code however you like.


							SELECT ORIGIN, DEST FROM FLIGHTS;

Syntax

Keywords in SQL are case insensitive so you can type them however you like.

Whiteshape is also ignored in SQL, so you can layout your code however you like.


							SELECT ORIGIN, DEST FROM FLIGHTS;

							select 
								origin, 
								dest 
							from 
								flights;

Syntax

However, being consistent and making your code easy to read is important.


							SELECT ORIGIN, DEST FROM FLIGHTS;

							select 
								origin, 
								dest 
							from 
								flights;

Syntax

However, being consistent and making your code easy to read is important.


							/* Three equivalent queries despite case and whitespace */
							SELECT ORIGIN, DEST FROM FLIGHTS;

							select 
								origin, 
								dest 
							from 
								flights;

							SeLeCt      OrIgIn,DeSt        fRoM      fLiGhTs;

Error Messages

Error messages try to tell you what you've done wrong in your code when it doesn't work.

Learning to read them and understand what they mean is invaluable, and we will see some as we go through.

Nulls

Finally, empty values in SQL have a special type call null.

This isn't an empty string or 0 or NA, it is a special thing all by itself.

Our Data

Part 1:

Variables

Selecting and creating them

Part 1:

By the end of this section you will be able to:

Return all or some columns from a table

Create and return your own columns based on functions

Create columns with conditional values

Alias columns and tables

Use common string functions

Variables

When you write a query, you are asking the database to produce you a table. This table need to have columns, so we need to tell the database which variables we need.

Some of these variables might already exist, some might need to be created - we'll learn how to do both.

Variables

All SQL queries take the following minimum form.


							SELECT
								variables
							FROM
								DATABASE.SCHEMA.TABLE;

Variables

All SQL queries take the following minimum form.

Every query has a FROM keyword followed by a reference to a table, this tells the database where to look for our variables.


							SELECT
								variables
							FROM
								DATABASE.SCHEMA.TABLE;

Variables

All SQL queries take the following minimum form.

Every query has a SELECT keyword followed by list of variables; these might be existing columns in the table or ones created.


							SELECT
								variables
							FROM
								DATABASE.SCHEMA.TABLE;

Variables

All SQL queries take the following minimum form.

Every query ends with a semi-colon. This is like a full stop for the database so it knows where one query stops and the next one begins.


							SELECT
								variables
							FROM
								DATABASE.SCHEMA.TABLE;

Variables - Existing

We can select every variable from our table using the * special character.


							SELECT
								*
							FROM
								schema_name.flights;

Variables - Existing

We can select every veriable from our table using the * special character.

Or we can swap out the * and list the column names ourselves, seperated by a comma


							SELECT
								*
							FROM
								schema_name.flights;

Variables - Existing

Or we can swap out the * and list the column names ourselves, seperated by a comma


							SELECT
								year, 
								month, 
								day, 
								dep_time, 
								sched_dep_time, 
								dep_delay, 
								arr_time, 
								sched_arr_time, 
								arr_delay, 
								carrier, 
								flight, 
								tailnum, 
								origin, 
								dest, 
								air_time, 
								distance, 
								hour, 
								minute, 
								time_hour
							FROM
								schema_name.flights;

Variables - Existing

Or we can just select some of them


							SELECT
								year, 
								month, 
								day, 
								dep_time, 
								sched_dep_time, 
								carrier, 
								flight, 
								tailnum, 
								origin, 
								dest
							FROM
								schema_name.flights;

Live View

Run some of these queries so far and see the results.

Variables - New

We can ask the database for variables that aren't in our table, we just need to tell it what the data is. In this case, that data is the value 123.


							SELECT
								year, 
								month, 
								day, 
								dep_time, 
								sched_dep_time, 
								carrier, 
								flight, 
								tailnum, 
								origin, 
								dest,
								123
							FROM
								schema_name.flights;

Note that we had to add a comma to the end of the list before our new variable.

Variables - New

It would be nice to give this variable a name, we can do that using the as keyword and call it whatever we like (within reason). This is known as aliasing.


							SELECT
								year, 
								month, 
								day, 
								dep_time, 
								sched_dep_time, 
								carrier, 
								flight, 
								tailnum, 
								origin, 
								dest,
								123 as new_column
							FROM
								schema_name.flights;

Column names should start with a letter, have only _ as punctuation and be less than 30 characters long. Some of these rules can but should not be broken.

Variables - New

Technically we don't need to use as but for column names it is good practice.


							SELECT
								year, 
								month, 
								day, 
								dep_time, 
								sched_dep_time, 
								carrier, 
								flight, 
								tailnum, 
								origin, 
								dest,
								123 new_column
							FROM
								schema_name.flights;

We'll see in a second you can alias a table as well, but you cannot use the as word there!

Live View

Run some of these queries so far and see the results.

Variables - New

NOTE that we can't mix * and extra variables.


							-- This code will error
							SELECT 
								*,
								123 as new_column
							FROM
								schema_name.flights;

Variables - New

NOTE that we can't mix * and extra variables.

This is because * doesn't mean everything from our table, it means EVERYTHING, and we can't ask for more than everything (it's like taking infinity+1).


							-- This code will error
							SELECT 
								*,
								123 as new_column
							FROM
								schema_name.flights;

Variables - New

What we need to do it tell the database we want everything from the table and note the universe. So first, it's important to understand the following are the same query, one is shorthand for the other.


							SELECT 
								tailnum
							FROM
								schema_name.flights;

							SELECT 
								schema_name.flights.tailnum
							FROM
								schema_name.flights;

Variables - New

So if we want everything from the table, we could write the following, and this will work, but is pretty annoying to type.


							SELECT 
								schema_name.flights.*,
								123 as new_column
							FROM
								schema_name.flights;

Variables - New

So if we want everything from the table, we could write the following, and this will work, but is pretty annoying to type.

What if the same way we aliased columns we alias our table? Let's call it a


							SELECT 
								a.*,
								123 as new_column
							FROM
								schema_name.flights a;

Variables - New

SIDENOTE: You can break some of those column name rules using double quotes ("), these are used when you talk about database objects (tables, columns etc). Use single quotes (') for text data in the database.


							SELECT 
								a.*,
								123 as "new column"
							FROM
								schema_name.flights a;

Live View

Run some of these queries so far and see the results.

Exercise time!

Attempt questions 1-4.

Variables - Functions

Having a column with all the same value isn't really that useful.

Let's create create some variables using existing ones next.

From now on for simplicity we will only show the relevant parts of the code, not all the variables as shown in the training file.

Variables - Functions

We can start by doing some basic maths on a numeric column.


							SELECT 
								month,
								month + 1 AS next_month, 
								...
							FROM
								schema_name.flights;

Variables - Functions

We can start by doing some basic maths on a numeric column.

We don't even have to keep the original column.


							SELECT 
								month + 1 AS next_month, 
								...
							FROM
								schema_name.flights;

Variables - Functions

We CAN'T use our new column in another calculation (we'll see way to do this later).


							-- This code will error
							SELECT 
								month + 1 AS next_month,
								next_month + 1 AS next_next_month 
								...
							FROM
								schema_name.flights;

Variables - Functions

Now let's take a look at some string functions that will be useful for when we're working with text.


							SELECT
								carrier, 
								...
							FROM
								schema_name.flights;

Variables - Functions

lower takes a string and makes it lowercase.


							SELECT
								carrier, 
								lower(carrier),
								...
							FROM
								schema_name.flights;

Variables - Functions

upper takes a string and makes it uppercase.


							SELECT
								carrier, 
								lower(carrier),
								upper(lower(carrier)),
								...
							FROM
								schema_name.flights;

Notice how we've nested functions here, this is a useful trick.

Variables - Functions

substr takes 3 arguments, the first 2 are required and are the string to cut and the start position. The third argument is optional and is the length of the substring.


							SELECT
								carrier, 
								lower(carrier),
								upper(lower(carrier)),
								tailnum,
								substr(tailnum, 3),
								substr(tailnum, 1, 3),
								substr(tailnum, 3, 2),
								...
							FROM
								schema_name.flights;

This is like LEFT, MID, and RIGHT from Excel all in one.

Variables - Functions

length tells you have many characters are in a string.


							SELECT
								carrier, 
								lower(carrier),
								upper(lower(carrier)),
								tailnum,
								substr(tailnum, 3),
								substr(tailnum, 1, 3),
								substr(tailnum, 3, 2),
								length(tailnum)
								...
							FROM
								schema_name.flights;

Live View

Run some of these queries so far and see the results.

Exercise time!

Attempt questions 5-7.

Variables - Case Statements

Sometimes we don't want to do the same thing to every record, we might want to do different calculations based on some conditions.

We do this by something called a case statement.

Variables - Case Statements

Case statements take this form:


							CASE 
								WHEN condition THEN value_1
								WHEN condition THEN value_2
								...
								WHEN condition THEN value_n
								ELSE value_default
							END

Variables - Case Statements

Case statements take this form:

condition is some boolean statement (TRUE or FALSE test)


							CASE 
								WHEN condition THEN value_1
								WHEN condition THEN value_2
								...
								WHEN condition THEN value_n
								ELSE value_default
							END

Variables - Case Statements

Case statements take this form:

condition is some boolean statement (TRUE or FALSE test)

value is the output for record if the test is TRUE


							CASE 
								WHEN condition THEN value_1
								WHEN condition THEN value_2
								...
								WHEN condition THEN value_n
								ELSE value_default
							END

Variables - Case Statements

Case statements take this form:

condition is some boolean statement (TRUE or FALSE test)

value is the output for record if the test is TRUE

Test are evaluated in order, if the first test is true it won't check the rest.


							CASE 
								WHEN condition THEN value_1
								WHEN condition THEN value_2
								...
								WHEN condition THEN value_n
								ELSE value_default
							END

Variables - Case Statements

Case statements take this form:

condition is some boolean statement (TRUE or FALSE test)

value is the output for record if the test is TRUE

Test are evaluated in order, if the first test is true it won't check the rest.

If none are true, the ELSE condition triggers (id you don't provide one, it defaults to NULL.


							CASE 
								WHEN condition THEN value_1
								WHEN condition THEN value_2
								...
								WHEN condition THEN value_n
								ELSE value_default
							END

Variables - Case Statements

REMEMBER: every CASE must have an END and every WHEN must have a THEN.


							CASE 
								WHEN condition THEN value_1
								WHEN condition THEN value_2
								...
								WHEN condition THEN value_n
								ELSE value_default
							END

Variables - Case Statements

To write these boolean/logical tests we use comparison operators like the following:

Symbol	Comparison
=	Equal
!=, <>, ^=	Unequal (not equal)
<, <=	Less than, Less than or equal to
>, >=	Greater than, Greater than or equal to



[NOT] IN	Set membership (is the element in the set/list provided). Set provided in ('Elem1', 'Elem2', ..., 'ElemN') format


[NOT] LIKE	Pattern match (is the element like the provided pattern). Uses % as 0-Inf character wildcard, _ for single character wildcard


[NOT] BETWEEN	Shorthand for x <= y and y <= z becomes y between x and z

The [NOT] is optional

Variables - Case Statements

And we can combine them using the logical operators:

Operation	Meaning
AND
OR
NOT
()

Truth Table Quiz!

A	B	A AND B	A OR B	NOT A
TRUE	TRUE	TRUE	TRUE	FALSE
TRUE	FALSE	FALSE	TRUE	-
FALSE	TRUE	FALSE	TRUE	TRUE
FALSE	FALSE	FALSE	FALSE	-

Variables - Case Statements

And we can combine them using the logical operators:

Operation	Meaning
AND	Logical AND, both statements must be TRUE
OR	Logical OR, at least one statement must be TRUE
NOT	Logical negation, converts TRUE to FALSE and vice versa
()	Brackets, force the evaluation of the logical statement inside before any outside, i.e. (2+7)5 != 2+75

Variables - Case Statements

Let's look at some examples


							CASE 
								WHEN sched_dep_time < 1200 THEN 'AM'
								ELSE 'PM'
							END AS DEP_AM_PM,

							CASE 
								WHEN sched_dep_time BETWEEN 0 AND 0800 THEN 'Early Morning'
								WHEN sched_dep_time BETWEEN 0801 AND 1159 THEN 'Morning'
								WHEN sched_dep_time BETWEEN 1200 AND 1800 THEN 'Afternoon'
								WHEN sched_dep_time BETWEEN 1801 AND 2100 THEN 'Evening'
								WHEN sched_dep_time BETWEEN 2100 AND 2359 THEN 'Night'
							END AS DEP_TIME_DETAIL, -- Notice no ELSE

							CASE 
								WHEN length(sched_arr_time) != 4 THEN 'AM'
								WHEN substr(sched_arr_time, 1, 2) < 12 THEN 'AM'
								WHEN substr(sched_arr_time, 1, 2) > 12 THEN 'PM'
								ELSE 'Midday'
							END AS ARR_AM_PM,
							...

Variables - Case Statements

Now examples using in


							CASE
								WHEN dest in ('JFK', 'LGA', 'EWR') THEN 'NYC'
								ELSE 'Other'
							END AS dest_nyc_check_in,

							-- see how much easier it is than writing ors
							CASE
								WHEN dest = 'JFK' or dest = 'LGA' or dest = 'EWR' THEN 'NYC'
								ELSE 'Other'
							END AS dest_nyc_check_ors
							...

Notice that comparisons always have to have a left and a right hand side, even when it's the same LHS

Variables - Case Statements

String values are case sensitive


							CASE
								WHEN origin in ('JFK', 'LGA', 'EWR') THEN 'NYC'
								ELSE 'Other'
							END AS origin_nyc_check1,

							CASE
								WHEN lower(origin) in ('JFK', 'LGA', 'EWR') THEN 'NYC'
								ELSE 'Other'
							END AS origin_nyc_check2,

							CASE
								WHEN origin in ('JFK', 'lga', 'ewr') THEN 'NYC'
								ELSE 'Other'
							END AS origin_nyc_check3
							...

Variables - Case Statements

And finally the like matches.


							-- Any number of character (including 0) before, and after a 7
							CASE
								WHEN tailnum LIKE '%7%' THEN 1
								ELSE 0
							END AS contains_7,

							-- Any number of characters before a capital Q
							CASE 
								WHEN tailnum LIKE '%Q'	THEN 1 
								ELSE 0
							END AS ends_in_q,

							-- A NOT acting on the output of a logical statement
							CASE 
								WHEN NOT (tailnum LIKE '%Q') THEN 1 
								ELSE 0
							END AS ends_in_not_q,

							-- A NOT LIKE, specifically any record that doesn't match the pattern, 
							-- slightly different to above
							CASE WHEN tailnum NOT LIKE 'N5_5%' THEN 1 
								ELSE 0
							END AS weird_test
							...

Live View

Run some of these queries so far and see the results.

Exercise time!

Attempt questions 8-11.

Variables - Dates

Finally let's look at dates in Oracle SQL.

Date data types include both a date and a time, there is no time only value in Oracle.

There is another type called a timestamp that contains milliseconds and timezones, most functions that work on dates also work on timestamps.

Variables - Dates

Just like in Excel, dates in Oracle are just numbers - meaning a difference of 24 hours is 1, 12 hours is 0.5 etc.

However unlike excel because they are a specific datatype there's some specific functions to work with them.

Variables - Dates

Function	Description
TO_DATE(variable, fmt)	Converts the variable (or string) into a date by giving it the format the variable is in
TRUNC(date [, fmt])	Sets the time component of the date to 00:00:00 by default, or will round down to the format if provided
sysdate	A system variable that is the current (local) date and time
NEXT_DAY(date, day)	Returns the next date of the given day, after the date provided e.g. takes a Saturday at 00:00:01 to the NEXT Saturday
TO_CHAR(variable, fmt)	Converts the variable (usually a date or a number) into the given format. This is the method to format a date in a given way NOT substr
EXTRACT(value FROM date)	Returns the given value e.g. year from the date and returns it as a number. To extract units less than a day you must use a timestamp
TO_TIMESTAMP(variable, fmt)	Converts the variable (or string) into a timestamp by giving it the format the variable is in

Variables - Dates

I'm going to use || in the code which concatenates (pastes) 2 strings together.


						SELECT 
							time_hour,
							time_hour + 1, -- add 1 day
							time_hour + 5/24/60, -- add 5 minutes
							TRUNC(time_hour), -- set time component to 0
							TRUNC(time_hour, 'MM'), -- set to first of month
							NEXT_DAY(time_hour, 'Saturday') - 1, -- Set to Friday inclusive for a week
							TO_CHAR(time_hour, 'HH24:MI DAY DD MONTH YYYY'), -- format date in a specific format
							TO_CHAR(time_hour, 'HH24:MI Day dd month yyyy'), -- notice how case changes the output
							-- convert our columns into a date
							TO_DATE(year||'-'||month||'-'||day||' '||hour||':'||minute, 'YYYY-MM-DD HH24:MI') -- convert our columns into a date
						FROM
							schema_name.flights;

Variables - Dates

Using TO_DATE every time can be annoying, so we can specify a date in another way. This is easier to read and more efficient for the database as well.


						SELECT 
							DATE '2019-01-20' AS fixed_date
						FROM
							schema_name.flights;

The format is ALWAYS YYYY-MM-DD and is a string.

Variables - Dates

You should NEVER use substr on a date because the result you gets depend on how the user has their system set up.


						-- Set a specific date format for your local session and sub-string our dates
						ALTER SESSION SET NLS_DATE_FORMAT = 'DD-MM-YYYY HH24:MI:SS';
						SELECT 
							time_hour,
							substr(time_hour, 9, 2)
						FROM
							schema_name.flights;
						-- returns 2 digit year
						
						-- Change the format and check the results
						ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-DD-MM HH24:MI:SS';
						SELECT 
							time_hour,
							substr(time_hour, 9, 2)
						FROM
							schema_name.flights;
						-- returns 2 digit month

Always use to_char instead.

Live View

Run some of these queries so far and see the results.

Exercise time!

Attempt questions 12-15.

Part 1:

In this section you learned to:

Return all or some columns from a table

Create and return your own columns based on functions

Create columns with conditional values

Alias columns and tables

Use common string functions

Don't forget to run the error message examples to see common examples of variable based errors!

Part 2:

Records

Filtering and ordering them

Part 2:

By the end of this section you will be able to:

Filter records based on conditions

Order and only return a fixed number of records

Combine record sets into a single output

Records - Filtering

We will start by learning how to filter records, we do this using a WHERE clause, which goes after the FROM clause.


					SELECT
						columns
					FROM
						tables
					WHERE
						condition
					;

The condition in this clause is the same type as those we saw in the CASE statements.

Records - Filtering

Let's take a look at some examples:


					SELECT 
						*
					FROM
						schema_name.flights
					WHERE
						arr_delay = 0
						AND dep_delay = 0
					;

Records - Filtering

Let's take a look at some examples:

Bonus tip: You can use an always true condition to make it easier to comment out individual checks when you're still developing your code


					SELECT 
						*
					FROM
						schema_name.flights
					WHERE
						1 = 1
						--AND arr_delay = 0
						AND dep_delay = 0;
					;

Records - Filtering

Let's take a look at some examples:

and of course we can use our favourite boolean logic!


						SELECT
						*
					FROM
						schema_name.flights
					WHERE
						1 = 1
						-- flights scheduled to depart in the first 10 or last 5 hours of a day
						AND (hour < 10 OR hour > 19)
						AND dep_delay = 0 -- left on time
					;

Records - Filtering

Let's take a look at some examples:

A reminder that we can use IS [NOT] NULL as a comparison. Useful when we want to check how many records have missing (or not) data


					SELECT
						*
					FROM
						schema_name.flights
					WHERE
						dep_delay IS NULL
					;

Records - Filtering

Just like before, we can't use a renamed column in a WHERE clause.


					-- Both of these will error
					SELECT 
						TRUNC(time_hour) AS flight_day
					FROM
						schema_name.flights
					WHERE
						flight_day > DATE '2013-06-01';
					
					SELECT 
						tailnum AS tailnumber
					FROM
						schema_name.flights
					WHERE
						tailnumber LIKE 'N%';

Records - Filtering

Just like before, we can't use a renamed column in a WHERE clause.

The reason for this is because the WHERE clause runs before the SELECT does (even though it's written after).

Records - Filtering

An upside of this is that you can filter on a column that you don't actually output.


					SELECT 
						tailnum 
					FROM
						schema_name.flights
					WHERE
						origin = 'JFK'; -- only flights out of JFK airport

Live View

Run some of these queries so far and see the results.

Exercise time!

Attempt questions 16-19.

Records - Ordering

Next let's look at how we can order our data

Ordering data is one of the most inefficient things a computer can do and databases are not designed to be fast at it, try to avoid ordering unless you have to, only do it once, or consider doing it in some other tool after exporting the data


					SELECT
						columns
					FROM
						tables
					WHERE
						condition
					ORDER BY 
						columns;

Records - Ordering

We can order by a single column


					SELECT 
						*
					FROM
						schema_name.flights
					WHERE 
						dep_delay > 0
					ORDER BY 
						dep_delay;

Records - Ordering

Or multiple columns for when there is a tie


					SELECT 
						*
					FROM
						schema_name.flights
					WHERE 
						dep_delay > 0
					ORDER BY 
					-- sort by departure delay THEN arrival delay if there is a tie.
						dep_delay, 
						arr_delay;

Records - Ordering

By default records are ordered in ascending order


					SELECT 
						*
					FROM
						schema_name.flights
					WHERE 
						dep_delay > 0
					ORDER BY 
					-- Least delayed departing, ties are sorted by least delayed when arriving.
						dep_delay, 
						arr_delay;

Records - Ordering

By default records are ordered in ascending order

but we can change that


					SELECT 
						*
					FROM
						schema_name.flights
					WHERE 
						dep_delay > 0
					ORDER BY 
					-- MOST delayed departing, ties are sorted by least delayed when arriving.
						dep_delay DESC, 
						arr_delay;

Records - Ordering

By default records are ordered in ascending order

but we can change that

and we have to specify it for each column.


					SELECT 
						*
					FROM
						schema_name.flights
					WHERE 
						dep_delay > 0
					ORDER BY 
					-- MOST delayed departing, ties are sorted by MOST delayed when arriving.
						dep_delay DESC, 
						arr_delay DESC;

Records - Ordering

Finally, we can be a bit lazy and because the ORDER BY runs after the SELECT we can actually just use numbers instead of column names.

This is also why we can use column aliases in the ORDER BY and can't use columns we didn't SELECT


					SELECT 
						flight,
						dep_delay,
						arr_delay
					FROM
						schema_name.flights
					WHERE 
						dep_delay > 0
					ORDER BY 
						2 desc, 
						3;

Generally this is a bad idea, it's hard to read and if you change the order in your SELECT it will break.

Records - Distinct

It's rare that you want duplicates in your data. Usually it means you've done a join (next section) wrong, or your source data has an issue, or you removed too many variables.

If you are comfortable as to why you have duplicates but want to remove them you can use a SELECT DISTINCT to just get unique records.


					SELECT DISTINCT 
						origin,
						dest
					FROM
						schema_name.flights;

DISTINCT acts on all selected columns, not just the first one.

Live View

Run some of these queries so far and see the results.

Exercise time!

Attempt questions 20-21.

Records - Combining

Sometimes we want to combine our records in one way or another, if they are from the same table there is often a clever way to do this using ORs and CASE statements, but not always. Other times we might want to:

Combine records (with or without checking for duplicates) and return them all (UNION [ALL])

Find records that exist in both record sets and just return those (INTERSECT)

Find records that exist in one set, but not in another and return those from the first (MINUS)

Records - Combining

Records - Combining

We'll use the airlines table with some specific filters to illustrate these concepts


						/* UNION records without duplicates */
						SELECT 
							*
						FROM 
							schema_name.airlines
						WHERE
							carrier in ('9E', 'AA', 'AS')
						UNION
						SELECT 
							*
						FROM 
							schema_name.airlines
						WHERE
							carrier in ('AS', 'B6', 'DL');
						
						
						/* UNION ALL records with duplicates */
						SELECT 
							*
						FROM 
							schema_name.airlines
						WHERE
							carrier in ('9E', 'AA', 'AS')
						UNION ALL
						SELECT 
							*
						FROM 
							schema_name.airlines
						WHERE
							carrier in ('AS', 'B6', 'DL');
						
						
						/* Find INTERESECTion of records */
						SELECT 
							*
						FROM 
							schema_name.airlines
						WHERE
							carrier in ('9E', 'AA', 'AS')
						INTERSECT
						SELECT 
							*
						FROM 
							schema_name.airlines
						WHERE
							carrier in ('AS', 'B6', 'DL');
						
						
						/* MINUS any matching records from the second table from the first */
						SELECT 
							*
						FROM 
							schema_name.airlines
						WHERE
							carrier in ('9E', 'AA', 'AS')
						MINUS
						SELECT 
							*
						FROM 
							schema_name.airlines
						WHERE
							carrier in ('AS', 'B6', 'DL');

Records - Fetch

Though it is very rare, sometimes you might only want to return a specific number of rows.

For this we use the keyword FETCH and it's associated terms.

FETCH is the very last thing a database will run as it has to complete ordering first, so it would be any quicker to just return 5 rows instead of 500


						SELECT
							columns
						FROM
							tables
						WHERE
							condition
						ORDER BY 
							columns
						[OFFSET number ROWS]
						FETCH NEXT number [PERCENT] ROWS ONLY/WITH TIES;

Records - Fetch

Though it is very rare, sometimes you might only want to return a specific number of rows.

For this we use the keyword FETCH and it's associated terms.


						SELECT 
							*
						FROM
							schema_name.flights
						ORDER BY 
							distance desc
						OFFSET 5 ROWS
						FETCH NEXT 5 ROWS WITH TIES;

This will return more than 5 records because of the option with ties.

Records - Fetch

Though it is very rare, sometimes you might only want to return a specific number of rows.

For this we use the keyword FETCH and it's associated terms.


						SELECT 
							*
						FROM
							schema_name.flights
						ORDER BY 
							distance desc
						OFFSET 5 ROWS
						FETCH NEXT 5 ROWS ONLY;

Live View

Run some of these queries so far and see the results.

Exercise time!

Attempt questions 22-24

Part 2:

In this section you learned to:

Filter records based on conditions

Order and only return a fixed number of records

Combine record sets into a single output

Don't forget to run the error message examples to see common examples of variable based errors!

Part 3:

Tables

Reusing and joinging them

Part 3:

By the end of this section you will be able to:

Use the dummy table for testing queries

Use the table output by one query in another

Join tables together

Tables - Dual

Sometimes we want to test our query or a function and don't want to query a whole table to do it.

To do this we can use a special table called dual which has a single row and a single column, with a value of 'X'.


						SELECT
							* 
						FROM
							dual;

Tables - Dual

One good use for this is often to test out some date or string functions on dummy data to make sure you get it right


					SELECT
						--Some random date manipulation as a test
						TRUNC(SYSDATE) - 5 + 1/24 + 14/24/60, 
						-- should return Friday
						NEXT_DAY(TRUNC(SYSDATE), 'SAT') - 1 AS week_end_date 
					FROM
						dual;

Tables - Reusing

We've seen multiple times that we can't use a column we create/alias in a query within the same query (except in ordering). This was because of the order the query actualy runs in vs how we see it written.

But we know that a query returns a table, so we could use that table as the input to another query!


					SELECT
						columns
					FROM
						table
					WHERE
						condition;

Tables - Reusing

But we know that a query returns a table, so we could use that table as the input to another query!


					SELECT
						columns
					FROM
						(SELECT
							columns
						FROM
							tables
						WHERE
							condition
						)
					WHERE
						condition;

Tables - Reusing

This is known as a nested query because the inner query is nested entirely within the outer one.

We can do this as many times as we like, but the database does have to complete an inner query before it can run the outer one, so filter as early (more nested) as possible, and order as late as possible.


					SELECT
						columns
					FROM
						(SELECT
							columns
						FROM
							tables
						WHERE
							condition
						)
					WHERE
						condition;

Tables - Reusing

We need a way to refer to the attributes in the output of this inner query, we do that by aliasing the inner query - just like we would a table.


					SELECT
						b.*
					FROM
						(
						SELECT
							a.*,
							distance * 1.60934 AS distance_km
						FROM
							schema_name.flights a
						) b
					WHERE
						distance_km > 1000;

People often use a/b/c etc. for aliases - for short code this is fine but for longer code try to use a short but descriptive alias for each table instead.

Tables - Reusing

Nested queries are great when the inner query is short and you only need to use it once. But what about if you want to use that inner query in multiple places, or the nesting gets very deep and complex?

The solution is Common Table Expressions (CTEs) which allow you to alias a whole query and refer to it at any later point in your code without having to write it out in full every time. These use the WITH keyboard


						WITH alias AS (QUERY),

						alias2 AS (QUERY),

						alias3 AS (QUERY)
						
						SELECT
							columns
						FROM
							tables;

Tables - Reusing

The database will optimise if it needs to reuse a table and will temporarily store it vs just running it once when needed - CTEs can be more efficient than nested queries, not just easier to read!


						WITH km_table as 
						(
							SELECT
								a.*,
								distance * 1.60934 AS distance_km
							FROM
								schema_name.flights a
						)

						SELECT
							*
						FROM
							km_table
						WHERE
							distance_km > 1000
						UNION
						SELECT
							*
						FROM
							km_table
						WHERE
							distance_km < 1000;

Live View

Run some of these queries so far and see the results.

Exercise time!

Attempt questions 25-28

Tables - Reusing

Finally we can keep a table for a longer period by storing it permanently in the database. To do this we use the CREATE TABLE statement.

Remember that a statement is different to a a query - a query reads the database and returns a table, a statement can interact with the database in other ways.


						-- Nologging optionally turns off logs for the table creation
						CREATE TABLE your_table_name [NOLOGGING] AS
							query;

It is possible to use CREATE TABLE to create tables without a query by just specifying column names and types, but that is beyond the scope of this training.

Tables - Reusing

Once created you can query the table like any other. Because you created it in your schema, you don't have to specify the schema name.

Remember only you can see this table, to share it with others you have to grant them access.


						CREATE TABLE flights_km NOLOGGING AS
						SELECT
							a.*,
							distance * 1.60934 AS distance_km
						FROM
							schema_name.flights a;

						SELECT
							*
						FROM 
							flights_km;

Tables - Reusing

Once we're done we want to DROP the table to delete it from the database.


						DROP TABLE your_table_name [PURGE];

PURGE removes it from your recycling bin, you can leave it out but make sure to empty the bin in the case!

Tables - Reusing


						DROP TABLE flights_km PURGE;

						SELECT
							*
						FROM 
							flights_km; -- the table no longer exist.

Tables - Reusing

There are plenty of other statements in SQL that interact with tables, you can use TRUNCATE to empty a table, INSERT to add new records, DELETE to remove specific records, and UPDATE to alter existing records.

However, these are beyond the scope of this training.

Live View

Run some of these queries so far and see the results.

Exercise time!

Attempt questions 29

Tables - Joining

We now come to one of the most powerful parts of using a relational databases, joining tables.

Joins take place in the FROM part of the query and have the following form:


						SELECT
							columns
						FROM
							table1
							join_type JOIN table2
							ON condition;

Tables - Joining

There are 4 types of joins you can do:

Join type	Description
INNER JOIN	Only return records where they exist in both tables
LEFT/RIGHT JOIN	Return all records in the left/right table and any match in the right/left one if it exists
FULL OUTER JOIN	Return all records in both tables and any match if it exists
CROSS JOIN	Return all combinations of rows across both tables (the Cartesian product of the rows). Not often useful.

Note that CROSS JOIN does not have an ON condition.

Tables - Joining

There are 4 types of joins you can do:

Tables - Joining

WARNING:

When you specify a condition in your join that defines where a match is returned to the record or not, when you specify it in the WHERE this specifies whether the whole record is returned or not.

In the case of an inner join these are the same thing, but for other joins they are not.

First define your join logic, then define your filter criteria separately.

Tables - Joining

For the join examples we're going to use a slim version of the planes table so we can see how each join behaves; let's create that first.


						CREATE TABLE planes_short NOLOGGING AS
						SELECT
							* 
						FROM 
							schema_name.planes
						WHERE
							tailnum = 'N108UW';

Tables - Joining

First let's look at an inner join, where only records that find a match are returned


						SELECT 
							a.*,
							b.*
						FROM 
							schema_name.flights a
							INNER JOIN planes_short b
							/* In this case the columns had the same names, 
							but that won't always (in fact will rarely) be the case */
							ON a.tailnum = b.tailnum;

Tables - Joining

Next the left join, where all records from the first table are returned, plus any matching data from the second


						SELECT 
							a.*,
							b.*
						FROM 
							schema_name.flights a
							LEFT JOIN planes_short b
							ON a.tailnum = b.tailnum;

Tables - Joining

And a right join, just the other way round to the left.


						SELECT 
							a.*,
							b.*
						FROM 
							schema_name.flights a
							RIGHT JOIN planes_short b
							ON a.tailnum = b.tailnum;

Despire the planes_short table having 1 row per plane, because of the join we now have multiple - we'll talk about this in a bit!

Tables - Joining

And a full outer join - we'll use a dummy table to create a fake plane to show that it returns all records from both tables regardless of if they match.


						SELECT 
							a.*,
							b.*
						FROM 
							schema_name.flights a
							FULL OUTER JOIN (SELECT 'FAKETAIL' AS tailnum_2 FROM dual) b
						ON a.tailnum = b.tailnum_2
						-- Defaults to nulls first for descending.
						ORDER BY a.flight DESC;

Tables - Joining

Lastly the cross join - the least used of them all. We'll use 2 tables with 3 rows each to show we get all matches back in 9 rows (3x3).


						SELECT
							*
						FROM 
							(SELECT '1' AS num FROM DUAL
								UNION
							SELECT '2' AS num FROM DUAL
								UNION
							SELECT '3' AS num FROM DUAL) A
						CROSS JOIN
							(SELECT 'A' AS let FROM DUAL
								UNION
							SELECT 'B' AS let FROM DUAL
								UNION
							SELECT 'C' AS let FROM DUAL) B;

Tables - Joining

A note about join conditions

Joins are the place you are most likely to make a mistake that leads to duplicates in your data and going from millions of records to billions.

The reason is if your join is not unique on at least one of the tables (i.e. the condition you give can be used to define each record and distinguish it from others) then each record in both tables will join to multiple in the other, getting out of hand very fast.

Always ask yourself; does my join link 1 row to 1/many rows in my other table? If the answer is no, you need to rethink your logic.

Live View

Run some of these queries so far and see the results.

Exercise time!

Attempt questions 30-32

Part 3:

In this section you learned to:

Use the dummy table for testing queries

Use the table output by one query in another

Join tables together

Don't forget to run the error message examples to see common examples of variable based errors!

Part 4:

Aggregation

Grouping and window functions

Part 4:

By the end of this section you will be able to:

Summarise data in defined groups

Add aggregated information to existing tables

Aggregation - Grouping

Often we want to get aggregate information of our data and group it by values in another column - this is similar to what we do in PivotTables in Excel.

To do this we use the GROUP BY clause and summary functions. This goes after WHERE but before ORDER BY.


						SELECT
							group_columns,
							summary_columns
						FROM
							tables
						WHERE
							condition
						GROUP BY 
							group_columns
						ORDER BY 
							columns;

Any non-summarised variable must be listed in your GROUP BY otherwise it will error. You can group by something you don't select but it's a bad idea.

Aggregation - Grouping

There are plenty of aggergation functions and the most common are listed below:

Agg type	Description
COUNT(*)	The total number of records per group
COUNT(column)	The total number of records that are non-null in that variable
SUM(column)	The sum of that variable, must be numeric
AVG(column)	The mean of that variable, must be numeric
MAX(column)	The maximum of that variable
MIN(column)	The minimum of that variable

For a full list visit the Oracle site.

Aggregation - Grouping

Let's look at information about flights out of each airport


						SELECT
							origin,
							COUNT(*) AS num_flights,
							COUNT(dep_time) as num_departed_flights, -- doesn't count records with a null departure time
							SUM(air_time) AS total_airtime,
							AVG(air_time) AS mean_airtime,
							MAX(dep_delay) AS latest_dep,
							MIN(dep_delay) AS earliest_dep
						FROM 
							schema_name.flights
						GROUP BY 
							origin;

Aggregation - Grouping

Let's look at information about flights out of each airport

AND by each destination airport.


						SELECT
							origin,
							dest,
							COUNT(*) AS num_flights,
							COUNT(dep_time) as num_departed_flights,
							SUM(air_time) AS total_airtime,
							AVG(air_time) AS mean_airtime,
							MAX(dep_delay) AS latest_dep,
							MIN(dep_delay) AS earliest_dep
						FROM 
							schema_name.flights
						GROUP BY 
							origin,
							dest;

Aggregation - Grouping

We don't have to group by everything if we want details about the whole dataset.


						SELECT 
							COUNT(*) AS num_flights,
							MAX(dep_delay) AS latest_dep,
							MIN(dep_delay) AS earliest_dep
						FROM
							schema_name.flights;

Aggregation - Grouping

These aggergations becomes even more powerful when combined with a CASE statement.


						SELECT 
							origin,
							SUM( -- SUM uses 1 and 0 as the values
								CASE 
									WHEN dep_delay = 0 AND arr_delay = 0 
										THEN 1 
									ELSE 0 
								END 
								) AS perfect_flights,
							COUNT( -- COUNT uses anything and null as the values
								CASE 
									WHEN dep_delay = 0 AND arr_delay = 0 
										THEN 1 
									ELSE null 
								END 
								) AS perfect_flights2 
						FROM
							schema_name.flights
						GROUP BY
							origin;

Aggregation - Grouping

You can filter on the results on an aggretation by using the HAVING keyword (as a WHERE filters records before they are aggregated).


						SELECT
							tailnum,
							COUNT(*)
						FROM
							schema_name.flights
						GROUP BY
							tailnum
						HAVING
						-- only planes that had at least 50 flights in 2013
							COUNT(*) > 50;

Live View

Run some of these queries so far and see the results.

Exercise time!

Attempt questions 33-34

Aggregation - Windowing

There are times we want to add aggregate information to our data but keep it at the record level. We could group and then join back on, but this is inefficient.

Instead we can use something called WINDOW FUNCTIONS (also known as ANALYTICAL FUNCTIONS).

Pretty much all of them have the same format:


						WINDOW_FUNCTION(column) OVER (PARTITION BY columns ORDER BY columns)

Not all of the functions have an argument to the main function, or an ORDER BY clause, but most do.

The PARTITION BY defines your group for that record, and the ORDER BY defines the order within that group.

Aggregation - Windowing

Most GROUP BY functions have an equivalent window function, but there are a few extras as well:

Window Function	Description
ROW_NUMBER()	Numbers records in group from 1 to N (number in group) based on order by
RANK(column)	Rank value within that group, same value for ties (with gaps)
DENSE_RANK(column)	Rank value within that group, same value for ties (with no gaps)
FIRST/LAST_VALUE(column)	The first/last value within that group.
LEAD/LAG(column)	The next/previous value within that group.

For a full list visit the Oracle site. Not covered here is a few extra (optional) arguments to these functions that let you do the calculation to only some range of the rows (e.g. only previous rows for cumulative values, or the previous and next X records for rolling values etc.) For more info see the documentation.

Aggregation - Windowing

Let's look at an example.


						SELECT
							a.*,
							ROW_NUMBER() OVER (
								PARTITION BY year, month, day, carrier 
								ORDER BY sched_dep_time
								) AS day_carrier_flight,
							COUNT(*) OVER (
								PARTITION BY year, month, day, carrier
								) AS day_carrier_flight_total,
							AVG(dep_delay) OVER (
								PARTITION BY year, month, day, carrier
								) AS day_carrier_avg_dep_delay
						FROM
							schema_name.flights a;

Live View

Run some of these queries so far and see the results.

Exercise time!

Attempt questions 35-36

Part 4:

In this section you learned to:

Summarise data in defined groups

Add aggregated information to existing tables

Don't forget to run the error message examples to see common examples of variable based errors!

Order of writing vs execution.

We have seen that often the order we write our code in is not the same order that the database runs it in. Let's quickly review the order of writing a full query, then compare it with how the database will (usually) execute it.

A murder mystery!?

A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a murder that occurred sometime on Jan.15, 2018 and that it took place in SQL City. Start by retrieving the corresponding crime scene report from the police department’s database.

All the clues to this mystery are buried in a huge database, and you need to use SQL to navigate through this vast network of information. Your first step to solving the mystery is to retrieve the corresponding crime scene report from the police department’s database.

Thank you, and I hope you found this useful!