SQL Training

Purpose & Definitions

Purpose

The goal of this content is to provide A good foundational understanding of Oracle SQL for querying databases to get you at a level to perform analysis on data.

It won't cover management of data within databases, or any way to automate your scripts and code, that is beyond the scope of this content.

Why Use a Database?

What can Databases do? What can't they do?

  • Our databases are designed to work with relational data – meaning they are great and efficient for bringing data together and combining it.
  • Databases are an effective way to store and share information, to make sure everyone has the same and up-to-date versions.
  • Database tables can have millions of rows, more than excel can handle!
  • Databases are fast at working with large volumes of data.
  • The same script can be used to bring back the same output every time, no more having to manually work with the data each time!
  • Make graphs (at least not that look anywhere near good)
  • Drag and drop interface (again, technically yes but not recommended)
  • Many things that R/Python is better for
  • How to access a database?

    SQL Developer

    A free software produced by Oracle it has a basic and clean interface but more complex features for power users.

    Screenshot of SQL developer

    Toad

    An alternative SQL IDE, a license is required for usage.

    Screenshot of Toad SQL IDE

    Many more!

    SQL is just instructions that are sent to the database, so you can edit that plain text code in any text editor, and many tools offer database connections, including VSCode, RStudio, and many more!

    Some terminology...

    What is Google?

  • In our context it is a search engine
  • Used to find all sorts of information and help
  • Learning how to Google something is a very important skill!
  • What is SQL?

  • Structured Query Language
  • A language used to access and manipulate information stored in a database
  • Technically it is used to interact with the database, not just the data stored within it
  • A set of keywords, rules, and inputs that produce an action as an output.
  • What is Oracle SQL?

  • SQL has many different variants, like colloquialisms or accents
  • Most of the words and rules are the same, but some bits may be slightly different. Somethings will work in one variant, but not another or at least have to be done a different way
  • Oracle SQL is the database product and variant sold and supported by Oracle. Oracle SQL has traditionally been the main industry supplier of large business databases, but this is changing in favour of the big 3 cloud vendors
  • There are other differences such as speed, costs, support, and much more that goes into choosing a database variant
  • Other key players in the market are MySQL, SQLServer (Microsoft), SQLite, and many new cloud based solutions such as Google’s BigQuery
  • What is Oracle SQL?

  • The important thing is when you google something about SQL, always include the word “Oracle” in your search
  • NEVER use the Oracle site for help, it’s designed for database experts and is useless to 99% of people
  • What is a database?

  • A data storage solution, traditionally represented by 3 disks on top of each other, representing different harddisks, designed to be optimised for certain operations.
  • We will focus on relational databases that are organised into tables with rows and columns
  • Within the database there are Schemas, small workspaces for a user or set of users
  • A Schema contains tables (and other things)
  • A Table contains records (rows) and variables (columns)
  • In Excel terms, a database if a folder on your laptop, the schema is an excel file (workbook), a table is a sheet, and the rows and columns are rows and columns
  • Tables within a schema are only accessible to that schema’s users, unless rights are granted to other schemas to access them
  • What is a database?

    Display of schema structure in a datbase

    What is a query?

  • A single complete piece of SQL code
  • It must have a certain minimum level of detail (e.g. contain reference to at least one table and a list of columns to return)
  • Returns a table
  • Are executed one at a time
  • Many queries can be stored in a single .sql file, or script
  • Are a subset of statements, these however include code that will create tables, or update information i.e. perform an action that doesn’t have to return a table
  • What is a table?

  • It is the foundation of relational databases, a collection of records (rows) and variables (columns)
  • All entried in a given column are of the same type
  • Tables are stored within a schema and have a name – usually the name has no spaces, doesn’t start with a number, and is in all caps (but not always because some people are sadists)
  • Tables are the output/return type of a query or SELECT statement
  • Can have 0 rows, must have at least 1 column
  • What is a data type?

  • The type of a data column determines what it can contain and what operations/functions you can perform on it
  • Numeric objects in Oracle are usually either integer (whole number) or number (decimal number). Databases are usually pretty good at automatically converting integers to decimal numbers if required
  • What is a data type?

  • The type of a data column determines what it can contain and what operations/functions you can perform on it
  • Character objects in Oracle are usually either char/nchar (a fixed length, padded with usually a space) or varchar2/nvarchar2 (a variable length). These are also referred to as strings, as in a string of characters.
  • Strings as data in Oracle are written using single quotes ‘string’ Strings as database objects (e.g. tables, column names etc) in Oracle are written using double quotes “table_name”

    What is a data type?

  • The type of a data column determines what it can contain and what operations/functions you can perform on it
  • Date objects in Oracle are usually stored as date, sometimes timestamp. In Oracle ALL date data has a time component – there is no data type without a time component – you can set it to zero but it’s still there. Date formatting is also based on local settings so there are special functions to interact with dates, but we’ll see more about this later on…
  • What is a data type?

  • The type of a data column determines what it can contain and what operations/functions you can perform on it
  • Boolean objects do not exist as a data type for a variable in Oracle; these are TRUE and FALSE values so you must store any column for this as either a string or a number (1/0)
  • What is a function?

  • Generally, a function is something that takes an input (arguments), does a calculation/computation, and in all our cases will return an output
  • For the purposes of this content, functions will usually take a variable (column) and potentially some additional information as arguments
  • For the purposes of this content, functions will usually return a variable (column) as an output, with the same number of rows as the input
  • Many functions will look like function_name(arg1, arg2, …, argn) but not all as we will see…
  • Part 0:

    Comments,

    Syntax,

    & Error messages

    Part 0:

    By the end of this section you will be able to:

  • Explain the use of comments in code
  • Explain the use of whitespace and case in SQL
  • Explain the benefit of error messages
  • Explain what a null value is
  • Comments

    A comment is the most important thing in any language.

    They let you write notes in and around your code that the computer ignores.

    Comments

    A comment is the most important thing in any language.

    They let you write notes in and around your code that the computer ignores.

    
    							-- Comments in SQL are done like this
    						

    Comments

    A comment is the most important thing in any language.

    They let you write notes in and around your code that the computer ignores.

    
    							-- Comments in SQL are done like this
    							/* Or multi-line comments
    								are done like this */
    						

    Comments

    A comment is the most important thing in any language.

    They let you write notes in and around your code that the computer ignores.

    
    							-- Comments in SQL are done like this
    							/* Or multi-line comments
    								are done like this */
    							select * from table; -- you can comment at the end of a line
    							select * /* or even in the middle*/ table;
    						

    Syntax

    Keywords in SQL are case insensitive so you can type them however you like.

    Syntax

    Keywords in SQL are case insensitive so you can type them however you like.

    
    							SELECT ORIGIN, DEST FROM FLIGHTS;
    						

    Syntax

    Keywords in SQL are case insensitive so you can type them however you like.

    Whiteshape is also ignored in SQL, so you can layout your code however you like.

    
    							SELECT ORIGIN, DEST FROM FLIGHTS;
    						

    Syntax

    Keywords in SQL are case insensitive so you can type them however you like.

    Whiteshape is also ignored in SQL, so you can layout your code however you like.

    
    							SELECT ORIGIN, DEST FROM FLIGHTS;
    
    							select 
    								origin, 
    								dest 
    							from 
    								flights;
    						

    Syntax

    However, being consistent and making your code easy to read is important.

    
    							SELECT ORIGIN, DEST FROM FLIGHTS;
    
    							select 
    								origin, 
    								dest 
    							from 
    								flights;
    						

    Syntax

    However, being consistent and making your code easy to read is important.

    
    							/* Three equivalent queries despite case and whitespace */
    							SELECT ORIGIN, DEST FROM FLIGHTS;
    
    							select 
    								origin, 
    								dest 
    							from 
    								flights;
    
    							SeLeCt      OrIgIn,DeSt        fRoM      fLiGhTs;
    						

    Error Messages

    Error messages try to tell you what you've done wrong in your code when it doesn't work.

    Learning to read them and understand what they mean is invaluable, and we will see some as we go through.

    Nulls

    Finally, empty values in SQL have a special type call null.

    This isn't an empty string or 0 or NA, it is a special thing all by itself.

    Our Data

    Display of schema structure in a datbase

    Part 1:

    Variables

    Selecting and creating them

    Part 1:

    By the end of this section you will be able to:

  • Return all or some columns from a table
  • Create and return your own columns based on functions
  • Create columns with conditional values
  • Alias columns and tables
  • Use common string functions
  • Variables

    When you write a query, you are asking the database to produce you a table. This table need to have columns, so we need to tell the database which variables we need.

    Some of these variables might already exist, some might need to be created - we'll learn how to do both.

    Variables

    All SQL queries take the following minimum form.

    
    							SELECT
    								variables
    							FROM
    								DATABASE.SCHEMA.TABLE;
    						

    Variables

    All SQL queries take the following minimum form.

    Every query has a FROM keyword followed by a reference to a table, this tells the database where to look for our variables.

    
    							SELECT
    								variables
    							FROM
    								DATABASE.SCHEMA.TABLE;
    						

    Variables

    All SQL queries take the following minimum form.

    Every query has a SELECT keyword followed by list of variables; these might be existing columns in the table or ones created.

    
    							SELECT
    								variables
    							FROM
    								DATABASE.SCHEMA.TABLE;
    						

    Variables

    All SQL queries take the following minimum form.

    Every query ends with a semi-colon. This is like a full stop for the database so it knows where one query stops and the next one begins.

    
    							SELECT
    								variables
    							FROM
    								DATABASE.SCHEMA.TABLE;
    						

    Variables - Existing

    We can select every variable from our table using the * special character.

    
    							SELECT
    								*
    							FROM
    								schema_name.flights;
    						

    Variables - Existing

    We can select every veriable from our table using the * special character.

    Or we can swap out the * and list the column names ourselves, seperated by a comma

    
    							SELECT
    								*
    							FROM
    								schema_name.flights;
    						

    Variables - Existing

    Or we can swap out the * and list the column names ourselves, seperated by a comma

    
    							SELECT
    								year, 
    								month, 
    								day, 
    								dep_time, 
    								sched_dep_time, 
    								dep_delay, 
    								arr_time, 
    								sched_arr_time, 
    								arr_delay, 
    								carrier, 
    								flight, 
    								tailnum, 
    								origin, 
    								dest, 
    								air_time, 
    								distance, 
    								hour, 
    								minute, 
    								time_hour
    							FROM
    								schema_name.flights;
    						

    Variables - Existing

    Or we can just select some of them

    
    							SELECT
    								year, 
    								month, 
    								day, 
    								dep_time, 
    								sched_dep_time, 
    								carrier, 
    								flight, 
    								tailnum, 
    								origin, 
    								dest
    							FROM
    								schema_name.flights;
    						

    Live View

    Run some of these queries so far and see the results.

    Variables - New

    We can ask the database for variables that aren't in our table, we just need to tell it what the data is. In this case, that data is the value 123.

    
    							SELECT
    								year, 
    								month, 
    								day, 
    								dep_time, 
    								sched_dep_time, 
    								carrier, 
    								flight, 
    								tailnum, 
    								origin, 
    								dest,
    								123
    							FROM
    								schema_name.flights;
    						
    Note that we had to add a comma to the end of the list before our new variable.

    Variables - New

    It would be nice to give this variable a name, we can do that using the as keyword and call it whatever we like (within reason). This is known as aliasing.

    
    							SELECT
    								year, 
    								month, 
    								day, 
    								dep_time, 
    								sched_dep_time, 
    								carrier, 
    								flight, 
    								tailnum, 
    								origin, 
    								dest,
    								123 as new_column
    							FROM
    								schema_name.flights;
    						
    Column names should start with a letter, have only _ as punctuation and be less than 30 characters long. Some of these rules can but should not be broken.

    Variables - New

    Technically we don't need to use as but for column names it is good practice.

    
    							SELECT
    								year, 
    								month, 
    								day, 
    								dep_time, 
    								sched_dep_time, 
    								carrier, 
    								flight, 
    								tailnum, 
    								origin, 
    								dest,
    								123 new_column
    							FROM
    								schema_name.flights;
    						
    We'll see in a second you can alias a table as well, but you cannot use the as word there!

    Live View

    Run some of these queries so far and see the results.

    Variables - New

    NOTE that we can't mix * and extra variables.

    
    							-- This code will error
    							SELECT 
    								*,
    								123 as new_column
    							FROM
    								schema_name.flights;
    						

    Variables - New

    NOTE that we can't mix * and extra variables.

    This is because * doesn't mean everything from our table, it means EVERYTHING, and we can't ask for more than everything (it's like taking infinity+1).

    
    							-- This code will error
    							SELECT 
    								*,
    								123 as new_column
    							FROM
    								schema_name.flights;
    						

    Variables - New

    What we need to do it tell the database we want everything from the table and note the universe. So first, it's important to understand the following are the same query, one is shorthand for the other.

    
    							SELECT 
    								tailnum
    							FROM
    								schema_name.flights;
    
    							SELECT 
    								schema_name.flights.tailnum
    							FROM
    								schema_name.flights;
    						

    Variables - New

    So if we want everything from the table, we could write the following, and this will work, but is pretty annoying to type.

    
    							SELECT 
    								schema_name.flights.*,
    								123 as new_column
    							FROM
    								schema_name.flights;
    						

    Variables - New

    So if we want everything from the table, we could write the following, and this will work, but is pretty annoying to type.

    What if the same way we aliased columns we alias our table? Let's call it a

    
    							SELECT 
    								a.*,
    								123 as new_column
    							FROM
    								schema_name.flights a;
    						

    Variables - New

    SIDENOTE: You can break some of those column name rules using double quotes ("), these are used when you talk about database objects (tables, columns etc). Use single quotes (') for text data in the database.

    
    							SELECT 
    								a.*,
    								123 as "new column"
    							FROM
    								schema_name.flights a;
    						

    Live View

    Run some of these queries so far and see the results.

    Exercise time!

    Attempt questions 1-4.

    Variables - Functions

    Having a column with all the same value isn't really that useful.

    Let's create create some variables using existing ones next.

    From now on for simplicity we will only show the relevant parts of the code, not all the variables as shown in the training file.

    Variables - Functions

    We can start by doing some basic maths on a numeric column.

    
    							SELECT 
    								month,
    								month + 1 AS next_month, 
    								...
    							FROM
    								schema_name.flights;
    						

    Variables - Functions

    We can start by doing some basic maths on a numeric column.

    We don't even have to keep the original column.

    
    							SELECT 
    								month + 1 AS next_month, 
    								...
    							FROM
    								schema_name.flights;
    						

    Variables - Functions

    We CAN'T use our new column in another calculation (we'll see way to do this later).

    
    							-- This code will error
    							SELECT 
    								month + 1 AS next_month,
    								next_month + 1 AS next_next_month 
    								...
    							FROM
    								schema_name.flights;
    						

    Variables - Functions

    Now let's take a look at some string functions that will be useful for when we're working with text.

    
    							SELECT
    								carrier, 
    								...
    							FROM
    								schema_name.flights;
    						

    Variables - Functions

    lower takes a string and makes it lowercase.

    
    							SELECT
    								carrier, 
    								lower(carrier),
    								...
    							FROM
    								schema_name.flights;
    						

    Variables - Functions

    upper takes a string and makes it uppercase.

    
    							SELECT
    								carrier, 
    								lower(carrier),
    								upper(lower(carrier)),
    								...
    							FROM
    								schema_name.flights;
    						
    Notice how we've nested functions here, this is a useful trick.

    Variables - Functions

    substr takes 3 arguments, the first 2 are required and are the string to cut and the start position. The third argument is optional and is the length of the substring.

    
    							SELECT
    								carrier, 
    								lower(carrier),
    								upper(lower(carrier)),
    								tailnum,
    								substr(tailnum, 3),
    								substr(tailnum, 1, 3),
    								substr(tailnum, 3, 2),
    								...
    							FROM
    								schema_name.flights;
    						
    This is like LEFT, MID, and RIGHT from Excel all in one.

    Variables - Functions

    length tells you have many characters are in a string.

    
    							SELECT
    								carrier, 
    								lower(carrier),
    								upper(lower(carrier)),
    								tailnum,
    								substr(tailnum, 3),
    								substr(tailnum, 1, 3),
    								substr(tailnum, 3, 2),
    								length(tailnum)
    								...
    							FROM
    								schema_name.flights;
    						

    Live View

    Run some of these queries so far and see the results.

    Exercise time!

    Attempt questions 5-7.

    Variables - Case Statements

    Sometimes we don't want to do the same thing to every record, we might want to do different calculations based on some conditions.

    We do this by something called a case statement.

    Variables - Case Statements

    Case statements take this form:

    
    							CASE 
    								WHEN condition THEN value_1
    								WHEN condition THEN value_2
    								...
    								WHEN condition THEN value_n
    								ELSE value_default
    							END
    						

    Variables - Case Statements

    Case statements take this form:

  • condition is some boolean statement (TRUE or FALSE test)
  • 
    							CASE 
    								WHEN condition THEN value_1
    								WHEN condition THEN value_2
    								...
    								WHEN condition THEN value_n
    								ELSE value_default
    							END
    						

    Variables - Case Statements

    Case statements take this form:

  • condition is some boolean statement (TRUE or FALSE test)
  • value is the output for record if the test is TRUE
  • 
    							CASE 
    								WHEN condition THEN value_1
    								WHEN condition THEN value_2
    								...
    								WHEN condition THEN value_n
    								ELSE value_default
    							END
    						

    Variables - Case Statements

    Case statements take this form:

  • condition is some boolean statement (TRUE or FALSE test)
  • value is the output for record if the test is TRUE
  • Test are evaluated in order, if the first test is true it won't check the rest.
  • 
    							CASE 
    								WHEN condition THEN value_1
    								WHEN condition THEN value_2
    								...
    								WHEN condition THEN value_n
    								ELSE value_default
    							END
    						

    Variables - Case Statements

    Case statements take this form:

  • condition is some boolean statement (TRUE or FALSE test)
  • value is the output for record if the test is TRUE
  • Test are evaluated in order, if the first test is true it won't check the rest.
  • If none are true, the ELSE condition triggers (id you don't provide one, it defaults to NULL.
  • 
    							CASE 
    								WHEN condition THEN value_1
    								WHEN condition THEN value_2
    								...
    								WHEN condition THEN value_n
    								ELSE value_default
    							END
    						

    Variables - Case Statements

    REMEMBER: every CASE must have an END and every WHEN must have a THEN.

    
    							CASE 
    								WHEN condition THEN value_1
    								WHEN condition THEN value_2
    								...
    								WHEN condition THEN value_n
    								ELSE value_default
    							END
    						

    Variables - Case Statements

    To write these boolean/logical tests we use comparison operators like the following:

    Symbol Comparison
    = Equal
    !=, <>, ^= Unequal (not equal)
    <, <= Less than, Less than or equal to
    >, >= Greater than, Greater than or equal to
    [NOT] IN Set membership (is the element in the set/list provided). Set provided in ('Elem1', 'Elem2', ..., 'ElemN') format
    [NOT] LIKE Pattern match (is the element like the provided pattern). Uses % as 0-Inf character wildcard, _ for single character wildcard
    [NOT] BETWEEN Shorthand for x <= y and y <= z becomes y between x and z
    The [NOT] is optional

    Variables - Case Statements

    And we can combine them using the logical operators:

    Operation Meaning
    AND
    OR
    NOT
    ()

    Truth Table Quiz!

    A B A AND B A OR B NOT A
    TRUE TRUE TRUE TRUE FALSE
    TRUE FALSE FALSE TRUE -
    FALSE TRUE FALSE TRUE TRUE
    FALSE FALSE FALSE FALSE -

    Variables - Case Statements

    And we can combine them using the logical operators:

    Operation Meaning
    AND Logical AND, both statements must be TRUE
    OR Logical OR, at least one statement must be TRUE
    NOT Logical negation, converts TRUE to FALSE and vice versa
    () Brackets, force the evaluation of the logical statement inside before any outside, i.e. (2+7)*5 != 2+7*5

    Variables - Case Statements

    Let's look at some examples

    
    							CASE 
    								WHEN sched_dep_time < 1200 THEN 'AM'
    								ELSE 'PM'
    							END AS DEP_AM_PM,
    
    							CASE 
    								WHEN sched_dep_time BETWEEN 0 AND 0800 THEN 'Early Morning'
    								WHEN sched_dep_time BETWEEN 0801 AND 1159 THEN 'Morning'
    								WHEN sched_dep_time BETWEEN 1200 AND 1800 THEN 'Afternoon'
    								WHEN sched_dep_time BETWEEN 1801 AND 2100 THEN 'Evening'
    								WHEN sched_dep_time BETWEEN 2100 AND 2359 THEN 'Night'
    							END AS DEP_TIME_DETAIL, -- Notice no ELSE
    
    							CASE 
    								WHEN length(sched_arr_time) != 4 THEN 'AM'
    								WHEN substr(sched_arr_time, 1, 2) < 12 THEN 'AM'
    								WHEN substr(sched_arr_time, 1, 2) > 12 THEN 'PM'
    								ELSE 'Midday'
    							END AS ARR_AM_PM,
    							...
    						

    Variables - Case Statements

    Now examples using in

    
    							CASE
    								WHEN dest in ('JFK', 'LGA', 'EWR') THEN 'NYC'
    								ELSE 'Other'
    							END AS dest_nyc_check_in,
    
    							-- see how much easier it is than writing ors
    							CASE
    								WHEN dest = 'JFK' or dest = 'LGA' or dest = 'EWR' THEN 'NYC'
    								ELSE 'Other'
    							END AS dest_nyc_check_ors
    							...
    						
    Notice that comparisons always have to have a left and a right hand side, even when it's the same LHS

    Variables - Case Statements

    String values are case sensitive

    
    							CASE
    								WHEN origin in ('JFK', 'LGA', 'EWR') THEN 'NYC'
    								ELSE 'Other'
    							END AS origin_nyc_check1,
    
    							CASE
    								WHEN lower(origin) in ('JFK', 'LGA', 'EWR') THEN 'NYC'
    								ELSE 'Other'
    							END AS origin_nyc_check2,
    
    							CASE
    								WHEN origin in ('JFK', 'lga', 'ewr') THEN 'NYC'
    								ELSE 'Other'
    							END AS origin_nyc_check3
    							...
    						

    Variables - Case Statements

    And finally the like matches.

    
    							-- Any number of character (including 0) before, and after a 7
    							CASE
    								WHEN tailnum LIKE '%7%' THEN 1
    								ELSE 0
    							END AS contains_7,
    
    							-- Any number of characters before a capital Q
    							CASE 
    								WHEN tailnum LIKE '%Q'	THEN 1 
    								ELSE 0
    							END AS ends_in_q,
    
    							-- A NOT acting on the output of a logical statement
    							CASE 
    								WHEN NOT (tailnum LIKE '%Q') THEN 1 
    								ELSE 0
    							END AS ends_in_not_q,
    
    							-- A NOT LIKE, specifically any record that doesn't match the pattern, 
    							-- slightly different to above
    							CASE WHEN tailnum NOT LIKE 'N5_5%' THEN 1 
    								ELSE 0
    							END AS weird_test
    							...
    						

    Live View

    Run some of these queries so far and see the results.

    Exercise time!

    Attempt questions 8-11.

    Variables - Dates

    Finally let's look at dates in Oracle SQL.

    Date data types include both a date and a time, there is no time only value in Oracle.

    There is another type called a timestamp that contains milliseconds and timezones, most functions that work on dates also work on timestamps.

    Variables - Dates

    Just like in Excel, dates in Oracle are just numbers - meaning a difference of 24 hours is 1, 12 hours is 0.5 etc.

    However unlike excel because they are a specific datatype there's some specific functions to work with them.

    Variables - Dates

    Function Description
    TO_DATE(variable,   fmt) Converts the variable (or string) into a date   by giving it the format the variable is in
    TRUNC(date [, fmt]) Sets the time component of the date to 00:00:00 by default, or will round   down to the format if provided
    sysdate A system variable that is the current (local) date and time
    NEXT_DAY(date, day) Returns the next date of the given day, after the date provided e.g.   takes a Saturday at 00:00:01 to the NEXT Saturday
    TO_CHAR(variable, fmt) Converts the variable (usually a date or a number) into the given format.   This is the method to format a date in a given way NOT substr
    EXTRACT(value FROM date) Returns the given value e.g. year from the date and returns it as a   number. To extract units less than a day you must use a timestamp
    TO_TIMESTAMP(variable, fmt) Converts the variable (or string) into a timestamp by giving it the   format the variable is in

    Variables - Dates

    I'm going to use || in the code which concatenates (pastes) 2 strings together.
    
    						SELECT 
    							time_hour,
    							time_hour + 1, -- add 1 day
    							time_hour + 5/24/60, -- add 5 minutes
    							TRUNC(time_hour), -- set time component to 0
    							TRUNC(time_hour, 'MM'), -- set to first of month
    							NEXT_DAY(time_hour, 'Saturday') - 1, -- Set to Friday inclusive for a week
    							TO_CHAR(time_hour, 'HH24:MI DAY DD MONTH YYYY'), -- format date in a specific format
    							TO_CHAR(time_hour, 'HH24:MI Day dd month yyyy'), -- notice how case changes the output
    							-- convert our columns into a date
    							TO_DATE(year||'-'||month||'-'||day||' '||hour||':'||minute, 'YYYY-MM-DD HH24:MI') -- convert our columns into a date
    						FROM
    							schema_name.flights;
    						

    Variables - Dates

    Using TO_DATE every time can be annoying, so we can specify a date in another way. This is easier to read and more efficient for the database as well.

    
    						SELECT 
    							DATE '2019-01-20' AS fixed_date
    						FROM
    							schema_name.flights;
    						
    The format is ALWAYS YYYY-MM-DD and is a string.

    Variables - Dates

    You should NEVER use substr on a date because the result you gets depend on how the user has their system set up.

    
    						-- Set a specific date format for your local session and sub-string our dates
    						ALTER SESSION SET NLS_DATE_FORMAT = 'DD-MM-YYYY HH24:MI:SS';
    						SELECT 
    							time_hour,
    							substr(time_hour, 9, 2)
    						FROM
    							schema_name.flights;
    						-- returns 2 digit year
    						
    						-- Change the format and check the results
    						ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-DD-MM HH24:MI:SS';
    						SELECT 
    							time_hour,
    							substr(time_hour, 9, 2)
    						FROM
    							schema_name.flights;
    						-- returns 2 digit month
    					
    Always use to_char instead.

    Live View

    Run some of these queries so far and see the results.

    Exercise time!

    Attempt questions 12-15.

    Part 1:

    In this section you learned to:

  • Return all or some columns from a table
  • Create and return your own columns based on functions
  • Create columns with conditional values
  • Alias columns and tables
  • Use common string functions
  • Don't forget to run the error message examples to see common examples of variable based errors!

    Part 2:

    Records

    Filtering and ordering them

    Part 2:

    By the end of this section you will be able to:

  • Filter records based on conditions
  • Order and only return a fixed number of records
  • Combine record sets into a single output
  • Records - Filtering

    We will start by learning how to filter records, we do this using a WHERE clause, which goes after the FROM clause.

    
    					SELECT
    						columns
    					FROM
    						tables
    					WHERE
    						condition
    					;
    					
    The condition in this clause is the same type as those we saw in the CASE statements.

    Records - Filtering

    Let's take a look at some examples:

    
    					SELECT 
    						*
    					FROM
    						schema_name.flights
    					WHERE
    						arr_delay = 0
    						AND dep_delay = 0
    					;
    					

    Records - Filtering

    Let's take a look at some examples:

    Bonus tip: You can use an always true condition to make it easier to comment out individual checks when you're still developing your code

    
    					SELECT 
    						*
    					FROM
    						schema_name.flights
    					WHERE
    						1 = 1
    						--AND arr_delay = 0
    						AND dep_delay = 0;
    					;
    					

    Records - Filtering

    Let's take a look at some examples:

    and of course we can use our favourite boolean logic!

    
    						SELECT
    						*
    					FROM
    						schema_name.flights
    					WHERE
    						1 = 1
    						-- flights scheduled to depart in the first 10 or last 5 hours of a day
    						AND (hour < 10 OR hour > 19)
    						AND dep_delay = 0 -- left on time
    					;
    					

    Records - Filtering

    Let's take a look at some examples:

    A reminder that we can use IS [NOT] NULL as a comparison. Useful when we want to check how many records have missing (or not) data

    
    					SELECT
    						*
    					FROM
    						schema_name.flights
    					WHERE
    						dep_delay IS NULL
    					;
    					

    Records - Filtering

    Just like before, we can't use a renamed column in a WHERE clause.

    
    					-- Both of these will error
    					SELECT 
    						TRUNC(time_hour) AS flight_day
    					FROM
    						schema_name.flights
    					WHERE
    						flight_day > DATE '2013-06-01';
    					
    					SELECT 
    						tailnum AS tailnumber
    					FROM
    						schema_name.flights
    					WHERE
    						tailnumber LIKE 'N%';
    					

    Records - Filtering

    Just like before, we can't use a renamed column in a WHERE clause.

    The reason for this is because the WHERE clause runs before the SELECT does (even though it's written after).

    Records - Filtering

    An upside of this is that you can filter on a column that you don't actually output.

    
    					SELECT 
    						tailnum 
    					FROM
    						schema_name.flights
    					WHERE
    						origin = 'JFK'; -- only flights out of JFK airport
    					

    Live View

    Run some of these queries so far and see the results.

    Exercise time!

    Attempt questions 16-19.

    Records - Ordering

    Next let's look at how we can order our data

    Ordering data is one of the most inefficient things a computer can do and databases are not designed to be fast at it, try to avoid ordering unless you have to, only do it once, or consider doing it in some other tool after exporting the data
    
    					SELECT
    						columns
    					FROM
    						tables
    					WHERE
    						condition
    					ORDER BY 
    						columns;
    					

    Records - Ordering

    We can order by a single column

    
    					SELECT 
    						*
    					FROM
    						schema_name.flights
    					WHERE 
    						dep_delay > 0
    					ORDER BY 
    						dep_delay;
    					

    Records - Ordering

    Or multiple columns for when there is a tie

    
    					SELECT 
    						*
    					FROM
    						schema_name.flights
    					WHERE 
    						dep_delay > 0
    					ORDER BY 
    					-- sort by departure delay THEN arrival delay if there is a tie.
    						dep_delay, 
    						arr_delay; 
    					

    Records - Ordering

    By default records are ordered in ascending order

    
    					SELECT 
    						*
    					FROM
    						schema_name.flights
    					WHERE 
    						dep_delay > 0
    					ORDER BY 
    					-- Least delayed departing, ties are sorted by least delayed when arriving.
    						dep_delay, 
    						arr_delay; 
    					

    Records - Ordering

    By default records are ordered in ascending order

    but we can change that

    
    					SELECT 
    						*
    					FROM
    						schema_name.flights
    					WHERE 
    						dep_delay > 0
    					ORDER BY 
    					-- MOST delayed departing, ties are sorted by least delayed when arriving.
    						dep_delay DESC, 
    						arr_delay; 
    					

    Records - Ordering

    By default records are ordered in ascending order

    but we can change that

    and we have to specify it for each column.

    
    					SELECT 
    						*
    					FROM
    						schema_name.flights
    					WHERE 
    						dep_delay > 0
    					ORDER BY 
    					-- MOST delayed departing, ties are sorted by MOST delayed when arriving.
    						dep_delay DESC, 
    						arr_delay DESC; 
    					

    Records - Ordering

    Finally, we can be a bit lazy and because the ORDER BY runs after the SELECT we can actually just use numbers instead of column names.

    This is also why we can use column aliases in the ORDER BY and can't use columns we didn't SELECT

    
    					SELECT 
    						flight,
    						dep_delay,
    						arr_delay
    					FROM
    						schema_name.flights
    					WHERE 
    						dep_delay > 0
    					ORDER BY 
    						2 desc, 
    						3; 
    					
    Generally this is a bad idea, it's hard to read and if you change the order in your SELECT it will break.

    Records - Distinct

    It's rare that you want duplicates in your data. Usually it means you've done a join (next section) wrong, or your source data has an issue, or you removed too many variables.

    If you are comfortable as to why you have duplicates but want to remove them you can use a SELECT DISTINCT to just get unique records.

    
    					SELECT DISTINCT 
    						origin,
    						dest
    					FROM
    						schema_name.flights;
    					
    DISTINCT acts on all selected columns, not just the first one.

    Live View

    Run some of these queries so far and see the results.

    Exercise time!

    Attempt questions 20-21.

    Records - Combining

    Sometimes we want to combine our records in one way or another, if they are from the same table there is often a clever way to do this using ORs and CASE statements, but not always. Other times we might want to:

  • Combine records (with or without checking for duplicates) and return them all (UNION [ALL])
  • Find records that exist in both record sets and just return those (INTERSECT)
  • Find records that exist in one set, but not in another and return those from the first (MINUS)
  • Records - Combining

    Union types in SQL

    Records - Combining

    We'll use the airlines table with some specific filters to illustrate these concepts
    
    						/* UNION records without duplicates */
    						SELECT 
    							*
    						FROM 
    							schema_name.airlines
    						WHERE
    							carrier in ('9E', 'AA', 'AS')
    						UNION
    						SELECT 
    							*
    						FROM 
    							schema_name.airlines
    						WHERE
    							carrier in ('AS', 'B6', 'DL');
    						
    						
    						/* UNION ALL records with duplicates */
    						SELECT 
    							*
    						FROM 
    							schema_name.airlines
    						WHERE
    							carrier in ('9E', 'AA', 'AS')
    						UNION ALL
    						SELECT 
    							*
    						FROM 
    							schema_name.airlines
    						WHERE
    							carrier in ('AS', 'B6', 'DL');
    						
    						
    						/* Find INTERESECTion of records */
    						SELECT 
    							*
    						FROM 
    							schema_name.airlines
    						WHERE
    							carrier in ('9E', 'AA', 'AS')
    						INTERSECT
    						SELECT 
    							*
    						FROM 
    							schema_name.airlines
    						WHERE
    							carrier in ('AS', 'B6', 'DL');
    						
    						
    						/* MINUS any matching records from the second table from the first */
    						SELECT 
    							*
    						FROM 
    							schema_name.airlines
    						WHERE
    							carrier in ('9E', 'AA', 'AS')
    						MINUS
    						SELECT 
    							*
    						FROM 
    							schema_name.airlines
    						WHERE
    							carrier in ('AS', 'B6', 'DL');
    					

    Records - Fetch

    Though it is very rare, sometimes you might only want to return a specific number of rows.

    For this we use the keyword FETCH and it's associated terms.

    FETCH is the very last thing a database will run as it has to complete ordering first, so it would be any quicker to just return 5 rows instead of 500
    
    						SELECT
    							columns
    						FROM
    							tables
    						WHERE
    							condition
    						ORDER BY 
    							columns
    						[OFFSET number ROWS]
    						FETCH NEXT number [PERCENT] ROWS ONLY/WITH TIES;
    					

    Records - Fetch

    Though it is very rare, sometimes you might only want to return a specific number of rows.

    For this we use the keyword FETCH and it's associated terms.

    
    						SELECT 
    							*
    						FROM
    							schema_name.flights
    						ORDER BY 
    							distance desc
    						OFFSET 5 ROWS
    						FETCH NEXT 5 ROWS WITH TIES;
    					
    This will return more than 5 records because of the option with ties.

    Records - Fetch

    Though it is very rare, sometimes you might only want to return a specific number of rows.

    For this we use the keyword FETCH and it's associated terms.

    
    						SELECT 
    							*
    						FROM
    							schema_name.flights
    						ORDER BY 
    							distance desc
    						OFFSET 5 ROWS
    						FETCH NEXT 5 ROWS ONLY;
    					

    Live View

    Run some of these queries so far and see the results.

    Exercise time!

    Attempt questions 22-24

    Part 2:

    In this section you learned to:

  • Filter records based on conditions
  • Order and only return a fixed number of records
  • Combine record sets into a single output
  • Don't forget to run the error message examples to see common examples of variable based errors!

    Part 3:

    Tables

    Reusing and joinging them

    Part 3:

    By the end of this section you will be able to:

  • Use the dummy table for testing queries
  • Use the table output by one query in another
  • Join tables together
  • Tables - Dual

    Sometimes we want to test our query or a function and don't want to query a whole table to do it.

    To do this we can use a special table called dual which has a single row and a single column, with a value of 'X'.

    
    						SELECT
    							* 
    						FROM
    							dual;
    					

    Tables - Dual

    One good use for this is often to test out some date or string functions on dummy data to make sure you get it right

    
    					SELECT
    						--Some random date manipulation as a test
    						TRUNC(SYSDATE) - 5 + 1/24 + 14/24/60, 
    						-- should return Friday
    						NEXT_DAY(TRUNC(SYSDATE), 'SAT') - 1 AS week_end_date 
    					FROM
    						dual;
    					

    Tables - Reusing

    We've seen multiple times that we can't use a column we create/alias in a query within the same query (except in ordering). This was because of the order the query actualy runs in vs how we see it written.

    But we know that a query returns a table, so we could use that table as the input to another query!

    
    					SELECT
    						columns
    					FROM
    						table
    					WHERE
    						condition;
    					

    Tables - Reusing

    We've seen multiple times that we can't use a column we create/alias in a query within the same query (except in ordering). This was because of the order the query actualy runs in vs how we see it written.

    But we know that a query returns a table, so we could use that table as the input to another query!

    
    					SELECT
    						columns
    					FROM
    						(SELECT
    							columns
    						FROM
    							tables
    						WHERE
    							condition
    						)
    					WHERE
    						condition;
    					

    Tables - Reusing

    This is known as a nested query because the inner query is nested entirely within the outer one.

    We can do this as many times as we like, but the database does have to complete an inner query before it can run the outer one, so filter as early (more nested) as possible, and order as late as possible.

    
    					SELECT
    						columns
    					FROM
    						(SELECT
    							columns
    						FROM
    							tables
    						WHERE
    							condition
    						)
    					WHERE
    						condition;
    					

    Tables - Reusing

    We need a way to refer to the attributes in the output of this inner query, we do that by aliasing the inner query - just like we would a table.

    
    					SELECT
    						b.*
    					FROM
    						(
    						SELECT
    							a.*,
    							distance * 1.60934 AS distance_km
    						FROM
    							schema_name.flights a
    						) b
    					WHERE
    						distance_km > 1000;
    					
    People often use a/b/c etc. for aliases - for short code this is fine but for longer code try to use a short but descriptive alias for each table instead.

    Tables - Reusing

    Nested queries are great when the inner query is short and you only need to use it once. But what about if you want to use that inner query in multiple places, or the nesting gets very deep and complex?

    The solution is Common Table Expressions (CTEs) which allow you to alias a whole query and refer to it at any later point in your code without having to write it out in full every time. These use the WITH keyboard

    
    						WITH alias AS (QUERY),
    
    						alias2 AS (QUERY),
    
    						alias3 AS (QUERY)
    						
    						SELECT
    							columns
    						FROM
    							tables;
    					

    Tables - Reusing

    The database will optimise if it needs to reuse a table and will temporarily store it vs just running it once when needed - CTEs can be more efficient than nested queries, not just easier to read!
    
    						WITH km_table as 
    						(
    							SELECT
    								a.*,
    								distance * 1.60934 AS distance_km
    							FROM
    								schema_name.flights a
    						)
    
    						SELECT
    							*
    						FROM
    							km_table
    						WHERE
    							distance_km > 1000
    						UNION
    						SELECT
    							*
    						FROM
    							km_table
    						WHERE
    							distance_km < 1000;
    					

    Live View

    Run some of these queries so far and see the results.

    Exercise time!

    Attempt questions 25-28

    Tables - Reusing

    Finally we can keep a table for a longer period by storing it permanently in the database. To do this we use the CREATE TABLE statement.

    Remember that a statement is different to a a query - a query reads the database and returns a table, a statement can interact with the database in other ways.
    
    						-- Nologging optionally turns off logs for the table creation
    						CREATE TABLE your_table_name [NOLOGGING] AS
    							query;
    					
    It is possible to use CREATE TABLE to create tables without a query by just specifying column names and types, but that is beyond the scope of this training.

    Tables - Reusing

    Once created you can query the table like any other. Because you created it in your schema, you don't have to specify the schema name.

    Remember only you can see this table, to share it with others you have to grant them access.
    
    						CREATE TABLE flights_km NOLOGGING AS
    						SELECT
    							a.*,
    							distance * 1.60934 AS distance_km
    						FROM
    							schema_name.flights a;
    
    						SELECT
    							*
    						FROM 
    							flights_km;
    					

    Tables - Reusing

    Once we're done we want to DROP the table to delete it from the database.

    
    						DROP TABLE your_table_name [PURGE];
    					
    PURGE removes it from your recycling bin, you can leave it out but make sure to empty the bin in the case!

    Tables - Reusing

    
    						DROP TABLE flights_km PURGE;
    
    						SELECT
    							*
    						FROM 
    							flights_km; -- the table no longer exist.
    					

    Tables - Reusing

    There are plenty of other statements in SQL that interact with tables, you can use TRUNCATE to empty a table, INSERT to add new records, DELETE to remove specific records, and UPDATE to alter existing records.

    However, these are beyond the scope of this training.

    Live View

    Run some of these queries so far and see the results.

    Exercise time!

    Attempt questions 29

    Tables - Joining

    We now come to one of the most powerful parts of using a relational databases, joining tables.

    Joins take place in the FROM part of the query and have the following form:

    
    						SELECT
    							columns
    						FROM
    							table1
    							join_type JOIN table2
    							ON condition;
    					

    Tables - Joining

    There are 4 types of joins you can do:

    Join type Description
    INNER JOIN Only return records where they exist in both tables
    LEFT/RIGHT JOIN Return all records in the left/right table and any match in the right/left one if it exists
    FULL OUTER JOIN Return all records in both tables and any match if it exists
    CROSS JOIN Return all combinations of rows across both tables (the Cartesian product of the rows). Not often useful.
    Note that CROSS JOIN does not have an ON condition.

    Tables - Joining

    There are 4 types of joins you can do:

    Join types in SQL

    Tables - Joining

    WARNING:

    When you specify a condition in your join that defines where a match is returned to the record or not, when you specify it in the WHERE this specifies whether the whole record is returned or not.

    In the case of an inner join these are the same thing, but for other joins they are not.

    First define your join logic, then define your filter criteria separately.

    Tables - Joining

    For the join examples we're going to use a slim version of the planes table so we can see how each join behaves; let's create that first.

    
    						CREATE TABLE planes_short NOLOGGING AS
    						SELECT
    							* 
    						FROM 
    							schema_name.planes
    						WHERE
    							tailnum = 'N108UW';
    					

    Tables - Joining

    First let's look at an inner join, where only records that find a match are returned

    
    						SELECT 
    							a.*,
    							b.*
    						FROM 
    							schema_name.flights a
    							INNER JOIN planes_short b
    							/* In this case the columns had the same names, 
    							but that won't always (in fact will rarely) be the case */
    							ON a.tailnum = b.tailnum; 
    					

    Tables - Joining

    Next the left join, where all records from the first table are returned, plus any matching data from the second

    
    						SELECT 
    							a.*,
    							b.*
    						FROM 
    							schema_name.flights a
    							LEFT JOIN planes_short b
    							ON a.tailnum = b.tailnum; 
    					

    Tables - Joining

    And a right join, just the other way round to the left.

    
    						SELECT 
    							a.*,
    							b.*
    						FROM 
    							schema_name.flights a
    							RIGHT JOIN planes_short b
    							ON a.tailnum = b.tailnum; 
    					
    Despire the planes_short table having 1 row per plane, because of the join we now have multiple - we'll talk about this in a bit!

    Tables - Joining

    And a full outer join - we'll use a dummy table to create a fake plane to show that it returns all records from both tables regardless of if they match.

    
    						SELECT 
    							a.*,
    							b.*
    						FROM 
    							schema_name.flights a
    							FULL OUTER JOIN (SELECT 'FAKETAIL' AS tailnum_2 FROM dual) b
    						ON a.tailnum = b.tailnum_2
    						-- Defaults to nulls first for descending.
    						ORDER BY a.flight DESC; 
    					

    Tables - Joining

    Lastly the cross join - the least used of them all. We'll use 2 tables with 3 rows each to show we get all matches back in 9 rows (3x3).

    
    						SELECT
    							*
    						FROM 
    							(SELECT '1' AS num FROM DUAL
    								UNION
    							SELECT '2' AS num FROM DUAL
    								UNION
    							SELECT '3' AS num FROM DUAL) A
    						CROSS JOIN
    							(SELECT 'A' AS let FROM DUAL
    								UNION
    							SELECT 'B' AS let FROM DUAL
    								UNION
    							SELECT 'C' AS let FROM DUAL) B;
    					

    Tables - Joining

    A note about join conditions

    Joins are the place you are most likely to make a mistake that leads to duplicates in your data and going from millions of records to billions.

    The reason is if your join is not unique on at least one of the tables (i.e. the condition you give can be used to define each record and distinguish it from others) then each record in both tables will join to multiple in the other, getting out of hand very fast.

    Always ask yourself; does my join link 1 row to 1/many rows in my other table? If the answer is no, you need to rethink your logic.

    Live View

    Run some of these queries so far and see the results.

    Exercise time!

    Attempt questions 30-32

    Part 3:

    In this section you learned to:

  • Use the dummy table for testing queries
  • Use the table output by one query in another
  • Join tables together
  • Don't forget to run the error message examples to see common examples of variable based errors!

    Part 4:

    Aggregation

    Grouping and window functions

    Part 4:

    By the end of this section you will be able to:

  • Summarise data in defined groups
  • Add aggregated information to existing tables
  • Aggregation - Grouping

    Often we want to get aggregate information of our data and group it by values in another column - this is similar to what we do in PivotTables in Excel.

    To do this we use the GROUP BY clause and summary functions. This goes after WHERE but before ORDER BY.

    
    						SELECT
    							group_columns,
    							summary_columns
    						FROM
    							tables
    						WHERE
    							condition
    						GROUP BY 
    							group_columns
    						ORDER BY 
    							columns;
    					
    Any non-summarised variable must be listed in your GROUP BY otherwise it will error. You can group by something you don't select but it's a bad idea.

    Aggregation - Grouping

    There are plenty of aggergation functions and the most common are listed below:

    Agg type Description
    COUNT(*) The total number of records per group
    COUNT(column) The total number of records that are non-null in that variable
    SUM(column) The sum of that variable, must be numeric
    AVG(column) The mean of that variable, must be numeric
    MAX(column) The maximum of that variable
    MIN(column) The minimum of that variable
    For a full list visit the Oracle site.

    Aggregation - Grouping

    Let's look at information about flights out of each airport

    
    						SELECT
    							origin,
    							COUNT(*) AS num_flights,
    							COUNT(dep_time) as num_departed_flights, -- doesn't count records with a null departure time
    							SUM(air_time) AS total_airtime,
    							AVG(air_time) AS mean_airtime,
    							MAX(dep_delay) AS latest_dep,
    							MIN(dep_delay) AS earliest_dep
    						FROM 
    							schema_name.flights
    						GROUP BY 
    							origin;
    					

    Aggregation - Grouping

    Let's look at information about flights out of each airport

    AND by each destination airport.

    
    						SELECT
    							origin,
    							dest,
    							COUNT(*) AS num_flights,
    							COUNT(dep_time) as num_departed_flights,
    							SUM(air_time) AS total_airtime,
    							AVG(air_time) AS mean_airtime,
    							MAX(dep_delay) AS latest_dep,
    							MIN(dep_delay) AS earliest_dep
    						FROM 
    							schema_name.flights
    						GROUP BY 
    							origin,
    							dest;
    					

    Aggregation - Grouping

    We don't have to group by everything if we want details about the whole dataset.

    
    						SELECT 
    							COUNT(*) AS num_flights,
    							MAX(dep_delay) AS latest_dep,
    							MIN(dep_delay) AS earliest_dep
    						FROM
    							schema_name.flights;
    					

    Aggregation - Grouping

    These aggergations becomes even more powerful when combined with a CASE statement.

    
    						SELECT 
    							origin,
    							SUM( -- SUM uses 1 and 0 as the values
    								CASE 
    									WHEN dep_delay = 0 AND arr_delay = 0 
    										THEN 1 
    									ELSE 0 
    								END 
    								) AS perfect_flights,
    							COUNT( -- COUNT uses anything and null as the values
    								CASE 
    									WHEN dep_delay = 0 AND arr_delay = 0 
    										THEN 1 
    									ELSE null 
    								END 
    								) AS perfect_flights2 
    						FROM
    							schema_name.flights
    						GROUP BY
    							origin;
    					

    Aggregation - Grouping

    You can filter on the results on an aggretation by using the HAVING keyword (as a WHERE filters records before they are aggregated).

    
    						SELECT
    							tailnum,
    							COUNT(*)
    						FROM
    							schema_name.flights
    						GROUP BY
    							tailnum
    						HAVING
    						-- only planes that had at least 50 flights in 2013
    							COUNT(*) > 50; 
    					

    Live View

    Run some of these queries so far and see the results.

    Exercise time!

    Attempt questions 33-34

    Aggregation - Windowing

    There are times we want to add aggregate information to our data but keep it at the record level. We could group and then join back on, but this is inefficient.

    Instead we can use something called WINDOW FUNCTIONS (also known as ANALYTICAL FUNCTIONS).

    Pretty much all of them have the same format:

    
    						WINDOW_FUNCTION(column) OVER (PARTITION BY columns ORDER BY columns)
    					
    Not all of the functions have an argument to the main function, or an ORDER BY clause, but most do.

    The PARTITION BY defines your group for that record, and the ORDER BY defines the order within that group.

    Aggregation - Windowing

    Most GROUP BY functions have an equivalent window function, but there are a few extras as well:

    Window Function Description
    ROW_NUMBER() Numbers records in group from 1 to N (number in group) based on order by
    RANK(column) Rank value within that group, same value for ties (with gaps)
    DENSE_RANK(column) Rank value within that group, same value for ties (with no gaps)
    FIRST/LAST_VALUE(column) The first/last value within that group.
    LEAD/LAG(column) The next/previous value within that group.
    For a full list visit the Oracle site. Not covered here is a few extra (optional) arguments to these functions that let you do the calculation to only some range of the rows (e.g. only previous rows for cumulative values, or the previous and next X records for rolling values etc.) For more info see the documentation.

    Aggregation - Windowing

    Let's look at an example.

    
    						SELECT
    							a.*,
    							ROW_NUMBER() OVER (
    								PARTITION BY year, month, day, carrier 
    								ORDER BY sched_dep_time
    								) AS day_carrier_flight,
    							COUNT(*) OVER (
    								PARTITION BY year, month, day, carrier
    								) AS day_carrier_flight_total,
    							AVG(dep_delay) OVER (
    								PARTITION BY year, month, day, carrier
    								) AS day_carrier_avg_dep_delay
    						FROM
    							schema_name.flights a;
    					

    Live View

    Run some of these queries so far and see the results.

    Exercise time!

    Attempt questions 35-36

    Part 4:

    In this section you learned to:

  • Summarise data in defined groups
  • Add aggregated information to existing tables
  • Don't forget to run the error message examples to see common examples of variable based errors!

    Order of writing vs execution.

    We have seen that often the order we write our code in is not the same order that the database runs it in. Let's quickly review the order of writing a full query, then compare it with how the database will (usually) execute it.

    A murder mystery!?

    A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a murder​ that occurred sometime on ​Jan.15, 2018​ and that it took place in SQL City​. Start by retrieving the corresponding crime scene report from the police department’s database.

    All the clues to this mystery are buried in a huge database, and you need to use SQL to navigate through this vast network of information. Your first step to solving the mystery is to retrieve the corresponding crime scene report from the police department’s database.

    Thank you, and I hope you found this useful!