Advanced SQL

History of SQL

Originally “SEQUEL” from IBM’s System R prototype.

  • Structured English Query Language.

  • Adopted by Oracle in the 1970s.

ANSI Standard in 1986. ISO in 1987 –> Structured Query Language

Current standard is SQL 2016

  • SQL 2016 : JSON, Polymorphic tables
  • SQL 2011 : Temporal DBs, Pipelined DML
  • SQL 2008: TRUNCATE, Fancy ORDER
  • SQL 2003: XML, windows, sequences, auto-generated IDs
  • SQL 1999: Regex, triggers, OO

Relational Languages:

  • Data Manipulation Language (DML)
  • Data Definition Language(DDL)
  • Data Control Language(DCL) –> security authentication

Important thing to know:

Different from in relational algebra that is totally based on sets, SQL is based on bags(duplicate) algebra. Sets–>(no duplicate)


Main Content of This Lecture

Back to our main content.

In this lecture you will learn:

  • Aggregations + Group By
  • String / Date / Time operations
  • Output Control + Redirection
  • Nested Queries
  • Common Table Expressions
  • Window Functions

Aggregates + Group By

Def: Functions that return a single value from a bag of tuples:

  • AVG(col), MIN(col), MAX(col), SUM(col), COUNT(col)

Aggregate functions can only be used in the SELECT output list.

SELECT COUNT(login) AS cnt
	FROM student WHERE login LIKE '%@cs'

And since we just want to get # of tuples, the login in COUNT does not mean anything, so we can just write

SELECT COUNT(*) AS cnt
	FROM student WHERE login LIKE '%@cs'
SELECT COUNT(1) AS cnt
	FROM student WHERE login LIKE '%@cs'
  • Multiple Aggregation
SELECT AVG(gpa), COUNT(sid)
	FROM student WHERE login LIKE '%@cs'
  • Distinct Aggregation

COUNT, SUM, AVG support DISTINCT

SELECT COUNT(DISTINCT login)
	FROM student WHERE login LIKE '%@cs'

Note that the output of other columns outside of an aggregate is undefined.

Now we have this example:

SELECT AVG(s.gpa), e.cid
	FROM enrolled AS e, student AS s
  WHERE e.sid = s.sid

In some DB app we will get the error message: column ‘e.cid’ must appear in the GROUP BY clause or be used in an aggregate function.

Some will just choose a random e.cid as the output.

Quick fix of this:

The error message has told us how to fix this:

SELECT AVG(s.gpa), e.cid
	FROM enrolled AS e, student AS s
   WHERE e.sid = s.sid
   GROUP BY e.cid

In GROUP BY, we project tuples into subsets and calculate aggregates against each subset.

Non-aggregated Values in SELECT output clause must appear in GROUP BY clause.


  • Filters results based on aggregation computation, like a WHERE clause for a GROUP BY

For example:

SELECT AVG(s.gpa) AS avg_gpa, e.cid
	FROM enrolled AS e, student AS s
  WHERE e.sid = s.sid
    AND avg_gpa > 3.9
  GROUP BY e.cid

Now we introduce HAVING:

SELECT AVG(s.gpa) AS avg_gpa, e.cid
	FROM enrolled AS e, student AS s
  WHERE e.sid = s.sid
  GROUP BY e.cid
  HAVING avg_gpa > 3.9;

String Operations

The requirements of string varies for different databases:

Name String Case String Quotes
SQL-92 Sensitive Single Only
Postgres Sensitive Single Only
MYSQL Insensitive Single/Double
SQLite Sensitive Single/Double
DB2 Sensitive Single Only
Oracle Sensitive Single Only

LIKE is used for string matching('%' matching any substr(including NULL), ‘_’ matching any one character)


DATE/TIME Operations

See more in manuals.


OUTPUT Redirection

If we want to store query results in another table, we need to know:

  • Table must not already be defined.
  • Table will have the same # of columns with the same types as the input.

Actually this lecture can be learned just using the manual. So I just gonna leave out the left part. RTFM instead.