Advanced SQL :: CS-Courses-Notes

Advanced SQL

History of SQL

Originally “SEQUEL” from IBM’s System R prototype.

Structured English Query Language.

Adopted by Oracle in the 1970s.

ANSI Standard in 1986. ISO in 1987 –> Structured Query Language

Current standard is SQL 2016

Relational Languages:

Important thing to know:

Different from in relational algebra that is totally based on sets, SQL is based on bags(duplicate) algebra. Sets–>(no duplicate)

Back to our main content.

In this lecture you will learn:

Def: Functions that return a single value from a bag of tuples:

Aggregate functions can only be used in the SELECT output list.

SELECT COUNT(login) AS cnt
	FROM student WHERE login LIKE '%@cs'

And since we just want to get # of tuples, the login in COUNT does not mean anything, so we can just write

SELECT COUNT(*) AS cnt
	FROM student WHERE login LIKE '%@cs'

SELECT COUNT(1) AS cnt
	FROM student WHERE login LIKE '%@cs'

SELECT AVG(gpa), COUNT(sid)
	FROM student WHERE login LIKE '%@cs'

COUNT, SUM, AVG support DISTINCT

SELECT COUNT(DISTINCT login)
	FROM student WHERE login LIKE '%@cs'

Note that the output of other columns outside of an aggregate is undefined.

Now we have this example:

SELECT AVG(s.gpa), e.cid
	FROM enrolled AS e, student AS s
  WHERE e.sid = s.sid

In some DB app we will get the error message: column ‘e.cid’ must appear in the GROUP BY clause or be used in an aggregate function.

Some will just choose a random e.cid as the output.

Quick fix of this:

The error message has told us how to fix this:

SELECT AVG(s.gpa), e.cid
	FROM enrolled AS e, student AS s
   WHERE e.sid = s.sid
   GROUP BY e.cid

In GROUP BY, we project tuples into subsets and calculate aggregates against each subset.

Non-aggregated Values in SELECT output clause must appear in GROUP BY clause.

Filters results based on aggregation computation, like a WHERE clause for a GROUP BY

For example:

SELECT AVG(s.gpa) AS avg_gpa, e.cid
	FROM enrolled AS e, student AS s
  WHERE e.sid = s.sid
    AND avg_gpa > 3.9
  GROUP BY e.cid

Now we introduce HAVING:

SELECT AVG(s.gpa) AS avg_gpa, e.cid
	FROM enrolled AS e, student AS s
  WHERE e.sid = s.sid
  GROUP BY e.cid
  HAVING avg_gpa > 3.9;

The requirements of string varies for different databases:

LIKE is used for string matching('%' matching any substr(including NULL), ‘_’ matching any one character)

See more in manuals.

If we want to store query results in another table, we need to know:

Actually this lecture can be learned just using the manual. So I just gonna leave out the left part. RTFM instead.