[TOC]

SQL基础

前言

SQL 语言是关系数据库的标准操作语言。不同的数据库有独特的扩展和修订，但是学习标准的 SQL 语言还是很有价值的。

标准 SQL 分三种类型的语句：

DDL：数据定义语言
1. create: 创建
2. drop：删除
3. alter：更新
DML：数据操纵语言
1. select:查询
2. insert：插入
3. update：更新
4. delete：删除
DCL：数据控制语言
1. commit：提交
2. rollback：回退
3. grant：授权
4. revoke：吊销权限

注意：标准 SQL 有一些实用功能是缺失的，这就需要使用特定数据库的 SQL 方言。

数据库的创建

create database xxx; --创建 xxx 数据库

创建表

create table yyy( --创建 yyy 表
a char(4) not null, -- 创建 a 列，类型 char(4),约束 not null(不为空)
b integer, -- 创建 b 列
primary key(a)); -- 设置主键为 a 列

列类型

integer：整数
char：字符
varchar：字符串（可变长度）
date：日期

约束

not null：不为空
primary key：主键（可以唯一标识一行数据的列或列的组合，即蕴含键值不重复）

删除表

drop table yyy;

更新表

alter table yyy add (
    c integer, -- 添加 c 列
    d integer  -- 添加 d 列
);

alter table yyy drop column b; -- 删除 b 列

插入数据

insert into yyy values('a',1,2); -- 添加一行数据到表格 yyy

insert into yyy(a,c,d) values('b',2,3); -- 指定对应的列
insert into yyy values('c',3,3);

插入其他表的数据

create table zzz(
    a char(4) not null,
    b integer default 0,
    c integer,
    primary key(a)
);

insert into zzz(a,c,d) 
select a,b,c from yyy; -- 插入其他表的数据

查询数据

整体框架如下：

查询操作书写顺序：select -> from -> where -> group by -> having -> order by

查询操作执行顺序：from → where → group by → having → select → order by

select a,c from yyy; -- 查询 yyy 表格，选取 a,c 两列返回

select * from yyy; -- 返回所有列

select 1 as "汉"; -- 选取常量 1，并重命名列名为"汉"，汉字需要用双引号包括。

删除重复行（查询结果中）

select distinct c,d from yyy; -- 查询 yyy 表中的 c,d 两列，并删除 c,d 两列中存在的相同行，合并为一条

查询条件

select * from yyy
where -- 添加查询条件
a like 'a%' 
and (c>=1 
or d - 1 > 1);

运算符

+、-、*、/、%：加减乘除、模
=、<>：等于、不等
>=、<=、>、<：大于等于、小于等于、大于、小于
is null、is not null：为空、不为空
not、and、or：不符合条件、两者皆符合条件、任一符合条件
like、glod: 匹配、区分大小写匹配
||: 字符串串联
<<、>>：左移、右移
&、|：与、或
in、not in、exist、not exist: 包含在、不包含在、存在、不存在
between x and y：从 x 到 y 的范围
case 表达式语法：

case 
    when /* 谓语 */ then /* 表达式 */
    when .. then ..
    else /* 表达式 */
end

聚合查询（汇总）

select count(*) from yyy; -- 返回 yyy 表的总行数

聚合函数

count：行数
sum：合计
avg：平均
max：最大值
min：最小值

注意：只有 count(*)计算 null 行，其他情况皆不计算 null 行。

分组（分类）

select count(*) from yyy
group by -- 按指定聚合键分组
c;

分组结果是无序的，可以使用聚合函数对每个小组进行汇总计算。

select 只能选择分组的聚合键或者聚合函数。（少数方言可以选择其他列）当选择聚合键时，它的结果类似 distinct 去重。

分组条件

select c,count(*) from yyy
group by c
having min(c) > 1; -- 分组条件

排序

select * from yyy
order by a; -- 根据 a 列升序排序

order by * asc：升序
order by * desc：降序

删除数据

drop table zzz; --删除表

delete from zzz; --删除表中的数据

delete from zzz
where -- 指定删除的行条件
a='a';

更新数据

update yyy set c=2,d=3
where -- 指定更新的行条件
a='a';

事务

事务具备（ACID）：

atomicity：原子性
consistency：一致性
isolation：隔离性
durability：持久性

begin transaction; --开始事务（SQL标准没有定义该语句的标准写法）
-- DML 语句段落
commit; -- 提交事务
rollback; -- 或者取消事务

-- 例子
begin transaction;
update yyy set c=100 where a='a';
select * from yyy; -- 预览更新效果
rollback; -- 取消事务
select * from yyy; -- 可以看到更新被取消

视图

视图的结果和表类似，但是视图不保存实际数据，而是保存对应的 select 语句。

create view vvv(id,sum) as -- 创建视图
select a,c+d from yyy;

drop view vvv; -- 删除视图

视图如果使用了 distinct、聚合函数、多张表、分组等操作将影响对视图进行更改操作。

子查询

类似一次性视图。

select * from (
    select a,c+d from yyy
) as vy; -- 子查询临时命名为 vy

标量子查询

查询结果是标量的子查询。

select * from yyy
where 
c > (select avg(c) from yyy); -- 子查询结果是标量，因此可以用在此处进行判断

关联子查询

对组内进行比较时使用关联子查询。

其目的是将非标量的子查询，转化为标量子查询，通过关联的匹配条件。

select * from yyy as p1
where d > (select avg(d) --每一组的平均值
from yyy as p2 
where p2.c = p1.c -- 关联条件，即获取指定组的平均值
group by c -- 分组，得到一个非标量的查询结果（即多个组）
); -- 最终结果形成一个标量子查询

函数

算术函数

abs：绝对值
mod：求余
round：四舍五入

字符串函数

length：长度
lower：转换成小写
replace：替换
substring：截取
upper：转换为大写

日期函数

current_date：当前日期
current_time：当前时间
current_timestamp：日期和时间
extract：截取时间的一部分

转换函数

cast：数据类型转换
coalesce：null 值转换

集合运算

设表 yyy =(1,2,3,4);zzz=(3,4,5,6);

并集

select * from yyy
union -- 并
select * from zzz;
-- 结果(1,2,3,4,5,6)

select * from yyy
union all-- 并(不去重)
select * from zzz;
-- 结果(1,2,3,4,3,4,5,6)

交集

select * from yyy
intersect -- 交
select * from zzz;
-- 结果(3,4)

差集

从集合 1 中去掉集合 2 中存在的元素

select * from yyy
except -- 交
select * from zzz;
-- 结果(1,2)

联接

join 联接操作是基于列进行关联（并集等操作是基于行记录）。

设表 yyy:

yid	class
1	2
2	3
3	4
4	1

设表 zzz:

zid	yid	age
1	1	17
2	1	15
3	2	17
4	3	19
5	2	18

内联接（inner join）

表 yyy 和表 zzz 有共同的 yid 键，通过 yid 作为桥梁，将两张表汇集到一起，这就是联接的过程。

内联接要求双发符合条件的行合并。

select z.zid,z.yid,z.age,y.class
from zzz as z
inner join -- 内联接
yyy as y
on z.yid = y.yid; -- 联接条件

结果：

zid	yid	age	class
1	1	17	2
2	1	15	2
3	2	17	3
4	3	19	4
5	2	18	3

外联接（outer join）

外联接包含内联接的所有行，并且还包含一张表有但另一张没有对应的行。

因为包含了没有对应行（即对应项为null）的信息，所以被视为表外（outer）信息。这也是其名字的由来。

外联接可以设置主表，主表的所有信息都会列出来。

left outer join：左外联接
right outer join：右外联接
full outer join：全外联接

select z.zid,y.yid,z.age,y.class
from yyy as y
left outer join -- 左外联接
zzz as z
on z.yid = y.yid; -- 联接条件

zid	yid	age	class
1	1	17	2
2	1	15	2
3	2	17	3
4	3	19	4
5	2	18	3
null	4	null	1

交叉联接(cross join)

交叉联接是所有联接的基础，它是一个基本的集合运算：笛卡尔积。

它是两张表所有行的交叉组合。如 yyy 有 4 行，zzz 有 5 行，交叉组合就有 4*5=20 行。内联接就是从这种交叉组合表中取匹配的行，而外联接就是存在不在该表中的行，所以叫表外。

select * from yyy 
cross join  -- 交叉联接
zzz;

结果：

yid	class	zid	yid	age
1	2	1	1	17
1	2	2	1	15
1	2	3	2	17
1	2	4	3	19
1	2	5	2	18
2	3	1	1	17
2	3	2	1	15
2	3	3	2	17
2	3	4	3	19
2	3	5	2	18
3	4	1	1	17
3	4	2	1	15
3	4	3	2	17
3	4	4	3	19
3	4	5	2	18
4	1	1	1	17
4	1	2	1	15
4	1	3	2	17
4	1	4	3	19
4	1	5	2	18

集合除法

设表 y.a(1,2,3)；表 x.b(4,5,6)；表 ycx:y cross join x;

想通过 ycx 除以 y 得到 x，范例：

-- 对重复的 b 去重
select distinct b from ycx as p1 
where not exists ( -- 存在不满足的 b 项去除
    select a from y 
    except -- 满足全部 y 的差集为空集
    select a from ycx as p2 
    where p1.b = p2.b -- 对当前 b 进行
    );

窗口函数

窗口函数用作对数据库的实时分析处理（OLAP）。

样例：rank() over(partition by b order by a rows 2 preceding and 1 following):

rank():窗口函数
partition by: 范围（分组）
order by：排位列（用于确定窗口函数计算次序，并非用于结果排序）
rows：限定行范围
1. 2 preceding：包含前2 行
2. 1 following：包含后1 行

函数：

rank():输出位次，跳过相同位次编号，如 2 位次存在相同项：1，2，2，4，5
dense_rank()：输出位次，不跳过相同位次编号，如：1，2，2，3，4
row_number(): 输出位次，连续位次（无视相同位次），如:1，2，3，4，5

聚合函数也可以用于此处，计算范围只包含当前位次（包括位次相同的项）和之前位次的项。

如 yyy.a(1,2,2,3,4):

select sum(a) over(order by a) as sum_a from yyy;

结果：

sum_a
1
5
5
8
12

范围分组后，这套逻辑分别应用到每个分组的内部：

select sum(a) over(partition by a order by a) as sum_a from yyy;

结果：

sum_a
1
4
4
3
4

这里有四个分组：(1),(2,2),(3),(4)。

指定行范围的结果：

select sum(a) over(order by a rows 1 preceding) as moving_sum from yyy;

只会计算当前行和前一行，一共两行的值：

moving_sum
1
3
4
5
7

合计和小计

sql 通过 group by 分组可以方便的计算小计，但是在此基础上无法计算各个分组的合计。因此引入了另一个运算符：grouping。

grouping 运算符实际有三种：

rollup
cube
grouping sets

要计算小计和合计，如 yyy.a(1,2,2,3,4):

select a,sum(a) from yyy
group by a -- 小计
union all -- 并集（不去除重复项）
select "合计" as a,sum(a) from yyy;--合计

a	sum(a)
1	1
2	4
3	3
4	4
合计	12

rollup 版：

select a,sum(a) from yyy
group by rollup(a);

rollup(a,b)，它相当于以下并集：

group by (a,b)
group by (a)
group by (): 即不分组

判断是否为合计的行，设置值为'合计';

select case when grouping(a) = 1 -- 是否为合计行
    then '合计'
    else cast(a as varchar(10)) -- 转换为字符串
    end as a,
    sum(a)
from yyy
group by rollup(a);

cube(a,b),相当于：

rollup(a,b)
1. group by(a,b)
2. group by(a)
3. group by()
group by (b) -- 多出来的行

grouping sets(a,b),相当于：

group by (a)
group by (b)