数据库笔记 | Ehoac の Site

概述

关系数据库

关系数据库管理系统(Relational DataBase Management System, RDBMS)是以集合理论为基础的系统，实现为具有行和列的二维表。
与 RDBMS 交互的标准方法，是结构化查询语言(Structured Query Language, SQL)编写查询。
数据值具有类型，可以是数字、字符串、日期、未解释的二进制大对象，或其他类型。系统强制使用类型。
重要的是，表可以联接并转化为新的、更复杂的表，因为它们的数学基础是关系(集合)理论。

键-值数据库

键-值(Key-Value, KV)存储库将键与值配对，类似于所有流行编程语言中的映射（或哈希表）。
因为 KV 存储库对资源的要求非常少，所以这种数据库类型在一些场景中有令人难以置信的高性能，但是当你有复杂的查询和聚合需求时，它一般不会有帮助。

列型数据库

列型（或面向列的数据库）的命名源自于其设计设计的一个重要方面，即来自一个给定的列（在二维表的意义上）的数据存放在一起。

文档型数据库

面向文档的数据库存储的就是文档。简而言之，文档就像是哈希，具有一个独一无二的标识符（ID）字段和值，值可能是任何类型，包括更多的哈希。
文档可以包含嵌套的结构，因此，它们表现出了调试的灵活性，允许有可变域。

图数据库

图数据库善于处理高度互联的数据。图数据库包含节点及节点之间的关系。
节点和关系可以有一些属性（一些键-值对），用来存储数据。
图数据库的真正实力是按照关系遍历节点。

PostgerSQL

简述

PostgreSQL 是一个关系数据库管理系统。
关系数据库流行的原因，不仅在于其庞大的特性集（触发器、存储过程、高级索引）、数据的安全性（符合 ACID），或符合大多数人的思维方式，还在于它们的查询灵活性。

基本使用

如果以管理员用户运行的话，PostgreSQL 的提示符后是#,如果是普通普通用户后面跟的是$。
输入\h，可以列出有关 SQL 命令的信息，\?列出以反斜杠开始的 psql 特有命令的帮助信息。

创建表

-- 新建表
create table countries (
   country_code char(2) primary key,
   country_name text unique
);
-- 从已有表创建(只有表结构)
create table countries_copy as select * from countries where 1=0;

-- 从已有表创建(表结构和数据)
create table countries_copy as select * from countries where 1=1;

基本表操作

-- 重命名表
alter table [表名A] rename to [表名B];
-- 添加字段
alter table [表名] add column [字段名] [类型];
-- 修改字段类型
alter table [表名] alter  COLUMN [字段名] type [类型];
-- 删除字段
alter table [表名] drop column [字段名];
-- 重命名字段
alter table [表名] rename column [字段名A] to [字段名B];
-- 给字段设置默认值
alter table [表名] alter column [字段名] set default [新的默认值];
-- 去除字段默认值
alter table [表名] alter column [字段名] drop default;
-- 更改列的名字
ALTER TABLE [表名] RENAME [旧字段名] to [新字段名];

插入数据

insert into countries (country_code, country_name) values
 ('us', 'United States'),
 ('mx', 'Mexico'),
 ('au', 'Australia'),
 ('gb', 'United Kingdom'),
 ('de', 'Germany'),
 ('ll', 'Loompaland');

查询数据

1	select * from countries;

删除数据

1	delete from countries where country_code='ll';

外键约束

create table cities(
 name text not null,
 postal_code varchar(9) check (postal_code <> ''),
 country_code char(2) references countries,
 primary key (country_code, postal_code)
);
insert into cities values ('Portland', '87200', 'us');

更新

1	update cities set postal_code='97205' where name='Portland';

内联接

只有列值匹配，inner join才会返回一行

1	select cities.*, country_name from cities inner join countries on cities.country_code=countries.country_code;

复合联接

create table venues (
  venue_id serial primay key,
  name varchar(255),
  street_address text,
  type char(7) check (type in ('public', 'private')) default 'public',
  postal_code varchar(9),
  country_code char(2),
  foreign key (country_code, postal_code)
  references cities (country_code, postal_code) match full
);
insert into venues (name, postal_code, country_code) values ('Crystal Ballroom', '97205', 'us');

select v.venue_id, v.name, c.name from venues v inner join cities c on v.postal_code=c.postal_code and v.country_code=c.country_code;

插入后返回

1
2
3

insert into venues (name, postal_code, country_code) values
('Voodoo Dounts', '97205', 'us') returning venue_id;

外联接

create table events (
  event_id serial primary key,
  title varchar(255),
  starts timestamp,
  ends timestamp,
  venue_id bigint,
  foreign key (venue_id) references venues (venue_id)
);
insert into events (title, starts, ends, venue_id) values
  ('Larp Club', '2012-02-15 17:30:00', '2020-02-15 19:30:00', 2),
  ('April Fools Day', '2012-04-01 00:00:00', '2020-04-01 23:59:00', null),
  ('Christmas Day', '2012-12-25 00:00:00', '2020-12-25 23:59:00', null);
-- 内联接
select e.title, v.name from events e join venues v on e.venue_id=v.venue_id;
-- 左外联接(left outer join/left join)
select e.title, v.name from events e left join venues v on e.venue_id=v.venue_id;

快速索引

RDBMS数据库的速度源于其高效的数据块管理、尽可能少的磁盘读取、查询优化等技术。
如果没有索引，就必须从磁盘读取每一行，都能知道是否是匹配行。
索引是一个特殊的数据结构，目的是避免执行查询时进行全表扫描。

-- 哈希索引
create index events_title on events using hash(title);
-- B树索引
create index events_starts on events using btree(starts);
-- 列出数据模式中的所有索引
\di

聚合查询

-- 准备数据
insert into countries values('zh', 'China');
insert into cities values ('ZhengZhou', '457100', 'zh');
insert into venues (name, postal_code, country_code) values ('My Place', '457100', 'zh');
insert into events (title, starts, ends, venue_id) values
('Moby', '2012-02-06 21:00:00', '2012-02-06 23:00:00', (select venue_id from venues where name='Crystal Ballroom')),
('Wedding', '2012-02-26 21:00:00', '2012-02-26 23:00:00', (select venue_id from venues where name='Voodoo Donuts')),
('Dinner with Mom', '2012-02-26 18:00:00', '2012-02-06 20:30:00', (select venue_id from venues where name='My Place')),
('Valentine''s Day', '2012-02-14 00:00:00', '2012-02-14 23:59:00', null);

-- count()
select count(title) from events where title like '%Day%';
-- max()、min()
select min(starts), max(ends) from events inner join venues on events.venue_id=venues.venue_id where venues.name='Crystal Ballroom';

分组

select venue_id, count(*) from events group by venue_id;
-- having
select venue_id, count(*) from events group by venue_id having count(*) >= 2 and venue_id is not null;
-- distinct
select distince venue_id from events;

-- 窗口函数
select title, count(*) over (partition by venue_id) from events;

事务

事务保障了关系数据库的一致性。
事务的准则是，要么全部成功，要么全部失败。
事务确保一组命令中的每一条命令都执行。
如果过程中间发生了任何失败，所有的命令将回滚，就像它们从未发生过一样。
PostgreSQL 的事务遵循 ACID
A：Atomic，原子性
C：Consistent，一致性
I：Isolated，隔离性
D：Durable,持久性

begin transaction;
delete from events;
rollback;
select * from events;

存储过程

存储过程可以通过巨大的架构代价取得巨大的性能优势。
使用存储过程可以避免将数千行数据发送到客户端应用程序，但也让应用程序代码与该数据库绑定，因此，不应该轻易决定使用存储过程。

create or replace function
  add_event(i_title text, i_starts timestamp, i_ends timestamp, venue text, postal varchar(9), country char(2))
  returns boolean
  as $$
  declare
    did_insert boolean := false;
    found_count integer;
    the_venue_id integer;
  begin select venue_id into the_venue_id from venues v where v.postal_code=postal and v.country_code=country and v.name ilike venue limit 1;
  if the_venue_id is null then
    insert into venues (name, postal_code, country_code) values (venue, postal, country) returning venue_id into the_venue_id;
    did_insert := true;
  end if;
  raise notice 'Venue found %', the_venue_id;
  insert into events( title, starts, ends, venue_id) values (i_title, i_starts, i_ends, the_venue_id);
  return did_insert;
end;
$$ language plpgsql;

-- 使用
select add_event('House Party', '2012-05-03 23:00:00', '2012-05-04 02:00:00', 'Run''s House', '97205','us');

-- 将外部文件导入当前数据库
\i filename.sql

-- 查看数据库中安装的语言
createlang dbname -l

触发器

当插入或更新这类事件发生时，触发器自动调用存储过程。

-- 日志表
create table logs (
  event_id integer,
  old_title varchar(255),
  old_starts timestamp,
  old_ends timestamp,
  logged_at timestamp default current_timestamp
);
-- function
create or replace function log_event() returns trigger
as $$
declare
begin
  insert into logs (event_id, old_title, old_starts, old_ends) values (OLD.event_id, OLD.title, OLD.starts, OLD.ends);
  raise notice 'someone just chage event #%', OLD.event_id;
  return NEW;
end;
$$
language plpgsql;
-- 创建触发器
create trigger log_events after update on events for each row execute procedure log_event();

-- 测试
update events set ends='2012-05-04 01:00:00' where title='House Party';

select * from logs;

视图

使复杂查询的结果用起来像其他任何表一样，是查询的别名。

-- 创建视图
create view holidays as
  select event_id as holiday_id, title as name, starts as date from events where title like '%Day%' and venue_id is NULL;

-- 查询
select name, to_char(date, 'MonthDD,YYYY') as date from holidays where date <= '2020-04-01';

-- 为视图增加新字段需要在源表增加
alter table events add colors text array;

-- 更新视图
create or replace view holidays as
  select event_id as holiday_id, title as name, starts as date,colors from events where title like '%Day%' and venue_id is NULL;

-- 通过视图更新数据
create rule update_holidays as on update to holidays
do instead update events set title = NEW.name,starts=NEW.date,colors=NEW.colors where title=OLD.name;

-- 直接更新视图
update holidays set colors='{"red", "green"}' where name='Christmas Day';

联表分析

-- extension
CREATE EXTENSION tablefunc;

-- extract()
select extract(year from starts) as year, extract(month from starts) as month, count(*) from events group by year,month;

--
create temporary table month_count(month int);
insert into month_count values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12);

-- crosstab()
select * from crosstab(
  'select extract(year from starts) as year, extract(month from starts) as month, count(*) from events group by year,month',
  'select * from month_count') as (
     year int, jan int, feb int, mar int, apr int, may int, jun int, jul int, aug int, sep int, oct int, nov int, dec int
  ) order by year;
-- generate_series(a,b)
select * from crosstab(
  'select extract(year from starts) as year, extract(month from starts) as month, count(*) from events group by year,month',
  'select generate_series(1, 12)') as (
     year int, jan int, feb int, mar int, apr int, may int, jun int, jul int, aug int, sep int, oct int, nov int, dec int
  ) order by year;

全文和模糊查询

ilike表示不区分大小写
'%'和_是通配符，_表示匹配一个字符

-- 需要的扩展
create extension tablefunc;
create extension dict_xsyn;
create extension fuzzystrmatch;
create extension pg_trgm;
create extension cube;

-- 创建测试表
create table genres(
  name text unique,
  position integer
);
create table movies (
  movie_id serial primary key,
  title text,
  genre cube
);
create table actors (
  actor_id serial primary key,
  name text
);
create table movies_actors(
  movie_id integer references movies not null,
  actor_id initeger references actors not null,
  unique(movie_id, actor_id)
);
create index movies_actors_movie_id on movies_actors(movie_id);
create index movies_actors_actor_id on movies_actors(actor_id);
create index movies_genres_cube on movies using gist(genre);

正则表达式查询

select count(*) from movies where title !~*'^the.*';

-- 字符串索引类型 varchar_pattern_ops, bpchar_pattern_ops, name_pattern_ops
create index movies_title_pattern on movies (lower(title) text_pattern_ops);

字符串相似比较算法 levenshtein fuzzystrmatch 扩展模块提供

select levenshtein ('bat', 'fads');

select levenshtein('bat', 'fad') fad, levenshtein('bat', 'fat') fat, levenshtein('bat', bat') bat;

select movie_id, title from movies where levenshtein(lower(title), lower('aharddaynght')) <= 3;

三连词（trigram）,pg_trgm 扩展

select show_trgm('Avatar');

-- 创建三连词索引(通用索引搜索树[Generalized Index Search Tree, GIST])
create index movies_title_trigram on movies using gist(title gist_trgm_ops);

-- TSVector & TSQuery
select title from movies where title @@'night&day';

-- 查看已安装的语言配置
\dF

全文检索

-- @@ 全文查询运算
-- @@ 运算符将搜索内容转换成tsvector结构休，并将查询转换成一个tsquery结构体
select title movies where title @@ 'night&day';

-- 设置了语言的查询
select title from movies where to_tsvector(title) @@ to_tsquery('english', 'night&day');

-- tsvector上的单词称为词素（lexeme）
select to_tsvector('AHardDay''sNight'), to_tsquery('english', 'night&day');

-- explain 命令查看查询计划树
explain select * from countries where country_name @@ 'english&us';

-- 建立反射索引
create index countries_name_searchable on countries using gin(to_tsvector('english', country_name));

-- 发音码metaphone
select country_name from countries natural join cities natural join venues where metaphone(name, 6)=metaphone('UnitiStat', 6);

-- fuzzystrmatch模块其它函数：dmetaphone() (双发音码)、dmetaphone_alt()(替代发音)、soundex() (19世纪80年代的美国人口普查创立的、用来对比常见的美国人姓的古老算法)
select name, dmetaphone(name), dmetaphone_alt(name),metaphone(name, 8),soundex(name) from venues;

-- 组合使用字符串匹配方法
select * from venues where metaphone(name, 8) % metaphone('OwnPlace', 8) order by levenshtein(lower('OwnPlace'),lower(name));

select * from venues where dmetaphone(name) % dmetaphone('Min');

多维超立方体（cube）

select name cube_ur_coord('(0,7,0,0,0,0,0,0,0,7,0,0,0,0,10,0,0,0)', street_address) as score from venues v
  where cube_ur_coord('(0,7,0,0,0,0,0,0,0,7,0,0,0,0,10,0,0,0)', street_address) > 0;

select *, cube_distance(genre,'(0,7,0,0,0,0,0,0,0,7,0,0,0,0,10,0,0,0)') dist from movies order by dist;

-- 有界立方体(bounding cube)
select cube_enlarge('(1,1)', 1, 2);

-- cube运算符'@>',意思是包含
select title, cube_distance(genre, '(0,7,0,0,0,0,0,0,0,7,0,0,0,0,10,0,0,0)') dist from movies
where cube_enlarge('(0,7,0,0,0,0,0,0,0,7,0,0,0,0,10,0,0,0)'::cube, 5, 18) @> genre order by dist;

select m.movie_id, m.title from movies m, (select genre, title from movies where title='MadMax') s
  where cube_enlarge(s.genre, 5, 18) @> m.genre and s.title<>m.title order by cube_distance(m.genre, s.genre) limit 10;

MariaDB

管理

# 创建新的用户帐户
'newusername'@'localhost' IDENTIFIED BY 'userpassword';
# 添加权限
GRANT SELECT, INSERT, UPDATE, DELETE ON database1 TO 'newusername'@'localhost';
# 刷新授权表
FLUSH PRIVILEGES;

数据类型

数字

TINYINT: -128到127的有符号范围内的小整数或0到255的无符号整数
BOOLEAN: 0=false,1=true
SMALLINT: -32768到32768的有符号范围内的整数或0到65535的无符号整数
MEDIUMINT: 有符号范围-8388608到8388607中的整数，无符号范围0到16777215中的整数
INT(或 INTEGER): 默认范围跨越-2147483648到2147483647,标记为unsigned时范围跨越0到4294967295
BIGINT: 有符号范围9223372036854775808到9223372036854775807内的整数,无符号范围0到18446744073709551615内的整数
DECIMAL（DEC，NUMERIC，FIXED）: 精确的定点数，M 指定其数字，D 指定小数后的数字。 M 值不添加“ - ”或小数点。如果 D 设置为 0，则不会出现小数或小数部分，并且该值将舍入为最接近的 DECIMAL INSERT。最大允许位数为 65，小数位数的最大值为 30.默认值 M 的默认值为 10，省略时 D 为 0
FLOAT: -3.402823466E + 38 至-1.175494351E-38, 1.175494351E-38 至 3.402823466E + 38
DOUBLE(或 REAL、DOUBLE PRECISION): -1.7976931348623157E + 308至-2.2250738585072014E-308, 2.2250738585072014E-308至1.7976931348623157E + 308
BIT: 表示位字段,M 指定每个值的位数。省略 M 时，默认值为 1.位值可以通过“b’[value]’”应用，其中值表示 0 和 1 中的位值。零填充从左边自动发生全长;例如，“10”变为“0010”

日期和时间

DATE: 表示日期范围1000-01-01”到“9999-12-31,并使用YYYY-MM-DD日期格式
TIME: -838：59：59.999999到838：59：59.999999的时间范围
DATETIME: 表示范围1000-01-01 00：00：00.000000至9999-12-31 23：59：59.999999,使用YYYY-MM-DD HH：MM：SS格式
TIMESTAMP: 表示YYYY-MM-DD HH：MM：DD格式的时间戳
YEAR: 表示 4 位数格式的年份,四位数格式允许在1901到2155和0000范围内的值

字符串

String literals: 表示用引号括起来的字符序列
CHAR: 表示包含指定长度的空格的右侧带有固定长度的字符串,取值范围为0〜255,缺省值为1
VARCHAR: 表示一个可变长度字符串,范围（最大列长度）为0到65535
BINARY: 表示二进制字节字符串
VARBINARY: 表示可变长度的二进制字节字符串
TINYBLOB: 表示最大长度为255（28 - 1）个字节的blob列,在存储中，每个都使用一个字节长度的前缀，表示值中的字节数量
BLOB: 表示最大长度为65,535（216 - 1）个字节的 blob 列
MEDIUMBLOB: 表示最大长度为16,777,215（224 - 1）个字节的 blob 列
LONGBLOB: 表示最大长度为 4,294,967,295（232 - 1）个字节的 blob 列
TINYTEXT: 表示最大长度为 255（28 - 1）个字符的文本列
TEXT: 表示最大长度为 65,535（216 - 1）个字符的文本列
MEDIUMTEXT: 表示最大长度为 16,777,215（224 - 1）个字符的文本列
LONGTEXT: 表示最大长度为 4,294,967,295 或 4GB（232 - 1）个字符的文本列
ENUM: 表示一个列表中只有一个值的字符串对象
SET: 表示一个列表中具有零个或多个值的字符串对象，最多包含 64 个成员,SET 值在内部作为整数值存在

创建索引

-- 创建
CREATE [UNIQUE or FULLTEXT or...] INDEX index_name ON table_name column;
CREATE UNIQUE INDEX top_sellers ON products_tbl product;
-- 删除
DROP INDEX index_name ON table_name;
-- 重命名
ALTER TABLE table_name DROP INDEX index_name, ADD INDEX new_index_name;

序列

-- 创建
create sequence seq_int;
select nextval(seq_int);
alter sequence seq_int restart 1;