Elasticsearch 支持两种搜索方式:URI Query 和 Request Body Query(DSL)
::: details URI Query 示例
1 2 3
GET /kibana_sample_data_ecommerce/_search?q=customer_first_name:Eddie GET /kibana*/_search?q=customer_first_name:Eddie GET /_all/_search?q=customer_first_name:Eddie
:::
::: details Request Body Query(DSL)示例
1 2 3 4 5 6
POST /kibana_sample_data_ecommerce/_search { "query": { "match_all": {} } }
:::
当文档存储在 Elasticsearch 中时,它会在 1 秒内近乎实时地被索引和完全搜索。
Elasticsearch 基于 Lucene 开发,并引入了分段搜索的概念。分段类似于倒排索引,但 Lucene 中的单词 index 表示“段的集合加上提交点”。提交后,将向提交点添加新分段并清除缓冲区。
POST titles/_bulk { "index": { "_id": 1 }} { "title": "My dog barks" } { "index": { "_id": 2 }} { "title": "I see a lot of barking dogs on the road " }
POST titles/_bulk { "index": { "_id": 1 }} { "title": "My dog barks" } { "index": { "_id": 2 }} { "title": "I see a lot of barking dogs on the road " }
GET kibana_sample_data_ecommerce/_mapping #对 text 字段进行排序。默认会报错,需打开 fielddata POST /kibana_sample_data_ecommerce/_search { "size": 5, "query": { "match_all": {
PUT test_text PUT test_text/_mapping { "properties": { "intro":{ "type": "text", "doc_values":true } } }
DELETE test_text
DELETE temp_users PUT temp_users PUT temp_users/_mapping { "properties": { "name":{"type": "text","fielddata": true}, "desc":{"type": "text","fielddata": true} } }
Post temp_users/_doc {"name":"Jack","desc":"Jack is a good boy!","age":10} #打开 fielddata 后,查看 docvalue_fields 数据 POST temp_users/_search { "docvalue_fields": [ "name","desc" ] } #查看整型字段的 docvalues POST temp_users/_search { "docvalue_fields": [ "age" ] }
#查看索引相关信息 GET kibana_sample_data_ecommerce #查看索引的文档总数 GET kibana_sample_data_ecommerce/_count #查看前 10 条文档,了解文档格式 POST kibana_sample_data_ecommerce/_search { } #_cat indices API #查看 indices GET /_cat/indices/kibana*?v&s=index #查看状态为绿的索引 GET /_cat/indices?v&health=green #按照文档个数排序 GET /_cat/indices?v&s=docs.count:desc #查看具体的字段 GET /_cat/indices/kibana*?pri&v&h=health,index,pri,rep,docs.count,mt #How much memory is used per index? GET /_cat/indices?v&h=i,tm&s=tm:desc
get _cat/nodes?v GET /_nodes/es7_01,es7_02 GET /_cat/nodes?v GET /_cat/nodes?v&h=id,ip,port,v,m
GET _cluster/health GET _cluster/health?level=shards GET /_cluster/health/kibana_sample_data_ecommerce,kibana_sample_data_flights GET /_cluster/health/kibana_sample_data_flights?level=shards #### cluster state The cluster state API allows access to metadata representing the state of the whole cluster. This includes information such as GET /_cluster/state #cluster get settings GET /_cluster/settings GET /_cluster/settings?include_defaults=true
GET _cat/shards GET _cat/shards?h=index,shard,prirep,state,unassigned.reason
文档的基本 CRUD 和批量操作
文档的 CRUD
create - 创建文档,如果 ID 已存在,会失败
update - 增量更新文档,且文档必须已存在
index - 若文档不存在,则创建新文档;若文档存在,则删除现有文档,再创建新文档,同时 version+1
# create document. 自动生成 _id POST users/_doc { "user" : "Mike", "post_date" : "2019-04-15T14:12:12", "message" : "trying out Kibana" } #create document. 指定 Id。如果 id 已经存在,报错 PUT users/_doc/1?op_type=create { "user" : "Jack", "post_date" : "2019-05-15T14:12:12", "message" : "trying out Elasticsearch" } #create document. 指定 ID 如果已经存在,就报错 PUT users/_create/1 { "user" : "Jack", "post_date" : "2019-05-15T14:12:12", "message" : "trying out Elasticsearch" } ### Get Document by ID #Get the document by ID GET users/_doc/1 ### Index & Update #Update 指定 ID (先删除,在写入) GET users/_doc/1
PUT users/_doc/1 { "user" : "Mike" } #GET users/_doc/1 #在原文档上增加字段 POST users/_update/1/ { "doc": { "post_date": "2019-05-15T14:12:12", "message": "trying out Elasticsearch" } } ### Delete by Id # 删除文档 DELETE users/_doc/1
#查看不同的 analyzer 的效果 #standard GET _analyze { "analyzer": "standard", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." } #simpe GET _analyze { "analyzer": "simple", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." }
GET _analyze { "analyzer": "stop", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." } #stop GET _analyze { "analyzer": "whitespace", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." } #keyword GET _analyze { "analyzer": "keyword", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." }
GET _analyze { "analyzer": "pattern", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." } #english GET _analyze { "analyzer": "english", "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening." }
POST _analyze { "analyzer": "icu_analyzer", "text": "他说的确实在理”" }
POST _analyze { "analyzer": "standard", "text": "他说的确实在理”" }
POST _analyze { "analyzer": "icu_analyzer", "text": "这个苹果不大好吃" }
SearchAPI 概览
ES Search 有两种类型:
URI 查询 - 在 URL 中使用查询
Request Body 查询 - 基于 JSON 格式的 DSL
语法
范围
/_search
集群上的所有索引
/index1/_search
index1
/index1,index2/_search
index1 和 index2
/index*/_search
以 index 开头的索引
【示例】
1 2 3 4 5 6 7 8 9 10 11 12 13
#URI Query GET kibana_sample_data_ecommerce/_search?q=customer_first_name:Eddie GET kibana*/_search?q=customer_first_name:Eddie GET /_all/_search?q=customer_first_name:Eddie #REQUEST Body POST kibana_sample_data_ecommerce/_search { "profile": true, "query": { "match_all": {} } }
var status = db.serverStatus()status.metrics.queryExecutor.scanned / status.metrics.document.returnedstatus.metrics.queryExecutor.scannedObjects / status.metrics.document.returned
Mysql 将每个数据库保存为数据目录下的一个子目录。建表时,MySQL 会在数据库子目录下创建一个和表同名的 .frm 文件保存表的定义。因为 MySQL 使用文件系统的目录和文件来保存数据库和表的定义,大小写敏感性和具体的平台密切相关:在 Windows 中,大小写不敏感;在 Linux 中,大小写敏感。
如果不是按照索引的最左列开始查找,则无法使用索引。例如上面例子中的索引无法用于查找名字为 Bill 的人,也无法查找某个特定生日的人,因为这两列都不是最左数据列。类似地,也无法查找姓氏以某个字母结尾的人。
不能跳过索引中的列。也就是说,前面所述的索引无法用于查找姓为 Smith 并且在某个特定日期出生的人。如果不指定名 (first_name),则 MySQL 只能使用索引的第一列。
如果查询中有某个列的范围查询,则其右边所有列都无法使用索引优化查找。例如有查询 WHERE last_name=' Smith' AND first_name LIKE 'J%' AND dob = '1976-12-23' ,这个查询只能使用索引的前两列,因为这里 LIKE 是一个范围条件(但是服务器可以把其余列用于其他目的)。如果范围查询列值的数量有限,那么可以通过使用多个等于条件来代替范围条件。
db.foo.find({"$where": function () { for (var current in this) { for (var other in this) { if (current != other && this[current] == this[other]) { return true; } } } return false; }});
游标
数据库会使用游标返回 find 的执行结果。
1 2 3 4
var cursor = db.people.find(); cursor.forEach(function(x) { print(x.name); });
SELECT prod_name, prod_price FROM Products WHERE vend_id ='DLL01'OR vend_id ='BRS01' AND prod_price >=10;
任何时候使用具有 AND 和 OR 操作符的 WHERE 子句,都应该使用圆括号明确地分组操作符。
1 2 3 4
SELECT prod_name, prod_price FROM Products WHERE (vend_id ='DLL01'OR vend_id ='BRS01') AND prod_price >=10;
IN 操作符
IN 操作符用来指定条件范围,范围中的每个条件都可以进行匹配。IN 取一组由逗号分隔、括在圆括号中的合法值。
1 2 3 4
SELECT prod_name, prod_price FROM Products WHERE vend_id IN ( 'DLL01', 'BRS01' ) ORDERBY prod_name;
和下面的示例作用相同
1 2 3 4
SELECT prod_name, prod_price FROM Products WHERE vend_id ='DLL01'OR vend_id ='BRS01' ORDERBY prod_name;
为什么要使用 IN 操作符?其优点如下。
在有很多合法选项时,IN 操作符的语法更清楚,更直观。
在与其他 AND 和 OR 操作符组合使用 IN 时,求值顺序更容易管理。
IN 操作符一般比一组 OR 操作符执行得更快。
IN 的最大优点是可以包含其他 SELECT 语句,能够更动态地建立 HERE 子句。
NOT 操作符
NOT 用来否定其后条件的关键字。
检索除 DLL01 之外的所有供应商制造的产品
1 2 3 4
SELECT prod_name FROM Products WHERENOT vend_id ='DLL01' ORDERBY prod_name;
和下面的示例作用相同
1 2 3 4
SELECT prod_name FROM Products WHERE vend_id <>'DLL01' ORDERBY prod_name;
第 6 课 用通配符进行过滤
通配符(wildcard)用来匹配值的一部分的特殊字符。
搜索模式(search pattern)由字面值、通配符或两者组合构成的搜索条件。
在搜索子句中使用通配符,必须使用 LIKE 操作符。LIKE 指示 DBMS,后跟的搜索模式利用通配符匹配而不是简单的相等匹配进行比较。
百分号(%)通配符
%表示任何字符出现任意次数。
检索所有产品名以 Fish 开头的产品
1 2 3
SELECT prod_id, prod_name FROM Products WHERE prod_name LIKE'Fish%';
匹配任何位置上包含文本 bean bag 的值, 不论它之前或之后出现什么字符。
检索产品名中包含 bean bag 的产品
1 2 3
SELECT prod_id, prod_name FROM Products WHERE prod_name LIKE'%bean bag%';
检索产品名中以 F 开头,y 结尾的产品
1 2 3
SELECT prod_name FROM Products WHERE prod_name LIKE'F%y';
下划线(_)通配符
下划线(_)的用途与%一样,但它只匹配单个字符。
1 2 3
SELECT prod_id, prod_name FROM Products WHERE prod_name LIKE'__ inch teddy bear';
方括号([ ])通配符
方括号([])通配符用来指定一个字符集,它必须匹配指定位置(通配符的位置)的一个字符。
说明:并不是所有 DBMS 都支持用来创建集合的 []。只有微软的 Access 和 SQL Server 支持集合。
找出所有名字以 J 或 M 开头的联系人:
1 2 3 4
SELECT cust_contact FROM Customers WHERE cust_contact LIKE'[JM]%' ORDERBY cust_contact;
第 7 课 创建计算字段
拼接字段
拼接字符串值:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
-- Access 和 SQL Server SELECT vend_name +' ('+ vend_country +')' FROM Vendors ORDERBY vend_name;
-- DB2、Oracle、PostgreSQL、SQLite 和 Open Office Base SELECT vend_name ||' ('|| vend_country ||')' FROM Vendors ORDERBY vend_name;
-- MySQL 或 MariaDB SELECT Concat(vend_name, ' (', vend_country, ')') FROM Vendors ORDERBY vend_name;
去除字符串中的空格
1 2 3 4 5 6 7 8 9
-- Access 和 SQL Server SELECT RTRIM(vend_name) +' ('+ RTRIM(vend_country) +')' FROM Vendors ORDERBY vend_name;
-- DB2、Oracle、PostgreSQL、SQLite 和 Open Office Base SELECT RTRIM(vend_name) ||' ('|| RTRIM(vend_country) ||')' FROM Vendors ORDERBY vend_name;
别名
使用别名
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
-- Access 和 SQL Server SELECT RTRIM(vend_name) +' ('+ RTRIM(vend_country) +')' AS vend_title FROM Vendors ORDERBY vend_name;
-- DB2、Oracle、PostgreSQL、SQLite 和 Open Office Base SELECT RTRIM(vend_name) ||' ('|| RTRIM(vend_country) ||')' AS vend_title FROM Vendors ORDERBY vend_name;
-- MySQL 和 MariaDB SELECT Concat(vend_name, ' (', vend_country, ')') AS vend_title FROM Vendors ORDERBY vend_name;
执行算术计算
汇总物品的价格(单价乘以订购数量):
1 2 3 4 5 6
SELECT prod_id, quantity, item_price, quantity*item_price AS expanded_price FROM OrderItems WHERE order_num =20008;
第 8 课 使用函数处理数据
大多数 SQL 实现支持以下类型的函数:
算术函数
文本处理函数
时间处理函数
聚合函数
返回 DBMS 正使用的特殊信息(如返回用户登录信息)的系统函数
文本处理函数
函数
说明
LEFT()(或使用子字符串函数)
返回字符串左边的字符
LENGTH()(也使用 DATALENGTH() 或 LEN())
返回字符串的长度
LOWER()(Access 使用 LCASE())
将字符串转换为小写
LTRIM()
去掉字符串左边的空格
RIGHT()(或使用子字符串函数)
返回字符串右边的字符
RTRIM()
去掉字符串右边的空格
SOUNDEX()
返回字符串的 SOUNDEX 值
UPPER()(Access 使用 UCASE())
将字符串转换为大写
UPPER() 将文本转换为大写
1 2 3
SELECT vend_name, UPPER(vend_name) AS vend_name_upcase FROM Vendors ORDERBY vend_name;
HAVING 非常类似于 WHERE。唯一的差别是,WHERE 过滤行,而 HAVING 过滤分组。
过滤两个以上订单的分组
1 2 3 4
SELECT cust_id, COUNT(*) AS orders FROM Orders GROUPBY cust_id HAVINGCOUNT(*) >=2;
列出具有两个以上产品且其价格大于等于 4 的供应商:
1 2 3 4 5
SELECT vend_id, COUNT(*) AS num_prods FROM Products WHERE prod_price >=4 GROUPBY vend_id HAVINGCOUNT(*) >=2;
检索包含三个或更多物品的订单号和订购物品的数目:
1 2 3 4
SELECT order_num, COUNT(*) AS items FROM orderitems GROUPBY order_num HAVINGCOUNT(*) >=3;
要按订购物品的数目排序输出,需要添加 ORDER BY 子句
1 2 3 4 5
SELECT order_num, COUNT(*) AS items FROM orderitems GROUPBY order_num HAVINGCOUNT(*) >=3 ORDERBY items, order_num;
在 SELECT 语句中使用时必须遵循的次序:
1
SELECT->FROM->WHERE->GROUPBY->HAVING->ORDERBY
第 11 课 使用子查询
子查询(subquery),即嵌套在其他查询中的查询。
假如需要列出订购物品 RGAN01 的所有顾客,应该怎样检索?下面列出具体的步骤。
(1) 检索包含物品 RGAN01 的所有订单的编号。
1 2 3
SELECT order_num FROM OrderItems WHERE prod_id ='RGAN01';
输出
1 2 3 4
order_num ----------- 20007 20008
(2) 检索具有前一步骤列出的订单编号的所有顾客的 ID。
1 2 3
SELECT cust_id FROM Orders WHERE order_num IN (20007,20008);
输出
1 2 3 4
cust_id ---------- 1000000004 1000000005
(3) 检索前一步骤返回的所有顾客 ID 的顾客信息。
1 2 3
SELECT cust_name, cust_contact FROM Customers WHERE cust_id IN ('1000000004','1000000005');
现在,结合这两个查询,把第一个查询(返回订单号的那一个)变为子查询。
1 2 3 4 5
SELECT cust_id FROM orders WHERE order_num IN (SELECT order_num FROM orderitems WHERE prod_id ='RGAN01');
再进一步结合第三个查询
1 2 3 4 5 6 7
SELECT cust_name, cust_contact FROM customers WHERE cust_id IN (SELECT cust_id FROM orders WHERE order_num IN (SELECT order_num FROM orderitems WHERE prod_id ='RGAN01'));
SELECT vend_name, prod_name, prod_price FROM vendors INNERJOIN products ON vendors.vend_id = products.vend_id;
联结多个表
下面两个 SQL 等价:
1 2 3 4 5 6 7 8 9 10 11
SELECT cust_name, cust_contact FROM customers, orders, orderitems WHERE customers.cust_id = orders.cust_id AND orderitems.order_num = orders.order_num AND prod_id ='RGAN01';
SELECT cust_name, cust_contact FROM customers WHERE cust_id IN (SELECT cust_id FROM orders WHERE order_num IN (SELECT order_num FROM orderitems WHERE prod_id ='RGAN01'));
第 13 课 创建高级联结
自联结
给与 Jim Jones 同一公司的所有顾客发送一封信件:
1 2 3 4 5 6 7 8 9 10 11
-- 子查询方式 SELECT cust_id, cust_name, cust_contact FROM customers WHERE cust_name = (SELECT cust_name FROM customers WHERE cust_contact ='Jim Jones');
-- 自联结方式 SELECT c1.cust_id, c1.cust_name, c1.cust_contact FROM customers AS c1, customers AS c2 WHERE c1.cust_name = c2.cust_name AND c2.cust_contact ='Jim Jones';
自然联结
1 2 3
SELECT c.*, o.order_num, o.order_date, oi.prod_id, oi.quantity, oi.item_price FROM customers AS c, orders AS o, orderitems AS oi WHERE c.cust_id = o.cust_id AND oi.order_num = o.order_num AND prod_id ='RGAN01';
左外联结
1 2 3 4 5 6 7 8 9
SELECT customers.cust_id, orders.order_num FROM customers INNERJOIN orders ON customers.cust_id = orders.cust_id;
SELECT customers.cust_id, orders.order_num FROM customers LEFTOUTERJOIN orders ON customers.cust_id = orders.cust_id;
右外联结
1 2 3 4
SELECT customers.cust_id, orders.order_num FROM customers RIGHTOUTERJOIN orders ON orders.cust_id = customers.cust_id;
全外联结
1 2 3 4
SELECT customers.cust_id, orders.order_num FROM orders FULLOUTERJOIN customers ON orders.cust_id = customers.cust_id;
注意:Access、MariaDB、MySQL、Open Office Base 和 SQLite 不支持 FULLOUTER JOIN 语法。
使用带聚集函数的联结
1 2 3 4 5 6
SELECT customers.cust_id, COUNT(orders.order_num) AS num_ord FROM customers INNERJOIN orders ON customers.cust_id = orders.cust_id GROUPBY customers.cust_id;
第 14 课 组合查询
主要有两种情况需要使用组合查询:
在一个查询中从不同的表返回结构数据;
对一个表执行多个查询,按一个查询返回数据。
把 Illinois、Indiana、Michigan 等州的缩写传递给 IN 子句,检索出这些州的所有行
1 2 3
SELECT cust_name, cust_contact, cust_email FROM Customers WHERE cust_state IN ('IL','IN','MI');
找出所有 Fun4All
1 2 3
SELECT cust_name, cust_contact, cust_email FROM Customers WHERE cust_name ='Fun4All';
组合这两条语句
1 2 3 4 5 6 7
SELECT cust_name, cust_contact, cust_email FROM customers WHERE cust_state IN ('IL', 'IN', 'MI') UNION SELECT cust_name, cust_contact, cust_email FROM customers WHERE cust_name ='Fun4All';
UNION 默认从查询结果集中自动去除了重复的行;如果想返回所有的匹配行,可使用 UNION ALL
1 2 3 4 5 6 7
SELECT cust_name, cust_contact, cust_email FROM customers WHERE cust_state IN ('IL', 'IN', 'MI') UNIONALL SELECT cust_name, cust_contact, cust_email FROM customers WHERE cust_name ='Fun4All';
CREATEVIEW ProductCustomers AS SELECT cust_name, cust_contact, prod_id FROM Customers, Orders, OrderItems WHERE Customers.cust_id = Orders.cust_id AND OrderItems.order_num = Orders.order_num;
检索订购了产品 RGAN01 的顾客
1 2 3
SELECT cust_name, cust_contact FROM ProductCustomers WHERE prod_id ='RGAN01';
第 19 课 使用存储过程
创建存储过程
对邮件发送清单中具有邮件地址的顾客进行计数
1 2 3 4 5 6 7 8 9 10 11 12
CREATEPROCEDURE MailingListCount ( ListCount OUTINTEGER ) IS v_rows INTEGER;
BEGIN SELECTCOUNT(*) INTO v_rows FROM customers WHERENOT cust_email ISNULL; ListCount := v_rows; END;
-- SQL Server CREATETRIGGER customer_state ON Customers FORINSERT, UPDATE AS UPDATE Customers SET cust_state =Upper(cust_state) WHERE Customers.cust_id = inserted.cust_id;
-- Oracle 和 PostgreSQL CREATETRIGGER customer_state AFTER INSERTORUPDATE FOREACHROW BEGIN UPDATE Customers SET cust_state =Upper(cust_state) WHERE Customers.cust_id = :OLD.cust_id END;