yugabytedb跳过扫描又称索引扫描，在复合索引上扫描-DEV365 开发者社区

在previous post中，我测试了索引扫描如何跳过第一个索引列上以访问第二列的范围。这要求第一列按范围分开。

在这里，我正在测试类似的东西，但使用散布。我在哈希列上有一个平等谓词，然后跳过第二个。这是我运行的内容，稍后我将解释结果：

\! curl -s https://raw.githubusercontent.com/FranckPachot/ybdemo/main/docker/yb-lab/client/ybwr.sql | grep -v '\watch' > ybwr.sql
\i ybwr.sql
create table tbl ( A int, B int, C int );
insert into  tbl select 0,mod(m,10),m from generate_series(1,1000000) m;
create index i1 on tbl(A hash,B asc) INCLUDE (C);
create index i2 on tbl(A hash,B asc,C asc);
set yb_enable_expression_pushdown=on;
execute snap_table;
/*+ indexonlyscan(tbl i1) */ explain analyze select C from tbl where A =  0 and C = 900042;
execute snap_table;
/*+ indexonlyscan(tbl i2) */ explain analyze select C from tbl where A =  0 and C = 900042;
execute snap_table;
/*+ indexonlyscan(tbl i1) */ explain analyze select C from tbl where A =  0 and C > 900042;
execute snap_table;
/*+ indexonlyscan(tbl i2) */ explain analyze select C from tbl where A =  0 and C > 900042;
execute snap_table;

这个想法是比较两个覆盖索引，其中一个（i2）将所有列作为索引键，而另一列（i1）包括最后一列，其中它不是密钥的一部分。我在第一个列（A）上以相等性的谓词查询，并在第三列（C）上查询范围谓词，并且在第二列（B）上没有谓词。第二列（B）我只有很少的不同值，这是跳过扫描最有意义的地方。

点查询

第一个测试将在A = 0 and C = 900042上过滤

包括覆盖索引

yugabyte=# /*+ indexonlyscan(tbl i1) */ explain analyze select C from tbl where A =  0 and C = 900042;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i1 on tbl  (cost=0.00..15.75 rows=100 width=4) (actual time=3605.845..3605.848 rows=1 loops=1)
   Index Cond: (a = 0)
   Remote Filter: (c = 900042)
   Heap Fetches: 0
 Planning Time: 0.123 ms
 Execution Time: 3605.891 ms
 Peak Memory Usage: 8 kB
(7 rows)

yugabyte=# execute snap_table;
 rocksdb_seek | rocksdb_next | rocksdb_insert | dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+------------------------------------------------
            1 |              |                | yugabyte i1 10.0.0.61 9c0a2dd1193e... L
            1 |       999999 |                | yugabyte i1 10.0.0.62 359f44ea1136... L
            1 |              |                | yugabyte i1 10.0.0.63 61591ddbdd81... L
(3 rows)

在这里，使用INCLUDE(C)索引，读取了(a = 0)的所有行，因为对(c = 900042)

没有意义

钥匙覆盖索引

yugabyte=# /*+ indexonlyscan(tbl i2) */ explain analyze select C from tbl where A =  0 and C = 900042;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Index Only Scan using i2 on tbl  (cost=0.00..15.50 rows=100 width=4) (actual time=1.040..1.041 rows=1 loops=1)
   Index Cond: ((a = 0) AND (c = 900042))
   Heap Fetches: 0
 Planning Time: 0.108 ms
 Execution Time: 1.070 ms
 Peak Memory Usage: 8 kB
(6 rows)

yugabyte=# execute snap_table;
 rocksdb_seek | rocksdb_next | rocksdb_insert | dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+------------------------------------------------
           21 |           41 |                | yugabyte i2 10.0.0.63 135697ffc832... L
(1 row)

当列在索引键中时，我们可以在LSM-Tree结构中直接跳到它。由于索引（B）有多个值，因此有多个跳过操作。当跳过的列没有太多不同的值时，这是有效的。基本上，索引访问从一个操作中的多个点读取。

范围查询

第二个测试将在A = 0 and C > 900042

上过滤

包括覆盖索引

yugabyte=# /*+ indexonlyscan(tbl i1) */ explain analyze select C from tbl where A =  0 and C > 900042;
execute snap_table;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i1 on tbl  (cost=0.00..15.75 rows=100 width=4) (actual time=39.337..3799.021 rows=99958 loops=1)
   Index Cond: (a = 0)
   Remote Filter: (c > 900042)
   Heap Fetches: 0
 Planning Time: 0.106 ms
 Execution Time: 3808.669 ms
 Peak Memory Usage: 8 kB
(7 rows)

yugabyte=# execute snap_table;
 rocksdb_seek | rocksdb_next | rocksdb_insert | dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+------------------------------------------------
            1 |              |                | yugabyte i1 10.0.0.61 9c0a2dd1193e... L
           98 |      1000096 |                | yugabyte i1 10.0.0.62 359f44ea1136... L
            1 |              |                | yugabyte i1 10.0.0.63 61591ddbdd81... L
(3 rows)

由于C上的条件无法通过跳过操作进行优化，因此与上述相同：Include Index只能进行索引扫描，而不会到桌子上消除更多的行，但是在阅读后进行过滤是进行过滤的所有索引条目。请注意，Remote Filter是一种优化，yugabytedb将过滤推向存储，但仍必须读取所有索引条目。

钥匙覆盖索引

yugabyte=# /*+ indexonlyscan(tbl i2) */ explain analyze select C from tbl where A =  0 and C > 900042;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i2 on tbl  (cost=0.00..15.50 rows=100 width=4) (actual time=6.563..595.860 rows=99958 loops=1)
   Index Cond: ((a = 0) AND (c > 900042))
   Heap Fetches: 0
 Planning Time: 0.109 ms
 Execution Time: 599.628 ms
 Peak Memory Usage: 8 kB
(6 rows)

yugabyte=# execute snap_table;
 rocksdb_seek | rocksdb_next | rocksdb_insert | dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+------------------------------------------------
          108 |       100074 |                | yugabyte i2 10.0.0.63 135697ffc832... L
(1 row)

当谓词使用的覆盖列在索引键中时，这将被优化以仅读取必要的内容

结论

覆盖索引包含索引条目中的所有必需列，因此无需转到表。当通过索引读取大量行时，这是有效的，尤其是对于有一些过滤的列时，要丢弃尽可能多的，然后再进入表格

然后，覆盖的列可以在索引键中，也可以添加为包括。

包括这些列不是唯一键的一部分的唯一索引。在这些列上有更新的地方，也可以优选，因为LSM-Tree中的索引条目位置不会更改。

但是，当这些列上有某个点或范围条件时，它们应该在索引密钥中受益于LSM-Tree Skip操作以达到正确的点。而且，即使两者之间还有其他列，也登上下降，因为yugabytedb可以跳过这些。

在定义可以提供多个查询的索引时，这具有灵活性。通常，它被认为是将最选择性的列放在首位，但是如果列有很少的值，ASC或DESC可能会更好地将它们放在选择性的前面，以便您有一个可以提供多个查询的索引。在此示例中，除了有效地提供((a = 0) AND (c > 900042))外，我的索引还可以提供查询where A=0 and B=0，这仅在(A HASH, C ASC)上的索引效率不高。

。