ClickHouse-物化视图-Toy模板网

这篇具有很好参考价值的文章主要介绍了ClickHouse-物化视图。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

官方文档

什么是物化视图

ClickHouse 中物化视图（Materialized View）是一种预先计算并缓存结果的视图，它存储在磁盘上并自动更新，典型的空间换时间思路。物化视图是一种优化技术，它可以加速查询操作，降低系统负载，并提高查询性能。

创建语法：

CREATE [MATERIALIZED] VIEW [IF NOT EXISTS] [db.]table_name [TO[db.]name] [ENGINE = engine] [POPULATE] AS SELECT ...

物化视图工作流程

当你创建一个物化视图时，ClickHouse 会计算该视图的结果，并将结果存储在磁盘上。然后，当你查询该视图时，ClickHouse 会直接从磁盘上的结果中获取数据，而不需要重新计算。

物化视图可以基于一个或多个表创建，并可以使用 SQL 查询语句定义。它可以使用各种查询操作进行更新，例如 Insert、Update、Delete 。当数据源表发生更改时，物化视图会自动更新，以保持结果的一致性。

注意：使用物化视图，可以在查询性能和数据一致性之间进行权衡。物化视图可以提高查询性能，但会增加数据更新和维护的开销。

使用示例

这边是以官方提供的数据来操作。example-datasets

创建数据库以及表，这里给出 sql，也可以去上面给的地址拿：

DROP DATABASE IF EXISTS git;
CREATE DATABASE git;

CREATE TABLE git.commits
(
    hash String,
    author LowCardinality(String),
    time DateTime,
    message String,
    files_added UInt32,
    files_deleted UInt32,
    files_renamed UInt32,
    files_modified UInt32,
    lines_added UInt32,
    lines_deleted UInt32,
    hunks_added UInt32,
    hunks_removed UInt32,
    hunks_changed UInt32
) ENGINE = MergeTree ORDER BY time;

CREATE TABLE git.file_changes
(
    change_type Enum('Add' = 1, 'Delete' = 2, 'Modify' = 3, 'Rename' = 4, 'Copy' = 5, 'Type' = 6),
    path LowCardinality(String),
    old_path LowCardinality(String),
    file_extension LowCardinality(String),
    lines_added UInt32,
    lines_deleted UInt32,
    hunks_added UInt32,
    hunks_removed UInt32,
    hunks_changed UInt32,

    commit_hash String,
    author LowCardinality(String),
    time DateTime,
    commit_message String,
    commit_files_added UInt32,
    commit_files_deleted UInt32,
    commit_files_renamed UInt32,
    commit_files_modified UInt32,
    commit_lines_added UInt32,
    commit_lines_deleted UInt32,
    commit_hunks_added UInt32,
    commit_hunks_removed UInt32,
    commit_hunks_changed UInt32
) ENGINE = MergeTree ORDER BY time;

CREATE TABLE git.line_changes
(
    sign Int8,
    line_number_old UInt32,
    line_number_new UInt32,
    hunk_num UInt32,
    hunk_start_line_number_old UInt32,
    hunk_start_line_number_new UInt32,
    hunk_lines_added UInt32,
    hunk_lines_deleted UInt32,
    hunk_context LowCardinality(String),
    line LowCardinality(String),
    indent UInt8,
    line_type Enum('Empty' = 0, 'Comment' = 1, 'Punct' = 2, 'Code' = 3),

    prev_commit_hash String,
    prev_author LowCardinality(String),
    prev_time DateTime,

    file_change_type Enum('Add' = 1, 'Delete' = 2, 'Modify' = 3, 'Rename' = 4, 'Copy' = 5, 'Type' = 6),
    path LowCardinality(String),
    old_path LowCardinality(String),
    file_extension LowCardinality(String),
    file_lines_added UInt32,
    file_lines_deleted UInt32,
    file_hunks_added UInt32,
    file_hunks_removed UInt32,
    file_hunks_changed UInt32,

    commit_hash String,
    author LowCardinality(String),
    time DateTime,
    commit_message String,
    commit_files_added UInt32,
    commit_files_deleted UInt32,
    commit_files_renamed UInt32,
    commit_files_modified UInt32,
    commit_lines_added UInt32,
    commit_lines_deleted UInt32,
    commit_hunks_added UInt32,
    commit_hunks_removed UInt32,
    commit_hunks_changed UInt32
) ENGINE = MergeTree ORDER BY time;

使用s3 函数，INSERT INTO SELECT 插入数据

INSERT INTO git.commits SELECT *
FROM s3('https://datasets-documentation.s3.amazonaws.com/github/commits/clickhouse/commits.tsv.xz', 'TSV', 'hash String,author LowCardinality(String), time DateTime, message String, files_added UInt32, files_deleted UInt32, files_renamed UInt32, files_modified UInt32, lines_added UInt32, lines_deleted UInt32, hunks_added UInt32, hunks_removed UInt32, hunks_changed UInt32');

INSERT INTO git.file_changes SELECT *
FROM s3('https://datasets-documentation.s3.amazonaws.com/github/commits/clickhouse/file_changes.tsv.xz', 'TSV', 'change_type Enum(\'Add\' = 1, \'Delete\' = 2, \'Modify\' = 3, \'Rename\' = 4, \'Copy\' = 5, \'Type\' = 6), path LowCardinality(String), old_path LowCardinality(String), file_extension LowCardinality(String), lines_added UInt32, lines_deleted UInt32, hunks_added UInt32, hunks_removed UInt32, hunks_changed UInt32, commit_hash String, author LowCardinality(String), time DateTime, commit_message String, commit_files_added UInt32, commit_files_deleted UInt32, commit_files_renamed UInt32, commit_files_modified UInt32, commit_lines_added UInt32, commit_lines_deleted UInt32, commit_hunks_added UInt32, commit_hunks_removed UInt32, commit_hunks_changed UInt32');

INSERT INTO git.line_changes SELECT *
FROM s3('https://datasets-documentation.s3.amazonaws.com/github/commits/clickhouse/line_changes.tsv.xz', 'TSV', 'sign Int8, line_number_old UInt32, line_number_new UInt32, hunk_num UInt32, hunk_start_line_number_old UInt32, hunk_start_line_number_new UInt32, hunk_lines_added UInt32,\n    hunk_lines_deleted UInt32, hunk_context LowCardinality(String), line LowCardinality(String), indent UInt8, line_type Enum(\'Empty\' = 0, \'Comment\' = 1, \'Punct\' = 2, \'Code\' = 3), prev_commit_hash String, prev_author LowCardinality(String), prev_time DateTime, file_change_type Enum(\'Add\' = 1, \'Delete\' = 2, \'Modify\' = 3, \'Rename\' = 4, \'Copy\' = 5, \'Type\' = 6),\n    path LowCardinality(String), old_path LowCardinality(String), file_extension LowCardinality(String), file_lines_added UInt32, file_lines_deleted UInt32, file_hunks_added UInt32, file_hunks_removed UInt32, file_hunks_changed UInt32, commit_hash String,\n    author LowCardinality(String), time DateTime, commit_message String, commit_files_added UInt32, commit_files_deleted UInt32, commit_files_renamed UInt32, commit_files_modified UInt32, commit_lines_added UInt32, commit_lines_deleted UInt32, commit_hunks_added UInt32, commit_hunks_removed UInt32, commit_hunks_changed UInt32');

创建一个物化视图，查每个用户每天 commits 数量：
```
create materialized view git.commits_mv
engine SummingMergeTree
order by (dt, author)
as select
toDate(time) as dt, author, count() as n from git.commits group by dt, author order by dt asc;
```
SummingMergeTree 表引擎主要用于只关心聚合后的数据，而不关心明细数据的场景，它能够在合并分区的时候按照预先定义的条件聚合汇总数据，将同一分组下的多行数据汇总到一行，可以显著的 减少存储空间并加快数据查询的速度。

注意：这里创建物化视图，并没有数据。需要写入数据，后面会提到。至于为什么不用 POPULATE，因为在填充历史数据的期间, 新进入的这部分数据会被忽略掉，所以如果对准确性要求非常高，应慎用。
```
-- POPULATE 版
create materialized view git.commits_mv
engine SummingMergeTree
order by (dt, author)
POPULATE as select
toDate(time) as dt, author, count() as n from git.commits group by dt, author order by dt asc;

-- ClickHouse 官方并不推荐使用 populated，因为在创建视图过程中插入表中的数据并不会写入视图，会造成数据的丢失。
```
如果创建时，无使用 POPULATE 的话，通过 insert into 写入数据：
```
insert into git.commits_mv
select toDate(time) as dt, author, count() as n from git.commits group by dt, author order by dt asc;
```
如果无报错的话，此时应该是能看视图的数据的。也可以验证下，在源数据有新增的情况下，是否会更新到视图里：
```
-- 写一条数据看看是否会自动更新视图
insert into git.commits (hash, author, time, message, files_added, files_deleted, files_renamed, files_modified, lines_added, lines_deleted, hunks_added, hunks_removed, hunks_changed) values ('488610bd96415bdb8a718135676cxdf6a665829922', 'Nikita Taranov', '2022-11-30 18:22:24', 'impl (#43709)', 2, 0, 0, 3, 50, 31, 5, 1, 1);
```
结果：是会更新的。但是你多新增几条的话，commits_mv 视图里，并没有对其汇总？在使用物化视图（SummingMergeTree 引擎）的时候，也需要按照聚合查询来写 sql，因为虽然 SummingMergeTree 会自己预聚合，但是并不是实时的，具体执行聚合的时机并不可控。
即查询的 sql 如下：文章来源地址https://www.toymoban.com/news/detail-679118.html
```
select dt, author, sum(n) from git.commits_mv group by dt ,author order by dt desc;
```