ClickHouse配置Hdfs存储数据-Toy模板网

这篇具有很好参考价值的文章主要介绍了ClickHouse配置Hdfs存储数据。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

背景

由于公司初始使用Hadoop这一套，所以希望ClickHouse也能使用Hdfs作为存储
看了下ClickHouse的文档，拿Hdfs举例来说，有两种方式来完成，一种是直接关联Hdfs上的数据文件，比如说TSV格式的文件，这种模式不支持插入数据。第二种是将Hdfs作为存储，可以理解为云存储方式，这篇文章讲解第二种方式的配置

官方文档：External Disks for Storing Data

配置单机

修改config.xml文件，一般路径在/etc/clickhouse-server/config.xml

 	<storage_configuration>
        <disks>
            <hdfs>
                <type>hdfs</type>
                <endpoint>hdfs://hdfs1:9000/clickhouse/</endpoint>
            </hdfs>
        </disks>
        <policies>
            <hdfs>
                <volumes>
                    <main>
                        <disk>hdfs</disk>
                    </main>
                </volumes>
            </hdfs>
        </policies>
    </storage_configuration>

    <merge_tree>
        <min_bytes_for_wide_part>0</min_bytes_for_wide_part>
    </merge_tree>

配置后重启

配置HA高可用Hdfs集群

复制hadoop下的配置文件hdfs-site.xml到/etc/clickhouse-server/下
修改config.xml配置文件，将endpoint中的标签内容，替换为cluster

        <disks>
            <hdfs>
                <type>hdfs</type>
                <endpoint>hdfs://cluster1/clickhouse/</endpoint>
            </hdfs>
        </disks>

这种方式的配置是没有端口的

拷贝了hdfs-site.xml文件，但是ClickHouse还不能识别到该文件，所以需要配置在config.xml的配置文件下

    <hdfs>
        <libhdfs3_conf>/etc/clickhouse-server/hdfs-site.xml</libhdfs3_conf>
    </hdfs>

这里在官方文档的另外一处有提到
地址：HDFS
ClickHouse配置Hdfs存储数据,# ClickHouse,数据库,clickhouse,hdfs,hadoop
配置完成，重启

这里配置集群模式有些折腾，看到之前有讲如果是hdfs-client.xml这种的，可以参考下。中间还设置过环境变量：
How do I use an HDFS engine in HA mode

性能测试

使用hdfs作为外部存储的时候，需要在建表时，设置存储策略，举例如下：

CREATE TABLE trait_term
(
	id UUID,
	termName String
)
ENGINE = MergeTree
PRIMARY KEY (id)
ORDER BY id
SETTINGS index_granularity = 1024, storage_policy='hdfs', index_granularity_bytes = 0;