中文全文检索pgroonga在HGDB-SEE V4.5.8版本编译-Toy模板网

这篇具有很好参考价值的文章主要介绍了中文全文检索pgroonga在HGDB-SEE V4.5.8版本编译。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

PGroonga 官网：https://pgroonga.github.io/

Description：PGroonga (píːzí:lúnɡά) is a PostgreSQL extension to use Groonga as the index.PostgreSQL supports full text search against languages that use only alphabet and digit. It means that PostgreSQL doesn’t support full text search against Japanese, Chinese and so on. You can use super fast full text search feature against all languages by installing PGroonga into your PostgreSQL!

一、安装相关依赖包

yum install wget curl tar gzip gcc gcc-c++ make zlib zlib-devel msgpack msgpack-devel mecab mecab-devel lz4 lz4-devel

二、下载安装 git

注意：编译要求Git版本为 V2.7.4及以上版本

wget https://www.kernel.org/pub/software/scm/git/git-2.7.4.tar.gz --no-check-certificate
tar -vzxf git-2.7.4.tar.gz 
cd git-2.7.4/
#Configure
./configure --with-openssl=/usr/local/openssl
#编译安装
make && make install
#打开操作系统环境变量配置文件，修改环境变量
vi /etc/profile
      #在底部加上git相关配置
      export PATH=$PATH:/usr/local/git-2.7.4
#:wq保存，source命令生效
source /etc/profile
#查看git版本
git --version

三、下载编译安装 groonga

Description：Groonga is an open-source fulltext search engine and column store. It lets you write high-performance applications that requires fulltext search.

官网：https://groonga.org/

编译安装：https://groonga.org/docs/install/centos.html#centos-7

PS：下载source进行源码编译，官网上的groonga-release-latest.noarch.rpm直接在本地安装会有问题（可能会存在文件缺失）。

wget https://packages.groonga.org/source/groonga/groonga-13.0.9.tar.gz --no-check-certificate
tar -xvzf groonga-13.0.9.tar.gz
cd groonga-13.0.9
#Configure
./configure
#编译Build
make -j$(grep '^processor' /proc/cpuinfo | wc -l)
#install
sudo make install
#install之后，执行如下命令查看当前系统安装了哪些库？ 
pkg-config --list-all
#查看是否能查到groonga，若没有查到groonga头文件和库文件的位置，编译器无法使用，需要设置PKG_CONFIG_PATH环境变量
#查找groonga.pc的位置
find / -name groonga.pc
#设置PKG_CONFIG_PATH环境变量
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig/
#查看是否可输出groonga
pkg-config --list-all
#正常可输出groonga   Groonga - An Embeddable Fulltext Search Engine

四、下载安装 xxHash

Description:xxHash is an Extremely fast Hash algorithm, processing at RAM speed limits. Code is highly portable, and produces hashes identical across all platforms (little / big endian).

Vcpkg用于在Windows、Linux、Mac上管理C和C++库，极大简化了第三方库的安装，它由微软开源，源码地址：https://github.com/Microsoft/vcpkg，最新发布版本为2023.04.15 Release，它的license为MIT。

Building xxHash - Using vcpkg

You can download and install xxHash using the vcpkg dependency manager

git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg integrate install
./vcpkg install xxhash

五、下载安装 pgroonga

git clone --recursive https://github.com/pgroonga/pgroonga.git
cd pgroonga
make 
make install

六、登录数据库

psql highgo sysdba

highgo=# select * from pg_available_extensions where name like '%roon%';
name        | default_version | installed_version | comment-------------------+-----------------+-------------------+-------------------------------------------------------
pgroonga          | 3.1.6           | 3.1.6             | Super fast and all languages supported full text search index based on Groonga
pgroonga_database | 3.1.6           |                   | PGroonga database management module
highgo=# create extension pgroonga;
错误:  扩展 "pgroonga" 已经存在
highgo=# \dx
                                                                 已安装扩展列表
        名称        | 版本  |      架构模式      |                                             描述                                              
--------------------+-------+--------------------+-----------------------------------------------------------------------------------------------
 hg_mac             | 1.0   | information_schema | hgdb mandatory access control without using selinux
 hg_permission      | 1.0   | information_schema | hg permission
 mysqlface          | 1.0   | public             | administrative functions for PostgreSQL
 orafce             | 3.9   | public             | Functions and operators that emulate a subset of functions and packages from the Oracle RDBMS
 passwordcheck      | 1.0   | information_schema | passwordcheck
 pg_buffercache     | 1.3   | public             | examine the shared buffer cache
 pg_stat_statements | 1.7   | public             | track execution statistics of all SQL statements executed
 pgroonga           | 3.1.6 | public             | Super fast and all languages supported full text search index based on Groonga
 plpgsql            | 1.0   | pg_catalog         | PL/pgSQL procedural language
 zhfts              | 1.1   | public             | RUM index access method
 zhparser           | 2.2   | public             | a parser for full-text search of Chinese
(11 行记录)

七、使用

1、启用全文搜索作为文本类型的列

CREATE TABLE memos (
  id integer,
  content text
);
CREATE INDEX pgroonga_content_index ON memos USING pgroonga (content);
INSERT INTO memos VALUES (1, 'PostgreSQL is a relational database management system.');
INSERT INTO memos VALUES (2, 'Groonga is a fast full text search engine that supports all languages.');
INSERT INTO memos VALUES (3, 'PGroonga is a PostgreSQL extension that uses Groonga as index.');
INSERT INTO memos VALUES (4, 'There is groonga command.');
SET enable_seqscan = off;

There are the following operators to perform full text search:

&@~

ILIKE

&@~ operator
You can use &@~ operator to perform full text search by query syntax such as keyword1 OR keyword2:

highgo=# SELECT * FROM memos WHERE content &@~ 'PGroonga OR PostgreSQL';
 id |                            content                             
----+----------------------------------------------------------------
  1 | PostgreSQL is a relational database management system.
  3 | PGroonga is a PostgreSQL extension that uses Groonga as index.
(2 行记录)

&@ operator
You can use &@ operator to perform full text search by one keyword:

highgo=# SELECT * FROM memos WHERE content &@ 'engine';
 id |                                content                                 
----+------------------------------------------------------------------------
  2 | Groonga is a fast full text search engine that supports all languages.
(1 行记录)


LIKE operator
PGroonga supports LIKE operator. You can perform fast full text search by PGroonga without changing existing SQL.
column LIKE '%keyword%' almost equals to column &@ 'keyword':

highgo=# SELECT * FROM memos WHERE content LIKE '%engine%';
 id |                                content                                 
----+------------------------------------------------------------------------
  2 | Groonga is a fast full text search engine that supports all languages.
(1 行记录)

2、Score（匹配精度排序）

You can use pgroonga.score function to get precision as a number. If a record is more precision against searched query, the record has more higher number.

You need to add primary key column into pgroonga index to use pgroonga.score function. If you don’t add primary key column into pgroonga index, pgroonga.score function always returns 0.

Here is a sample schema that includes primary key into indexed columns:

CREATE TABLE score_memos (
  id integer PRIMARY KEY,
  content text
);

CREATE INDEX pgroonga_score_memos_content_index
          ON score_memos
       USING pgroonga (id, content);
INSERT INTO score_memos VALUES (1, 'PostgreSQL is a relational database management system.');
INSERT INTO score_memos VALUES (2, 'Groonga is a fast full text search engine that supports all languages.');
INSERT INTO score_memos VALUES (3, 'PGroonga is a PostgreSQL extension that uses Groonga as index.');
INSERT INTO score_memos VALUES (4, 'There is groonga command.');
SET enable_seqscan = off;

--执行全文检索并获得分数
SELECT *, pgroonga_score(tableoid, ctid) FROM score_memos WHERE content &@ 'PGroonga' OR content &@ 'PostgreSQL';
 id |                            content                             | pgroonga_score 
----+----------------------------------------------------------------+----------------
  1 | PostgreSQL is a relational database management system.         |              1
  3 | PGroonga is a PostgreSQL extension that uses Groonga as index. |              2
(2 行记录)

--可以使用ORDER by子句中的pgroonga_score函数，按精度降序对匹配的记录进行排序：
highgo=# SELECT *, pgroonga_score(tableoid, ctid)
highgo-#   FROM score_memos
highgo-#  WHERE content &@ 'PGroonga' OR content &@ 'PostgreSQL'
highgo-#  ORDER BY pgroonga_score(tableoid, ctid) DESC;
 id |                            content                             | pgroonga_score 
----+----------------------------------------------------------------+----------------
  3 | PGroonga is a PostgreSQL extension that uses Groonga as index. |              2
  1 | PostgreSQL is a relational database management system.         |              1
(2 行记录)