实验6 熟悉Hive的基本操作-Toy模板网

这篇具有很好参考价值的文章主要介绍了实验6 熟悉Hive的基本操作。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

一、实验目的

（1）理解 Hive 作为数据仓库在 Hadoop 体系结构中的角色。
（2）熟练使用常用的 HiveQL。

二、实验平台

操作系统：Ubuntu18.04（或Ubuntu16.04）；
Hadoop版本：3.1.3；
Hive版本：3.1.2；
JDK版本：1.8。

三、数据集

准备工作：

由《Hive编程指南》(O’Reilly系列，人民邮电出版社)提供，下载地址：
https://raw.githubusercontent.com/oreillymedia/programming_hive/master/prog-hive-1st-ed-data.zip
备用下载地址：
https://www.cocobolo.top/FileServer/prog-hive-1st-ed-data.zip
下载慢可参考我上传的资源：林子雨Hive数据集下载

解压后可以得到本实验所需的 stocks.csv 和 dividends.csv 两个文件。

进入你的 Downloads（下载）文件夹，右键解压刚下载的数据压缩包，进入 prog-hive-1st-ed-data 文件夹，右键打开终端：

cd ~/Downloads/prog-hive-1st-ed-data
sudo cp ./data/stocks/stocks.csv /usr/local/hive
sudo cp ./data/dividends/dividends.csv /usr/local/hive

进入 Hadoop 目录，启动 Hadoop：

cd /usr/local/hadoop
sbin/start-dfs.sh

启动 MySQL：

service mysql start

切换到 Hive 目录下，启动 MySQL 和 Hive：

cd /usr/local/hive
bin/hive

四、实验步骤

（1）创建一个内部表 stocks，字段分隔符为英文逗号，表结构如下所示：

stocks 表结构：

col_name	data_type
exchange	string
symbol	string
ymd	string
price_open	float
price_high	float
price_low	float
price_close	float
volume	int
price_adj_close	float

代码：

create table if not exists stocks
(
`exchange` string,
`symbol` string,
`ymd` string,
`price_open` float,
`price_high` float,
`price_low` float,
`price_close` float,
`volume` int,
`price_adj_close` float
)
row format delimited fields terminated by ',';

查看表：

hive> describe stocks;
OK
exchange            	string              	                    
symbol              	string              	                    
ymd                 	string              	                    
price_open          	float               	                    
price_high          	float               	                    
price_low           	float               	                    
price_close         	float               	                    
volume              	int                 	                    
price_adj_close     	float               	                    
Time taken: 0.062 seconds, Fetched: 9 row(s)
hive>

（2）创建一个外部分区表 dividends（分区字段为 exchange 和 symbol），字段分隔符为英文逗号，表结构如下所示：

dividends 表结构

col_name	data_type
ymd	string
dividend	float
exchange	string
symbol	string

代码：

create external table if not exists dividends
(
`ymd` string,
`dividend` float
)
partitioned by(`exchange` string ,`symbol` string)
row format delimited fields terminated by ',';

查看表：

hive> describe dividends;
OK
ymd                 	string              	                    
dividend            	float               	                    
exchange            	string              	                    
symbol              	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
exchange            	string              	                    
symbol              	string              	                    
Time taken: 0.106 seconds, Fetched: 9 row(s)
hive>

（3）从 stocks.csv 文件向 stocks 表中导入数据：

代码：

load data local inpath '/usr/local/hive/stocks.csv' overwrite into table stocks;

（4）创建一个未分区的外部表 dividends_unpartitioned，并从 dividends.csv 向其中导入数据，表结构如下所示：

dividends_unpartitioned 表结构

col_name	data_type
ymd	string
dividend	float
exchange	string
symbol	string

代码：

create external table if not exists dividends_unpartitioned
(
`exchange` string ,
`symbol` string,
`ymd` string,
`dividend` float
)
row format delimited fields terminated by ',';

导入数据：

load data local inpath '/usr/local/hive/dividends.csv' overwrite into table dividends_unpartitioned;

（5）通过对 dividends_unpartitioned 的查询语句，利用 Hive 自动分区特性向分区表 dividends 各个分区中插入对应数据。

代码：

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=1000;
insert overwrite table dividends partition(`exchange`,`symbol`) select `ymd`,`dividend`,`exchange`,`symbol` from dividends_unpartitioned;

（6）查询IBM公司（symbol = IBM）从 2000 年起所有支付股息的交易日（dividends 表中有对应记录）的收盘价（price_close）。

操作语句如下：

select s.ymd,s.symbol,s.price_close
from stocks s 
LEFT SEMI JOIN 
dividends d
ON s.ymd=d.ymd and s.symbol=d.symbol
where s.symbol='IBM' and year(ymd)>=2000;

输出如下（折叠部分输出）：

2010-02-08	IBM	121.88
2009-11-06	IBM	123.49
2009-08-06	IBM	117.38
...
2000-05-08	IBM	109.75
2000-02-08	IBM	118.81
Time taken: 8.75 seconds, Fetched: 41 row(s)

（7）查询苹果公司（symbol = AAPL）2008 年 10 月每个交易日的涨跌情况，涨显示 rise，跌显示 fall，不变显示 unchange。

操作语句如下：

select ymd,
case
    when price_close-price_open>0 then 'rise'
    when price_close-price_open<0 then 'fall'
    else 'unchanged'
end as situation
from stocks
where symbol='AAPL' and substring(ymd,0,7)='2008-10';

输出如下（折叠部分输出）：

2008-10-31	rise
2008-10-30	rise
...
2008-10-02	fall
2008-10-01	fall
Time taken: 0.1 seconds, Fetched: 23 row(s)

（8）查询 stocks 表中收盘价（price_close）比开盘价（price_open）高得最多的那条记录的交易所（exchange）、股票代码（symbol）、日期（ymd）、收盘价、开盘价及二者差价。

操作语句如下：

select `exchange`,`symbol`,`ymd`,price_close,price_open,price_close-price_open as `diff`
from
(
    select *
    from stocks
    order by price_close-price_open desc
    limit 1
)t;

输出如下：

NASDAQ	INFY	2000-02-11	670.06	534.5	135.56
Time taken: 4.476 seconds, Fetched: 1 row(s)

9）从 stocks 表中查询苹果公司（symbol=AAPL）年平均调整后收盘价（price_adj_close）大于 50 美元的年份及年平均调整后收盘价。

操作语句如下：

select
    year(ymd) as `year`,
    avg(price_adj_close) as avg_price from stocks
where `exchange`='NASDAQ' and symbol='AAPL'
group by year(ymd)
having avg_price > 50;

输出如下：

2006	70.81063753105255
2007	128.27390423049016
2008	141.9790115054888
2009	146.81412711976066
2010	204.72159912109376
Time taken: 2.347 seconds, Fetched: 5 row(s)

（10）查询每年年平均调整后收盘价（price_adj_close）前三名的公司的股票代码及年平均调整后收盘价。

操作语句如下：

select t2.`year`,symbol,t2.avg_price
from
(
    select
        *,row_number() over(partition by t1.`year` order by t1.avg_price desc) as `rank`
    from
    (
        select
            year(ymd) as `year`,
            symbol,
            avg(price_adj_close) as avg_price
        from stocks
        group by year(ymd),symbol
    )t1
)t2
where t2.`rank`<=3;

输出如下（折叠部分输出）：文章来源地址https://www.toymoban.com/news/detail-498213.html

NULL	stock_symbol	NULL
1962	IBM	2.0072222134423634
1962	GE	0.16876984293025638
...
2009	GTC	174.11607115609306
2010	ISRG	319.75360107421875
2010	AMEN	313.875
2010	GTC	214.36719848632814
Time taken: 7.715 seconds, Fetched: 140 row(s)