【错误记录】PySpark 运行报错 ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset )

这篇具有很好参考价值的文章主要介绍了【错误记录】PySpark 运行报错 ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset )。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。





一、报错信息



核心报错信息 :

  • WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException:
  • java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.

在 PyCharm 中 , 调用 PySpark 执行 计算任务 , 会报如下错误 :

D:\001_Develop\022_Python\Python39\python.exe D:/002_Project/011_Python/HelloPython/Client.py
23/08/01 11:25:24 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/01 11:25:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
PySpark 版本号 :  3.4.1
查看文件内容 :  ['Tom Jerry', 'Tom Jerry Tom', 'Jack Jerry']
查看文件内容展平效果 :  ['Tom', 'Jerry', 'Tom', 'Jerry', 'Tom', 'Jack', 'Jerry']
转为二元元组效果 :  [('Tom', 1), ('Jerry', 1), ('Tom', 1), ('Jerry', 1), ('Tom', 1), ('Jack', 1), ('Jerry', 1)]
D:\001_Develop\022_Python\Python39\Lib\site-packages\pyspark\python\lib\pyspark.zip\pyspark\shuffle.py:65: UserWarning: Please install psutil to have better support with spilling
D:\001_Develop\022_Python\Python39\Lib\site-packages\pyspark\python\lib\pyspark.zip\pyspark\shuffle.py:65: UserWarning: Please install psutil to have better support with spilling
D:\001_Develop\022_Python\Python39\Lib\site-packages\pyspark\python\lib\pyspark.zip\pyspark\shuffle.py:65: UserWarning: Please install psutil to have better support with spilling
D:\001_Develop\022_Python\Python39\Lib\site-packages\pyspark\python\lib\pyspark.zip\pyspark\shuffle.py:65: UserWarning: Please install psutil to have better support with spilling
最终统计单词 :  [('Tom', 3), ('Jack', 1), ('Jerry', 3)]

Process finished with exit code 0

【错误记录】PySpark 运行报错 ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset ),错误记录,Python,hadoop,大数据,分布式,python,PySpark,原力计划





二、解决方案 ( 安装 Hadoop 运行环境 )



核心报错信息 :

  • WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException:
  • java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.

PySpark 一般会与 Hadoop 环境一起运行 , 如果在 Windows 中没有安装 Hadoop 运行环境 , 就会报上述错误 ;

Hadoop 发布版本在 https://hadoop.apache.org/releases.html 页面可下载 ;
【错误记录】PySpark 运行报错 ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset ),错误记录,Python,hadoop,大数据,分布式,python,PySpark,原力计划

当前最新版本是 3.3.6 , 点击 Binary download 下的 binary (checksum signature) 链接 ,
【错误记录】PySpark 运行报错 ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset ),错误记录,Python,hadoop,大数据,分布式,python,PySpark,原力计划
进入到 Hadoop 3.3.6 下载页面 :

【错误记录】PySpark 运行报错 ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset ),错误记录,Python,hadoop,大数据,分布式,python,PySpark,原力计划

下载地址为 :

https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

官方下载速度很慢 ;

【错误记录】PySpark 运行报错 ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset ),错误记录,Python,hadoop,大数据,分布式,python,PySpark,原力计划

这里提供一个 Hadoop 版本 , Hadoop 3.3.4 + winutils , CSDN 0 积分下载地址 :

下载完后 , 解压 Hadoop , 安装路径为 D:\001_Develop\052_Hadoop\hadoop-3.3.4\hadoop-3.3.4 ;

【错误记录】PySpark 运行报错 ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset ),错误记录,Python,hadoop,大数据,分布式,python,PySpark,原力计划

在 环境变量 中 , 设置

HADOOP_HOME = D:\001_Develop\052_Hadoop\hadoop-3.3.4\hadoop-3.3.4

系统 环境变量 ;

【错误记录】PySpark 运行报错 ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset ),错误记录,Python,hadoop,大数据,分布式,python,PySpark,原力计划

在 Path 环境变量中 , 增加

%HADOOP_HOME%\bin
%HADOOP_HOME%\sbin

环境变量 ;

【错误记录】PySpark 运行报错 ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset ),错误记录,Python,hadoop,大数据,分布式,python,PySpark,原力计划

设置 D:\001_Develop\052_Hadoop\hadoop-3.3.4\hadoop-3.3.4\etc\hadoop\hadoop-env.cmd 脚本中的 JAVA_HOME 为真实的 JDK 路径 ;

set JAVA_HOME=%JAVA_HOME%

修改为

set JAVA_HOME=C:\Program Files\Java\jdk1.8.0_91

【错误记录】PySpark 运行报错 ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset ),错误记录,Python,hadoop,大数据,分布式,python,PySpark,原力计划

将 winutils-master\hadoop-3.3.0\bin 中的 hadoop.dll 和 winutils.exe 文件拷贝到 C:\Windows\System32 目录中 ;

【错误记录】PySpark 运行报错 ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset ),错误记录,Python,hadoop,大数据,分布式,python,PySpark,原力计划

重启电脑 , 一定要重启 ;

然后在命令行中 , 执行

hadoop -version

验证 Hadoop 是否安装完成 ;文章来源地址https://www.toymoban.com/news/detail-721733.html

到了这里,关于【错误记录】PySpark 运行报错 ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset )的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包