【Python】 -- 实现统计《红楼梦》人物名称出现次数

这篇具有很好参考价值的文章主要介绍了【Python】 -- 实现统计《红楼梦》人物名称出现次数。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

前面文章已经具体讲解了对统计《三国演义》人物名称出现次数的操作和实现思路,如有需要可以浏览。

初级实现代码

import jieba
excludes = {"什么","一个","我们","那里","如今","你们","说道","起来",
            "姑娘","这里","出来","他们","众人","奶奶","自己","一面",
            "太太","只见","怎么","两个","没有","不是","不知","这个",
            "知道","听见","这样","进来","告诉","东西","咱们","就是",
            "回来","大家","只是","老爷","只得","丫头","这些","不敢",
            "出去","所以","不过"}
txt = open("红楼梦.txt","r",encoding='gb18030').read()
words = jieba.lcut(txt)
counts = {}
for word in words:
    if len(word) == 1:
        continue
    elif word == '宝玉':
        rword = '贾宝玉'
    elif word == '凤姐':
        rword = '王熙凤'
    elif word == '老太太':
        rword = '贾母'
    elif word == '宝钗':
        rword = '薛宝钗'
    elif word == '黛玉':
        rword = '林黛玉'
    elif word == '二太太':
        rword = '王夫人'
    elif word == '琏二爷':
        rword = '贾琏'
    elif word == '平姐姐':
        rword = '平儿'
    elif word == '薛夫人':
        rword = '薛姨妈'
    else:
        rword = word
    counts[rword] = counts.get(rword,0) + 1
for word in excludes:
    del(counts[word])
items = list(counts.items())
items.sort(key = lambda x:x[1],reverse=True)
for i in range(10):
    word,count = items[i]
    print("{0:<10}{1:>5}".format(word,count))

运行结果

【Python】 -- 实现统计《红楼梦》人物名称出现次数

 为什么说是初级代码?因为可以数据更加严谨统计,我们都知道人物名称在文章中不可能都只出现一次,还可能有别称,如贾宝玉可能有“宝二爷”的别称,因此可以将这些别称都统计为一个名称,为防止jieba库可能分词不理想,可以手动将这些别称添加到jieba库中。

加强版实现代码

import jieba
excludes = {"什么","一个","我们","那里","如今","你们","说道","起来",
            "姑娘","这里","出来","他们","众人","奶奶","自己","一面",
            "太太","只见","怎么","两个","没有","不是","不知","这个",
            "知道","听见","这样","进来","告诉","东西","咱们","就是",
            "回来","大家","只是","老爷","只得","丫头","这些","不敢",
            "出去","所以","不过"}
txt = open("红楼梦.txt","r",encoding='gb18030').read()
words = jieba.lcut(txt)
jieba.add_word('宝二爷')
jieba.add_word('凤辣子')
jieba.add_word('凤哥儿')
jieba.add_word('凤丫头')
jieba.add_word('二太太')
jieba.add_word('林妹妹')
jieba.add_word('林姑娘')
jieba.add_word('琏二爷')
jieba.add_word('宝丫头')
jieba.add_word('宝姑娘')
jieba.add_word('宝姐姐')
jieba.add_word('平姐姐')
jieba.add_word('平姑娘')
jieba.add_word('薛夫人')
jieba.add_word('姨太太')
counts = {}
for word in words:
    if len(word) == 1:
        continue
    elif word == '宝玉' or word == '宝二爷':
        rword = '贾宝玉'
    elif word == '凤姐' or word == '凤姐儿' or word == '凤丫头' or word == '凤哥儿' or word == '凤辣子':
        rword = '王熙凤'
    elif word == '老太太' or word == '老祖宗':
        rword = '贾母'
    elif word == '宝钗' or word == '宝姐姐' or word == '宝姑娘' or word== '宝丫头':
        rword = '薛宝钗'
    elif word == '黛玉' or word == '林妹妹' or word == '林姑娘':
        rword = '林黛玉'
    elif word == '二太太':
        rword = '王夫人'
    elif word == '琏二爷':
        rword = '贾琏'
    elif word == '平姐姐' or word == '平姑娘':
        rword = '平儿'
    elif word == '薛夫人' or word == '姨太太':
        rword = '薛姨妈'
    else:
        rword = word
    counts[rword] = counts.get(rword,0) + 1
for word in excludes:
    del(counts[word])
items = list(counts.items())
items.sort(key = lambda x:x[1],reverse=True)
for i in range(10):
    word,count = items[i]
    print("{0:<10}{1:>5}".format(word,count))

运行结果

【Python】 -- 实现统计《红楼梦》人物名称出现次数

 对比两次的运行结果可知,大多数人物出场次数发生了变化,甚至顺序也随之改变。文章来源地址https://www.toymoban.com/news/detail-501852.html


到了这里,关于【Python】 -- 实现统计《红楼梦》人物名称出现次数的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包