记一次Flink遇到性能瓶颈

这篇具有很好参考价值的文章主要介绍了记一次Flink遇到性能瓶颈。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

前言

这周的主要时间花在Flink上面，做了一个简单的从文本文件中读取数据，然后存入数据库的例子，能够正常的实现功能，但是遇到个问题，我有四台机器，自己搭建了一个standalone的集群，不论我把并行度设置多少，跑起来的耗时都非常接近，实在是百思不得其解。机器多似乎并不能帮助它。把过程记录在此，看后面随着学习的深入能不能解答出这个问题。
记一次Flink遇到性能瓶颈

尝试过的修复方法

集群搭建

出现这个问题后，我从集群的角度来进行了些修改，
1，机器是2核的，slots被设置成了6，那我就有点怀疑是这个设置问题，因为其实只有2核，设置的多了，反而存在抢占资源，导致运行达不到效果，改成2后效果一样，没有改进。这个参数在
taskmanager.numberOfTaskSlots: 2
2，调整内存， taskmanager 从2G调整为4G, 效果也没有变化。
taskmanager.memory.process.size: 4000m
这里说下这个内存，我们设置的是总的Memory，也就是这个Total Process Memory。
记一次Flink遇到性能瓶颈
剔除掉些比较固定的Memory，剩下的大头就是这个Task Heap 和 Managed Memory。
所以我们调整大小后，它两个也就相应的增加了。我查了下这两个，可以理解为堆内存和堆外内存，
一个是存放我们程序的对象，会被垃圾回收器回收；一个是堆外内存，比如RockDB 和缓存 sort，hash 等的中间结果。

程序方面修改

最开始的时候我把保存数据库操作写在MapFunction里面，后来改到SinkFunction里面。
SinkFunction里面保存数据库的方法也进行了反复修改，从开始使用Spring的JdbcTemplate,换成后来直接使用最原始JDBC。而且还踩了一个坑，开始的时候用的注入的JdbcTemplate, 本地运行没有问题，到了集群上面，发到别的机器的时候，注入的东西就是空的了。
换成原始的JDBC速度能提升不少，我猜想这里的原因是jdbctemplate做了些多余的事情， JDBC打开一次，后面Invoke的时候就直接存了，效率要高些，所以速度上提升不少。
这里把部分代码贴出来, 在Open的时候就预加载好PreparedStatement， Invoke的时候直接传参数，调用就可以了。

public class SinkToMySQL2 extends RichSinkFunction<MarketPrice> {
    private PreparedStatement updatePS;
    private PreparedStatement insertPS;
    private Connection connection;

    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);
        HikariDataSource dataSource = new HikariDataSource();
        connection = getConnection(dataSource);
        if(connection != null)
        {
            String updateSQL = " update MarketPrice set open_price=?,high_price=?,low_price=?,close_price=? where performance_id = ? and price_as_of_date = ?";
            updatePS = this.connection.prepareStatement(updateSQL);

            String insertSQL = " insert into MarketPrice(performance_id,price_as_of_date,open_price,high_price,low_price,close_price) values (?,?,?,?,?,?)";
            insertPS = this.connection.prepareStatement(insertSQL);
        }

    }

    @Override
    public void close() throws Exception {
        super.close();
        if (updatePS != null) {
            updatePS.close();
        }
        if (insertPS != null) {
            insertPS.close();
        }
        //关闭连接和释放资源
        if (connection != null) {
            connection.close();
        }

    }

    /**
     * 每条数据的插入都要调用一次 invoke() 方法
     *
     * @param marketPrice
     * @param context
     * @throws Exception
     */
    @Override
    public void invoke(MarketPrice marketPrice, Context context) throws Exception {

        log.info("start save for {}", marketPrice.getPerformanceId().toString() );

        updatePS.setDouble(1,marketPrice.getOpenPrice());
        updatePS.setDouble(2,marketPrice.getHighPrice());
        updatePS.setDouble(3,marketPrice.getLowPrice());
        updatePS.setDouble(4,marketPrice.getClosePrice());
        updatePS.setString(5, marketPrice.getPerformanceId().toString());
        updatePS.setInt(6, marketPrice.getPriceAsOfDate());
        int result = updatePS.executeUpdate();


        log.info("finish update for {} result {}", marketPrice.getPerformanceId().toString(), result);

        if(result == 0)
        {
            String insertSQL = " insert into MarketPrice(performance_id,price_as_of_date,open_price,high_price,low_price,close_price) values (?,?,?,?,?,?)";
            insertPS = this.connection.prepareStatement(insertSQL);
            insertPS.setString(1, marketPrice.getPerformanceId().toString());
            insertPS.setInt(2, marketPrice.getPriceAsOfDate());
            insertPS.setDouble(3,marketPrice.getOpenPrice());
            insertPS.setDouble(4,marketPrice.getHighPrice());
            insertPS.setDouble(5,marketPrice.getLowPrice());
            insertPS.setDouble(6,marketPrice.getClosePrice());

            result = insertPS.executeUpdate();
            log.info("finish save for {} result {}", marketPrice.getPerformanceId().toString(), result);
        }
    }

}

总结

从多个方面去改进，结果发现还是一样的，就是使用一台机器和使用三台机器，时间上一样的，再怀疑我只能怀疑是某台机器有问题，然后运行的时候，由最慢的机器决定了速度。我在使用MapFunction的时候有观察到，有的时候，某台机器已经处理上千条，而有的只处理了几十条，到最后完成的时候，大家处理的数量又是很接近的。这样能够解释为什么机器多了，速度却是一样的。但是我没有办法找出哪台机器来。我自己的本地运行，并行数设置的多，速度上面是有提升的，到了集群就碰到这样的现象，后面看能不能解决它，先记录在此。文章来源地址https://www.toymoban.com/news/detail-414576.html

到了这里，关于记一次Flink遇到性能瓶颈的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！