快速入门Unity机器学习:三:

这篇具有很好参考价值的文章主要介绍了快速入门Unity机器学习:三:。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

目录

1.目的

1.1 记录一下,方便下次直接查看博客

2.参考

1.我的博客

3.注意

4. 课时 12 : 109 - 随机Target的位置并收集观察结果

5.  课时 13 : 110 - 收集观察结果完成前期训练准备

6.  课时 14 : 111 - 让红球可以一直吃到绿球

 7. 课时 15 : 112 - 开始训练模型

 7.1 解决报错:已解决

7.2 解决报错:成功

8. 课时 16 : 113 - 完成模型的训练:


1.目的

1.1 记录一下,方便下次直接查看博客

2.参考

1.我的博客

快速入门Unity机器学习:一:_Smart_zy的博客-CSDN博客目录1.目的1.1 记录一下2.参考1.SIKI学院3.注意4.课时 3 : 100 - Unity机器学习案例下载5.课时 4 : 101 - 狗子的学习6.课时 5 : 102 - 安装Anaconda并且创建环境6.1 安装Anaconda6.2 运行Anaconda6.3 搜索 mlagents1.目的1.1 记录一下2.参考1.SIKI学院登录 - SiKi学院 - 生命不息,学习不止!siki老师的Unity3D专...https://blog.csdn.net/qq_40544338/article/details/124746037

https://blog.csdn.net/qq_40544338/article/details/124763962?csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22124763962%22%2C%22source%22%3A%22qq_40544338%22%7D&ctrtid=MNxBWhttps://blog.csdn.net/qq_40544338/article/details/124763962?csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22124763962%22%2C%22source%22%3A%22qq_40544338%22%7D&ctrtid=MNxBW

3.注意

版本

  • windows 10 家庭版
  • Anaconda Navigator 1.9.7
  • Unity 2019.3.15f1

4. 课时 12 : 109 - 随机Target的位置并收集观察结果

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

using Unity.MLAgents;//【104创建场景:添加】
using Unity.MLAgents.Sensors;//【104创建场景:添加】

/// <summary>
/// 【function:球代理】【Time:2022 05 14】【104创建场景:添加】
/// </summary>
public class MyRollerAgent : Agent
{
    /// <summary>目标的坐标【105Agent里面的四个函数:添加】</summary>
    public Transform target;

    /// <summary>rBody【105Agent里面的四个函数:添加】</summary>
    private Rigidbody rBody;

    /// <summary>rBody【106手动操作智能体:添加】</summary>
    private float speed = 10;

    void Start()
    {
        rBody = GetComponent<Rigidbody>();//【105Agent里面的四个函数:添加】
    }

    /// <summary>
    /// 【function:进入新的一轮时候调用的函数】【Time:2022 05 14】【105Agent里面的四个函数:添加】
    /// </summary>
    public override void OnEpisodeBegin()
    {
        print("OnEpisodeBegin");

        //重新开始,设置小球的初始位置
        this.transform.position = new Vector3(0, 0.5f, 0);//【107重置游戏的函数:添加】
        this.rBody.velocity = Vector3.zero;//速度【107重置游戏的函数:添加】
        this.rBody.angularVelocity = Vector3.zero;//旋转【107重置游戏的函数:添加】

        target.position = new Vector3(Random.value * 8 - 4, 0.5f, Random.value * 8 - 4);//随机target的位置【109随机Target的位置并收集观察结果:添加】
    }

    /// <summary>
    /// 【function:收集观察的结果】【Time:2022 05 14】【105Agent里面的四个函数:添加】
    /// </summary>
    /// <param name="sensor"></param>
    public override void CollectObservations(VectorSensor sensor)
    {
        base.CollectObservations(sensor);
    }

    /// <summary>
    /// 【function:接受动作,是否给予奖励】【Time:2022 05 14】【105Agent里面的四个函数:添加】
    /// </summary>
    /// <param name="vectorAction"></param>
    public override void OnActionReceived(float[] vectorAction)
    {
        print("Horizontal:"+vectorAction[0]);//【106手动操作智能体:添加】
        print("Vertical:"+vectorAction[1]);//【106手动操作智能体:添加】

        Vector3 control = Vector3.zero;//【106手动操作智能体:添加】
        control.x = vectorAction[0];//【106手动操作智能体:添加】
        control.z = vectorAction[1];//【106手动操作智能体:添加】

        rBody.AddForce(control * speed);//移动小球【106手动操作智能体:添加】

        //狗子出界了,使用y去判断
        if (this.transform.position.y<0)
        {
            EndEpisode();//结束这一轮的测试【108设置智能体奖励:添加】  
        }

        //狗子吃到东西了
        float distance = Vector3.Distance(this.transform.position, target.position);//给定奖励【108设置智能体奖励:添加】 
        if (distance<1.41f)
        {
            SetReward(1);//给定奖励【108设置智能体奖励:添加】 
            EndEpisode();//给定奖励【108设置智能体奖励:添加】 
        }

    }

    /// <summary>
    /// 【function:手动操作智能体】【Time:2022 05 14】【105Agent里面的四个函数:添加】
    /// </summary>
    /// <param name="actionsOut"></param>
    public override void Heuristic(float[] actionsOut)
    {
        //拿到水平和垂直方向
        actionsOut[0] = Input.GetAxis("Horizontal");//【106手动操作智能体:添加】
        actionsOut[1] = Input.GetAxis("Vertical");//【106手动操作智能体:添加】
    }

}

快速入门Unity机器学习:三:快速入门Unity机器学习:三:

5.  课时 13 : 110 - 收集观察结果完成前期训练准备

快速入门Unity机器学习:三:

快速入门Unity机器学习:三:一共观察了8个float数值

快速入门Unity机器学习:三:观察的数值,再神经网络输入作为1个向量

快速入门Unity机器学习:三:Discrete:可列的(可以列举出来的)

Continous:连续的

快速入门Unity机器学习:三:Default:机器人自己动

Heuristic Only:人为控制

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

using Unity.MLAgents;//【104创建场景:添加】
using Unity.MLAgents.Sensors;//【104创建场景:添加】

/// <summary>
/// 【function:球代理】【Time:2022 05 14】【104创建场景:添加】
/// </summary>
public class MyRollerAgent : Agent
{
    /// <summary>目标的坐标【105Agent里面的四个函数:添加】</summary>
    public Transform target;

    /// <summary>rBody【105Agent里面的四个函数:添加】</summary>
    private Rigidbody rBody;

    /// <summary>rBody【106手动操作智能体:添加】</summary>
    private float speed = 10;

    void Start()
    {
        rBody = GetComponent<Rigidbody>();//【105Agent里面的四个函数:添加】
    }

    /// <summary>
    /// 【function:进入新的一轮时候调用的函数】【Time:2022 05 14】【105Agent里面的四个函数:添加】
    /// </summary>
    public override void OnEpisodeBegin()
    {
        print("OnEpisodeBegin");

        //重新开始,设置小球的初始位置
        this.transform.position = new Vector3(0, 0.5f, 0);//【107重置游戏的函数:添加】
        this.rBody.velocity = Vector3.zero;//速度【107重置游戏的函数:添加】
        this.rBody.angularVelocity = Vector3.zero;//旋转【107重置游戏的函数:添加】

        target.position = new Vector3(Random.value * 8 - 4, 0.5f, Random.value * 8 - 4);//随机target的位置【109随机Target的位置并收集观察结果:添加】
    }

    /// <summary>
    /// 【function:收集观察的结果】【Time:2022 05 14】【105Agent里面的四个函数:添加】
    /// </summary>
    /// <param name="sensor"></param>
    public override void CollectObservations(VectorSensor sensor)
    {
        //一共观察了 8个float值

        //2个坐标(当前位置的坐标,Target的坐标)
        sensor.AddObservation(target.position);//【110收集观察结果完成前期训练准备:添加】
        sensor.AddObservation(this.transform.position);//【110收集观察结果完成前期训练准备:添加】

        //2个速度(x和z的速度)
        sensor.AddObservation(rBody.velocity.x);//【110收集观察结果完成前期训练准备:添加】
        sensor.AddObservation(rBody.velocity.z);//【110收集观察结果完成前期训练准备:添加】
    }

    /// <summary>
    /// 【function:接受动作,是否给予奖励】【Time:2022 05 14】【105Agent里面的四个函数:添加】
    /// </summary>
    /// <param name="vectorAction"></param>
    public override void OnActionReceived(float[] vectorAction)
    {
        print("Horizontal:"+vectorAction[0]);//【106手动操作智能体:添加】
        print("Vertical:"+vectorAction[1]);//【106手动操作智能体:添加】

        Vector3 control = Vector3.zero;//【106手动操作智能体:添加】
        control.x = vectorAction[0];//【106手动操作智能体:添加】
        control.z = vectorAction[1];//【106手动操作智能体:添加】

        rBody.AddForce(control * speed);//移动小球【106手动操作智能体:添加】

        //狗子出界了,使用y去判断
        if (this.transform.position.y<0)
        {
            EndEpisode();//结束这一轮的测试【108设置智能体奖励:添加】  
        }

        //狗子吃到东西了
        float distance = Vector3.Distance(this.transform.position, target.position);//给定奖励【108设置智能体奖励:添加】 
        if (distance<1.41f)
        {
            SetReward(1);//给定奖励【108设置智能体奖励:添加】 
            EndEpisode();//给定奖励【108设置智能体奖励:添加】 
        }

    }

    /// <summary>
    /// 【function:手动操作智能体】【Time:2022 05 14】【105Agent里面的四个函数:添加】
    /// </summary>
    /// <param name="actionsOut"></param>
    public override void Heuristic(float[] actionsOut)
    {
        //拿到水平和垂直方向
        actionsOut[0] = Input.GetAxis("Horizontal");//【106手动操作智能体:添加】
        actionsOut[1] = Input.GetAxis("Vertical");//【106手动操作智能体:添加】
    }

}

6.  课时 14 : 111 - 让红球可以一直吃到绿球

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

using Unity.MLAgents;//【104创建场景:添加】
using Unity.MLAgents.Sensors;//【104创建场景:添加】

/// <summary>
/// 【function:球代理】【Time:2022 05 14】【104创建场景:添加】
/// </summary>
public class MyRollerAgent : Agent
{
    /// <summary>目标的坐标【105Agent里面的四个函数:添加】</summary>
    public Transform target;

    /// <summary>rBody【105Agent里面的四个函数:添加】</summary>
    private Rigidbody rBody;

    /// <summary>rBody【106手动操作智能体:添加】</summary>
    private float speed = 10;

    void Start()
    {
        rBody = GetComponent<Rigidbody>();//【105Agent里面的四个函数:添加】
    }

    /// <summary>
    /// 【function:进入新的一轮时候调用的函数】【Time:2022 05 14】【105Agent里面的四个函数:添加】
    /// </summary>
    public override void OnEpisodeBegin()
    {
        //print("OnEpisodeBegin");

        //只有当小球掉落的时候,才去重置小球的位置(目的是让小球一直可以吃到小球)
        if (this.transform.position.y<0)
        {
            //重新开始,设置小球的初始位置
            this.transform.position = new Vector3(0, 0.5f, 0);//【107重置游戏的函数:添加】
            this.rBody.velocity = Vector3.zero;//速度【107重置游戏的函数:添加】
            this.rBody.angularVelocity = Vector3.zero;//旋转【107重置游戏的函数:添加】
        }

        target.position = new Vector3(Random.value * 8 - 4, 0.5f, Random.value * 8 - 4);//随机target的位置【109随机Target的位置并收集观察结果:添加】
    }

    /// <summary>
    /// 【function:收集观察的结果】【Time:2022 05 14】【105Agent里面的四个函数:添加】
    /// </summary>
    /// <param name="sensor"></param>
    public override void CollectObservations(VectorSensor sensor)
    {
        //一共观察了 8个float值

        //2个坐标(当前位置的坐标,Target的坐标)
        sensor.AddObservation(target.position);//【110收集观察结果完成前期训练准备:添加】
        sensor.AddObservation(this.transform.position);//【110收集观察结果完成前期训练准备:添加】

        //2个速度(x和z的速度)
        sensor.AddObservation(rBody.velocity.x);//【110收集观察结果完成前期训练准备:添加】
        sensor.AddObservation(rBody.velocity.z);//【110收集观察结果完成前期训练准备:添加】
    }

    /// <summary>
    /// 【function:接受动作,是否给予奖励】【Time:2022 05 14】【105Agent里面的四个函数:添加】
    /// </summary>
    /// <param name="vectorAction"></param>
    public override void OnActionReceived(float[] vectorAction)
    {
        //print("Horizontal:"+vectorAction[0]);//【106手动操作智能体:添加】
        //print("Vertical:"+vectorAction[1]);//【106手动操作智能体:添加】

        Vector3 control = Vector3.zero;//【106手动操作智能体:添加】
        control.x = vectorAction[0];//【106手动操作智能体:添加】
        control.z = vectorAction[1];//【106手动操作智能体:添加】

        rBody.AddForce(control * speed);//移动小球【106手动操作智能体:添加】

        //狗子出界了,使用y去判断
        if (this.transform.position.y<0)
        {
            EndEpisode();//结束这一轮的测试【108设置智能体奖励:添加】  
        }

        //狗子吃到东西了
        float distance = Vector3.Distance(this.transform.position, target.position);//给定奖励【108设置智能体奖励:添加】 
        if (distance<1.41f)
        {
            SetReward(1);//给定奖励【108设置智能体奖励:添加】 
            EndEpisode();//给定奖励【108设置智能体奖励:添加】 
        }

    }

    /// <summary>
    /// 【function:手动操作智能体】【Time:2022 05 14】【105Agent里面的四个函数:添加】
    /// </summary>
    /// <param name="actionsOut"></param>
    public override void Heuristic(float[] actionsOut)
    {
        //拿到水平和垂直方向
        actionsOut[0] = Input.GetAxis("Horizontal");//【106手动操作智能体:添加】
        actionsOut[1] = Input.GetAxis("Vertical");//【106手动操作智能体:添加】
    }

}

快速入门Unity机器学习:三:快速入门Unity机器学习:三:

behaviors:
  RollerBall:
    trainer_type: ppo
    hyperparameters:
      batch_size: 64
      buffer_size: 12000
      learning_rate: 0.0003
      beta: 0.001
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 128
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    keep_checkpoints: 5
    max_steps: 300000
    time_horizon: 1000
    summary_freq: 1000
    threaded: true

基础环境修改

快速入门Unity机器学习:三:

 快速入门Unity机器学习:三:

 7. 课时 15 : 112 - 开始训练模型

输入以下语句,为了按照config.yaml进行训练

但是我的报错

activate unity_py_3.6_siki
D:
cd D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train
mlagents-learn config.yaml

快速入门Unity机器学习:三:快速入门Unity机器学习:三:

 7.1 解决报错:已解决

参考:

Unity ML-Agents遇到A compatible version of PyTorch was not install解决方法_YZW*威的博客-CSDN博客使用中报错:a compatible version of PyTorch was not install原因是没有安装torch,下面进行安装。打开 pytorch.org 官网,https://pytorch.org/get-started/locally/,安装如下:复制最下面的命令pip install torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio===0.7.0 -f https://download.pytorch.org/whhttps://blog.csdn.net/weixin_44813895/article/details/110312591

原因是没有安装torch,下面进行安装。

pip install torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

 快速入门Unity机器学习:三:

下载成功

快速入门Unity机器学习:三:

再次开始

mlagents-learn config.yaml

点击unity编辑器开始按钮就可以

快速入门Unity机器学习:三:

7.2 解决报错:成功

 无法正常运行

快速入门Unity机器学习:三:

mlagents.trainers.exception.UnityTrainerException: Previous data from this run ID was found. Either specify a new run ID, use --resume to resume this run, or use the --force parameter to overwrite existing data.

Unity报错如下 

快速入门Unity机器学习:三:

猜测可能是 tensorflow没有下载

输入以下命令行进行下载 

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorflow-gpu==2.2.0

快速入门Unity机器学习:三:

还是失败

快速入门Unity机器学习:三:

重新输入

mlagents-learn config.yaml --resume

快速入门Unity机器学习:三:

然后点击unity的play,如果有报错,就命令行在输入一次,Unity中再点击一次,多尝试几次,如下图,我就成功了


(base) C:\Users\Lenovo>activate unity_py_3.6_siki

(unity_py_3.6_siki) C:\Users\Lenovo>D:

(unity_py_3.6_siki) D:\>cd D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train

(unity_py_3.6_siki) D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train>mlagents-learn config.yaml

            ┐  ╖
        ╓╖╬│╡  ││╬╖╖
    ╓╖╬│││││┘  ╬│││││╬╖
 ╖╬│││││╬╜        ╙╬│││││╖╖                               ╗╗╗
 ╬╬╬╬╖││╦╖        ╖╬││╗╣╣╣╬      ╟╣╣╬    ╟╣╣╣             ╜╜╜  ╟╣╣
 ╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╒╣╣╖╗╣╣╣╗   ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖   ╣╣╣
 ╬╬╬╬┐  ╙╬╬╬╬│╓╣╣╣╝╜  ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣  ╣╣╣ ╙╟╣╣╜╙  ╫╣╣  ╟╣╣
 ╬╬╬╬┐     ╙╬╬╣╣      ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣     ╣╣╣┌╣╣╜
 ╬╬╬╜       ╬╬╣╣      ╙╝╣╣╬      ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣╦╓    ╣╣╣╣╣
 ╙   ╓╦╖    ╬╬╣╣   ╓╗╗╖            ╙╝╣╣╣╣╝╜   ╘╝╝╜   ╝╝╝  ╝╝╝   ╙╣╣╣    ╟╣╣╣
   ╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝                                             ╫╣╣╣╣
      ╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜
          ╙╬╬╬╣╣╣╜
             ╙

 Version information:
  ml-agents: 0.28.0,
  ml-agents-envs: 0.28.0,
  Communicator API: 1.5.0,
  PyTorch: 1.7.0+cpu
Traceback (most recent call last):
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 260, in main
    run_cli(parse_command_line())
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 256, in run_cli
    run_training(run_seed, options, num_areas)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 75, in run_training
    checkpoint_settings.maybe_init_path,
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\directory_utils.py", line 26, in validate_existing_directories
    "Previous data from this run ID was found. "
mlagents.trainers.exception.UnityTrainerException: Previous data from this run ID was found. Either specify a new run ID, use --resume to resume this run, or use the --force parameter to overwrite existing data.

(unity_py_3.6_siki) D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train>--resume
'--resume' 不是内部或外部命令,也不是可运行的程序
或批处理文件。

(unity_py_3.6_siki) D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train>mlagents-learn config.yaml -resume
usage: mlagents-learn.exe [-h] [--env ENV_PATH] [--resume] [--deterministic]
                          [--force] [--run-id RUN_ID]
                          [--initialize-from RUN_ID] [--seed SEED]
                          [--inference] [--base-port BASE_PORT]
                          [--num-envs NUM_ENVS] [--num-areas NUM_AREAS]
                          [--debug] [--env-args ...]
                          [--max-lifetime-restarts MAX_LIFETIME_RESTARTS]
                          [--restarts-rate-limit-n RESTARTS_RATE_LIMIT_N]
                          [--restarts-rate-limit-period-s RESTARTS_RATE_LIMIT_PERIOD_S]
                          [--torch] [--tensorflow] [--results-dir RESULTS_DIR]
                          [--width WIDTH] [--height HEIGHT]
                          [--quality-level QUALITY_LEVEL]
                          [--time-scale TIME_SCALE]
                          [--target-frame-rate TARGET_FRAME_RATE]
                          [--capture-frame-rate CAPTURE_FRAME_RATE]
                          [--no-graphics] [--torch-device DEVICE]
                          [trainer_config_path]
mlagents-learn.exe: error: unrecognized arguments: -resume

(unity_py_3.6_siki) D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train>mlagents-learn config.yaml --resume

            ┐  ╖
        ╓╖╬│╡  ││╬╖╖
    ╓╖╬│││││┘  ╬│││││╬╖
 ╖╬│││││╬╜        ╙╬│││││╖╖                               ╗╗╗
 ╬╬╬╬╖││╦╖        ╖╬││╗╣╣╣╬      ╟╣╣╬    ╟╣╣╣             ╜╜╜  ╟╣╣
 ╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╒╣╣╖╗╣╣╣╗   ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖   ╣╣╣
 ╬╬╬╬┐  ╙╬╬╬╬│╓╣╣╣╝╜  ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣  ╣╣╣ ╙╟╣╣╜╙  ╫╣╣  ╟╣╣
 ╬╬╬╬┐     ╙╬╬╣╣      ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣     ╣╣╣┌╣╣╜
 ╬╬╬╜       ╬╬╣╣      ╙╝╣╣╬      ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣╦╓    ╣╣╣╣╣
 ╙   ╓╦╖    ╬╬╣╣   ╓╗╗╖            ╙╝╣╣╣╣╝╜   ╘╝╝╜   ╝╝╝  ╝╝╝   ╙╣╣╣    ╟╣╣╣
   ╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝                                             ╫╣╣╣╣
      ╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜
          ╙╬╬╬╣╣╣╜
             ╙

 Version information:
  ml-agents: 0.28.0,
  ml-agents-envs: 0.28.0,
  Communicator API: 1.5.0,
  PyTorch: 1.7.0+cpu
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
Traceback (most recent call last):
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 260, in main
    run_cli(parse_command_line())
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 256, in run_cli
    run_training(run_seed, options, num_areas)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 132, in run_training
    tc.start_learning(env_manager)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\trainer_controller.py", line 173, in start_learning
    self._reset_env(env_manager)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\trainer_controller.py", line 105, in _reset_env
    env_manager.reset(config=new_config)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\env_manager.py", line 68, in reset
    self.first_step_infos = self._reset_env(config)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 446, in _reset_env
    ew.previous_step = EnvironmentStep(ew.recv().payload, ew.worker_id, {}, {})
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 101, in recv
    raise env_exception
mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
         The environment does not need user interaction to launch
         The Agents' Behavior Parameters > Behavior Type is set to "Default"
         The environment and the Python interface have compatible versions.
         If you're running on a headless server without graphics support, turn off display by either passing --no-graphics option or build your Unity executable as server build.

(unity_py_3.6_siki) D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train>mlagents-learn config.yaml --resume

            ┐  ╖
        ╓╖╬│╡  ││╬╖╖
    ╓╖╬│││││┘  ╬│││││╬╖
 ╖╬│││││╬╜        ╙╬│││││╖╖                               ╗╗╗
 ╬╬╬╬╖││╦╖        ╖╬││╗╣╣╣╬      ╟╣╣╬    ╟╣╣╣             ╜╜╜  ╟╣╣
 ╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╒╣╣╖╗╣╣╣╗   ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖   ╣╣╣
 ╬╬╬╬┐  ╙╬╬╬╬│╓╣╣╣╝╜  ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣  ╣╣╣ ╙╟╣╣╜╙  ╫╣╣  ╟╣╣
 ╬╬╬╬┐     ╙╬╬╣╣      ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣     ╣╣╣┌╣╣╜
 ╬╬╬╜       ╬╬╣╣      ╙╝╣╣╬      ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣╦╓    ╣╣╣╣╣
 ╙   ╓╦╖    ╬╬╣╣   ╓╗╗╖            ╙╝╣╣╣╣╝╜   ╘╝╝╜   ╝╝╝  ╝╝╝   ╙╣╣╣    ╟╣╣╣
   ╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝                                             ╫╣╣╣╣
      ╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜
          ╙╬╬╬╣╣╣╜
             ╙

 Version information:
  ml-agents: 0.28.0,
  ml-agents-envs: 0.28.0,
  Communicator API: 1.5.0,
  PyTorch: 1.7.0+cpu
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
[INFO] Connected to Unity environment with package version 1.2.0-preview and communication version 1.0.0
[INFO] Connected new brain: RollerBall?team=0
2022-05-16 17:10:06.687745: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2022-05-16 17:10:06.688146: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
[INFO] Hyperparameters for behavior name RollerBall:
        trainer_type:   ppo
        hyperparameters:
          batch_size:   64
          buffer_size:  12000
          learning_rate:        0.0003
          beta: 0.001
          epsilon:      0.2
          lambd:        0.99
          num_epoch:    3
          learning_rate_schedule:       linear
          beta_schedule:        linear
          epsilon_schedule:     linear
        network_settings:
          normalize:    True
          hidden_units: 128
          num_layers:   2
          vis_encode_type:      simple
          memory:       None
          goal_conditioning_type:       hyper
          deterministic:        False
        reward_signals:
          extrinsic:
            gamma:      0.99
            strength:   1.0
            network_settings:
              normalize:        False
              hidden_units:     128
              num_layers:       2
              vis_encode_type:  simple
              memory:   None
              goal_conditioning_type:   hyper
              deterministic:    False
        init_path:      None
        keep_checkpoints:       5
        checkpoint_interval:    500000
        max_steps:      300000
        time_horizon:   1000
        summary_freq:   1000
        threaded:       True
        self_play:      None
        behavioral_cloning:     None
[INFO] Resuming from results\ppo\RollerBall.
[INFO] Exported results\ppo\RollerBall\RollerBall-0.onnx
[INFO] Copied results\ppo\RollerBall\RollerBall-0.onnx to results\ppo\RollerBall.onnx.
Traceback (most recent call last):
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 260, in main
    run_cli(parse_command_line())
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 256, in run_cli
    run_training(run_seed, options, num_areas)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 132, in run_training
    tc.start_learning(env_manager)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\trainer_controller.py", line 173, in start_learning
    self._reset_env(env_manager)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\trainer_controller.py", line 107, in _reset_env
    self._register_new_behaviors(env_manager, env_manager.first_step_infos)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\trainer_controller.py", line 268, in _register_new_behaviors
    self._create_trainers_and_managers(env_manager, new_behavior_ids)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\trainer_controller.py", line 166, in _create_trainers_and_managers
    self._create_trainer_and_manager(env_manager, behavior_id)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\trainer_controller.py", line 142, in _create_trainer_and_manager
    trainer.add_policy(parsed_behavior_id, policy)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\ppo\trainer.py", line 265, in add_policy
    self.model_saver.initialize_or_load()
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\model_saver\torch_model_saver.py", line 82, in initialize_or_load
    reset_global_steps=reset_steps,
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\model_saver\torch_model_saver.py", line 91, in _load_model
    saved_state_dict = torch.load(load_path)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\torch\serialization.py", line 581, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\torch\serialization.py", line 230, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\torch\serialization.py", line 211, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'results\\ppo\\RollerBall\\checkpoint.pt'

(unity_py_3.6_siki) D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train>mlagents-learn config.yaml --resume

            ┐  ╖
        ╓╖╬│╡  ││╬╖╖
    ╓╖╬│││││┘  ╬│││││╬╖
 ╖╬│││││╬╜        ╙╬│││││╖╖                               ╗╗╗
 ╬╬╬╬╖││╦╖        ╖╬││╗╣╣╣╬      ╟╣╣╬    ╟╣╣╣             ╜╜╜  ╟╣╣
 ╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╒╣╣╖╗╣╣╣╗   ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖   ╣╣╣
 ╬╬╬╬┐  ╙╬╬╬╬│╓╣╣╣╝╜  ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣  ╣╣╣ ╙╟╣╣╜╙  ╫╣╣  ╟╣╣
 ╬╬╬╬┐     ╙╬╬╣╣      ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣     ╣╣╣┌╣╣╜
 ╬╬╬╜       ╬╬╣╣      ╙╝╣╣╬      ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣╦╓    ╣╣╣╣╣
 ╙   ╓╦╖    ╬╬╣╣   ╓╗╗╖            ╙╝╣╣╣╣╝╜   ╘╝╝╜   ╝╝╝  ╝╝╝   ╙╣╣╣    ╟╣╣╣
   ╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝                                             ╫╣╣╣╣
      ╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜
          ╙╬╬╬╣╣╣╜
             ╙

 Version information:
  ml-agents: 0.28.0,
  ml-agents-envs: 0.28.0,
  Communicator API: 1.5.0,
  PyTorch: 1.7.0+cpu
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
[INFO] Connected to Unity environment with package version 1.2.0-preview and communication version 1.0.0
[INFO] Connected new brain: RollerBall?team=0
2022-05-16 17:11:01.472214: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2022-05-16 17:11:01.472336: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
[INFO] Hyperparameters for behavior name RollerBall:
        trainer_type:   ppo
        hyperparameters:
          batch_size:   64
          buffer_size:  12000
          learning_rate:        0.0003
          beta: 0.001
          epsilon:      0.2
          lambd:        0.99
          num_epoch:    3
          learning_rate_schedule:       linear
          beta_schedule:        linear
          epsilon_schedule:     linear
        network_settings:
          normalize:    True
          hidden_units: 128
          num_layers:   2
          vis_encode_type:      simple
          memory:       None
          goal_conditioning_type:       hyper
          deterministic:        False
        reward_signals:
          extrinsic:
            gamma:      0.99
            strength:   1.0
            network_settings:
              normalize:        False
              hidden_units:     128
              num_layers:       2
              vis_encode_type:  simple
              memory:   None
              goal_conditioning_type:   hyper
              deterministic:    False
        init_path:      None
        keep_checkpoints:       5
        checkpoint_interval:    500000
        max_steps:      300000
        time_horizon:   1000
        summary_freq:   1000
        threaded:       True
        self_play:      None
        behavioral_cloning:     None
[INFO] Resuming from results\ppo\RollerBall.
[INFO] Resuming training from step 0.
[INFO] RollerBall. Step: 1000. Time Elapsed: 14.898 s. Mean Reward: 0.317. Std of Reward: 0.465. Training.
[INFO] RollerBall. Step: 2000. Time Elapsed: 21.509 s. Mean Reward: 0.128. Std of Reward: 0.334. Training.
[INFO] RollerBall. Step: 3000. Time Elapsed: 28.361 s. Mean Reward: 0.083. Std of Reward: 0.276. Training.
[INFO] RollerBall. Step: 4000. Time Elapsed: 35.126 s. Mean Reward: 0.189. Std of Reward: 0.392. Training.
[INFO] RollerBall. Step: 5000. Time Elapsed: 41.974 s. Mean Reward: 0.081. Std of Reward: 0.273. Training.
[INFO] RollerBall. Step: 6000. Time Elapsed: 48.794 s. Mean Reward: 0.114. Std of Reward: 0.318. Training.
[INFO] RollerBall. Step: 7000. Time Elapsed: 55.667 s. Mean Reward: 0.125. Std of Reward: 0.331. Training.
[INFO] RollerBall. Step: 8000. Time Elapsed: 62.408 s. Mean Reward: 0.114. Std of Reward: 0.318. Training.
[INFO] RollerBall. Step: 9000. Time Elapsed: 69.597 s. Mean Reward: 0.105. Std of Reward: 0.307. Training.
[INFO] RollerBall. Step: 10000. Time Elapsed: 76.625 s. Mean Reward: 0.118. Std of Reward: 0.322. Training.
[INFO] RollerBall. Step: 11000. Time Elapsed: 83.861 s. Mean Reward: 0.162. Std of Reward: 0.369. Training.
[INFO] RollerBall. Step: 12000. Time Elapsed: 91.244 s. Mean Reward: 0.233. Std of Reward: 0.422. Training.
[INFO] RollerBall. Step: 13000. Time Elapsed: 101.414 s. Mean Reward: 0.184. Std of Reward: 0.388. Training.
[INFO] RollerBall. Step: 14000. Time Elapsed: 108.521 s. Mean Reward: 0.220. Std of Reward: 0.414. Training.
[INFO] RollerBall. Step: 15000. Time Elapsed: 115.816 s. Mean Reward: 0.103. Std of Reward: 0.303. Training.
[INFO] RollerBall. Step: 16000. Time Elapsed: 123.151 s. Mean Reward: 0.214. Std of Reward: 0.410. Training.
[INFO] RollerBall. Step: 17000. Time Elapsed: 130.571 s. Mean Reward: 0.239. Std of Reward: 0.427. Training.
[INFO] RollerBall. Step: 18000. Time Elapsed: 137.849 s. Mean Reward: 0.200. Std of Reward: 0.400. Training.
[INFO] RollerBall. Step: 19000. Time Elapsed: 145.127 s. Mean Reward: 0.256. Std of Reward: 0.436. Training.
[INFO] RollerBall. Step: 20000. Time Elapsed: 152.521 s. Mean Reward: 0.300. Std of Reward: 0.458. Training.
[INFO] RollerBall. Step: 21000. Time Elapsed: 159.957 s. Mean Reward: 0.256. Std of Reward: 0.436. Training.
[INFO] RollerBall. Step: 22000. Time Elapsed: 167.344 s. Mean Reward: 0.200. Std of Reward: 0.400. Training.
[INFO] RollerBall. Step: 23000. Time Elapsed: 174.863 s. Mean Reward: 0.154. Std of Reward: 0.361. Training.
[INFO] RollerBall. Step: 24000. Time Elapsed: 182.129 s. Mean Reward: 0.244. Std of Reward: 0.430. Training.
[INFO] RollerBall. Step: 25000. Time Elapsed: 192.057 s. Mean Reward: 0.190. Std of Reward: 0.393. Training.
[INFO] RollerBall. Step: 26000. Time Elapsed: 199.477 s. Mean Reward: 0.304. Std of Reward: 0.460. Training.
[INFO] RollerBall. Step: 27000. Time Elapsed: 206.889 s. Mean Reward: 0.227. Std of Reward: 0.419. Training.
[INFO] RollerBall. Step: 28000. Time Elapsed: 214.260 s. Mean Reward: 0.209. Std of Reward: 0.407. Training.
[INFO] RollerBall. Step: 29000. Time Elapsed: 221.703 s. Mean Reward: 0.353. Std of Reward: 0.478. Training.
[INFO] RollerBall. Step: 30000. Time Elapsed: 229.239 s. Mean Reward: 0.358. Std of Reward: 0.480. Training.
[INFO] RollerBall. Step: 31000. Time Elapsed: 236.568 s. Mean Reward: 0.289. Std of Reward: 0.454. Training.
[INFO] RollerBall. Step: 32000. Time Elapsed: 243.837 s. Mean Reward: 0.356. Std of Reward: 0.479. Training.
[INFO] RollerBall. Step: 33000. Time Elapsed: 251.407 s. Mean Reward: 0.302. Std of Reward: 0.459. Training.
[INFO] RollerBall. Step: 34000. Time Elapsed: 258.593 s. Mean Reward: 0.179. Std of Reward: 0.384. Training.
[INFO] RollerBall. Step: 35000. Time Elapsed: 266.037 s. Mean Reward: 0.377. Std of Reward: 0.485. Training.
[INFO] RollerBall. Step: 36000. Time Elapsed: 273.382 s. Mean Reward: 0.304. Std of Reward: 0.460. Training.
[INFO] RollerBall. Step: 37000. Time Elapsed: 283.443 s. Mean Reward: 0.333. Std of Reward: 0.471. Training.
[INFO] RollerBall. Step: 38000. Time Elapsed: 290.705 s. Mean Reward: 0.317. Std of Reward: 0.465. Training.
[INFO] RollerBall. Step: 39000. Time Elapsed: 298.170 s. Mean Reward: 0.474. Std of Reward: 0.499. Training.
[INFO] RollerBall. Step: 40000. Time Elapsed: 305.522 s. Mean Reward: 0.300. Std of Reward: 0.458. Training.
[INFO] RollerBall. Step: 41000. Time Elapsed: 313.041 s. Mean Reward: 0.360. Std of Reward: 0.480. Training.
[INFO] RollerBall. Step: 42000. Time Elapsed: 320.362 s. Mean Reward: 0.396. Std of Reward: 0.489. Training.
[INFO] RollerBall. Step: 43000. Time Elapsed: 327.828 s. Mean Reward: 0.364. Std of Reward: 0.481. Training.
[INFO] RollerBall. Step: 44000. Time Elapsed: 335.334 s. Mean Reward: 0.429. Std of Reward: 0.495. Training.
[INFO] RollerBall. Step: 45000. Time Elapsed: 342.619 s. Mean Reward: 0.346. Std of Reward: 0.476. Training.
[INFO] RollerBall. Step: 46000. Time Elapsed: 350.089 s. Mean Reward: 0.388. Std of Reward: 0.487. Training.
[INFO] RollerBall. Step: 47000. Time Elapsed: 357.409 s. Mean Reward: 0.380. Std of Reward: 0.485. Training.
[INFO] RollerBall. Step: 48000. Time Elapsed: 364.820 s. Mean Reward: 0.408. Std of Reward: 0.491. Training.
[INFO] RollerBall. Step: 49000. Time Elapsed: 375.179 s. Mean Reward: 0.383. Std of Reward: 0.486. Training.
[INFO] RollerBall. Step: 50000. Time Elapsed: 382.544 s. Mean Reward: 0.500. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 51000. Time Elapsed: 390.747 s. Mean Reward: 0.423. Std of Reward: 0.494. Training.
[INFO] RollerBall. Step: 52000. Time Elapsed: 398.292 s. Mean Reward: 0.517. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 53000. Time Elapsed: 405.611 s. Mean Reward: 0.362. Std of Reward: 0.480. Training.
[INFO] RollerBall. Step: 54000. Time Elapsed: 412.743 s. Mean Reward: 0.417. Std of Reward: 0.493. Training.
[INFO] RollerBall. Step: 55000. Time Elapsed: 420.042 s. Mean Reward: 0.418. Std of Reward: 0.493. Training.
[INFO] RollerBall. Step: 56000. Time Elapsed: 426.894 s. Mean Reward: 0.465. Std of Reward: 0.499. Training.
[INFO] RollerBall. Step: 57000. Time Elapsed: 433.696 s. Mean Reward: 0.463. Std of Reward: 0.499. Training.
[INFO] RollerBall. Step: 58000. Time Elapsed: 440.624 s. Mean Reward: 0.453. Std of Reward: 0.498. Training.
[INFO] RollerBall. Step: 59000. Time Elapsed: 447.569 s. Mean Reward: 0.547. Std of Reward: 0.498. Training.
[INFO] RollerBall. Step: 60000. Time Elapsed: 454.539 s. Mean Reward: 0.480. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 61000. Time Elapsed: 463.862 s. Mean Reward: 0.465. Std of Reward: 0.499. Training.
[INFO] RollerBall. Step: 62000. Time Elapsed: 470.829 s. Mean Reward: 0.511. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 63000. Time Elapsed: 478.034 s. Mean Reward: 0.476. Std of Reward: 0.499. Training.
[INFO] RollerBall. Step: 64000. Time Elapsed: 485.041 s. Mean Reward: 0.415. Std of Reward: 0.493. Training.
[INFO] RollerBall. Step: 65000. Time Elapsed: 491.911 s. Mean Reward: 0.489. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 66000. Time Elapsed: 498.539 s. Mean Reward: 0.500. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 67000. Time Elapsed: 505.428 s. Mean Reward: 0.583. Std of Reward: 0.493. Training.
[INFO] RollerBall. Step: 68000. Time Elapsed: 512.027 s. Mean Reward: 0.522. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 69000. Time Elapsed: 518.833 s. Mean Reward: 0.596. Std of Reward: 0.491. Training.
[INFO] RollerBall. Step: 70000. Time Elapsed: 525.841 s. Mean Reward: 0.522. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 71000. Time Elapsed: 532.647 s. Mean Reward: 0.415. Std of Reward: 0.493. Training.
[INFO] RollerBall. Step: 72000. Time Elapsed: 539.779 s. Mean Reward: 0.591. Std of Reward: 0.492. Training.
[INFO] RollerBall. Step: 73000. Time Elapsed: 549.495 s. Mean Reward: 0.532. Std of Reward: 0.499. Training.
[INFO] RollerBall. Step: 74000. Time Elapsed: 556.365 s. Mean Reward: 0.636. Std of Reward: 0.481. Training.
[INFO] RollerBall. Step: 75000. Time Elapsed: 563.009 s. Mean Reward: 0.605. Std of Reward: 0.489. Training.
[INFO] RollerBall. Step: 76000. Time Elapsed: 569.911 s. Mean Reward: 0.564. Std of Reward: 0.496. Training.
[INFO] RollerBall. Step: 77000. Time Elapsed: 576.629 s. Mean Reward: 0.478. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 78000. Time Elapsed: 583.416 s. Mean Reward: 0.636. Std of Reward: 0.481. Training.
[INFO] RollerBall. Step: 79000. Time Elapsed: 589.926 s. Mean Reward: 0.591. Std of Reward: 0.492. Training.
[INFO] RollerBall. Step: 80000. Time Elapsed: 596.663 s. Mean Reward: 0.638. Std of Reward: 0.480. Training.
[INFO] RollerBall. Step: 81000. Time Elapsed: 603.482 s. Mean Reward: 0.588. Std of Reward: 0.492. Training.
[INFO] RollerBall. Step: 82000. Time Elapsed: 610.259 s. Mean Reward: 0.571. Std of Reward: 0.495. Training.
[INFO] RollerBall. Step: 83000. Time Elapsed: 617.153 s. Mean Reward: 0.575. Std of Reward: 0.494. Training.
[INFO] RollerBall. Step: 84000. Time Elapsed: 623.906 s. Mean Reward: 0.638. Std of Reward: 0.480. Training.
[INFO] RollerBall. Step: 85000. Time Elapsed: 632.944 s. Mean Reward: 0.698. Std of Reward: 0.459. Training.
[INFO] RollerBall. Step: 86000. Time Elapsed: 639.629 s. Mean Reward: 0.675. Std of Reward: 0.468. Training.
[INFO] RollerBall. Step: 87000. Time Elapsed: 646.264 s. Mean Reward: 0.588. Std of Reward: 0.492. Training.
[INFO] RollerBall. Step: 88000. Time Elapsed: 653.018 s. Mean Reward: 0.725. Std of Reward: 0.447. Training.
[INFO] RollerBall. Step: 89000. Time Elapsed: 660.053 s. Mean Reward: 0.795. Std of Reward: 0.403. Training.
[INFO] RollerBall. Step: 90000. Time Elapsed: 666.831 s. Mean Reward: 0.750. Std of Reward: 0.433. Training.
[INFO] RollerBall. Step: 91000. Time Elapsed: 673.867 s. Mean Reward: 0.756. Std of Reward: 0.429. Training.
[INFO] RollerBall. Step: 92000. Time Elapsed: 681.346 s. Mean Reward: 0.667. Std of Reward: 0.471. Training.
[INFO] RollerBall. Step: 93000. Time Elapsed: 688.432 s. Mean Reward: 0.830. Std of Reward: 0.375. Training.
[INFO] RollerBall. Step: 94000. Time Elapsed: 695.400 s. Mean Reward: 0.686. Std of Reward: 0.464. Training.
[INFO] RollerBall. Step: 95000. Time Elapsed: 702.263 s. Mean Reward: 0.721. Std of Reward: 0.449. Training.
[INFO] RollerBall. Step: 96000. Time Elapsed: 709.423 s. Mean Reward: 0.800. Std of Reward: 0.400. Training.
[INFO] RollerBall. Step: 97000. Time Elapsed: 718.726 s. Mean Reward: 0.880. Std of Reward: 0.325. Training.
[INFO] RollerBall. Step: 98000. Time Elapsed: 725.571 s. Mean Reward: 0.865. Std of Reward: 0.341. Training.
[INFO] RollerBall. Step: 99000. Time Elapsed: 732.557 s. Mean Reward: 0.882. Std of Reward: 0.322. Training.
[INFO] RollerBall. Step: 100000. Time Elapsed: 739.284 s. Mean Reward: 0.827. Std of Reward: 0.378. Training.
[INFO] RollerBall. Step: 101000. Time Elapsed: 745.954 s. Mean Reward: 0.854. Std of Reward: 0.353. Training.
[INFO] RollerBall. Step: 102000. Time Elapsed: 753.131 s. Mean Reward: 0.870. Std of Reward: 0.336. Training.
[INFO] RollerBall. Step: 103000. Time Elapsed: 759.850 s. Mean Reward: 0.900. Std of Reward: 0.300. Training.
[INFO] RollerBall. Step: 104000. Time Elapsed: 766.645 s. Mean Reward: 0.812. Std of Reward: 0.390. Training.
[INFO] RollerBall. Step: 105000. Time Elapsed: 773.365 s. Mean Reward: 0.820. Std of Reward: 0.384. Training.
[INFO] RollerBall. Step: 106000. Time Elapsed: 780.067 s. Mean Reward: 0.851. Std of Reward: 0.356. Training.
[INFO] RollerBall. Step: 107000. Time Elapsed: 787.011 s. Mean Reward: 0.804. Std of Reward: 0.397. Training.
[INFO] RollerBall. Step: 108000. Time Elapsed: 793.614 s. Mean Reward: 0.902. Std of Reward: 0.297. Training.
[INFO] RollerBall. Step: 109000. Time Elapsed: 803.009 s. Mean Reward: 0.906. Std of Reward: 0.292. Training.
[INFO] RollerBall. Step: 110000. Time Elapsed: 809.970 s. Mean Reward: 0.860. Std of Reward: 0.347. Training.
[INFO] RollerBall. Step: 111000. Time Elapsed: 816.523 s. Mean Reward: 0.833. Std of Reward: 0.373. Training.
[INFO] RollerBall. Step: 112000. Time Elapsed: 823.426 s. Mean Reward: 0.906. Std of Reward: 0.292. Training.
[INFO] RollerBall. Step: 113000. Time Elapsed: 830.236 s. Mean Reward: 0.948. Std of Reward: 0.221. Training.
[INFO] RollerBall. Step: 114000. Time Elapsed: 836.938 s. Mean Reward: 0.865. Std of Reward: 0.341. Training.
[INFO] RollerBall. Step: 115000. Time Elapsed: 843.833 s. Mean Reward: 0.925. Std of Reward: 0.264. Training.
[INFO] RollerBall. Step: 116000. Time Elapsed: 850.819 s. Mean Reward: 0.898. Std of Reward: 0.302. Training.
[INFO] RollerBall. Step: 117000. Time Elapsed: 857.805 s. Mean Reward: 0.942. Std of Reward: 0.233. Training.
[INFO] RollerBall. Step: 118000. Time Elapsed: 865.292 s. Mean Reward: 0.966. Std of Reward: 0.181. Training.
[INFO] RollerBall. Step: 119000. Time Elapsed: 872.311 s. Mean Reward: 0.922. Std of Reward: 0.269. Training.
[INFO] RollerBall. Step: 120000. Time Elapsed: 879.230 s. Mean Reward: 0.837. Std of Reward: 0.370. Training.
[INFO] RollerBall. Step: 121000. Time Elapsed: 888.609 s. Mean Reward: 0.940. Std of Reward: 0.237. Training.
[INFO] RollerBall. Step: 122000. Time Elapsed: 895.353 s. Mean Reward: 0.967. Std of Reward: 0.180. Training.
[INFO] RollerBall. Step: 123000. Time Elapsed: 902.055 s. Mean Reward: 0.906. Std of Reward: 0.292. Training.
[INFO] RollerBall. Step: 124000. Time Elapsed: 908.933 s. Mean Reward: 0.921. Std of Reward: 0.270. Training.
[INFO] RollerBall. Step: 125000. Time Elapsed: 915.936 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 126000. Time Elapsed: 922.447 s. Mean Reward: 0.951. Std of Reward: 0.216. Training.
[INFO] RollerBall. Step: 127000. Time Elapsed: 929.275 s. Mean Reward: 0.985. Std of Reward: 0.121. Training.
[INFO] RollerBall. Step: 128000. Time Elapsed: 936.035 s. Mean Reward: 0.969. Std of Reward: 0.174. Training.
[INFO] RollerBall. Step: 129000. Time Elapsed: 942.664 s. Mean Reward: 0.944. Std of Reward: 0.229. Training.
[INFO] RollerBall. Step: 130000. Time Elapsed: 949.341 s. Mean Reward: 0.962. Std of Reward: 0.191. Training.
[INFO] RollerBall. Step: 131000. Time Elapsed: 956.469 s. Mean Reward: 0.925. Std of Reward: 0.264. Training.
[INFO] RollerBall. Step: 132000. Time Elapsed: 962.963 s. Mean Reward: 0.926. Std of Reward: 0.262. Training.
[INFO] RollerBall. Step: 133000. Time Elapsed: 972.607 s. Mean Reward: 0.952. Std of Reward: 0.213. Training.
[INFO] RollerBall. Step: 134000. Time Elapsed: 979.378 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 135000. Time Elapsed: 986.197 s. Mean Reward: 0.957. Std of Reward: 0.203. Training.
[INFO] RollerBall. Step: 136000. Time Elapsed: 992.932 s. Mean Reward: 0.952. Std of Reward: 0.213. Training.
[INFO] RollerBall. Step: 137000. Time Elapsed: 999.726 s. Mean Reward: 0.984. Std of Reward: 0.127. Training.
[INFO] RollerBall. Step: 138000. Time Elapsed: 1006.405 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 139000. Time Elapsed: 1013.174 s. Mean Reward: 0.985. Std of Reward: 0.121. Training.
[INFO] RollerBall. Step: 140000. Time Elapsed: 1019.885 s. Mean Reward: 0.972. Std of Reward: 0.164. Training.
[INFO] RollerBall. Step: 141000. Time Elapsed: 1026.696 s. Mean Reward: 0.983. Std of Reward: 0.128. Training.
[INFO] RollerBall. Step: 142000. Time Elapsed: 1033.564 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 143000. Time Elapsed: 1040.442 s. Mean Reward: 0.971. Std of Reward: 0.167. Training.
[INFO] RollerBall. Step: 144000. Time Elapsed: 1047.188 s. Mean Reward: 0.986. Std of Reward: 0.119. Training.
[INFO] RollerBall. Step: 145000. Time Elapsed: 1056.882 s. Mean Reward: 0.987. Std of Reward: 0.115. Training.
[INFO] RollerBall. Step: 146000. Time Elapsed: 1063.685 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 147000. Time Elapsed: 1070.829 s. Mean Reward: 0.986. Std of Reward: 0.119. Training.
[INFO] RollerBall. Step: 148000. Time Elapsed: 1077.540 s. Mean Reward: 0.973. Std of Reward: 0.163. Training.
[INFO] RollerBall. Step: 149000. Time Elapsed: 1084.568 s. Mean Reward: 0.987. Std of Reward: 0.113. Training.
[INFO] RollerBall. Step: 150000. Time Elapsed: 1091.580 s. Mean Reward: 0.987. Std of Reward: 0.115. Training.
[INFO] RollerBall. Step: 151000. Time Elapsed: 1098.682 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 152000. Time Elapsed: 1105.768 s. Mean Reward: 0.975. Std of Reward: 0.156. Training.
[INFO] RollerBall. Step: 153000. Time Elapsed: 1112.896 s. Mean Reward: 0.985. Std of Reward: 0.120. Training.
[INFO] RollerBall. Step: 154000. Time Elapsed: 1119.381 s. Mean Reward: 0.955. Std of Reward: 0.208. Training.
[INFO] RollerBall. Step: 155000. Time Elapsed: 1126.167 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 156000. Time Elapsed: 1133.020 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 157000. Time Elapsed: 1142.865 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 158000. Time Elapsed: 1149.926 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 159000. Time Elapsed: 1157.413 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 160000. Time Elapsed: 1164.465 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 161000. Time Elapsed: 1171.384 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 162000. Time Elapsed: 1178.337 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 163000. Time Elapsed: 1185.140 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 164000. Time Elapsed: 1192.192 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 165000. Time Elapsed: 1199.004 s. Mean Reward: 0.987. Std of Reward: 0.113. Training.
[INFO] RollerBall. Step: 166000. Time Elapsed: 1205.923 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 167000. Time Elapsed: 1213.083 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 168000. Time Elapsed: 1220.061 s. Mean Reward: 0.963. Std of Reward: 0.188. Training.
[INFO] RollerBall. Step: 169000. Time Elapsed: 1229.890 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 170000. Time Elapsed: 1236.702 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 171000. Time Elapsed: 1243.546 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 172000. Time Elapsed: 1250.348 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 173000. Time Elapsed: 1257.201 s. Mean Reward: 0.989. Std of Reward: 0.107. Training.
[INFO] RollerBall. Step: 174000. Time Elapsed: 1264.062 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 175000. Time Elapsed: 1270.864 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 176000. Time Elapsed: 1277.759 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 177000. Time Elapsed: 1284.854 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 178000. Time Elapsed: 1291.846 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 179000. Time Elapsed: 1298.717 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 180000. Time Elapsed: 1305.445 s. Mean Reward: 0.987. Std of Reward: 0.115. Training.
[INFO] RollerBall. Step: 181000. Time Elapsed: 1315.365 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 182000. Time Elapsed: 1322.126 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 183000. Time Elapsed: 1328.928 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 184000. Time Elapsed: 1335.715 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 185000. Time Elapsed: 1342.726 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 186000. Time Elapsed: 1349.645 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 187000. Time Elapsed: 1356.639 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 188000. Time Elapsed: 1363.759 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 189000. Time Elapsed: 1370.695 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 190000. Time Elapsed: 1377.922 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 191000. Time Elapsed: 1384.934 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 192000. Time Elapsed: 1391.961 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 193000. Time Elapsed: 1402.132 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 194000. Time Elapsed: 1409.085 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 195000. Time Elapsed: 1415.912 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 196000. Time Elapsed: 1422.706 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 197000. Time Elapsed: 1429.484 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 198000. Time Elapsed: 1436.254 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 199000. Time Elapsed: 1442.973 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 200000. Time Elapsed: 1449.734 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 201000. Time Elapsed: 1456.486 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 202000. Time Elapsed: 1463.264 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 203000. Time Elapsed: 1470.408 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 204000. Time Elapsed: 1477.545 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 205000. Time Elapsed: 1488.081 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 206000. Time Elapsed: 1495.135 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 207000. Time Elapsed: 1502.245 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 208000. Time Elapsed: 1509.239 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 209000. Time Elapsed: 1516.758 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 210000. Time Elapsed: 1523.670 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 211000. Time Elapsed: 1530.765 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 212000. Time Elapsed: 1537.816 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 213000. Time Elapsed: 1544.803 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 214000. Time Elapsed: 1551.623 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 215000. Time Elapsed: 1558.425 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 216000. Time Elapsed: 1565.344 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 217000. Time Elapsed: 1575.214 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 218000. Time Elapsed: 1582.042 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 219000. Time Elapsed: 1588.804 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 220000. Time Elapsed: 1595.573 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 221000. Time Elapsed: 1602.292 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 222000. Time Elapsed: 1609.019 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 223000. Time Elapsed: 1615.688 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 224000. Time Elapsed: 1622.424 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 225000. Time Elapsed: 1629.219 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 226000. Time Elapsed: 1635.938 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 227000. Time Elapsed: 1642.774 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 228000. Time Elapsed: 1649.652 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 229000. Time Elapsed: 1659.548 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 230000. Time Elapsed: 1666.334 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 231000. Time Elapsed: 1673.136 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 232000. Time Elapsed: 1679.905 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 233000. Time Elapsed: 1686.683 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 234000. Time Elapsed: 1693.385 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 235000. Time Elapsed: 1700.121 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 236000. Time Elapsed: 1706.840 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 237000. Time Elapsed: 1713.752 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 238000. Time Elapsed: 1720.504 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 239000. Time Elapsed: 1727.682 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 240000. Time Elapsed: 1734.426 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 241000. Time Elapsed: 1744.295 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 242000. Time Elapsed: 1751.024 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 243000. Time Elapsed: 1757.835 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 244000. Time Elapsed: 1764.678 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 245000. Time Elapsed: 1771.465 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 246000. Time Elapsed: 1778.285 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 247000. Time Elapsed: 1785.037 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 248000. Time Elapsed: 1791.840 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 249000. Time Elapsed: 1798.643 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 250000. Time Elapsed: 1805.479 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 251000. Time Elapsed: 1812.331 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 252000. Time Elapsed: 1819.059 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 253000. Time Elapsed: 1829.052 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 254000. Time Elapsed: 1835.874 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 255000. Time Elapsed: 1842.718 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 256000. Time Elapsed: 1849.537 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 257000. Time Elapsed: 1856.315 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 258000. Time Elapsed: 1863.017 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 259000. Time Elapsed: 1869.770 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 260000. Time Elapsed: 1876.573 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 261000. Time Elapsed: 1883.308 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 262000. Time Elapsed: 1890.127 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 263000. Time Elapsed: 1896.914 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 264000. Time Elapsed: 1903.724 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 265000. Time Elapsed: 1913.611 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 266000. Time Elapsed: 1920.339 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 267000. Time Elapsed: 1927.133 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 268000. Time Elapsed: 1933.828 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 269000. Time Elapsed: 1940.555 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 270000. Time Elapsed: 1947.273 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 271000. Time Elapsed: 1953.911 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 272000. Time Elapsed: 1960.679 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 273000. Time Elapsed: 1967.425 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 274000. Time Elapsed: 1974.194 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 275000. Time Elapsed: 1980.929 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 276000. Time Elapsed: 1987.641 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 277000. Time Elapsed: 1997.551 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 278000. Time Elapsed: 2004.372 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 279000. Time Elapsed: 2011.273 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 280000. Time Elapsed: 2018.135 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 281000. Time Elapsed: 2024.980 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 282000. Time Elapsed: 2031.774 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 283000. Time Elapsed: 2038.426 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 284000. Time Elapsed: 2045.179 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 285000. Time Elapsed: 2051.956 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 286000. Time Elapsed: 2058.633 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 287000. Time Elapsed: 2065.396 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 288000. Time Elapsed: 2072.172 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 289000. Time Elapsed: 2081.992 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 290000. Time Elapsed: 2088.812 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 291000. Time Elapsed: 2095.606 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 292000. Time Elapsed: 2102.384 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 293000. Time Elapsed: 2109.187 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 294000. Time Elapsed: 2115.998 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 295000. Time Elapsed: 2122.851 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 296000. Time Elapsed: 2129.737 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 297000. Time Elapsed: 2136.564 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 298000. Time Elapsed: 2143.332 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 299000. Time Elapsed: 2150.127 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 300000. Time Elapsed: 2156.922 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] Exported results\ppo\RollerBall\RollerBall-300005.onnx
[INFO] Copied results\ppo\RollerBall\RollerBall-300005.onnx to results\ppo\RollerBall.onnx.

(unity_py_3.6_siki) D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train>

 一开始也出现了错误,发现再尝试一下,就成功了快速入门Unity机器学习:三:

快速入门Unity机器学习:三:

 最终实现学习了30万次快速入门Unity机器学习:三:

8. 课时 16 : 113 - 完成模型的训练:

模型放入后,运行发现一直报错 

快速入门Unity机器学习:三:

检查发现我的训练文件位置和名称如下

快速入门Unity机器学习:三:

将文件改名字为 RollerBall ,失败

将文件改名字后放到 之前RollerBall位置,还是失败

最终发现我没有成功生成 .nn文件,将老师的原工程里面的.nn文件导入后发现是成功了

快速入门Unity机器学习:三:

 文章来源地址https://www.toymoban.com/news/detail-446314.html

快速入门Unity机器学习:三:

 

 

到了这里,关于快速入门Unity机器学习:三:的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 【Unity 实战100例】 教程 专栏《导航目录帖》带你深入学习Unity实战经验

    📣前言 本文章为 Unity 实战100例 教程 专栏 导航帖,正在积极更新中! 本系列博客争取把Unity入门阶段的实战小项目都包含住 本专栏适用人群 :对Unity有一个基本的认识,开始上手Unity的实战内容的小伙伴。 当然你也完全可以零基础观看本系列博客 如果需要速学Unity也可以来

    2024年02月12日
    浏览(42)
  • 机器学习洞察 | 分布式训练让机器学习更加快速准确

    机器学习能够基于数据发现一般化规律的优势日益突显,我们看到有越来越多的开发者关注如何训练出更快速、更准确的机器学习模型,而分布式训练 (Distributed Training) 则能够大幅加速这一进程。 亚马逊云科技开发者社区为开发者们提供全球的开发技术资源。这里有技术文档

    2024年02月16日
    浏览(47)
  • Unity快速入门教程-创建并启用c#脚本

    提示:本篇文章主要提供新手入门学习,初次发文,多多指教 unity通过c#脚本构建项目逻辑关系,本篇介绍c#脚本创建,启用及其简单示例 提示:以下是本篇文章正文内容,下面案例可供参考 在Project窗口依次单击右键–Create–C#Script,创建脚本,假设命名为【Test】。

    2024年02月11日
    浏览(46)
  • Transformers实战(二)快速入门文本相似度、检索式对话机器人

    文本匹配 是一个较为宽泛的概念,基本上只要涉及到两段文本之间关系的,都可以被看作是一种文本匹配的任务, 只是在具体的场景下,不同的任务对匹配二字的定义可能是存在差异的,具体的任务场景包括文本相似度计算、问答匹配、对话匹配、文本推理等等,另外,如

    2024年01月21日
    浏览(39)
  • 【机器学习】实验记录工具

    Weights Biases(简称为 WandB)是一个用于跟踪机器学习实验、可视化实验结果并进行协作的工具。它提供了一个简单易用的界面,让用户可以轻松地记录模型训练过程中的指标、超参数和输出结果,并将这些信息可视化展示。WandB 还支持团队协作,可以让团队成员共享实验记录

    2024年01月25日
    浏览(37)
  • 适合进阶学习的 机器学习 开源项目(可快速下载)

    AI时代已经来临,机器学习成为了当今的热潮。但是,很多人在面对机器学习时却不知道如何开始学习。 今天,我为大家推荐几个适合初学者的机器学习开源项目,帮助大家更好地了解和掌握机器学习的知识。这些项目都是开源的,且已经加入了 Github加速计划 ,可以 快速下

    2024年01月19日
    浏览(36)
  • [FPGA 学习记录] 快速开发的法宝——IP核

    快速开发的法宝——IP核 在本小节当中,我们来学习一下 IP 核的相关知识。 IP 核在 FPGA 开发当中应用十分广泛,它被称为快速开发的法宝。在本小节当中,我们将和各位朋友一起来学习了解 IP 核的相关知识、理解掌握 IP 核的调用方法。 我们分为以下几个部分进行 IP 核的学

    2024年02月05日
    浏览(46)
  • 【机器学习合集】模型设计之残差网络 ->(个人学习记录笔记)

    残差网络(Residual Network,通常缩写为ResNet)是一种深度神经网络架构,最早由微软研究员提出。ResNet的核心思想是通过引入残差块(Residual Blocks)来解决深度神经网络训练中的梯度消失和梯度爆炸问题,从而使得更深的网络能够更容易地训练和优化。 以下是ResNet的主要特点

    2024年02月06日
    浏览(39)
  • 适用于Unity的 Google Cardboard XR Plugin快速入门

    本指南向您展示如何使用 Google Cardboard XR Plugin for Unity 创建您自己的虚拟现实 (VR) 体验。 您可以使用 Cardboard SDK 将移动设备变成 VR 平台。移动设备可以显示具有立体渲染的 3D 场景,跟踪头部运动并对其做出反应,并通过检测用户何时按下查看器按钮来与应用程序交互。 首先

    2024年02月09日
    浏览(49)
  • Unity组件的学习记录

    扯点犊子,学习Unity已经有一段时间了,对于一个一直做H5游戏的开发者来说接触和学习3D游戏引擎是一个新的开始,但是也没有一开始就决定写写博客,近来这个感觉愈加强烈,尤其在粗浅的认为Unity引擎中组件的重要性以及本人对复杂的组件种类认识了解不足导致在学做游

    2024年01月24日
    浏览(41)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包