定制gym环境后,不显示运行结果

这篇具有很好参考价值的文章主要介绍了定制gym环境后,不显示运行结果。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

问题:根据官方的定制gym环境,构建了gym运行环境后,代码运行正常,但是没有Agent与环境交互的效果图。

gym环境的定制过程参见本人前面的发布

原因:是因为官方的代码中有bug,实际就没有执行render函数

解决方案:

1. 在环境make中,增加render_mode,如图1所示。

self.np_random.integers,pygame,深度学习,人工智能,神经网络

import gymnasium
import gym_examples

env = gymnasium.make('gym_examples/GridWorld-v0', render_mode="human")

observation, info = env.reset(seed=42)

for _ in range(100000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)

    env.render()

    if terminated or truncated:
        observation, info = env.reset()


env.close()

2.  将自己创建的环境的文件,grid_world.py中,render函数的几行代码注释掉,如图2所示。

self.np_random.integers,pygame,深度学习,人工智能,神经网络

import numpy as np
import pygame

import gymnasium as gym
from gymnasium import spaces


class GridWorldEnv(gym.Env):
    metadata = {"render_modes": ["human", "rgb_array"], "render_fps": 4}

    def __init__(self, render_mode=None, size=5):
        self.size = size  # The size of the square grid
        self.window_size = 512  # The size of the PyGame window

        # Observations are dictionaries with the agent's and the target's location.
        # Each location is encoded as an element of {0, ..., `size`}^2, i.e. MultiDiscrete([size, size]).
        self.observation_space = spaces.Dict(
            {
                "agent": spaces.Box(0, size - 1, shape=(2,), dtype=int),
                "target": spaces.Box(0, size - 1, shape=(2,), dtype=int),
            }
        )

        # We have 4 actions, corresponding to "right", "up", "left", "down"
        self.action_space = spaces.Discrete(4)

        """
        The following dictionary maps abstract actions from `self.action_space` to
        the direction we will walk in if that action is taken.
        I.e. 0 corresponds to "right", 1 to "up" etc.
        """
        self._action_to_direction = {
            0: np.array([1, 0]),
            1: np.array([0, 1]),
            2: np.array([-1, 0]),
            3: np.array([0, -1]),
        }

        assert render_mode is None or render_mode in self.metadata["render_modes"]
        self.render_mode = render_mode

        """
        If human-rendering is used, `self.window` will be a reference
        to the window that we draw to. `self.clock` will be a clock that is used
        to ensure that the environment is rendered at the correct framerate in
        human-mode. They will remain `None` until human-mode is used for the
        first time.
        """
        self.window = None
        self.clock = None

    # %%
    # Constructing Observations From Environment States
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    #
    # Since we will need to compute observations both in ``reset`` and
    # ``step``, it is often convenient to have a (private) method ``_get_obs``
    # that translates the environment’s state into an observation. However,
    # this is not mandatory and you may as well compute observations in
    # ``reset`` and ``step`` separately:

    def _get_obs(self):
        return {"agent": self._agent_location, "target": self._target_location}

    # %%
    # We can also implement a similar method for the auxiliary information
    # that is returned by ``step`` and ``reset``. In our case, we would like
    # to provide the manhattan distance between the agent and the target:

    def _get_info(self):
        return {
            "distance": np.linalg.norm(
                self._agent_location - self._target_location, ord=1
            )
        }

    # %%
    # Oftentimes, info will also contain some data that is only available
    # inside the ``step`` method (e.g. individual reward terms). In that case,
    # we would have to update the dictionary that is returned by ``_get_info``
    # in ``step``.

    # %%
    # Reset
    # ~~~~~
    #
    # The ``reset`` method will be called to initiate a new episode. You may
    # assume that the ``step`` method will not be called before ``reset`` has
    # been called. Moreover, ``reset`` should be called whenever a done signal
    # has been issued. Users may pass the ``seed`` keyword to ``reset`` to
    # initialize any random number generator that is used by the environment
    # to a deterministic state. It is recommended to use the random number
    # generator ``self.np_random`` that is provided by the environment’s base
    # class, ``gymnasium.Env``. If you only use this RNG, you do not need to
    # worry much about seeding, *but you need to remember to call
    # ``super().reset(seed=seed)``* to make sure that ``gymnasium.Env``
    # correctly seeds the RNG. Once this is done, we can randomly set the
    # state of our environment. In our case, we randomly choose the agent’s
    # location and the random sample target positions, until it does not
    # coincide with the agent’s position.
    #
    # The ``reset`` method should return a tuple of the initial observation
    # and some auxiliary information. We can use the methods ``_get_obs`` and
    # ``_get_info`` that we implemented earlier for that:

    def reset(self, seed=None, options=None):
        # We need the following line to seed self.np_random
        super().reset(seed=seed)

        # Choose the agent's location uniformly at random
        self._agent_location = self.np_random.integers(0, self.size, size=2, dtype=int)

        # We will sample the target's location randomly until it does not coincide with the agent's location
        self._target_location = self._agent_location
        while np.array_equal(self._target_location, self._agent_location):
            self._target_location = self.np_random.integers(
                0, self.size, size=2, dtype=int
            )

        observation = self._get_obs()
        info = self._get_info()

        if self.render_mode == "human":
            #self._render_frame()
            self.render()

        return observation, info

    # %%
    # Step
    # ~~~~
    #
    # The ``step`` method usually contains most of the logic of your
    # environment. It accepts an ``action``, computes the state of the
    # environment after applying that action and returns the 4-tuple
    # ``(observation, reward, done, info)``. Once the new state of the
    # environment has been computed, we can check whether it is a terminal
    # state and we set ``done`` accordingly. Since we are using sparse binary
    # rewards in ``GridWorldEnv``, computing ``reward`` is trivial once we
    # know ``done``. To gather ``observation`` and ``info``, we can again make
    # use of ``_get_obs`` and ``_get_info``:

    def step(self, action):
        # Map the action (element of {0,1,2,3}) to the direction we walk in
        direction = self._action_to_direction[action]
        # We use `np.clip` to make sure we don't leave the grid
        self._agent_location = np.clip(
            self._agent_location + direction, 0, self.size - 1
        )
        # An episode is done iff the agent has reached the target
        terminated = np.array_equal(self._agent_location, self._target_location)
        reward = 1 if terminated else 0  # Binary sparse rewards
        observation = self._get_obs()
        info = self._get_info()

        if self.render_mode == "human":
            #self._render_frame()
            self.render()

        return observation, reward, terminated, False, info

    # %%
    # Rendering
    # ~~~~~~~~~
    #
    # Here, we are using PyGame for rendering. A similar approach to rendering
    # is used in many environments that are included with Gymnasium and you
    # can use it as a skeleton for your own environments:

    def render(self):
        #if self.render_mode == "rgb_array":
        #   return self._render_frame()

        #def _render_frame(self):
            if self.window is None and self.render_mode == "human":
                pygame.init()
                pygame.display.init()
                self.window = pygame.display.set_mode(
                    (self.window_size, self.window_size)
                )
            if self.clock is None and self.render_mode == "human":
                self.clock = pygame.time.Clock()

            canvas = pygame.Surface((self.window_size, self.window_size))
            canvas.fill((255, 255, 255))
            pix_square_size = (
                self.window_size / self.size
            )  # The size of a single grid square in pixels

            # First we draw the target
            pygame.draw.rect(
                canvas,
                (255, 0, 0),
                pygame.Rect(
                    pix_square_size * self._target_location,
                    (pix_square_size, pix_square_size),
                ),
            )
            # Now we draw the agent
            pygame.draw.circle(
                canvas,
                (0, 0, 255),
                (self._agent_location + 0.5) * pix_square_size,
                pix_square_size / 3,
            )

            # Finally, add some gridlines
            for x in range(self.size + 1):
                pygame.draw.line(
                    canvas,
                    0,
                    (0, pix_square_size * x),
                    (self.window_size, pix_square_size * x),
                    width=3,
                )
                pygame.draw.line(
                    canvas,
                    0,
                    (pix_square_size * x, 0),
                    (pix_square_size * x, self.window_size),
                    width=3,
                )

            if self.render_mode == "human":
                # The following line copies our drawings from `canvas` to the visible window
                self.window.blit(canvas, canvas.get_rect())
                pygame.event.pump()
                pygame.display.update()

                # We need to ensure that human-rendering occurs at the predefined framerate.
                # The following line will automatically add a delay to keep the framerate stable.
                self.clock.tick(self.metadata["render_fps"])
            else:  # rgb_array
                return np.transpose(
                    np.array(pygame.surfarray.pixels3d(canvas)), axes=(1, 0, 2)
                )
        
        #self._render_frame()

    # %%
    # Close
    # ~~~~~
    #
    # The ``close`` method should close any open resources that were used by
    # the environment. In many cases, you don’t actually have to bother to
    # implement this method. However, in our example ``render_mode`` may be
    # ``"human"`` and we might need to close the window that has been opened:

    def close(self):
        if self.window is not None:
            pygame.display.quit()
            pygame.quit()

完成修改后,运行结果如下:

 self.np_random.integers,pygame,深度学习,人工智能,神经网络

 文章来源地址https://www.toymoban.com/news/detail-566159.html

到了这里,关于定制gym环境后,不显示运行结果的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 并行环境让采样速度快两个量级:Isaac Gym提速强化学习

    仿真环境的 采样速度慢 ,是强化学习的一个瓶颈。例如,论文中常用的 MuJoCo 环境,台式机或服务器的 CPU 上运行仿真环境,一小时大概采集十万或百万步(1e5 或 1e6 步);训练一个智能体(收敛后)需要十多个小时。 加快仿真环境的采样速度,通常有以下方法: 增加并行

    2024年02月16日
    浏览(55)
  • Anaconda+PyCharm+PyTorch+Gym深度强化学习环境搭建 送新手直接送进炼丹炉

    需要下载的软件和包: Anaconda Pycharm Python PyTorch gym pygame 直接从官网下载: https://www.anaconda.com https://www.anaconda.com/ 点击Download下载即可。 下载好后,打开安装包,自己选一个安装路径,默认路径也行,放其他盘也行,我安装在D盘下的Anaconda文件夹下。 安装向导一路下一步即可

    2024年02月06日
    浏览(78)
  • 【深度学习】训练模型结果同时显示,模型结果对比

    码字不易,如果各位看官感觉该文章对你有所帮助,麻烦点个关注,如果有任何问题,请留言交流。如需转载,请注明出处,谢谢。 文章链接:【深度学习】训练模型结果同时显示,模型结果对比_莫克_Cheney的博客-CSDN博客 目录 目录 一、问题描述 二、解决方案 三、实验结果

    2024年02月09日
    浏览(55)
  • 用 GPU 并行环境 Isaac Gym + 强化学习库 ElegantRL:训练机器人Ant,3小时6000分,最高12000分

    前排提醒,目前我们能 “用 ppo 四分钟训练 ant 到 6000 分”,比本文的 3 小时快了很多很多,有空会更新代码 https://blog.csdn.net/sinat_39620217/article/details/131724602 介绍了 Isaac Gym 库 如何使用 GPU 做大规模并行仿真,对环境模块提速。这篇帖子,我们使用 1 张 A100GPU 在 3 个小时之内

    2024年02月16日
    浏览(51)
  • 【OpenAI】Python:基于 Gym-CarRacing 的自动驾驶项目(1) | 前置知识介绍 | 项目环境准备 | 手把手带你一步步实现

     猛戳!跟哥们一起玩蛇啊  👉 《一起玩蛇》🐍  💭 写在前面:  本篇是关于自动驾驶专业项目 Gym-CarRacing 的博客。GYM-Box2D CarRacing 是一种在 OpenAI Gym 平台上开发和比较强化学习算法的模拟环境。 本专栏提供完整可运行代码,包括环境安装的详细讲解,将通过 \\\"理论+实践

    2024年02月03日
    浏览(82)
  • 【Python】OpenAI:基于 Gym-CarRacing 的自动驾驶项目(1) | 前置知识介绍 | 项目环境准备 | 手把手带你一步步实现

     猛戳!跟哥们一起玩蛇啊  👉 《一起玩蛇》🐍  💭 写在前面:  本篇是关于自动驾驶专业项目 Gym-CarRacing 的博客。GYM-Box2D CarRacing 是一种在 OpenAI Gym 平台上开发和比较强化学习算法的模拟环境。 本专栏提供完整可运行代码,包括环境安装的详细讲解,将通过 \\\"理论+实践

    2024年02月02日
    浏览(86)
  • pycharm如何查看之前的运行结果

    谁能懂用pycharm跑了一夜的代码,早上起来,电脑突然关机了!!!!!!!!!!气死我了!!!! 不过正是因为遇到的一个又一个的问题,才能探索未知的东西,就比如昨天某位小李同学,让我做PPT的放大镜效果,嘿,这学习机会不就来了吗,贵宾们是否也想知道放大镜

    2024年02月02日
    浏览(40)
  • WPF使用TextBlock实现查找结果高亮显示

    在应用开发过程中,经常遇到这样的需求:通过查找数据,把带有的数据显示出来,同时在结果中高亮显示。在web开发中,只需在上加一层标签,然后设置标签样式就可以轻松实现。 在WPF中显示文本内容通常采用 TextBlock 控件,也可以采用类似的方

    2024年02月11日
    浏览(38)
  • Python 如何将运行结果导出为 CSV 格式?

    在 Python 中,我们常常会遇到需要将运行结果以 CSV 格式导出以供其他语言或工具使用的情况。本文将介绍如何使用 Python 将结果导出为 CSV 格式的两种主要方法。 csv 模块是 Python 自带的用于读写 CSV 文件的模块。我们可以这样使用它导出 CSV: 这会生成如下 data.csv 文件: 如果我们

    2024年02月12日
    浏览(35)
  • 【Linux】Shell脚本中获取命令运行的结果

    写shell脚本的时候,常需要将一个命令的运行结果做为参数传递给另外一个命令,除了我们熟知的管道 | 和args,我们也可以通过获取命令的运行结果。 执行结果: 来点复杂的应用: 再比如: 😉 运行结果: 把反引号``换成$()即可 反引号不支持嵌套,而 $ 支持嵌套。 举个例

    2024年02月11日
    浏览(40)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包