一、测试环境
CUDA环境: i7-8550u + 16G DDR4 2133MHz + nVidia MX150 2GB
AMD DirectML环境: Ryzen 5 5600G + 32G DDR4 3200MHz + Vega7 4GB
AMD 纯CPU环境:Ryzen 5 5600G + 32G DDR4 3200MHz
其他硬件配置的硬盘、电源均一致。Pytorch版本为2.0.0,Python环境为3.7.11,Win10 LTSC。
二、测试代码
拟合一个100万点数的函数,并计算从神经网络被传入内存/显存开始,到计算结果出来,所耗费的时间。不含前面准备时间、出图时间。计算三次手动记录平均值。代码如下:
CUDA测试代码
# -*- coding: utf-8 -*-
# @Time : 19/12/9 16:38
# @Author : JL
# @File : pytorchTest.py
# @Software: PyCharm
import matplotlib.pyplot as plt
import torch
import time
x = torch.unsqueeze(torch.linspace(-1, 1, 1000000), dim=1).cuda()
y = x.pow(2) + 0.3 * torch.rand(x.size()).cuda()
net1 = torch.nn.Sequential(
torch.nn.Linear(1, 10),
torch.nn.ReLU(),
torch.nn.Linear(10, 1)
)
optimizer = torch.optim.SGD(net1.parameters(), lr=0.5)
lossFunc = torch.nn.MSELoss()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("当前使用的设备是:" + str(torch.cuda.get_device_name(torch.cuda.current_device())))
print("当前CUDA、CUDNN版本号分别为:"+str(torch.version.cuda)+"、"+str(torch.backends.cudnn.version()))
print("当前Pytorch版本号为:"+str(torch.__version__))
startTime = time.perf_counter()
net1.to(device)
for t in range(100):
prediction = net1(x)
loss = lossFunc(prediction, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(loss.data.cpu().numpy())
endTime = time.perf_counter()
delta = endTime-startTime
print("Treat a net in %0.2f s." % delta)
plt.scatter(x.data.cpu().numpy(), y.data.cpu().numpy())
plt.show()
DirectML、AMD CPU测试代码:文章来源:https://www.toymoban.com/news/detail-615133.html
# -*- coding: utf-8 -*-
# @Time : 19/12/9 16:38
# @Author : Jay Lam
# @File : pytorchTest.py
# @Software: PyCharm
import matplotlib.pyplot as plt
import torch
import torch_directml
import time
dml = torch_directml.device(0) # 如果使用DirectML,则分配到dml上
cpuML = torch.device("cpu") # 如果仅使用CPU,则选择分配到cupML上
# 注意修改dml或cpuML
x = torch.unsqueeze(torch.linspace(-1, 1, 1000000), dim=1).to(dml)
y = x.pow(2).to(dml)+ 0.3 * torch.rand(x.size()).to(dml)
net1 = torch.nn.Sequential(
torch.nn.Linear(1, 10),
torch.nn.ReLU(),
torch.nn.Linear(10, 1)
)
lossFunc = torch.nn.MSELoss()
print("当前Pytorch版本号为:" + str(torch.__version__))
net1.to(dml) # 注意修改dml或cpuML
startTime = time.perf_counter()
for t in range(100):
optimizer = torch.optim.SGD(net1.parameters(), lr=0.01) # 注意: 对于使用AMD显卡做DML的要把optimizer放在循环内,不然梯度无法下降
prediction = net1(x)
loss = lossFunc(prediction, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(loss.data.cpu().numpy())
endTime = time.perf_counter()
delta = endTime - startTime
print("Treat a net in %0.2f s." % delta)
plt.scatter(x.data.cpu().numpy(), y.data.cpu().numpy())
plt.show()
三、测试结论
测试类型 | 耗费时间(秒,越小越好) |
---|---|
CUDA | 3.57 |
DirectML | 4.48 |
纯CPU | 5.31 |
看起来DirectML有点加速效果,但是还是和CUDA有差距,更何况这个是笔记本上最弱的MX150显卡的CUDA。微软要加油了。另外AMD的CPU,还是安心打游戏好了。文章来源地址https://www.toymoban.com/news/detail-615133.html
到了这里,关于Pytorch在cuda、AMD DirectML和AMD CPU下性能比较的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!