4-2 3D images: Volumetric data & Representing tabular data

这篇具有很好参考价值的文章主要介绍了4-2 3D images: Volumetric data & Representing tabular data。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

本文所用到的资料下载地址

By stacking individual 2D slices into a 3D tensor, we can build volumetric data representing the 3D anatomy of a
subject. We just have an extra dimension, depth, after the channel dimension, leading to a 5D tensor of shape N × C × D × H × W.

N 表示样本(图片)的数量
C 表示特征通道的数量
D 表示深度维度的大小(例如,对于三维图像,表示图像的深度)
H 表示图像的高度
W 表示图像的宽度
4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

Let’s load a sample CT scan using the volread function in the imageio module, which takes a directory as an argument and assembles all Digital Imaging and Communications in Medicine (DICOM) files in a series in a NumPy 3D array.

import imageio.v2 as imageio
dir_path = "D:/Deep-Learning/资料/dlwpt-code-master/data/p1ch4/volumetric-dicom/2-LUNG 3.0  B70f-04083"
# volread从文件中读取三维体积数据,返回结果为NumPy数组
vol_arr = imageio.volread(dir_path, 'DICOM')  # DICOM表示要使用的解码器格式,以确保正确解析DICOM文件
vol_arr.shape  # 结果的99表示该三维图像是由99张2维图像从bottom到top叠加得到的

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习
我们可以使用以下代码打印其中的某张图片,例如第50张

%matplotlib inline
import matplotlib.pyplot as plt
plt.imshow(vol_arr[50])

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习
纵切面

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

The layout is different from what PyTorch expects, due to having no channel information. So we’ll have to make room for the channel dimension using unsqueeze:

import torch
vol = torch.from_numpy(vol_arr).float()
vol = torch.unsqueeze(vol, 0)
vol.shape

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

The simplest form of data we’ll encounter on a machine learning job is sitting in a spreadsheet, CSV file, or database. Whatever the medium, it’s a table containing one row per sample (or record), where columns contain one piece of information about our sample.

Columns may contain numerical values, like temperatures at specific locations; or labels, like a string expressing an attribute of the sample, like “blue.” Therefore, tabular data is typically not homogeneous: different columns don’t have the same type. We might have a column showing the weight of apples and another encoding their color in a label.

PyTorch tensors, are homogeneous. Information in PyTorch is typically encoded as a number, typically floating-point (though integer types and Boolean are supported as well).

Our first job as deep learning practitioners is to encode heterogeneous, real-world data into a tensor of floating-point numbers, ready for consumption by a neural network.

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习
The first 11 columns contain values of chemical variables, and the last column contains the sensory quality score from 0 (very bad) to 10 (excellent).A possible machine learning task on this dataset is predicting the quality score from chemical characterization alone.

We have to get the training data from somewhere! As we can see in figure 4.3, we’re hoping to find a relationship between one of the chemical columns in our data and the quality column. Here, we’re expecting to see quality increase as sulfur decreases.

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

Let’s load our file and turn the resulting NumPy array into a PyTorch tensor.

import csv
import numpy as np
wine_path = "D:/Deep-Learning/资料/dlwpt-code-master/data/p1ch4/tabular-wine/winequality-white.csv"
wineq_numpy = np.loadtxt(wine_path, dtype=np.float32, delimiter=";",skiprows=1)
# skiprows=1 表示跳过第一行的标题行
wineq_numpy

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

We can obtain column names (the first row of data) using this approach.

col_list = next(csv.reader(open(wine_path), delimiter=';'))
col_list

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

Let’s check that all the data has been read.

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习
4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

and proceed to convert the NumPy array to a PyTorch tensor:

wineq = torch.from_numpy(wineq_numpy)
wineq
wineq.shape, wineq.dtype

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习
4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习
At this point, we have a floating-point torch.Tensor containing all the columns, including the last, which refers to the quality score.

Continuous, ordinal, and categorical values
continuous values: Stating that package A is 2 kilograms heavier than package B, or that package B came from 100 miles farther away than A has a fixed meaning. If you’re counting or measuring something with units, it’s probably a continuous value.
ordinal values: A good examplenof this is ordering a small, medium, or large drink, with small mapped to the value 1, medium 2, and large 3. The large drink is bigger than the medium, in the same way that 3 is bigger than 2, but it doesn’t tell us anything about how much bigger.
categorical values: Assigning water to 1, coffee to 2, soda to 3, and milk to 4 is a good example.

在处理一个包含scores(即quality)的问题时,可以将评分视为连续变量并进行回归任务,或将其视为标签并尝试从化学分析中猜测标签的分类任务。在这两种方法中,通常会从输入数据的张量中移除scores(下文data),并将其保存在一个单独的张量中(下文target),这样我们可以将score作为真实值使用,而不将其作为模型的输入。

将scores作为分类任务中的标签,而输入数据是基于化学分析的其他特征。这样的任务旨在根据给定的化学分析特征,通过模型预测和分类的方式来推断scores的标签或类别。通过这种方式,我们可以使用模型从输入数据中学习化学特征与评分之间的关联,并进行分类预测。

data = wineq[:, :-1]  # Selects all rows and all columns except the last
data,data.shape

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

target = wineq[:, -1]  # Selects all rows and the last column
target, target.shape

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

标签张量(Label tensor)是一个包含离散标签值的张量。在分类任务中,我们通常将目标值(target)转换为标签张量,其中每个样本的目标值表示为一个整数,代表样本所属的类别或标签。标签张量的形状通常与输入数据的形状相对应,每个样本都有一个对应的标签值。
例如,如果我们有一个包含10个样本的数据集(可以将其看成表格的第一列),每个样本有3个特征(可以将其看成表格的第一行),而目标是将样本分为3个类别(标签为0、1和2),那么标签张量的形状将是(10,),其中每个元素表示一个样本的类别标签。(简单说,每一列就是一个标签张量)
标签张量在训练和评估模型时起着重要的作用。它用于计算损失函数,评估模型的性能,并与模型的预测进行比较,以确定预测是否正确。

如果我们想把target转换成标签张量,一种方法是简单地将标签视为scores的整数向量:

target = wineq[:, -1].long()
target

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

另一种方法是构建分数的one-hot encoding
也就是说,在由10个元素组成的向量中对10个scores进行编码,所有元素都被设置为0(除了1),每个scores的索引都不同。这样,1的score可以映射到向量(1,0,0,0,0,0,0,0,0,0,0,0,0),5的score可以映射到(0,0,0,0,1,0,0,0,0,0),以此类推。

这两种方法有明显的区别。将葡萄酒质量scores保存在scores的整数向量中会对scores进行排序(例如1低于4)。它还能推导出分数之间的某种距离(1到3的距离=2到4的距离)。如果分数是纯粹离散的,就像葡萄品种,one-hot encoding将是一个更好的匹配,因为没有隐含的顺序或距离。

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

使用target.shape[0]即可表示样本数量
10表示类别总数
现在我们创建了一个全零张量target_onehot

target_onehot = torch.zeros(target.shape[0], 10)

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

我们可以使用scatter_ 方法实现one-hot encoding
名称以下划线结尾,表明该方法将不返回一个新的张量,而是在适当的地方修改张量
第一个参数1表示在维度1(列维度)上进行散布操作
unsqueeze使target从[4898]变为[4898, 1],扩展后的张量就可以作为索引张量,用于指定在维度1上进行散布操作的位置

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

1.0表示按照目标标签的索引位置进行散布操作,即每个样本的目标标签对应的位置会被设置为1.0,而其他位置仍保持为0.0

target_onehot.scatter_(1, target.unsqueeze(1), 1.0)

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习
4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习
4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

可以看到,第一行的第6个位置(从0起)被置为1(表示6)
倒数第二行的地7个位置(从0起)被置为1(表示7)

下面进行标准化
首先求各列的均值和标准差

data_mean=torch.mean(data,dim=0)  # 均值
data_mean

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

data_var=torch.var(data,dim=0)  # 方差
data_var

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

我们可以通过减去均值并除以标准差来标准化数据

data_normalized=(data-data_mean)/torch.sqrt(data_var)
data_normalized

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

我们来看看这些数据,是否有一种简单的方法可以一眼就分辨出葡萄酒的好坏。
First, we’re going to determine which rows in target correspond to a score less than or equal to 3:

bad_indexed=target<=3
bad_indexed
print(bad_indexed.shape)
print(bad_indexed.dtype)
print(bad_indexed.sum())

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

Note that only 20 of the bad_indexes entries are set to True.
For example,
4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习
The bad_indexes tensor has the same shape as target, with values of False or True depending on the outcome of the comparison between our threshold and each element in the original target tensor:

bad_data=data[bad_indexed]
bad_data

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习
4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习
Note that the new bad_data tensor has 20 rows, the same as the number of rows with True in the bad_indexes tensor. It retains all 11 columns.

Now we can start to get information about wines grouped into good, middling, and bad categories. Let’s take the .mean() of each column:

bad_data=data[target<=3]
mid_data=data[(target>3)&(target<7)]
good_data=data[target>=7]
bad_mean=torch.mean(bad_data,dim=0)
mid_mean=torch.mean(mid_data,dim=0)
good_mean=torch.mean(good_data,dim=0)

zip函数将col_list、bad_mean、mid_mean和good_mean按索引依次配对,生成一个迭代器,每次迭代输出一个四元组。enumerate函数用于遍历这个迭代器,并返回每个元素及其对应的索引。i是索引,args是一个四元组,分别对应每个列的名称和其对应的坏、中等、好类别的平均值。在遍历过程中,可以使用i和args来处理每一列的数据和平均值。

format方法用于格式化输出字符串。前面输出5个,分别是i和args四元组,使用大括号格式化
‘{:2}’ :使用两个字符的宽度对齐,并且右对齐
‘{:20}’ :使用二十个字符的宽度对齐,并且左对齐。
‘{:6.2f}’ :使用六个字符的宽度对齐,其中两位小数,并且右对齐。

*args用于将四元组中的元素解包作为参数传递给format方法,这样可以按顺序填充格式化字符串中的占位符。通过这种方式,代码将索引、列名和平均值按指定格式输出为表格形式。

for i,args in enumerate(zip(col_list,bad_mean,mid_mean,good_mean)):
    print('{:2} {:20} {:6.2f} {:6.2f} {:6.2f}'.format(i,*args))

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

It looks like we’re on to something here: at first glance, the bad wines seem to have higher total sulfur dioxide, among other differences. We could use a threshold on total sulfur dioxide as a crude criterion for discriminating good wines from bad ones.

torch.lt(x, y) 比较 x 和 y 的每个元素,并返回一个布尔张量。x<y true,否则false
import torch

x = torch.tensor([1, 2, 3, 4, 5])
y = torch.tensor([3, 3, 3, 3, 3])
result = torch.lt(x, y)
print(result) # Output: tensor([ True, True, False, False, False])`

total_sulfur_threshold=141.83  # total sulfur dioxide 的 mid
total_sulfur_data=data[:,6]  # total sulfur dioxide列
predicted_indexes=torch.lt(total_sulfur_data,total_sulfur_threshold)
predicted_indexes

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

predicted_indexes.shape,predicted_indexes.dtype,predicted_indexes.sum()

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习
This means our threshold implies that just over half of all the wines are going to be high quality.
(total sulfur dioxide越小越好,有2727个小于threshold,即我们预测有2727瓶好酒)

Next, we’ll need to get the indexes of the actually good wines:

actual_indexes=target>5
actual_indexes.sum()

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习

Since there are about 500 more actually good wines than our threshold predicted, we already have hard evidence that it’s not perfect.(之前拿>=7定义好酒,现在又改成5了,emmmm)

Now we need to see how well our predictions line up with the actual rankings. We will perform a logical “and” between our prediction indexes and the actual good indexes (remember that each is just an array of zeros and ones) and use that intersection of wines-in-agreement to determine how well we did:

当我们从一个张量中提取单个元素时,得到的是一个张量,而不是一个标量。这时,如果我们希望得到这个张量的数值值(即标量),可以使用.item() 方法。
x = torch.tensor(3.5)
value = x.item()
print(value) # 输出 3.5

n_matches = torch.sum(actual_indexes & predicted_indexes).item()
n_predicted = torch.sum(predicted_indexes).item()
n_actual = torch.sum(actual_indexes).item()
n_matches, n_matches / n_predicted, n_matches / n_actual

4-2 3D images: Volumetric data & Representing tabular data,PyTorch,深度学习,pytorch,python,人工智能,机器学习
We got around 2,000 wines right! Since we predicted 2,700 wines, this gives us a 74% chance that if we predict a wine to be high quality, it actually is. Unfortunately, there are 3,200 good wines, and we only identified 61% of them. Well, we got what we signed up for; that’s barely better than random!

Of course, this is all very naive: we know for sure that multiple variables contribute to wine quality, and the relationships between the values of these variables and the outcome (which could be the actual score, rather than a binarized version of it) is likely more complicated than a simple threshold on a single value.文章来源地址https://www.toymoban.com/news/detail-600695.html

到了这里,关于4-2 3D images: Volumetric data & Representing tabular data的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • On Data Scaling in Masked Image Modelin

    论文名称:On Data Scaling in Masked Image Modeling 发表时间:CVPR2023 作者及组织:Zhenda Xie, ZhengZhang, Hu Han等,来自清华,西安交大,微软亚洲研究院。  本文验证SIMMIM无监督预训练方法,是否会出现与NLP类似的拓展法则现象。  这篇论文做了大量的对比实验,因此,先说结论:  

    2024年01月22日
    浏览(33)
  • MATLAB Fundamentals>>Representing Discrete Categories

    MATLAB Fundamentals Specialized Data Types Representing Discrete Categories(1/6) Introduction When text labels are intended to represent a finite set of possibilities, a string array may use more memory than a  categorical  array. There are functions designed to work with categories of data, so if your text describes a group or category, this is often the

    2024年01月18日
    浏览(37)
  • Image Editing、3D Textured Mesh、Image Composition、SplattingAvatar

    本文首发于公众号:机器感知 Image Editing、3D Textured Mesh、Image Composition、SplattingAvatar An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control Building on the success of text-to-image diffusion models (DPMs), image editing is an important application to enable human interaction with AI-generated content. Among var

    2024年03月18日
    浏览(46)
  • 论文阅读:Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data

    目录 摘要 Motivation 整体架构流程 技术细节 雷达和图像数据的同步 小结 论文地址:  [2203.16258] Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data (arxiv.org) 论文代码: GitHub - valeoai/SLidR: Official PyTorch implementation of \\\"Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data\\\"    

    2024年02月08日
    浏览(48)
  • INFOBATCH: LOSSLESS TRAINING SPEED UP BY UNBIASED DYNAMIC DATA PRUNING 和Masked Image denoised

    即插即用的动态数据裁剪,加速网络训练. ICLR 2024 Oral | InfoBatch,三行代码,无损加速,即插即用! 论文题目: InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning 论文地址:https://arxiv.org/abs/2303.04947 代码地址:https://github.com/henryqin1997/InfoBatch 加速训练一个比较直接的方法

    2024年02月20日
    浏览(39)
  • TypeError: Cannot handle this data type: (1, 1, 3), <f8和Image.fromarray()函数转换后图像失真

    错误显示1. 错误显示2. (预测为RGB图像,结果失真为黑色) 原因: 采用 matplotlib.image 读入图片数组,注意这里读入的数组是 float32 型的,范围是 0-1,而 PIL.Image 数据是 uinit8 型的,范围是0-255 (可根据 python PIL图像处理 - jujua - 博客园 (cnblogs.com) 分析) 更改方案: image=Image

    2024年02月12日
    浏览(44)
  • PyTorch学习笔记:data.RandomSampler——数据随机采样

    功能:随即对样本进行采样 输入: data_source :被采样的数据集合 replacement :采样策略,如果为 True ,则代表使用替换采样策略,即可重复对一个样本进行采样;如果为 False ,则表示不用替换采样策略,即一个样本最多只能被采一次 num_samples :所采样本的数量,默认采全部

    2023年04月08日
    浏览(36)
  • 机器学习笔记 - PyTorch Image Models图像模型概览 (timm)

            PyTorch Image Models (timm)是一个用于最先进的图像分类的库,包含图像模型、优化器、调度器、增强等的集合;是比较热门的论文及代码库。         虽然越来越多的低代码和无代码解决方案可以轻松开始将深度学习应用于计算机视觉问题,但我们经常与希望寻求

    2024年02月12日
    浏览(33)
  • Unity3d 图片Image置灰shader

    置灰公式:    value=color.r x 0.299 +color.g x 0.587 + color.b * 0.114 color.rgb = lerp(color.rgb, Luminance(color.rgb), _Factor); 或 color.rgb = dot(color.rgb, fixed3(0.299, 0.587, 0.114));

    2024年02月11日
    浏览(46)
  • 完美解决PermissionError: [Errno 13] Permission denied: ‘./data\\mnist\\train-images-idx3-ubyte‘

    完美解决PermissionError: [Errno 13] Permission denied: ‘./datamnisttrain-images-idx3-ubyte’ 下滑查看解决方法 PermissionError: [Errno 13] Permission denied: ‘./datamnisttrain-images-idx3-ubyte‘ 这个错误通常是由于缺少对文件或目录的读写权限导致的。 下滑查看解决方法 确保你有足够的权限:检查你正

    2024年02月08日
    浏览(31)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包