【计算机视觉】DINOv2(视觉大模型)代码使用和测试(完整的源代码)

这篇具有很好参考价值的文章主要介绍了【计算机视觉】DINOv2(视觉大模型)代码使用和测试(完整的源代码)。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

一、环境部署

!git clone https://ghproxy.com/https://github.com/facebookresearch/dinov2.git

输出为:

Cloning into 'dinov2'...
remote: Enumerating objects: 141, done.
remote: Counting objects: 100% (96/96), done.
remote: Compressing objects: 100% (74/74), done.  71% (53/74)
remote: Total 141 (delta 40), reused 31 (delta 22), pack-reused 45
Receiving objects: 100% (141/141), 101.01 KiB | 348.00 KiB/s, done.
Resolving deltas: 100% (42/42), done.

命令是一个Git命令,用于克隆(Clone)名为"dinov2"的存储库。它使用了一个名为"ghproxy.com"的代理,用于加速GitHub的克隆操作。

!pip install -r /kaggle/working/dinov2/requirements.txt

【计算机视觉】DINOv2(视觉大模型)代码使用和测试(完整的源代码),计算机视觉,计算机视觉,人工智能,DINOv2,视觉大模型
【计算机视觉】DINOv2(视觉大模型)代码使用和测试(完整的源代码),计算机视觉,计算机视觉,人工智能,DINOv2,视觉大模型

!pip install scikit-learn -i https://pypi.tuna.tsinghua.edu.cn/simple

【计算机视觉】DINOv2(视觉大模型)代码使用和测试(完整的源代码),计算机视觉,计算机视觉,人工智能,DINOv2,视觉大模型

二、导入原图

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

image = mpimg.imread('/kaggle/input/demo-image/1 (4).png')

plt.imshow(image)
plt.axis('off')
plt.show()

# 输出图像尺寸
print("图像尺寸:{} x {} x {}".format(image.shape[0], image.shape[1], image.shape[2]))

【计算机视觉】DINOv2(视觉大模型)代码使用和测试(完整的源代码),计算机视觉,计算机视觉,人工智能,DINOv2,视觉大模型

图像尺寸:1376 x 920 x 3

我们需要切换为output的路径:

import os

input_path = "/kaggle/working/dinov2"
os.chdir(input_path)

2.1 使用vit_s14的模型

import torch
import torchvision.transforms as T
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.image as mpimg 
from PIL import Image
from sklearn.decomposition import PCA
import matplotlib
 
patch_h = 75
patch_w = 50
feat_dim = 384
 
transform = T.Compose([
    T.GaussianBlur(9, sigma=(0.1, 2.0)),
    T.Resize((patch_h * 14, patch_w * 14)),
    T.CenterCrop((patch_h * 14, patch_w * 14)),
    T.ToTensor(),
    T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
 
dinov2_vits14 = torch.hub.load('', 'dinov2_vits14',source='local').cuda()
 
features = torch.zeros(4, patch_h * patch_w, feat_dim)
imgs_tensor = torch.zeros(4, 3, patch_h * 14, patch_w * 14).cuda()
 
img_path = f'/kaggle/input/demo-image/1 (4).png'
img = Image.open(img_path).convert('RGB')
imgs_tensor[0] = transform(img)[:3]
with torch.no_grad():
    features_dict = dinov2_vits14.forward_features(imgs_tensor)
    features = features_dict['x_norm_patchtokens']
    
features = features.reshape(4 * patch_h * patch_w, feat_dim).cpu()
pca = PCA(n_components=3)
pca.fit(features)
pca_features = pca.transform(features)
pca_features[:, 0] = (pca_features[:, 0] - pca_features[:, 0].min()) / (pca_features[:, 0].max() - pca_features[:, 0].min())
 
pca_features_fg = pca_features[:, 0] > 0.3
pca_features_bg = ~pca_features_fg
 
b = np.where(pca_features_bg)

pca.fit(features[pca_features_fg])
pca_features_rem = pca.transform(features[pca_features_fg])
for i in range(3):
    pca_features_rem[:, i] = (pca_features_rem[:, i] - pca_features_rem[:, i].min()) / (pca_features_rem[:, i].max() - pca_features_rem[:, i].min())
    # transform using mean and std, I personally found this transformation gives a better visualization
    # pca_features_rem[:, i] = (pca_features_rem[:, i] - pca_features_rem[:, i].mean()) / (pca_features_rem[:, i].std() ** 2) + 0.5

pca_features_rgb = pca_features.copy()
pca_features_rgb[pca_features_fg] = pca_features_rem
pca_features_rgb[b] = 0

pca_features_rgb = pca_features_rgb.reshape(4, patch_h, patch_w, 3)
plt.imshow(pca_features_rgb[0][...,::-1])
plt.savefig('features.png')
plt.show()
plt.close()

以下是代码的逐行中文解读:

import torch
import torchvision.transforms as T
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.image as mpimg 
from PIL import Image
from sklearn.decomposition import PCA
import matplotlib

# 设置补丁(patch)的高度和宽度
patch_h = 75
patch_w = 50
# 特征维度
feat_dim = 384

# 定义图像转换操作
transform = T.Compose([
    T.GaussianBlur(9, sigma=(0.1, 2.0)),  # 高斯模糊
    T.Resize((patch_h * 14, patch_w * 14)),  # 调整图像大小
    T.CenterCrop((patch_h * 14, patch_w * 14)),  # 中心裁剪
    T.ToTensor(),  # 转换为张量
    T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),  # 标准化
])

# 使用torch.hub加载dinov2_vits14模型并移至CUDA设备
dinov2_vits14 = torch.hub.load('', 'dinov2_vits14', source='local').cuda()

# 创建用于存储特征和图像张量的零张量
features = torch.zeros(4, patch_h * patch_w, feat_dim)
imgs_tensor = torch.zeros(4, 3, patch_h * 14, patch_w * 14).cuda()

# 图像路径
img_path = f'/kaggle/input/demo-image/1 (4).png'
# 打开图像并转换为RGB模式
img = Image.open(img_path).convert('RGB')
# 对图像进行转换操作,并将其存储在imgs_tensor的第一个位置
imgs_tensor[0] = transform(img)[:3]

# 禁用梯度计算
with torch.no_grad():
    # 将图像张量传递给dinov2_vits14模型获取特征
    features_dict = dinov2_vits14.forward_features(imgs_tensor)
    features = features_dict['x_norm_patchtokens']
    
# 重塑特征形状为(4 * patch_h * patch_w, feat_dim)
features = features.reshape(4 * patch_h * patch_w, feat_dim).cpu()

# 创建PCA对象并拟合特征
pca = PCA(n_components=3)
pca.fit(features)

# 对PCA转换后的特征进行归一化处理
pca_features = pca.transform(features)
pca_features[:, 0] = (pca_features[:, 0] - pca_features[:, 0].min()) / (pca_features[:, 0].max() - pca_features[:, 0].min())

# 根据阈值进行前景和背景的区分
pca_features_fg = pca_features[:, 0] > 0.3
pca_features_bg = ~pca_features_fg

# 查找背景特征的索引
b = np.where(pca_features_bg)

# 对前景特征再次进行PCA转换
pca.fit(features[pca_features_fg])
pca_features_rem = pca.transform(features[pca_features_fg])

# 对前景特征进行归一化处理
for i in range(3):
    pca_features_rem[:, i] = (pca_features_rem[:, i] - pca_features_rem[:, i].min()) / (pca_features_rem[:, i].max() - pca_features_rem[:, i].min())
    # 使用均值和标准差进行转换,个人发现这种转换方式可以得到更好的可视化效果
    # pca_features_rem[:, i] = (pca_features_rem[:, i] - pca_features_rem[:, i].mean()) / (pca_features_rem[:, i].std() ** 2) + 0.5

# 创建RGB特征数组
pca_features_rgb = pca_features.copy()

# 替换前景特征为转换后的特征
pca_features_rgb[pca_features_fg] = pca_features_rem

# 将背景特征设置为0
pca_features_rgb[b] = 0

# 重塑特征形状为(4, patch_h, patch_w, 3)
pca_features_rgb = pca_features_rgb.reshape(4, patch_h, patch_w, 3)

# 显示第一个图像的RGB特征
plt.imshow(pca_features_rgb[0][...,::-1])
plt.savefig('features.png')
plt.show()
plt.close()

这段代码的功能是对给定的图像进行一系列处理和特征提取,并使用PCA对特征进行降维。然后,根据特定阈值对前景和背景进行区分,最后将特征可视化为RGB图像。请注意,其中的具体数值和路径可能需要根据您的实际数据和环境进行调整。

【计算机视觉】DINOv2(视觉大模型)代码使用和测试(完整的源代码),计算机视觉,计算机视觉,人工智能,DINOv2,视觉大模型

print(features)
print(features.shape)

我们的输出结果为:

tensor([[-1.3500, -4.8793, -1.4393,  ...,  2.3347,  1.6834, -2.9632],
        [-0.4650, -6.4163, -1.5503,  ...,  2.2055,  2.5527, -3.2553],
        [-0.6371, -6.2615, -0.7516,  ...,  3.1827,  2.3861, -2.6838],
        ...,
        [ 1.9385,  0.0726, -0.5395,  ...,  0.3876, -1.4914, -4.5422],
        [ 1.6399, -0.0860,  0.4701,  ...,  1.0180, -0.8897, -5.2614],
        [ 1.6084, -0.0669,  0.7341,  ...,  1.0633, -0.9713, -5.3548]])
torch.Size([15000, 384])

降维后的特征为:

print(pca_features)
print(pca_features.shape)

输出的结果为:

[[  0.81004055   2.458559    12.11051576]
 [  0.79562888   5.65071716  10.84007045]
 [  0.82050109   5.55007889   9.05274001]
 ...
 [  0.27618588 -18.96898667  19.48198916]
 [  0.31861323 -12.21414371  14.19802898]
 [  0.34356016 -10.82144825  13.74648131]]
(15000, 3)
features_dict

我们看一下字典的构成:

{'x_norm_clstoken': tensor([[ 2.2549, -1.5661,  4.4978,  ...,  1.4984, -5.8642, -0.8560],
         [ 1.8816,  2.4343,  1.4931,  ..., -1.3401, -2.5460,  1.3967],
         [ 1.8816,  2.4343,  1.4931,  ..., -1.3401, -2.5460,  1.3967],
         [ 1.8816,  2.4343,  1.4931,  ..., -1.3401, -2.5460,  1.3967]],
        device='cuda:0'),
 'x_norm_patchtokens': tensor([[[-1.3500, -4.8793, -1.4393,  ...,  2.3347,  1.6834, -2.9632],
          [-0.4650, -6.4163, -1.5503,  ...,  2.2055,  2.5527, -3.2553],
          [-0.6371, -6.2615, -0.7516,  ...,  3.1827,  2.3861, -2.6838],
          ...,
          [-0.8778, -0.0251, -0.2867,  ...,  4.7801, -2.0887, -4.5910],
          [-1.2309,  0.2852,  0.7693,  ...,  5.0635, -1.1529, -6.0175],
          [-1.7551,  1.1333, -0.0898,  ...,  4.1885, -3.3197, -5.7227]],
 
         [[ 0.9131, -4.9736, -0.6238,  ...,  0.2835, -0.3494, -0.4916],
          [ 1.0967, -6.0392, -0.7900,  ...,  0.2323,  0.0510,  0.0176],
          [ 1.3852, -5.8056, -1.2573,  ...,  0.0549, -0.3270, -0.4510],
          ...,
          [ 1.9385,  0.0726, -0.5395,  ...,  0.3877, -1.4914, -4.5422],
          [ 1.6399, -0.0860,  0.4701,  ...,  1.0180, -0.8897, -5.2614],
          [ 1.6084, -0.0669,  0.7341,  ...,  1.0633, -0.9713, -5.3548]],
 
         [[ 0.9131, -4.9736, -0.6238,  ...,  0.2835, -0.3494, -0.4916],
          [ 1.0967, -6.0392, -0.7900,  ...,  0.2323,  0.0510,  0.0176],
          [ 1.3852, -5.8056, -1.2573,  ...,  0.0549, -0.3270, -0.4510],
          ...,
          [ 1.9385,  0.0726, -0.5395,  ...,  0.3877, -1.4914, -4.5422],
          [ 1.6399, -0.0860,  0.4701,  ...,  1.0180, -0.8897, -5.2614],
          [ 1.6085, -0.0669,  0.7341,  ...,  1.0633, -0.9713, -5.3548]],
 
         [[ 0.9131, -4.9736, -0.6238,  ...,  0.2835, -0.3494, -0.4916],
          [ 1.0967, -6.0392, -0.7900,  ...,  0.2323,  0.0510,  0.0176],
          [ 1.3852, -5.8056, -1.2573,  ...,  0.0549, -0.3270, -0.4511],
          ...,
          [ 1.9385,  0.0726, -0.5395,  ...,  0.3876, -1.4914, -4.5422],
          [ 1.6399, -0.0860,  0.4701,  ...,  1.0180, -0.8897, -5.2614],
          [ 1.6084, -0.0669,  0.7341,  ...,  1.0633, -0.9713, -5.3548]]],
        device='cuda:0'),
 'x_prenorm': tensor([[[ 4.7546e-01, -3.4794e-02,  1.1905e+00,  ...,  3.3896e-01,
           -1.2591e+00, -8.1724e-03],
          [-5.2994e-01, -3.0311e-01, -2.0162e-01,  ...,  9.4372e-01,
            8.7399e-01, -3.2527e-01],
          [-1.5728e-01, -3.9359e-01, -2.1482e-01,  ...,  9.0485e-01,
            1.2325e+00, -3.3923e-01],
          ...,
          [-4.9091e-01,  1.1081e-02,  1.9814e-01,  ...,  2.0630e+00,
           -8.5562e-01, -7.6588e-01],
          [-6.0861e-01,  5.2204e-02,  6.6299e-01,  ...,  2.1127e+00,
           -3.8590e-01, -9.7335e-01],
          [-9.3785e-01,  1.2485e-01,  3.0359e-01,  ...,  1.9137e+00,
           -1.5223e+00, -1.0352e+00]],
 
         [[ 4.4059e-01,  1.4807e-01,  5.9425e-01,  ..., -3.4851e-01,
           -6.1687e-01,  2.0463e-01],
          [ 3.1511e-01, -3.3073e-01,  9.0955e-02,  ...,  1.3627e-01,
            1.8562e-02,  4.2850e-02],
          [ 3.8695e-01, -4.1345e-01,  2.8734e-02,  ...,  1.1916e-01,
            1.8061e-01,  1.2469e-01],
          ...,
          [ 6.3855e-01,  1.9967e-03,  5.6187e-02,  ...,  1.0780e-01,
           -5.0606e-01, -6.6095e-01],
          [ 5.6617e-01,  4.9071e-03,  4.8375e-01,  ...,  3.7527e-01,
           -2.6194e-01, -7.9524e-01],
          [ 5.6790e-01,  1.4408e-02,  6.0538e-01,  ...,  4.0537e-01,
           -2.9182e-01, -8.1226e-01]],
 
         [[ 4.4059e-01,  1.4807e-01,  5.9424e-01,  ..., -3.4851e-01,
           -6.1687e-01,  2.0463e-01],
          [ 3.1511e-01, -3.3073e-01,  9.0957e-02,  ...,  1.3627e-01,
            1.8564e-02,  4.2850e-02],
          [ 3.8695e-01, -4.1345e-01,  2.8733e-02,  ...,  1.1916e-01,
            1.8061e-01,  1.2469e-01],
          ...,
          [ 6.3855e-01,  1.9971e-03,  5.6186e-02,  ...,  1.0780e-01,
           -5.0606e-01, -6.6095e-01],
          [ 5.6617e-01,  4.9067e-03,  4.8375e-01,  ...,  3.7527e-01,
           -2.6194e-01, -7.9524e-01],
          [ 5.6790e-01,  1.4408e-02,  6.0538e-01,  ...,  4.0536e-01,
           -2.9182e-01, -8.1226e-01]],
 
         [[ 4.4059e-01,  1.4807e-01,  5.9424e-01,  ..., -3.4851e-01,
           -6.1687e-01,  2.0463e-01],
          [ 3.1511e-01, -3.3073e-01,  9.0956e-02,  ...,  1.3627e-01,
            1.8562e-02,  4.2849e-02],
          [ 3.8695e-01, -4.1344e-01,  2.8735e-02,  ...,  1.1916e-01,
            1.8061e-01,  1.2469e-01],
          ...,
          [ 6.3855e-01,  1.9964e-03,  5.6189e-02,  ...,  1.0780e-01,
           -5.0607e-01, -6.6095e-01],
          [ 5.6617e-01,  4.9066e-03,  4.8375e-01,  ...,  3.7527e-01,
           -2.6194e-01, -7.9524e-01],
          [ 5.6790e-01,  1.4408e-02,  6.0538e-01,  ...,  4.0537e-01,
           -2.9182e-01, -8.1226e-01]]], device='cuda:0'),
 'masks': None}

我们换一种可视化的方法:

patch_h = 75
patch_w = 50
feat_dim = 384
 
transform = T.Compose([
    T.GaussianBlur(9, sigma=(0.1, 2.0)),
    T.Resize((patch_h * 14, patch_w * 14)),
    T.CenterCrop((patch_h * 14, patch_w * 14)),
    T.ToTensor(),
    T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
 
dinov2_vits14 = torch.hub.load('', 'dinov2_vits14',source='local').cuda()
 
features = torch.zeros(4, patch_h * patch_w, feat_dim)
imgs_tensor = torch.zeros(4, 3, patch_h * 14, patch_w * 14).cuda()
 
img_path = f'/kaggle/input/demo-image/1 (4).png'
img = Image.open(img_path).convert('RGB')
imgs_tensor[0] = transform(img)[:3]
with torch.no_grad():
    features_dict = dinov2_vits14.forward_features(imgs_tensor)
    features = features_dict['x_norm_patchtokens']
    
features = features.reshape(4 * patch_h * patch_w, feat_dim).cpu()
pca = PCA(n_components=3)
pca.fit(features)
pca_features = pca.transform(features)
pca_features[:, 0] = (pca_features[:, 0] - pca_features[:, 0].min()) / (pca_features[:, 0].max() - pca_features[:, 0].min())
 
pca_features_fg = pca_features[:, 0] > 0.3
pca_features_bg = ~pca_features_fg
 
b = np.where(pca_features_bg)

pca.fit(features[pca_features_fg])
pca_features_rem = pca.transform(features[pca_features_fg])
for i in range(3):
    # transform using mean and std, I personally found this transformation gives a better visualization
    pca_features_rem[:, i] = (pca_features_rem[:, i] - pca_features_rem[:, i].mean()) / (pca_features_rem[:, i].std() ** 2) + 0.5

pca_features_rgb = pca_features.copy()
pca_features_rgb[pca_features_fg] = pca_features_rem
pca_features_rgb[b] = 0

pca_features_rgb = pca_features_rgb.reshape(4, patch_h, patch_w, 3)
plt.imshow(pca_features_rgb[0][...,::-1])
plt.savefig('features.png')
plt.show()
plt.close()

【计算机视觉】DINOv2(视觉大模型)代码使用和测试(完整的源代码),计算机视觉,计算机视觉,人工智能,DINOv2,视觉大模型

三、使用其他模型

3.1 使用vit_b14的模型

patch_h = 75
patch_w = 50
feat_dim = 768
 
transform = T.Compose([
    T.GaussianBlur(9, sigma=(0.1, 2.0)),
    T.Resize((patch_h * 14, patch_w * 14)),
    T.CenterCrop((patch_h * 14, patch_w * 14)),
    T.ToTensor(),
    T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
 
dinov2_vitb14 = torch.hub.load('', 'dinov2_vitb14',source='local').cuda()
 
features = torch.zeros(4, patch_h * patch_w, feat_dim)
imgs_tensor = torch.zeros(4, 3, patch_h * 14, patch_w * 14).cuda()
 
img_path = f'/kaggle/input/demo-image/1 (4).png'
img = Image.open(img_path).convert('RGB')
imgs_tensor[0] = transform(img)[:3]
with torch.no_grad():
    features_dict = dinov2_vitb14.forward_features(imgs_tensor)
    features = features_dict['x_norm_patchtokens']
    
features = features.reshape(4 * patch_h * patch_w, feat_dim).cpu()
pca = PCA(n_components=3)
pca.fit(features)
pca_features = pca.transform(features)
pca_features[:, 0] = (pca_features[:, 0] - pca_features[:, 0].min()) / (pca_features[:, 0].max() - pca_features[:, 0].min())
 
pca_features_fg = pca_features[:, 0] > 0.3
pca_features_bg = ~pca_features_fg
 
b = np.where(pca_features_bg)

pca.fit(features[pca_features_fg])
pca_features_rem = pca.transform(features[pca_features_fg])
for i in range(3):
    # transform using mean and std, I personally found this transformation gives a better visualization
    pca_features_rem[:, i] = (pca_features_rem[:, i] - pca_features_rem[:, i].mean()) / (pca_features_rem[:, i].std() ** 2) + 0.5

pca_features_rgb = pca_features.copy()
pca_features_rgb[pca_features_fg] = pca_features_rem
pca_features_rgb[b] = 0

pca_features_rgb = pca_features_rgb.reshape(4, patch_h, patch_w, 3)
plt.imshow(pca_features_rgb[0][...,::-1])
plt.savefig('features.png')
plt.show()
plt.close()

【计算机视觉】DINOv2(视觉大模型)代码使用和测试(完整的源代码),计算机视觉,计算机视觉,人工智能,DINOv2,视觉大模型

3.2 使用vit_l14的模型

patch_h = 75
patch_w = 50
feat_dim = 1024
 
transform = T.Compose([
    T.GaussianBlur(9, sigma=(0.1, 2.0)),
    T.Resize((patch_h * 14, patch_w * 14)),
    T.CenterCrop((patch_h * 14, patch_w * 14)),
    T.ToTensor(),
    T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
 
dinov2_vitl14 = torch.hub.load('', 'dinov2_vitl14',source='local').cuda()
 
features = torch.zeros(4, patch_h * patch_w, feat_dim)
imgs_tensor = torch.zeros(4, 3, patch_h * 14, patch_w * 14).cuda()
 
img_path = f'/kaggle/input/demo-image/1 (4).png'
img = Image.open(img_path).convert('RGB')
imgs_tensor[0] = transform(img)[:3]
with torch.no_grad():
    features_dict = dinov2_vitl14.forward_features(imgs_tensor)
    features = features_dict['x_norm_patchtokens']
    
features = features.reshape(4 * patch_h * patch_w, feat_dim).cpu()
pca = PCA(n_components=3)
pca.fit(features)
pca_features = pca.transform(features)
pca_features[:, 0] = (pca_features[:, 0] - pca_features[:, 0].min()) / (pca_features[:, 0].max() - pca_features[:, 0].min())
 
pca_features_fg = pca_features[:, 0] > 0.3
pca_features_bg = ~pca_features_fg
 
b = np.where(pca_features_bg)

pca.fit(features[pca_features_fg])
pca_features_rem = pca.transform(features[pca_features_fg])
for i in range(3):
    # transform using mean and std, I personally found this transformation gives a better visualization
    pca_features_rem[:, i] = (pca_features_rem[:, i] - pca_features_rem[:, i].mean()) / (pca_features_rem[:, i].std() ** 2) + 0.5

pca_features_rgb = pca_features.copy()
pca_features_rgb[pca_features_fg] = pca_features_rem
pca_features_rgb[b] = 0

pca_features_rgb = pca_features_rgb.reshape(4, patch_h, patch_w, 3)
plt.imshow(pca_features_rgb[0][...,::-1])
plt.savefig('features.png')
plt.show()
plt.close()

【计算机视觉】DINOv2(视觉大模型)代码使用和测试(完整的源代码),计算机视觉,计算机视觉,人工智能,DINOv2,视觉大模型

3.3 使用vit_g14的模型

patch_h = 75
patch_w = 50
feat_dim = 1536
 
transform = T.Compose([
    T.GaussianBlur(9, sigma=(0.1, 2.0)),
    T.Resize((patch_h * 14, patch_w * 14)),
    T.CenterCrop((patch_h * 14, patch_w * 14)),
    T.ToTensor(),
    T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
 
dinov2_vitg14 = torch.hub.load('', 'dinov2_vitg14',source='local').cuda()
 
features = torch.zeros(4, patch_h * patch_w, feat_dim)
imgs_tensor = torch.zeros(4, 3, patch_h * 14, patch_w * 14).cuda()
 
img_path = f'/kaggle/input/demo-image/1 (4).png'
img = Image.open(img_path).convert('RGB')
imgs_tensor[0] = transform(img)[:3]
with torch.no_grad():
    features_dict = dinov2_vitg14.forward_features(imgs_tensor)
    features = features_dict['x_norm_patchtokens']
    
features = features.reshape(4 * patch_h * patch_w, feat_dim).cpu()
pca = PCA(n_components=3)
pca.fit(features)
pca_features = pca.transform(features)
pca_features[:, 0] = (pca_features[:, 0] - pca_features[:, 0].min()) / (pca_features[:, 0].max() - pca_features[:, 0].min())
 
pca_features_fg = pca_features[:, 0] > 0.3
pca_features_bg = ~pca_features_fg
 
b = np.where(pca_features_bg)

pca.fit(features[pca_features_fg])
pca_features_rem = pca.transform(features[pca_features_fg])
for i in range(3):
    # transform using mean and std, I personally found this transformation gives a better visualization
    pca_features_rem[:, i] = (pca_features_rem[:, i] - pca_features_rem[:, i].mean()) / (pca_features_rem[:, i].std() ** 2) + 0.5

pca_features_rgb = pca_features.copy()
pca_features_rgb[pca_features_fg] = pca_features_rem
pca_features_rgb[b] = 0

pca_features_rgb = pca_features_rgb.reshape(4, patch_h, patch_w, 3)
plt.imshow(pca_features_rgb[0][...,::-1])
plt.savefig('features.png')
plt.show()
plt.close()

【计算机视觉】DINOv2(视觉大模型)代码使用和测试(完整的源代码),计算机视觉,计算机视觉,人工智能,DINOv2,视觉大模型文章来源地址https://www.toymoban.com/news/detail-598818.html

到了这里,关于【计算机视觉】DINOv2(视觉大模型)代码使用和测试(完整的源代码)的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 【计算机视觉】YOLOv8如何使用?(含源代码)

    comments description keywords true Boost your Python projects with object detection, segmentation and classification using YOLOv8. Explore how to load, train, validate, predict, export, track and benchmark models with ease. YOLOv8, Ultralytics, Python, object detection, segmentation, classification, model training, validation, prediction, model export, bench

    2024年02月04日
    浏览(39)
  • Azure 机器学习 - 使用 ONNX 对来自 AutoML 的计算机视觉模型进行预测

    本文介绍如何使用 Open Neural Network Exchange (ONNX) 对从 Azure 机器学习中的自动机器学习 (AutoML) 生成的计算机视觉模型进行预测。 关注TechLead,分享AI全维度知识。作者拥有10+年互联网服务架构、AI产品研发经验、团队管理经验,同济本复旦硕,复旦机器人智能实验室成员,阿里云

    2024年02月05日
    浏览(33)
  • Azure 机器学习 - 使用自动化机器学习训练计算机视觉模型的数据架构

    了解如何设置Azure Machine Learning JSONL 文件格式,以便在训练和推理期间在计算机视觉任务的自动化 ML 实验中使用数据。 关注TechLead,分享AI全维度知识。作者拥有10+年互联网服务架构、AI产品研发经验、团队管理经验,同济本复旦硕,复旦机器人智能实验室成员,阿里云认证的

    2024年02月05日
    浏览(35)
  • 【计算机视觉 | 目标检测】YOLO-NAS的介绍以及如何使用?(含源代码)

    Github 仓库: 参考QARepVGG,该方案引入了QSP与QCI模块以同时利用重参数与8-bit量化的优化; 该方案采用AutoNAC搜索最优尺寸、每个stage的结构,含模块类型、数量以及通道数; 采用混合量化机制进行模型量化,既考虑了每一层对精度与延迟的影响,也考虑了8-bit与16-bit之间切换对

    2024年02月03日
    浏览(32)
  • 13 计算机视觉-代码详解

    为了防止在训练集上过拟合,有两种办法,第一种是扩大训练集数量,但是需要大量的成本;第二种就是应用迁移学习,将源数据学习到的知识迁移到目标数据集,即在把在源数据训练好的参数和模型(除去输出层)直接复制到目标数据集训练。 13.2.1 获取数据集  13.2.2 初始

    2024年02月12日
    浏览(27)
  • 计算机视觉任务图像预处理之去除图像中的背景区域-------使用连通域分析算法(包含完整代码)

    通过连通域分析算法能够找到最大的连通域,即图片的主体部分,然后保存该连通域的最小外接矩阵,即可去除掉无关的背景区域 更多图像预处理操作工具集包含在这个github仓库中

    2024年02月06日
    浏览(37)
  • 计算机视觉基础知识(八)--点云模型

    三维图像 一种特殊的信息表达形式; 特征是表达的空间中有三个维度的数据; 是对一类信息的统称; 信息的表现形式: 深度图:以灰度表达物体与相机的距离 几何模型:由cad软件建立 点云模型:所有逆向工程设备都将物体采样为点云 和二维图像相比; 三维图像借助第三

    2024年01月25日
    浏览(41)
  • 计算机视觉领域经典模型汇总(2023.09.08

    一、RCNN系列 1、RCNN RCNN是用于目标检测的经典方法,其核心思想是将目标检测任务分解为两个主要步骤:候选区域生成和目标分类。 候选区域生成:RCNN的第一步是生成可能包含目标的候选区域,RCNN使用传统的计算机视觉技术,特别是 选择性搜索(Selective Search)算法 ,这是一

    2024年02月09日
    浏览(36)
  • 【学习笔记】计算机视觉深度学习网络模型

    这是本人学习计算机视觉CV领域深度学习模型的学习的一点点学习笔记,很多片子没有完成,可以作为学习的参考~

    2024年04月10日
    浏览(57)
  • 数据增强:让计算机视觉模型更加智能和有效

    作者:禅与计算机程序设计艺术 引言 1.1. 背景介绍 随着计算机视觉技术的快速发展,各种数据增强技术也应运而生。数据增强技术可以有效地提高计算机视觉模型的智能和有效性,从而在众多应用场景中取得更好的表现。 1.2. 文章目的 本文旨在阐述数据增强技术在计算机视

    2024年02月08日
    浏览(30)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包