Leveraging Unlabeled Data for Crowd Counting by Learning to Rank-Toy模板网

这篇具有很好参考价值的文章主要介绍了Leveraging Unlabeled Data for Crowd Counting by Learning to Rank。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

无标签人群技术，作者引入了一种排名。
利用的是一个图的人群数量一定小于等于包含这个图的图

生成排名数据集

作者提出了一种自监督任务，利用的是一个图的人群数量一定小于等于包含这个图的图
Leveraging Unlabeled Data for Crowd Counting by Learning to Rank,深度学习,人群技术方法
流程：
1.以图像中心为中心，划分一个 $1/ r$ 图像大小的矩形（但是这里没写是面积的还是长宽的）
在这个矩形中，随机选择一个点当作锚点
2.以锚点为中心，找到一个不超过图像边界的正方形
3.重复 $k - 1$ 次，每次生成一个正方形，大小是上一个正方形的 $1/ s$ （也没说是面积还是长宽）
目测代码是这样写的

def generate_ranking(img, k, s, r):
    h, w, _ = img.shape
    center_h = h // 2
    center_w = w // 2
    region_h = int(h // (r**0.5))
    region_w = int(w // (r**0.5))
    left_h = center_h - region_h // 2
    left_w = center_w - region_w // 2
    right_h = left_h + region_h
    right_w = left_w + region_w

    anchor_h = np.random.randint(left_h, right_h)
    anchor_w = np.random.randint(left_w, right_w)
    radius = min(anchor_h, h - anchor_h, anchor_w, w - anchor_w)

    res = []
    for _ in range(k):
        res.append(img[anchor_h - int(radius):anchor_h + int(radius),
                   anchor_w - int(radius):anchor_w + int(radius)])
        radius *= float(s)

    return res

为了收集一个大的数据集，作者用了两种方法
Keyword query：google里搜索Crowded, Demonstration, Train station, Mall, Studio,
Beach

Query-by-example image retrieval：利用UCF CC 50，ShanghaiTech Part A， ShanghaiTech Part B，在google图搜图，每一张图选10张
Leveraging Unlabeled Data for Crowd Counting by Learning to Rank,深度学习,人群技术方法

Learning from ranked image sets

Crowd density estimation network

用的vgg16，去掉全连接，最后一个max pooling换成 $3 * 3$ 的卷积，把通道从512变为1，生成density map
模型就是图中的橙色部分
Leveraging Unlabeled Data for Crowd Counting by Learning to Rank,深度学习,人群技术方法

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
import torch
from torch import nn
from torchvision.models import vgg16, VGG16_Weights

class VGG(nn.Module):
    def __init__(self):
        super(VGG, self).__init__()
        vgg_16 = vgg16(weights=VGG16_Weights.DEFAULT)
        self.features = vgg16(weights=VGG16_Weights.DEFAULT).features
        temp = nn.Conv2d(512, 1, 3, 1, 1)
        nn.init.normal_(temp.weight, std=0.01)
        if temp.bias is not None:
            nn.init.constant_(temp.bias, 0)
        self.features[-1] = temp

    def forward(self, x):
        return self.features(x)


if __name__ == '__main__':
    model = VGG()
    B = 2
    a = torch.rand((B, 3, 224, 224))
    b = model(a)
    c = b.view(B, 1, -1)
    M = c.size(2)
    d = torch.mean(c, dim=-1)
    print(M)
    print(b.shape) # torch.Size([2, 1, 14, 14])
    print(c.shape) # torch.Size([2, 1, 196])
    print(d.shape) # torch.Size([2, 1])

标签的density map就是每一个点分别做一个标准差为1，大小为15的高斯核，损失用的MSE
为了进一步提升效果，我们随机采样一个正方形（56-448像素）

Crowd ranking network

这里针对的是没有标注的部分
简单来说就是对density map做average pooling，得到 $\hat{c}_i$ , 人群数量就是 $\hat{C}\left(I_i\right) = M \times \hat{c}\left(I_i\right)$

损失是一个排名hinge loss
$L_r = \max \left(0, \hat{c}\left(I_2\right) - \hat{c}\left(I_1\right) + \varepsilon\right)$
这里的 $\varepsilon=0$
这个loss就是要大的图片比小的图片排名靠前（人数更多）

损失只针对比他小文章来源地址https://www.toymoban.com/news/detail-817816.html

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
import torch
from torch import nn
import torch.nn.functional as F

class RankingLoss(nn.Module):
    def __init__(self, k, eps=0, reduction='sum'):
        super(RankingLoss, self).__init__()
        self.k = k
        self.eps = eps
        self.reduction = reduction

    def forward(self, x):
        B = x.size(0)
        assert B % self.k == 0
        loss = 0.
        cnt = 0
        for start in range(0, B, self.k):
            end = start + self.k
            for i in range(start, end):
                for j in range(i + 1, end):
                    loss += F.relu(x[j] - x[i] + self.eps)
                cnt += 1
        if self.reduction == 'mean':
            return loss / cnt
        return loss