【论文笔记】动态蛇卷积（Dynamic Snake Convolution）

1年前作者：justld分类：Toy博客阅读(13)违法举报

这篇具有很好参考价值的文章主要介绍了【论文笔记】动态蛇卷积（Dynamic Snake Convolution）。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

精确分割拓扑管状结构例如血管和道路，对医疗各个领域至关重要，可确保下游任务的准确性和效率。然而许多因素使分割任务变得复杂，包括细小脆弱的局部结构和复杂多变的全局形态。针对这个问题，作者提出了动态蛇卷积，该结构在管状分割任务上获得了极好的性能。

论文：Dynamic Snake Convolution based on Topological Geometric Constraints for Tubular Structure Segmentation

中文论文：拓扑几何约束管状结构分割的动态蛇卷积

代码：https://github.com/yaoleiqi/dscnet

一、适用场景

管状目标分割的特点是细长且复杂，标准卷积、空洞卷积无法更具目标特征调整关注区域，可变形卷积可以更具特征自适应学习感兴趣区域，但是对于管状目标，可变形卷积无法限制关注区域的连通性，而动态蛇卷积限制了关注区域的连通性，是的其更适合管状场景。

动态蛇形卷积,深度学习,CNN,语义分割,论文阅读,人工智能,深度学习

二、动态蛇卷积

对于一个标准3x3的2D卷积核K，其表示为：

动态蛇形卷积,深度学习,CNN,语义分割,论文阅读,人工智能,深度学习

为了赋予卷积核更多灵活性，使其能够聚焦于目标的复杂几何特征，受到可变形卷积的启发，引入了变形偏移 ∆。然而，如果模型被完全自由地学习变形偏移，感知场往往会偏离目标，特别是在处理细长管状结构的情况下。因此，作者采用了一个迭代策略（下图），依次选择每个要处理的目标的下一个位置进行观察，从而确保关注的连续性，不会由于大的变形偏移而将感知范围扩散得太远。

动态蛇形卷积,深度学习,CNN,语义分割,论文阅读,人工智能,深度学习

在动态蛇形卷积中，作者将标准卷积核在 x 轴和 y 轴方向都进行了直线化。考虑一个大小为 9 的卷积核，以 x 轴方向为例，K 中每个网格的具体位置表示为：Ki±c = (xi±c, yi±c)，其中 c = 0, 1, 2, 3, 4 表示距离中心网格的水平距离。卷积核 K 中每个网格位置 Ki±c 的选择是一个累积过程。从中心位置 Ki 开始，远离中心网格的位置取决于前一个网格的位置：Ki+1 相对于 Ki 增加了偏移量 ∆ = {δ|δ ∈ [−1, 1]}。因此，偏移量需要进行累加 Σ，从而确保卷积核符合线性形态结构。上图中 x 轴方向的变化为：

动态蛇形卷积,深度学习,CNN,语义分割,论文阅读,人工智能,深度学习

y轴方向的变化为：

动态蛇形卷积,深度学习,CNN,语义分割,论文阅读,人工智能,深度学习

由于偏移量 ∆ 通常是小数，然而坐标通常是整数形式，因此采用双线性插值，表示为：

动态蛇形卷积,深度学习,CNN,语义分割,论文阅读,人工智能,深度学习

其中，K 表示方程 2和方程 3的小数位置，K′ 列举所有整数空间位置，B 是双线性插值核，可以分解为两个一维核，即：

动态蛇形卷积,深度学习,CNN,语义分割,论文阅读,人工智能,深度学习

再给个整体图：

动态蛇形卷积,深度学习,CNN,语义分割,论文阅读,人工智能,深度学习

三、代码

蛇卷积的代码如下：文章来源地址https://www.toymoban.com/news/detail-779931.html

# -*- coding: utf-8 -*-
import os
import torch
from torch import nn
import einops


"""Dynamic Snake Convolution Module"""


class DSConv_pro(nn.Module):
    def __init__(
        self,
        in_channels: int = 1,
        out_channels: int = 1,
        kernel_size: int = 9,
        extend_scope: float = 1.0,
        morph: int = 0,
        if_offset: bool = True,
        device: str | torch.device = "cuda",
    ):
        """
        A Dynamic Snake Convolution Implementation

        Based on:

            TODO

        Args:
            in_ch: number of input channels. Defaults to 1.
            out_ch: number of output channels. Defaults to 1.
            kernel_size: the size of kernel. Defaults to 9.
            extend_scope: the range to expand. Defaults to 1 for this method.
            morph: the morphology of the convolution kernel is mainly divided into two types along the x-axis (0) and the y-axis (1) (see the paper for details).
            if_offset: whether deformation is required,  if it is False, it is the standard convolution kernel. Defaults to True.

        """

        super().__init__()

        if morph not in (0, 1):
            raise ValueError("morph should be 0 or 1.")

        self.kernel_size = kernel_size
        self.extend_scope = extend_scope
        self.morph = morph
        self.if_offset = if_offset
        self.device = torch.device(device)
        self.to(device)

        # self.bn = nn.BatchNorm2d(2 * kernel_size)
        self.gn_offset = nn.GroupNorm(kernel_size, 2 * kernel_size)
        self.gn = nn.GroupNorm(out_channels // 4, out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.tanh = nn.Tanh()

        self.offset_conv = nn.Conv2d(in_channels, 2 * kernel_size, 3, padding=1)

        self.dsc_conv_x = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=(kernel_size, 1),
            stride=(kernel_size, 1),
            padding=0,
        )
        self.dsc_conv_y = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=(1, kernel_size),
            stride=(1, kernel_size),
            padding=0,
        )

    def forward(self, input: torch.Tensor):
        # Predict offset map between [-1, 1]
        offset = self.offset_conv(input)
        # offset = self.bn(offset)
        offset = self.gn_offset(offset)
        offset = self.tanh(offset)

        # Run deformative conv
        y_coordinate_map, x_coordinate_map = get_coordinate_map_2D(
            offset=offset,
            morph=self.morph,
            extend_scope=self.extend_scope,
            device=self.device,
        )
        deformed_feature = get_interpolated_feature(
            input,
            y_coordinate_map,
            x_coordinate_map,
        )

        if self.morph == 0:
            output = self.dsc_conv_x(deformed_feature)
        elif self.morph == 1:
            output = self.dsc_conv_y(deformed_feature)

        # Groupnorm & ReLU
        output = self.gn(output)
        output = self.relu(output)

        return output


def get_coordinate_map_2D(
    offset: torch.Tensor,
    morph: int,
    extend_scope: float = 1.0,
    device: str | torch.device = "cuda",
):
    """Computing 2D coordinate map of DSCNet based on: TODO

    Args:
        offset: offset predict by network with shape [B, 2*K, W, H]. Here K refers to kernel size.
        morph: the morphology of the convolution kernel is mainly divided into two types along the x-axis (0) and the y-axis (1) (see the paper for details).
        extend_scope: the range to expand. Defaults to 1 for this method.
        device: location of data. Defaults to 'cuda'.

    Return:
        y_coordinate_map: coordinate map along y-axis with shape [B, K_H * H, K_W * W]
        x_coordinate_map: coordinate map along x-axis with shape [B, K_H * H, K_W * W]
    """

    if morph not in (0, 1):
        raise ValueError("morph should be 0 or 1.")

    batch_size, _, width, height = offset.shape
    kernel_size = offset.shape[1] // 2
    center = kernel_size // 2
    device = torch.device(device)

    y_offset_, x_offset_ = torch.split(offset, kernel_size, dim=1)

    y_center_ = torch.arange(0, width, dtype=torch.float32, device=device)
    y_center_ = einops.repeat(y_center_, "w -> k w h", k=kernel_size, h=height)

    x_center_ = torch.arange(0, height, dtype=torch.float32, device=device)
    x_center_ = einops.repeat(x_center_, "h -> k w h", k=kernel_size, w=width)

    if morph == 0:
        """
        Initialize the kernel and flatten the kernel
            y: only need 0
            x: -num_points//2 ~ num_points//2 (Determined by the kernel size)
        """
        y_spread_ = torch.zeros([kernel_size], device=device)
        x_spread_ = torch.linspace(-center, center, kernel_size, device=device)

        y_grid_ = einops.repeat(y_spread_, "k -> k w h", w=width, h=height)
        x_grid_ = einops.repeat(x_spread_, "k -> k w h", w=width, h=height)

        y_new_ = y_center_ + y_grid_
        x_new_ = x_center_ + x_grid_

        y_new_ = einops.repeat(y_new_, "k w h -> b k w h", b=batch_size)
        x_new_ = einops.repeat(x_new_, "k w h -> b k w h", b=batch_size)

        y_offset_ = einops.rearrange(y_offset_, "b k w h -> k b w h")
        y_offset_new_ = y_offset_.detach().clone()

        # The center position remains unchanged and the rest of the positions begin to swing
        # This part is quite simple. The main idea is that "offset is an iterative process"

        y_offset_new_[center] = 0

        for index in range(1, center + 1):
            y_offset_new_[center + index] = (
                y_offset_new_[center + index - 1] + y_offset_[center + index]
            )
            y_offset_new_[center - index] = (
                y_offset_new_[center - index + 1] + y_offset_[center - index]
            )

        y_offset_new_ = einops.rearrange(y_offset_new_, "k b w h -> b k w h")

        y_new_ = y_new_.add(y_offset_new_.mul(extend_scope))

        y_coordinate_map = einops.rearrange(y_new_, "b k w h -> b (w k) h")
        x_coordinate_map = einops.rearrange(x_new_, "b k w h -> b (w k) h")

    elif morph == 1:
        """
        Initialize the kernel and flatten the kernel
            y: -num_points//2 ~ num_points//2 (Determined by the kernel size)
            x: only need 0
        """
        y_spread_ = torch.linspace(-center, center, kernel_size, device=device)
        x_spread_ = torch.zeros([kernel_size], device=device)

        y_grid_ = einops.repeat(y_spread_, "k -> k w h", w=width, h=height)
        x_grid_ = einops.repeat(x_spread_, "k -> k w h", w=width, h=height)

        y_new_ = y_center_ + y_grid_
        x_new_ = x_center_ + x_grid_

        y_new_ = einops.repeat(y_new_, "k w h -> b k w h", b=batch_size)
        x_new_ = einops.repeat(x_new_, "k w h -> b k w h", b=batch_size)

        x_offset_ = einops.rearrange(x_offset_, "b k w h -> k b w h")
        x_offset_new_ = x_offset_.detach().clone()

        # The center position remains unchanged and the rest of the positions begin to swing
        # This part is quite simple. The main idea is that "offset is an iterative process"

        x_offset_new_[center] = 0

        for index in range(1, center + 1):
            x_offset_new_[center + index] = (
                x_offset_new_[center + index - 1] + x_offset_[center + index]
            )
            x_offset_new_[center - index] = (
                x_offset_new_[center - index + 1] + x_offset_[center - index]
            )

        x_offset_new_ = einops.rearrange(x_offset_new_, "k b w h -> b k w h")

        x_new_ = x_new_.add(x_offset_new_.mul(extend_scope))

        y_coordinate_map = einops.rearrange(y_new_, "b k w h -> b w (h k)")
        x_coordinate_map = einops.rearrange(x_new_, "b k w h -> b w (h k)")

    return y_coordinate_map, x_coordinate_map


def get_interpolated_feature(
    input_feature: torch.Tensor,
    y_coordinate_map: torch.Tensor,
    x_coordinate_map: torch.Tensor,
    interpolate_mode: str = "bilinear",
):
    """From coordinate map interpolate feature of DSCNet based on: TODO

    Args:
        input_feature: feature that to be interpolated with shape [B, C, H, W]
        y_coordinate_map: coordinate map along y-axis with shape [B, K_H * H, K_W * W]
        x_coordinate_map: coordinate map along x-axis with shape [B, K_H * H, K_W * W]
        interpolate_mode: the arg 'mode' of nn.functional.grid_sample, can be 'bilinear' or 'bicubic' . Defaults to 'bilinear'.

    Return:
        interpolated_feature: interpolated feature with shape [B, C, K_H * H, K_W * W]
    """

    if interpolate_mode not in ("bilinear", "bicubic"):
        raise ValueError("interpolate_mode should be 'bilinear' or 'bicubic'.")

    y_max = input_feature.shape[-2] - 1
    x_max = input_feature.shape[-1] - 1

    y_coordinate_map_ = _coordinate_map_scaling(y_coordinate_map, origin=[0, y_max])
    x_coordinate_map_ = _coordinate_map_scaling(x_coordinate_map, origin=[0, x_max])

    y_coordinate_map_ = torch.unsqueeze(y_coordinate_map_, dim=-1)
    x_coordinate_map_ = torch.unsqueeze(x_coordinate_map_, dim=-1)

    # Note here grid with shape [B, H, W, 2]
    # Where [:, :, :, 2] refers to [x ,y]
    grid = torch.cat([x_coordinate_map_, y_coordinate_map_], dim=-1)

    interpolated_feature = nn.functional.grid_sample(
        input=input_feature,
        grid=grid,
        mode=interpolate_mode,
        padding_mode="zeros",
        align_corners=True,
    )

    return interpolated_feature


def _coordinate_map_scaling(
    coordinate_map: torch.Tensor,
    origin: list,
    target: list = [-1, 1],
):
    """Map the value of coordinate_map from origin=[min, max] to target=[a,b] for DSCNet based on: TODO

    Args:
        coordinate_map: the coordinate map to be scaled
        origin: original value range of coordinate map, e.g. [coordinate_map.min(), coordinate_map.max()]
        target: target value range of coordinate map,Defaults to [-1, 1]

    Return:
        coordinate_map_scaled: the coordinate map after scaling
    """
    min, max = origin
    a, b = target

    coordinate_map_scaled = torch.clamp(coordinate_map, min, max)

    scale_factor = (b - a) / (max - min)
    coordinate_map_scaled = a + scale_factor * (coordinate_map_scaled - min)

    return coordinate_map_scaled

到了这里，关于【论文笔记】动态蛇卷积（Dynamic Snake Convolution）的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：如若内容造成侵权/违法违规/事实不符，请点击违法举报进行投诉反馈，一经查实，立即删除！

分享到：

领支付宝红包赞助服务器费用

使用动态网格的流体动画 Fluid Animation with Dynamic Meshes 论文阅读笔记
原文： Klingner, Bryan M., et al. “Fluid animation with dynamic meshes.” ACM SIGGRAPH 2006 Papers. 2006. 820-825. 使用 [Alliez et al., 2005] 的方法动态生成不规则的四面体网格根据边界的位置、边界的形状、基于流体和速度场的视觉重点部分的标准来构建一个尺寸场。这个尺寸场表明要生成的四面体
2024年02月21日
浏览(5)
行列可分离卷积 separable convolution
概述推导知识沙漠中的一点扩展如有纰漏错误，恳请指正:D 行列可分离卷积(separable convolution)主要应用于图像处理算法中，用于将一遍2D离散卷积(也称滤波，下文交替使用)操作分离成2遍1D卷积操作。如果图像像素数为 m m m ，卷积核(也称卷积模板、模板)大小为 k ∗ k k*k k ∗
2024年02月03日
浏览(6)
论文阅读：通过时空生成卷积网络合成动态模式(重点论文)
原文链接 github code 介绍视频视频序列包含丰富的动态模式，例如在时域中表现出平稳性的动态纹理模式，以及在空间或时域中表现出非平稳的动作模式。我们证明了时空生成卷积网络可用于建模和合成动态模式。该模型定义了视频序列上的概率分布，对数概率由时空ConvN
2024年01月19日
浏览(9)
论文笔记: NSG: Neural Scene Graphs for Dynamic Scenes
对动态场景进行渲染，完成动态前景与背景分离、背景inpainting、动态前景编辑和新视角生成。之前的方法如nerf只能够渲染静态场景（利用的是静态场景在多视角下的一致性），如将整张图像场景中的所有物体编码进单个神经网络中，缺乏表征动态物体和将这些物体分解为单
2024年01月16日
浏览(7)
Super Resolve Dynamic Scene from Continuous Spike Streams论文笔记
近期，脉冲相机在记录高动态场景中展示了其优越的潜力。不像传统相机将一个曝光时间内的视觉信息进行压缩成像，脉冲相机连续地输出二的脉冲流来记录动态场景，因此拥有极高的时间分辨率。而现有的脉冲相机重建方法主要集中在重建和脉冲相机相同分辨率的图像上。
2024年02月10日
浏览(14)
【论文笔记】DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets
原文链接：https://arxiv.org/abs/2301.06051 本文提出DSVT，一种通用的、部署友好的、基于transformer的3D主干，可用于多种基于点云处理的3D感知任务中。传统的稀疏点云特征提取方法，如PointNet系列和稀疏卷积，要么需要高计算力进行采样与分组，要么因为子流形卷积导致表达能力受
2024年02月05日
浏览(8)
【论文阅读】点云地图动态障碍物去除基准 A Dynamic Points Removal Benchmark in Point Cloud Maps
终于一次轮到了讲自己的paper了 hahaha，写个中文的解读放在博客方便大家讨论 Title Picture Reference and prenotes paper: https://arxiv.org/abs/2307.07260 code: https://github.com/KTH-RPL/DynamicMap_Benchmark b站：地图动态障碍物去除总结 ITSC’23: A Dynamic Points Removal Benchmark in Point Cloud Maps 主要就是2019年末
2024年02月06日
浏览(14)
深度学习入门——深度卷积神经网络模型（Deep Convolution Neural Network,DCNN）概述
机器学习是实现人工智能的方法和手段，其专门研究计算机如何模拟或实现人类的学习行为，以获取新的知识和技能，重新组织已有的知识结构使之不断改善自身性能的方法。计算机视觉技术作为人工智能的一个研究方向，其随着机器学习的发展而进步，尤其近10年来，以深
2024年02月13日
浏览(21)
一文看懂卷积运算（convolution）与互相关运算（cross-correlation）的区别
目录互相关运算定义互相关运算图示互相关运算完整计算示例卷积数学定义卷积运算图示卷积与互相关运算区别深度学习中的卷积为何能用互相关运算代替在二维互相关运算中，卷积窗口从输入数组的最左上方开始，按从左往右、从上往下的顺序，依次在输入数组上滑
2024年02月02日
浏览(8)
论文笔记：Dual Dynamic Spatial-Temporal Graph ConvolutionNetwork for Traffic Prediction
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2022 GCN和TCN被引入到交通预测中 GCN能够保留交通网络的图结构信息 TCN能够捕获交通流的时间特征基于GCN的交通预测方法依赖于如何构建图或邻接矩阵将道路段的交通测量作为节点通过不同道路段的直接连接来构建图道路段上的交通
2024年02月03日
浏览(26)