ECCV2022论文列表（中英对照）

这篇具有很好参考价值的文章主要介绍了ECCV2022论文列表（中英对照）。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

Paper ID Paper Title 论文标题
8 Learning Uncoupled-Modulation CVAE for 3D Action-Conditioned Human Motion Synthesis 学习用于 3D 动作条件人体运动合成的非耦合调制 CVAE
16 Generative Domain Adaptation for Face Anti-Spoofing 人脸反欺骗的生成域自适应
19 Learning Depth from Focus in the Wild 从野外专注中学习深度
34 Relighting4D: Neural Relightable Human from Videos Relighting4D：来自视频的神经可重新照明人类
46 PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation PPT：token-Pruned Pose Transformer 用于单目和多视图人体姿态估计
52 Understanding the Dynamics of DNNs Using Graph Modularity 使用图模块化理解 DNN 的动态
59 Contrastive Deep Supervision 对比深度监督
65 Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective 可辨别性-可转移性权衡：信息论视角
69 Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World 基于学习的点云配准，用于现实世界中的 6D 对象姿态估计
74 AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing AvatarPoser：来自稀疏运动传感的铰接式全身姿势跟踪
75 Knowledge Condensation Distillation 知识浓缩蒸馏
83 CAR: Class-aware Regularizations for Semantic Segmentation CAR：语义分割的类感知正则化
86 Style-Hallucinated Dual Consistency Learning for Domain Generalized Semantic Segmentation 领域广义语义分割的风格幻觉双重一致性学习
88 Reducing Information Loss for Spiking Neural Networks 减少脉冲神经网络的信息丢失
95 Real-Time Intermediate Flow Estimation for Video Frame Interpolation 视频帧插值的实时中间流估计
101 Class-incremental Novel Class Discovery 类增量小说类发现
103 PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation PixelFolder：用于图像生成的高效渐进式像素合成网络
107 Minimal Neural Atlas: Parameterizing Complex Surfaces with Minimal Charts and Distortion 最小神经图谱：使用最小图表和失真参数化复杂表面
116 Towards Grand Unification of Object Tracking 迈向目标跟踪的大统一
121 Contrastive Prototypical Network with Wasserstein Confidence Penalty 具有 Wasserstein 置信度惩罚的对比原型网络
123 Privacy-Preserving Face Recognition with Learnable Privacy Budgets in Frequency Domain 具有可学习的频域隐私预算的隐私保护人脸识别
125 SeqFormer: Sequential Transformer for Video Instance Segmentation SeqFormer：用于视频实例分割的顺序转换器
127 An End-to-End Transformer Model for Crowd Localization 用于人群定位的端到端 Transformer 模型
132 Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection 用于动态多模态 3D 对象检测的可变形特征聚合
140 Masked Generative Distillation 掩蔽生成蒸馏
145 Saliency Hierarchy Modeling via Generative Kernels for Salient Object Detection 通过生成内核进行显着性层次建模以进行显着性目标检测
154 Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification Tip-Adapter：用于 Few-shot 分类的 CLIP 的免训练自适应
160 Temporal Lift Pooling for Continuous Sign Language Recognition 用于连续手语识别的时间提升池
162 Estimating Spatially-Varying Lighting in Urban Scenes with Disentangled Representation 用解耦表示估计城市场景中的空间变化照明
167 MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes 更多：用于 3D 场景中密集字幕的多阶关系挖掘
168 In Defense of Online Models for Video Instance Segmentation 为视频实例分割的在线模型辩护
171 JPEG Artifacts Removal via Contrastive Representation Learning 通过对比表示学习去除 JPEG 伪影
180 Tackling Long-Tailed Category Distribution Under Domain Shifts 解决领域转移下的长尾类别分布
184 WeLSA: Learning To Predict 6D Pose From Weakly Labeled Data Using Shape Alignment WeLSA：学习使用形状对齐从弱标记数据中预测 6D 姿势
185 HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling HuMMan：用于多功能传感和建模的多模态 4D 人体数据集
190 Fine-grained Data Distribution Alignment for Post-Training Quantization 训练后量化的细粒度数据分布对齐
192 Few-shot Single-view 3D Reconstruction with Memory Prior Contrastive Network 基于记忆先验对比网络的 Few-shot 单视图 3D 重建
193 Graph R-CNN: Towards Accurate 3D Object Detection with Semantic-Decorated Local Graph Graph R-CNN：使用语义装饰的局部图实现准确的 3D 对象检测
194 ExtrudeNet: Unsupervised Inverse Sketch-and-Extrude for Shape Parsing ExtrudeNet：用于形状解析的无监督逆向草图和拉伸
196 P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation P-STMO：用于 3D 人体姿势估计的预训练空间时间多对一模型
205 Contrast-Phys: Unsupervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast Contrast-Phys：基于时空对比的无监督视频远程生理测量
213 PointScatter: Point Set Representation for Tubular Structure Extraction PointScatter：用于管状结构提取的点集表示
222 Panoptic Scene Graph Generation 全景场景图生成
229 D&D: Learning Human Dynamics from Dynamic Camera D&D：从动态相机学习人体动力学
247 StyleSwap: Style-Based Generator Empowers Robust Face Swapping StyleSwap：基于样式的生成器支持强大的人脸交换
248 Boosting Event Stream Super-Resolution with A Recurrent Neural Network 使用循环神经网络提升事件流超分辨率
249 Unknown-Oriented Learning for Open Set Domain Adaptation 面向未知的开放集域自适应学习
255 Unpaired Deep Image Dehazing Using Contrastive Disentanglement Learning 使用对比解缠结学习的非配对深度图像去雾
263 Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection 检查和链接：成对病变对应指南乳房 X 线照片质量检测
265 Generative Subgraph Contrast for Self-Supervised Graph Representation Learning 自监督图表示学习的生成子图对比
267 DVS-Voltmeter: Stochastic Process-based Event Simulator for Dynamic Vision Sensors DVS-Voltmeter：用于动态视觉传感器的基于随机过程的事件模拟器
268 Prototype-Guided Continual Adaptation for Class-Incremental Unsupervised Domain Adaptation 用于类增量无监督域自适应的原型引导持续自适应
283 SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding SiRi：基于变压器的视觉接地的简单选择性再训练机制
287 Benchmarking Omni-Vision Representation through the Lens of Visual Realms 通过视觉领域的镜头对 Omni-Vision 表示进行基准测试
291 Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing Paint2Pix：基于交互式绘画的渐进式图像合成和编辑
296 BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis BEAT：用于会话手势合成的大规模语义和情感多模态数据集
300 Active Pointly-Supervised Instance Segmentation 主动点监督实例分割
303 DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation DecoupleNet：用于域自适应语义分割的解耦网络
315 ByteTrack: Multi-Object Tracking by Associating Every Detection Box ByteTrack：通过关联每个检测框进行多对象跟踪
317 Robust Multi-Object Tracking by Marginal Inference 边际推理的鲁棒多目标跟踪
322 Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with Local Representation 双融合 ViT：将来自 Vision Transformer 的信息与本地表示双重融合
326 CATRE: Iterative Point Clouds Alignment for Category-level Object Pose Refinement CATRE：用于类别级对象姿势细化的迭代点云对齐
334 Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition 用于动作识别的时空自注意力建模与时间补丁移位
339 Efficient Long-Range Attention Network for Image Super-resolution 用于图像超分辨率的高效远程注意力网络
343 DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection DID-M3D：单目 3D 对象检测的解耦实例深度
349 FlowFormer: A Transformer Architecture for Optical Flow FlowFormer：用于光流的变压器架构
357 Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction 用于高光谱图像重建的粗到细稀疏变换器
358 An Embedded Feature Whitening Approach to Deep Neural Network Optimization 一种用于深度神经网络优化的嵌入式特征白化方法
361 Optimization over Disentangled Encoding: Unsupervised Cross-Domain Point Cloud Completion via Occlusion Factor Manipulation 解纠缠编码的优化：通过遮挡因子操作完成无监督跨域点云
362 Source-Free Domain Adaptation with Contrastive Domain Alignment and Self-supervised Exploration for Face Anti-Spoofing 具有对比域对齐和自监督探索的人脸反欺骗的无源域自适应
368 MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection MPPNet：用于 3D 时间对象检测的多帧特征与代理点交织
379 SdAE: Self-distillated Masked Autoencoder SdAE：自蒸馏掩码自动编码器
383 A Transformer-based Decoder for Semantic Segmentation with Multi-level Context Mining 基于 Transformer 的多级上下文挖掘语义分割解码器
399 Graph-constrained Contrastive Regularization for Semi-weakly Volumetric Segmentation 半弱体积分割的图约束对比正则化
401 Improving Vision Transformers by Revisiting High-frequency Components 通过重新审视高频组件来改进视觉变压器
405 Adaptive Co-Teaching for Unsupervised Monocular Depth Estimation 无监督单目深度估计的自适应协同教学
408 FurryGAN: High quality foreground-aware image synthesis FurryGAN：高质量的前景感知图像合成
413 On Mitigating Hard Clusters for Face Clustering 关于减轻人脸聚类的硬聚类
415 Recurrent Bilinear Optimization for Binary Neural Networks 二元神经网络的循环双线性优化
433 An Efficient Spatio-Temporal Pyramid Transformer for Action Detection 一种用于动作检测的高效时空金字塔变换器
434 LocVTP: Video-Text Pre-training for Temporal Localization LocVTP：时间定位的视频文本预训练
444 Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects 融合局部相似性用于基于检索的未见对象 3D 方向估计
458 Online Segmentation of LiDAR Sequences: Dataset and Algorithm LiDAR 序列的在线分割：数据集和算法
460 MVSTER: Epipolar Transformer for Efficient Multi-View Stereo MVSTER：用于高效多视图立体的对极变压器
463 Unsupervised Learning of 3D Semantic Keypoints with Mutual Reconstruction 具有相互重构的 3D 语义关键点的无监督学习
482 Generalizable Medical Image Segmentation via Random Amplitude Mixup and Domain-Specific Image Restoration 通过随机幅度混合和特定领域图像恢复的可概括医学图像分割
499 Demystifying Unsupervised Semantic Correspondence Estimation 揭开无监督语义对应估计的神秘面纱
513 Learning Shadow Correspondence for Video Shadow Detection 学习视频阴影检测的阴影对应
514 PolarMOT: How far can geometric relations take us in 3D multi-object tracking? PolarMOT：几何关系在 3D 多目标跟踪中能带我们走多远？
516 Few-Shot End-to-End Object Detection via Constantly Concentrated Encoding across Heads 通过跨头的持续集中编码进行少镜头端到端对象检测
525 MVDECOR: Multi-view Dense Correspondence Learning for Fine-grained 3D Segmentation MVDECOR：用于细粒度 3D 分割的多视图密集对应学习
537 Implicit Neural Representations for Image Compression 图像压缩的隐式神经表示
541 Cross-modal Prototype Driven Network for Radiology Report Generation 用于放射学报告生成的跨模式原型驱动网络
556 Scene Text Recognition with Permuted Autoregressive Sequence Models 具有置换自回归序列模型的场景文本识别
561 Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories 重新审视粒子视频：使用点轨迹通过遮挡进行跟踪
568 XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model XMem：使用 Atkinson-Shiffrin 记忆模型进行长期视频对象分割
570 SUPR: A Sparse Unified Part-Based Human Body Model SUPR：一种基于稀疏统一部位的人体模型
575 SCAM! Transferring humans between images with Semantic Cross Attention Modulation 骗局！使用语义交叉注意调制在图像之间转移人类
583 Q-FW: A Hybrid Classical-Quantum Frank-Wolfe for Quadratic Binary Optimization Q-FW：用于二次二进制优化的混合经典量子 Frank-Wolfe
584 Revisiting Point Cloud Simplification: A Learnable Feature Preserving Approach 重新审视点云简化：一种可学习的特征保留方法
599 Neural Architecture Search for Spiking Neural Networks 尖峰神经网络的神经架构搜索
601 Neuromorphic Data Augmentation for Training Spiking Neural Networks 用于训练尖峰神经网络的神经形态数据增强
602 RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild RelPose：预测野外单个物体的概率相对旋转
609 Human Trajectory Prediction via Neural Social Physics 基于神经社会物理学的人体轨迹预测
615 Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation 多人 3D 人体姿态估计的显式遮挡推理
617 Open-Set Semi-Supervised Object Detection 开放集半监督目标检测
626 R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis R2L：将神经辐射场蒸馏到神经光场以实现高效的新视图合成
629 Towards Open Set Video Anomaly Detection 迈向开放集视频异常检测
631 Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation 语义感知隐式神经音频驱动的视频人像生成
634 Object-Compositional Neural Implicit Surfaces 对象组合神经隐式曲面
636 Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields Sem2NeRF：将单视图语义掩码转换为神经辐射场
640 Long-tail Detection with Effective Class-Margins 具有有效类别边距的长尾检测
641 WaveGAN: Frequency-aware GAN for High-Fidelity Few-shot Image Generation WaveGAN：用于高保真少镜头图像生成的频率感知 GAN
642 Class-Agnostic Object Counting Robust to Intraclass Diversity 与类无关的对象计数对类内多样性具有鲁棒性
650 TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts TM2T：用于相互生成 3D 人体运动和文本的随机和标记化建模
652 Self-Distillation for Robust LiDAR Semantic Segmentation in Autonomous Driving 自动驾驶中鲁棒 LiDAR 语义分割的自蒸馏
654 Semi-Supervised Monocular 3D Object Detection by Multi-View Consistency 基于多视图一致性的半监督单目 3D 目标检测
655 Lidar Point Cloud Guided Monocular 3D Object Detection 激光雷达点云引导的单目 3D 目标检测
656 Structural Causal 3D Reconstruction 结构因果 3D 重建
669 SeqTR: A Simple yet Universal Network for Visual Grounding SeqTR：一个简单而通用的视觉接地网络
671 KD-MVS: Knowledge Distillation Based Self-supervised Learning for Multi-view Stereo KD-MVS：基于知识蒸馏的多视图立体自监督学习
685 When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition 当计数遇到 HMER：手写数学表达式识别的计数感知网络
689 Shape Matters: Deformable Patch Attack 形状很重要：可变形补丁攻击
690 PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection PTSEFormer：面向视频对象检测的渐进式时空增强型变压器
694 BEVFormer: Learning Bird-Eye-View Representations from Multi-View Images via Spatiotemporal Transformer BEVFormer：通过时空变换器从多视图图像中学习鸟瞰图表示
696 Detecting Tampered Scene Text in the Wild 在野外检测被篡改的场景文本
702 Projective Parallel Single-pixel Imaging to Overcome Global Illumination in 3D Structure Light Scanning 投影并行单像素成像克服 3D 结构光扫描中的全局照明
709 CelebV-HQ: A Large-Scale Video Facial Attributes Dataset CelebV-HQ：大规模视频面部属性数据集
710 Open-world Semantic Segmentation for LIDAR Point Clouds 激光雷达点云的开放世界语义分割
721 Burn After Reading: Online Adaptation for Cross-domain Streaming Data 阅后即焚：跨域流数据的在线适配
728 CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS 关闭：关于共享程度的课程学习，以实现更好的一次性 NAS
734 RigNet: Repetitive Image Guided Network for Depth Completion RigNet：用于深度完成的重复图像引导网络
735 ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound ECLIPSE：使用视觉和声音的高效远程视频检索
744 Streamable Neural Fields 可流式神经场
755 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds 2DPASS：激光雷达点云上的二维先验辅助语义分割
762 Where to Focus: Investigating Hierarchical Attention Relationship for Fine-Grained Visual Classification 关注点：研究细粒度视觉分类的层次注意关系
776 Mind the Gap in Distilling StyleGANs 注意提炼 StyleGAN 的差距
784 End-to-End Active Speaker Detection 端到端有源说话人检测
785 Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing 用于弱监督视听视频解析的联合模态标签去噪
790 Learn-to-Decompose: Cascaded Decomposition Network for Cross-Domain Few-Shot Facial Expression Recognition Learn-to-Decompose：用于跨域 Few-Shot 面部表情识别的级联分解网络
798 Learning with Recoverable Forgetting 用可恢复的遗忘学习
800 Masked Autoencoders for Point Cloud Self-supervised Learning 用于点云自监督学习的掩码自动编码器
803 RamGAN: Region Attentive Morphing GAN for Region-Level Makeup Transfer RamGAN：用于区域级化妆转移的区域注意变形 GAN
807 Efficient One Pass Self-distillation with Zipf’s Label Smoothing 使用 Zipf 的标签平滑实现高效的单程自蒸馏
812 DaViT: Dual Attention Vision Transformers DaViT：双注意力视觉变形金刚
815 OneFace: One Threshold for All OneFace：所有人的一个门槛
820 Semantic-Sparse Colorization Network for Deep Exemplar-based Colorization 用于基于深度示例的着色的语义稀疏着色网络
822 Vibration-based Uncertainty Estimation for Learning from Limited Supervision 从有限监督中学习的基于振动的不确定性估计
824 SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition 求救！自我中心动作识别中处理对象集的自监督学习
829 FADE: Fusing the Assets of Decoder and Encoder for Task-Agnostic Upsampling FADE：融合解码器和编码器的资产以进行与任务无关的上采样
833 VTC: Improving Video-Text Retrieval with User Comments VTC：使用用户评论改进视频文本检索
839 Less than Few: Self-Shot Video Instance Segmentation 少于少数：自拍视频实例分割
841 End-to-End Visual Editing with a Generatively Pre-Trained Artist 通过生成式预训练的艺术家进行端到端的视觉编辑
843 KING: Generating Safety-Critical Driving Scenarios for Robust Imitation via Kinematics Gradients KING：通过运动学梯度为鲁棒模仿生成安全关键驾驶场景
852 COUCH: Towards Controllable Human-chair Interactions 沙发：走向可控的人椅交互
859 MovieCuts: A New Dataset and Benchmark forCut Type Recognition 电影剪辑：剪辑类型识别的新数据集和基准
877 High-fidelity GAN Inversion with Padding Space 具有填充空间的高保真 GAN 反演
893 LiDAL: Inter-frame Uncertainty Based Active Learning for 3D LiDAR Semantic Segmentation LiDAL：基于帧间不确定性的 3D LiDAR 语义分割主动学习
897 Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning 最优框：通过强化学习调整带注释的边界框来提升端到端场景文本识别
910 Extract Free Dense Labels from CLIP 从 CLIP 中提取自由密集标签
912 Concurrent Subsidiary Supervision for Unsupervised Source-Free Domain Adaptation 无监督无源域适应的并发附属监督
913 Designing One Unified Framework for High-Fidelity Face Reenactment and Swapping 设计一个统一的高保真人脸再现和交换框架
919 Category-Level 6D Object Pose and Size Estimation using Self-Supervised Deep Prior Deformation Networks 使用自监督深度先验变形网络的类别级 6D 对象姿势和大小估计
927 Intrinsic Neural Fields: Learning Functions on Manifolds 内在神经场：流形上的学习函数
930 LaMAR: Benchmarking Localization and Mapping for Augmented Reality LaMAR：增强现实的基准定位和映射
933 3D Compositional Zero-shot Learning with DeCompositional Consensus 具有分解共识的 3D 组合零样本学习
939 Video Mask Transfiner for High-Quality Video Instance Segmentation 用于高质量视频实例分割的视频掩码转换器
940 FashionViL: Fashion-Focused Vision-and-Language Representation Learning FashionViL：以时尚为中心的视觉和语言表征学习
945 Adaptive Face Forgery Detection in Cross Domain 跨域自适应人脸伪造检测
958 LiP-Flow: Learning Inference-time Priors for Codec Avatars via Normalizing Flows in Latent Space LiP-Flow：通过对潜在空间中的流进行归一化来学习编解码器头像的推理时间先验
961 Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection 密集教师：用于半监督目标检测的密集伪标签
968 Metric Learning based Interactive Modulation for Real-World Super-Resolution 基于度量学习的真实世界超分辨率交互调制
971 Optimal Transport for Label-Efficient Visible-Infrared Person Re-Identification 标签高效可见红外人员重新识别的最佳传输
974 Frequency Domain Model Augmentation for Adversarial Attack 对抗性攻击的频域模型增强
977 Proposal-Free Temporal Action Detection via Global Segmentation Mask Learning 基于全局分割掩码学习的无提议时间动作检测
979 Sobolev Training for Implicit Neural Representations with Approximated Image Derivatives 具有近似图像导数的隐式神经表示的 Sobolev 训练
982 Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression 无监督夜间图像增强：当层分解遇到光效抑制时
986 Point-to-Box Network for Accurate Object Detection via Single Point Supervision 通过单点监督实现精确目标检测的点对盒网络
989 Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks 超低精度超分辨率网络的动态双可训练边界
993 Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors Make-A-Scene：具有人类先验的基于场景的文本到图像生成
999 Locality Guidance for Improving Vision Transformers on Tiny Datasets 在微小数据集上改进视觉转换器的局部性指导
1002 Weakly Supervised Object Localization through Inter-class Feature Similarity and Intra-class Appearance Consistency 通过类间特征相似性和类内外观一致性的弱监督目标定位
1003 Semi-Supervised Temporal Action Detection with Proposal-Free Masking 具有无提议掩蔽的半监督时间动作检测
1005 Neighborhood Collective Estimation for Noisy Label Identification and Correction 用于噪声标签识别和校正的邻域集体估计
1010 Zero-Shot Temporal Action Detection via Vision-Language Prompting 通过视觉语言提示进行零次时间动作检测
1011 Weakly Supervised Grounding for VQA in Vision-Language Transformers 视觉语言变压器中 VQA 的弱监督接地
1016 Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval 用于无监督视频检索的双流知识保留哈希
1018 Discover and Mitigate Unknown Biases with Debiasing Alternate Networks 使用去偏备用网络发现和减轻未知偏差
1020 Hierarchical Memory Learning for Fine-Grained Scene Graph Generation 用于细粒度场景图生成的分层记忆学习
1026 Improving Test-Time Adaptation via Shift-agnostic Weight Regularization and Nearest Source Prototypes 通过与 Shift 无关的权重正则化和最近源原型改进测试时间适应
1028 Automatic dense annotation of large-vocabulary sign language videos 大词汇量手语视频的自动密集注释
1029 Few-shot Class-incremental Learning via Entropy-regularized Data-free Replay 通过熵正则化无数据重放的少样本类增量学习
1035 Learning Instance-Specific Adaptation for Cross-Domain Segmentation 跨域分割的学习实例特定适应
1039 SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas SALVe：从稀疏全景图重建平面图的语义对齐验证
1044 Active Learning Strategies for Weakly-Supervised Object Detection 弱监督目标检测的主动学习策略
1049 3D Human Pose Estimation Using Möbius Graph Convolutional Networks 使用莫比乌斯图卷积网络的 3D 人体姿势估计
1055 Real-time Online Video Detection with Temporal Smoothing Transformers 使用时间平滑变压器进行实时在线视频检测
1060 3D-FM GAN: Towards 3D-Controllable Face Manipulation 3D-FM GAN：迈向 3D 可控人脸操作
1064 SinNeRF: Training Neural Radiance Field on Complex Scene from a Single Image SinNeRF：在单个图像的复杂场景上训练神经辐射场
1069 Entropy-driven Sampling and Training Scheme for Conditional Diffusion Generation 用于条件扩散生成的熵驱动采样和训练方案
1076 Identity-aware Hand Mesh Estimation and Personalization from RGB Images 来自 RGB 图像的身份感知手网格估计和个性化
1083 Practical and Scalable Desktop-based High-Quality Facial Capture 实用且可扩展的基于桌面的高质量面部捕捉
1084 TALLFormer: Temporal Action Localization with a Long-memory Transformer TALLFormer：使用长记忆转换器进行时间动作定位
1086 Unsupervised and Semi-supervised Bias Benchmarking in Face Recognition 人脸识别中的无监督和半监督偏差基准测试
1100 Domain Adaptive Hand Keypoint and Pixel Localization in the Wild 领域自适应手部关键点和野外像素定位
1103 Skeleton-free Pose Transfer for Stylized 3D Characters 风格化 3D 角色的无骨架姿势转移
1105 Differentiable Raycasting for Self-supervised Occupancy Forecasting 用于自我监督入住预测的微分光线投射
1109 InAction: Interpretable Action Decision Making for Autonomous Driving InAction：可解释的自动驾驶行动决策
1114 CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection CramNet：具有光线约束交叉注意的相机-雷达融合，用于稳健的 3D 对象检测
1118 CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video CycDA：从图像到视频学习的无监督循环域适应
1119 Latent Discriminant deterministic Uncertainty 潜在判别确定性不确定性
1129 Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation Auto-FedRL：多机构医学图像分割的联合超参数优化
1135 Image-based CLIP-Guided Essence Transfer 基于图像的 CLIP 引导的本质转移
1136 Prune Your Model Before Distill It 在蒸馏之前修剪你的模型
1155 S2N: Suppression-Strengthen Network for Event-based Recognition under Variant Illuminations S2N：变体照明下基于事件的识别的抑制强化网络
1159 MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval MILES：使用注入语言语义进行视频文本检索的视觉 BERT 预训练
1161 PASS: Part-Aware Self-Supervised Pre-Training for Person Re-Identification PASS：用于人员重新识别的部分感知自我监督预训练
1165 RegionCL: Exploring Contrastive Region Pairs for Self-supervised Representation Learning RegionCL：探索用于自监督表示学习的对比区域对
1174 Towards Data-Efficient Detection Transformers 迈向数据高效的检测变压器
1175 Label2Label: A Language Modeling Framework for Multi-Attribute Learning Label2Label：多属性学习的语言建模框架
1179 Anti-Retroactive Interference for Lifelong Learning 终身学习的反追溯干扰
1181 Emotion Recognition for Multiple Context Awareness 多语境感知的情绪识别
1182 Box-supervised Instance Segmentation with Level Set Evolution 具有水平集进化的框监督实例分割
1185 Tracking Objects as Pixel-wise Distributions 跟踪对象作为像素分布
1197 mc-BEiT: Multi-choice Discretization for Image BERT Pre-training mc-BEiT：图像 BERT 预训练的多项选择离散化
1198 Adaptive Cross-Domain Learning for Generalizable Person Re-Identification 用于可泛化人员重新识别的自适应跨域学习
1202 MetaGait: Learning to Learn an Omni Sample Adaptive Representation for Gait Recognition MetaGait：学习学习用于步态识别的全样本自适应表示
1203 Bootstrapped Masked Autoencoders for Vision BERT Pretraining 用于视觉 BERT 预训练的自举掩码自动编码器
1209 Masked Discrimination for Self-Supervised Learning on Point Clouds 点云自监督学习的掩蔽判别
1212 CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation CMD：具有跨模态相互蒸馏的自我监督 3D 动作表示学习
1214 GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval GEB+：通用事件边界字幕、接地和检索的基准
1225 FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling FAST-VQA：使用片段采样进行高效的端到端视频质量评估
1235 Learning to train a point cloud reconstruction network without matching 学习在没有匹配的情况下训练点云重建网络
1243 Long-Tailed Class Incremental Learning 长尾类增量学习
1247 CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving CODA：用于自动驾驶目标检测的真实道路拐角案例数据集
1248 Open-Vocabulary DETR with Conditional Matching 带条件匹配的开放词汇 DETR
1250 Towards Calibrated Hyper-sphere Representation via Distribution Overlap Coefficient for Long-tailed Learning 通过用于长尾学习的分布重叠系数实现校准的超球面表示
1253 CMT: Context-Matching-Guided Transformer for 3D Tracking in Point Clouds CMT：用于点云中 3D 跟踪的上下文匹配引导转换器
1257 Motion Inspired Unsupervised Perception and Prediction in Autonomous Driving 运动启发自动驾驶中的无监督感知和预测
1259 Unitail: Detecting, Reading, and Matching in Retail Scene Unitail：零售场景中的检测、读取和匹配
1272 FBNet: Feedback Network for Point Cloud Completion FBNet：点云补全的反馈网络
1275 DODA: Data-oriented Sim-to-Real Domain Adaptation for 3D Semantic Segmentation DODA：面向数据的 Sim-to-Real Domain Adaptation for 3D 语义分割
1276 Physically-Based Editing of Indoor Scene Lighting from a Single Image 从单个图像中基于物理的室内场景照明编辑
1277 Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining 通过观看 YouTube 视频学习驾驶：以动作为条件的对比策略预训练
1278 Multi-Curve Translator for High-Resolution Photorealistic Image Translation 用于高分辨率照片级图像转换的多曲线转换器
1280 Dynamic Metric Learning with Cross-Level Concept Distillation 跨层次概念蒸馏的动态度量学习
1287 Deep Bayesian Video Frame Interpolation 深度贝叶斯视频帧插值
1300 PanoFormer: Panorama Transformer for Indoor 360° Depth Estimation PanoFormer：用于室内 360° 深度估计的全景变压器
1312 Cross Attention Based Style Distribution for Controllable Person Image Synthesis 可控人物图像合成的基于交叉注意的风格分布
1315 Generative Meta-Adversarial Network for Unseen Object Navigation 用于看不见的对象导航的生成元对抗网络
1316 Unsupervised Visual Representation Learning by Synchronous Momentum Grouping 同步动量分组的无监督视觉表示学习
1317 OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers OSFormer：使用 Transformers 进行单阶段伪装实例分割
1321 Highly Accurate Dichotomous Image Segmentation 高精度二分图像分割
1322 KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints KeypointNeRF：使用关键点的相对空间编码来概括基于图像的体积化身
1326 MENet: a Memory-Based Network with Dual-Branch for Efficient Event Stream Processing MENet：具有双分支的基于内存的网络，用于高效的事件流处理
1330 Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals 正面或反面：走向语义一致的视觉反事实
1331 LEDNet: Joint Low-light Enhancement and Deblurring in the Dark LEDNet：在黑暗中联合弱光增强和去模糊
1336 RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering RC-MVSNet：具有神经渲染的无监督多视图立体
1342 StretchBEV: Stretching Future Instance Prediction Spatially and Temporally StretchBEV：在空间和时间上扩展未来实例预测
1344 AgeTransGAN for Facial Age Transformation with Rectified Performance Metrics AgeTransGAN 用于具有校正性能指标的面部年龄转换
1346 Boosting Supervised Dehazing Methods via Bi-level Patch Reweighting 通过双层补丁重加权提升监督去雾方法
1347 Detecting and Recovering Sequential DeepFake Manipulation 检测和恢复连续的 DeepFake 操作
1353 MTFormer: Multi-Task Learning via Transformer and Cross-Task Reasoning MTFormer：通过 Transformer 和跨任务推理进行多任务学习
1356 Prediction-Guided Distillation for Dense Object Detection 用于密集对象检测的预测引导蒸馏
1358 Towards Generic 3D Tracking in RGBD Videos: Benchmark and Baseline 走向 RGBD 视频中的通用 3D 跟踪：基准和基线
1364 C3P: Cross-domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation C3P：用于弱监督 3D 人体姿态估计的跨域姿态先验传播
1366 Adaptive Fine-Grained Sketch-Based Image Retrieval 基于自适应细粒度草图的图像检索
1376 Learning Ego 3D Representation as Ray Tracing 学习 Ego 3D 表示作为光线追踪
1380 Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling 使用预条件扩散采样加速基于分数的生成模型
1382 RCLane: Relay Chain Prediction for Lane Detection RCLane：车道检测的中继链预测
1384 GLASS: Global to Local Attention for Scene-Text Spotting GLASS：场景文本识别的全局到局部注意力
1394 Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding 用于长期 4D 点云视频理解的 Point Primitive Transformer
1395 Towards Efficient Adversarial Training on Vision Transformers 迈向视觉变形金刚的高效对抗训练
1396 Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal Distillation Drive&Segment：通过跨模态蒸馏对城市场景进行无监督语义分割
1397 Adaptive Agent Transformer for Few-shot Segmentation 用于少镜头分割的自适应代理转换器
1398 Expanding Language-Image Pretrained Models for General Video Recognition 扩展通用视频识别的语言-图像预训练模型
1407 Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation Using Bounding Boxes Box2Mask：使用边界框的弱监督 3D 语义实例分割
1408 Improving Few-Shot Part Segmentation using Coarse Supervision 使用粗监督改进 Few-Shot 零件分割
1412 Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation 视频语义分割的跨帧亲和度挖掘关系
1413 Pose-NDF: Modelling Human Pose Manifolds with Neural Distance Fields Pose-NDF：使用神经距离场对人体姿势流形建模
1414 Out-of-distribution Detection with Boundary Aware Learning 边界感知学习的分布外检测
1415 NeILF: Neural Incident Light Field for Physically-based Material Estimation NeILF：用于基于物理的材料估计的神经入射光场
1417 ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers ViewFormer：使用 Transformers 从少量图像中进行无 NeRF 神经渲染
1421 L-Tracing: Fast Light Visibility Estimation on Neural Surfaces by Sphere Tracing L-Tracing：通过球体追踪对神经表面进行快速光能见度估计
1424 ARF: Artistic Radiance fields ARF：艺术光辉领域
1425 Multiview Stereo with Cascaded Epipolar RAFT 具有级联对极 RAFT 的多视图立体
1439 What to Hide from Your Students: Attention-Guided Masked Image Modeling 对学生隐瞒什么：注意力引导的蒙版图像建模
1441 Static and Dynamic Concepts for Self-supervised Video Representation Learning 自监督视频表示学习的静态和动态概念
1447 Deep Partial Updating: Towards Communication Efficient Updating for On-device Inference 深度部分更新：面向设备上推理的通信高效更新
1448 Multimodal Object Detection via Probabilistic Ensembling 基于概率集成的多模态目标检测
1455 Gradient-based Uncertainty for Monocular Depth Estimation 单目深度估计的基于梯度的不确定性
1456 Flow-Guided Transformer for Video Inpainting 用于视频修复的流引导变压器
1468 Relationformer: A Unified Framework for Image-to-Graph Generation Relationformer：图像到图生成的统一框架
1469 ARAH: Animatable Volume Rendering of Articulated Human SDFs ARAH：关节式人体 SDF 的动画体积渲染
1471 Learning Hierarchy Aware Features for Reducing Mistake Severity 用于降低错误严重性的学习层次结构感知功能
1474 Exploiting Unlabeled Data with Vision and Language Models for Object Detection 使用视觉和语言模型利用未标记数据进行对象检测
1479 A Simple and Robust Correlation Filtering method for text-based person search 一种用于基于文本的人员搜索的简单且鲁棒的相关过滤方法
1482 Hunting Group Clues with Transformers for Social Group Activity Recognition 用变形金刚寻找群体线索以识别社会群体活动
1493 Quantized GAN for Complex Music Generation from Dance Videos 从舞蹈视频中生成复杂音乐的量化 GAN
1506 Not Just Streaks: Towards Ground Truth for Single Image Deraining 不只是条纹：单张图像去雨的基本事实
1511 HIVE: Evaluating the Human Interpretability of Visual Explanations HIVE：评估视觉解释的人类可解释性
1512 GAMa: Cross-view Video Geo-localization GAMa：跨视图视频地理定位
1516 Meta-Sampler: Almost-Universal yet Task-Oriented Sampling for Point Clouds 元采样器：点云的几乎通用但面向任务的采样
1517 Multi-Query Video Retrieval 多查询视频检索
1525 Waymo Open Dataset: Panoramic Video Panoptic Segmentation Waymo 开放数据集：全景视频全景分割
1531 MIME: Minority Inclusion for Majority Group Enhancement of AI Performance MIME：少数族裔包容多数群体增强 AI 性能
1534 Self-supervised Human Mesh Recovery with Cross-Representation Alignment 具有交叉表示对齐的自我监督人体网格恢复
1541 TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency TL；DW？总结具有任务相关性和跨模态显着性的教学视频
1542 A Perceptual Quality Metric for Video Frame Interpolation 视频帧插值的感知质量度量
1543 Adaptive Feature Interpolation for Low-Shot Image Generation 用于生成低拍摄图像的自适应特征插值
1544 Rethinking Learning Approaches for Long-Term Action Anticipation 重新思考长期行动预期的学习方法
1545 CenterFormer: Center-based Transformer for 3D Object Detection CenterFormer：用于 3D 对象检测的基于中心的 Transformer
1546 Object Manipulation via Visual Target Localization 通过视觉目标定位进行对象操作
1549 AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction AlignSDF：用于手对象重建的姿势对齐有符号距离场
1551 Shift-tolerant Perceptual Similarity Metric 移位容错感知相似度度量
1552 Revisiting a kNN-based Image Classification System with High-capacity Storage 重新审视具有大容量存储的基于 kNN 的图像分类系统
1557 Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing 充分利用文本语义来改进生物医学视觉语言处理
1561 Self-Supervised Sparse Representation for Video Anomaly Detection 用于视频异常检测的自监督稀疏表示
1567 CPO: Change Robust Panorama to Point Cloud Localization CPO：将 Robust Panorama 更改为点云本地化
1569 MonoPLFlowNet: Permutohedral Lattice FlowNet for Real-Scale 3D Scene Flow Estimation with Monocular Images MonoPLFlowNet：用于单目图像的实尺度 3D 场景流估计的 Permutohedral Lattice FlowNet
1576 DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning DLCFT：用于一般增量学习的深度线性持续微调
1578 Contrastive Positive Mining for Unsupervised 3D Action Representation Learning 无监督 3D 动作表示学习的对比正向挖掘
1580 Patch Similarity Aware Data-Free Quantization for Vision Transformers 视觉转换器的补丁相似性感知无数据量化
1586 Perception-Distortion Balanced ADMM Optimization for Single-Image Super-Resolution 单图像超分辨率的感知失真平衡 ADMM 优化
1588 TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation TransFGU：一种自上而下的细粒度无监督语义分割方法
1596 DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition DualFormer：用于高效视频识别的局部-全局分层变压器
1606 Hierarchical Contrastive Inconsistency Learning for Deepfake Video Detection Deepfake 视频检测的分层对比不一致性学习
1616 Watermark Vaccine: Adversarial Attacks to Prevent Watermark Removal 水印疫苗：防止水印去除的对抗性攻击
1617 VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder VQFR：使用矢量量化字典和并行解码器的盲人脸恢复
1620 CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation CLIFF：将全帧位置信息携带到人体姿势和形状估计中
1625 ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO ECCV Caption：通过为 MS-COCO 收集机器和人工验证的图像标题关联来纠正假阴性
1626 Personalizing Federated Medical Image Segmentation via Local Calibration 通过局部校准个性化联合医学图像分割
1628 Learning to Detect Every Thing in an Open World 学习检测开放世界中的每一件事
1637 Pointly-Supervised Panoptic Segmentation 点监督全景分割
1648 MVP: Multimodality-guided Visual Pre-training MVP：多模态引导的视觉预训练
1649 Uncertainty Learning in Kernel Estimation for Multi-Stage Blind Image Super-Resolution 多阶段盲图像超分辨率核估计中的不确定性学习
1666 Physical Attack on Monocular Depth Estimation in Autonomous Driving with Optimal Adversarial Patches 具有最优对抗补丁的自动驾驶中单目深度估计的物理攻击
1670 KVT: $k$ -NN Attention for Boosting Vision Transformers KVT： $k$ -NN Attention for Boosting Vision Transformers
1673 Locally Varying Distance Transform for Unsupervised Visual Anomaly Detection 用于无监督视觉异常检测的局部变化距离变换
1676 Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation 用于无监督视频对象分割的分层特征对齐网络
1677 PalGAN: Image Colorization with Palette Generative Adversarial Networks PalGAN：使用调色板生成对抗网络进行图像着色
1687 Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis Fast-Vid2Vid：视频到视频合成的时空压缩
1693 Generative Negative Text Replay for Continual Vision-Language Pretraining 用于持续视觉语言预训练的生成负文本重放
1697 Learning Spatio-Temporal Downsampling for Effective Video Upscaling 学习时空下采样以实现有效的视频升级
1698 Geometric Representation Learning for Document Image Rectification 文档图像校正的几何表示学习
1701 ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer ASpanFormer：使用自适应跨度变压器进行无检测器图像匹配
1709 Egocentric Activity Recognition and Localization on a 3D Map 3D 地图上的以自我为中心的活动识别和定位
1710 Generative Adversarial Network for Future Hand Segmentation from Egocentric Video 从以自我为中心的视频中用于未来手部分割的生成对抗网络
1712 One-Shot Medical Landmark Localization by Edge-Guided Transform and Noisy Landmark Refinement 通过边缘引导变换和噪声地标细化的一次性医学地标定位
1721 Learning Prior Feature and Attention Enhanced Image Inpainting 学习先验特征和注意力增强图像修复
1729 Registration based Few-Shot Anomaly Detection 基于注册的少样本异常检测
1730 AdaAfford: Learning to Adapt Manipulation Affordance for 3D Articulated Objects via Few-shot Interactions AdaAfford：学习通过少量交互来调整 3D 关节物体的操作承受能力
1735 Video Graph Transformer for Video Question Answering 用于视频问答的视频图转换器
1737 A Reliable Online Method for Joint Estimation of Focal Length and Camera Rotation 一种可靠的焦距和相机旋转联合估计在线方法
1738 Learning Local Implicit Fourier Representation for Image Warping 学习用于图像变形的局部隐式傅里叶表示
1740 SepLUT: Separable Image-adaptive Lookup Tables for Real-time Image Enhancement SepLUT：用于实时图像增强的可分离图像自适应查找表
1742 A Level Set Theory for Neural Implicit Evolution under Explicit Flows 显式流下神经隐式进化的水平集理论
1744 Temporal-MPI: Enabling Multi-Plane Images for Dynamic Scene Modelling via Temporal Basis Learning 时间-MPI：通过时间基础学习为动态场景建模启用多平面图像
1746 Blind Image Decomposition 盲图像分解
1751 INT: Towards Infinite-frames 3D Detection with An Efficient Framework INT：使用高效框架实现无限帧 3D 检测
1756 MuLUT: Cooperating Multiple Look-Up Tables for Efficient Image Super-Resolution MuLUT：协作多个查找表以实现高效的图像超分辨率
1757 NDF: Neural Deformable Fields for Dynamic Human Modelling NDF：用于动态人体建模的神经可变形场
1759 MPIB: An MPI-Based Bokeh Rendering Framework for Realistic Partial Occlusion Effects MPIB：基于 MPI 的散景渲染框架，用于逼真的部分遮挡效果
1761 Neural Density-Distance Fields 神经密度-距离场
1762 MoDA: Map style transfer for self-supervised Domain Adaptation of embodied agents MoDA：用于体现代理的自我监督域适应的地图样式转移
1766 L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training L3：用于高分辨率、高吞吐量 DNN 训练的加速器友好无损图像格式
1780 Prior-Guided Adversarial Initialization for Fast Adversarial Training 用于快速对抗训练的先验引导对抗初始化
1790 Housekeep: Tidying Virtual Households using Commonsense Reasoning 管家：使用常识推理整理虚拟家庭
1791 Improving Robustness by Enhancing Weak Subnets 通过增强弱子网提高鲁棒性
1792 TO-Scene: A Large-scale Dataset for Understanding 3D Tabletop Scenes TO-Scene：用于理解 3D 桌面场景的大规模数据集
1804 Real-RawVSR: Real-World Raw Video Super-Resolution with a Benchmark Dataset Real-RawVSR：具有基准数据集的真实世界原始视频超分辨率
1807 ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning ST-P3：通过时空特征学习的端到端基于视觉的自动驾驶
1810 NeXT: Towards High Quality Neural Radiance Fields via Multi-Skip Transformer NeXT：通过多跳变压器实现高质量神经辐射场
1814 Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution 学习压缩视频超分辨率的时空频率变换器
1817 PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark PersFormer：通过 Perspective Transformer 和 OpenLane 基准进行 3D 车道检测
1819 Adversarial Partial Domain Adaptation by Cycle Inconsistency 循环不一致的对抗性部分域适应
1824 BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen Neural Networks BayesCap：冻结神经网络中校准不确定性的贝叶斯身份上限
1831 Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects 用于感知和抓取镜面和透明对象的域随机化增强深度模拟和恢复
1832 PS-NeRF: Neural Inverse Rendering for Multi-view Photometric Stereo PS-NeRF：多视图光度立体的神经逆向渲染
1845 DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation DeciWatch：10 倍高效 2D 和 3D 姿势估计的简单基线
1846 Hierarchical Latent Structure for Multi-Modal Vehicle Trajectory Forecasting 多模式车辆轨迹预测的分层潜在结构
1848 SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos SmoothNet：用于优化视频中人体姿势的即插即用网络
1851 Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency 分享给你的邻居：跨实例一致性的单视图重构
1852 End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution 具有稀疏提议演化的端到端弱监督目标检测
1853 PAC-Net: Highlight Your Video via History Preference Modeling PAC-Net：通过历史偏好建模突出显示您的视频
1859 Efficient Point Cloud Analysis Using Hilbert Curve 使用希尔伯特曲线进行高效点云分析
1860 Learning Online Multi-Sensor Depth Fusion 在线学习多传感器深度融合
1866 Self-Support Few-Shot Semantic Segmentation 自支持少镜头语义分割
1868 Few-Shot Object Detection with Model Calibration 带有模型校准的小样本目标检测
1870 S2-VER: Semi-Supervised Visual Emotion Recognition S2-VER：半监督视觉情绪识别
1882 Self-Supervision Can Be a Good Few-Shot Learner 自我监督可以成为一个很好的少数人学习者
1886 My View is the Best View: Procedure Learning from Egocentric Videos 我的观点是最好的观点：从以自我为中心的视频中学习的过程
1894 Trace Controlled Text to Image Generation 跟踪受控文本到图像生成
1925 Towards Comprehensive Representation Enhancement in Semantics-guided Self-supervised Monocular Depth Estimation 面向语义引导的自监督单目深度估计中的综合表示增强
1929 Calibration-free Multi-view Crowd Counting 免校准多视图人群计数
1930 Unsupervised Domain Adaptation for Monocular 3D Object Detection via Self-Training 通过自训练进行单目 3D 目标检测的无监督域自适应
1940 Online Continual Learning with Contrastive Vision Transformer 使用对比视觉转换器进行在线持续学习
1946 COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts COO：用于识别任意或截断文本的漫画拟声词数据集
1947 BungeeNeRF: Progressive Neural Radiance Field for Extreme Multiscale Scene Rendering BungeeNeRF：用于极端多尺度场景渲染的渐进式神经辐射场
1951 AiATrack: Attention in Attention for Transformer Visual Tracking AiATrack：Attention in Attention for Transformer 视觉跟踪
1952 Learning Invariant Visual Representations for Compositional Zero-Shot Learning 学习组合零样本学习的不变视觉表示
1954 Image Coding for Machines with Omnipotent Feature Learning 具有全能特征学习的机器的图像编码
1958 Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting 语言问题：用于场景文本检测和识别的弱监督视觉语言预训练方法
1959 MOTCOM: The Multi-Object Tracking Dataset Complexity Metric MOTCOM：多对象跟踪数据集复杂性度量
1980 How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning? 视频自监督学习中的基准敏感度有多严重？
1982 Rethinking Robust Representation Learning Under Fine-grained Noisy Faces 重新思考细粒度噪声下的鲁棒表示学习
1986 Feature Representation Learning for Unsupervised Cross-domain Image Retrieval 无监督跨域图像检索的特征表示学习
1987 Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation 用于少镜头分割的 4D 卷积 Swin 变压器的成本聚合
1988 Spatial-Frequency Domain Information Integration for Pan-sharpening 用于全色锐化的空间频域信息集成
1991 TOCH: Spatio-Temporal Object-to-Hand Correspondence for Motion Refinement TOCH：用于运动细化的时空对象与手的对应
1999 HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation HRDA：上下文感知高分辨率域自适应语义分割
2012 Combating Label Distribution Shift for Active Domain Adaptation 对抗主动域适应的标签分布转移
2016 GIPSO: Geometrically Informed Propagation for Online Adaptation in 3D LiDAR Segmentation GIPSO：3D LiDAR 分割中在线自适应的几何信息传播
2021 Adaptive Patch Exiting for Scalable Single Image Super-Resolution 可扩展单图像超分辨率的自适应补丁退出
2025 SuperLine3D: Self-supervised Line Segmentation and Description for LiDAR Point Cloud SuperLine3D：激光雷达点云的自监督线分割和描述
2031 Efficient Meta-Tuning for Content-aware Neural Video Delivery 内容感知神经视频交付的高效元调整
2033 PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation PoseTrans：用于人体姿势估计的简单而有效的姿势变换增强
2039 3D-Aware Semantic-Guided Generative Model for Human Synthesis 用于人体合成的 3D 感知语义引导生成模型
2041 Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality 通过正交改进 SVD 元层的协方差条件
2050 CoSMix: Compositional Semantic Mix for Domain Adaptation in 3D LiDAR Segmentation CoSMix：3D LiDAR 分割中域自适应的组合语义混合
2054 Streaming Multiscale Deep Equilibrium Models 流式多尺度深度平衡模型
2057 AvatarCap: Animatable Avatar Conditioned Monocular Human Volumetric Capture AvatarCap：可动画化身条件单目人体体积捕捉
2061 Hierarchical Average Precision Training for Pertinent Image Retrieval 相关图像检索的分层平均精度训练
2087 Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition Fashionformer：人类时尚分割和识别的简单、有效和统一的基线
2088 Out-of-Distribution Detection with Semantic Mismatch under Masking 掩蔽下语义不匹配的分布外检测
2104 Target-absent Human Attention 目标缺失的人类注意力
2105 Reference-based Image Super-Resolution with Deformable Attention Transformer 具有可变形注意变换器的基于参考的图像超分辨率
2116 Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers 使用 Transformers 进行 3D 人体网格恢复的解开模态的交叉注意
2118 Learning to Generate Realistic LiDAR Point Cloud 学习生成逼真的 LiDAR 点云
2124 GeoRefine: Self-Supervised Online Depth Refinement for Accurate Dense Mapping GeoRefine：用于精确密集映射的自我监督在线深度细化
2134 Transform your Smartphone into a DSLR Camera: Learning the ISP in the Wild 将您的智能手机变成数码单反相机：在野外学习 ISP
2138 Uncertainty-Based Spatial-Temporal Attention for Online Action Detection 用于在线动作检测的基于不确定性的时空注意
2144 Video Question Answering with Iterative Video-Text Co-Tokenization 使用迭代视频-文本联合标记化的视频问答
2145 LaTeRF: Label and Text Driven Object Radiance Fields LaTeRF：标签和文本驱动的对象辐射场
2146 Temporally Consistent Semantic Video Editing 时间一致的语义视频编辑
2149 SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation 用于异常检测和分割的 Spot-the-Difference 自监督预训练
2151 Exploring Plain Vision Transformer Backbones for Object Detection 探索用于目标检测的普通视觉变压器骨干网
2152 Fine-grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications 细粒度以自我为中心的手对象分割：数据集、模型和应用
2153 Perceptual Artifacts Localization for Inpainting 用于修复的感知伪影定位
2154 Is It Necessary to Transfer Temporal Knowledge for Domain Adaptive Video Semantic Segmentation? 域自适应视频语义分割是否需要传递时间知识？
2162 GIMO: Gaze-Informed Human Motion Prediction in Context GIMO：上下文中基于注视的人体运动预测
2166 Error Compensation Framework for Flow-Guided Video Inpainting 流引导视频修复的误差补偿框架
2170 Decomposing The Tangent of Occluding Boundaries According to Curvatures and Torsions 根据曲率和扭转分解遮挡边界的切线
2171 CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution CPrune：编译器通知模型修剪以实现高效的目标感知 DNN 执行
2179 Adversarially-Aware Robust Object Detector 对抗性感知鲁棒目标检测器
2180 Scraping Textures from Natural Images for Synthesis and Editing 从自然图像中抓取纹理以进行合成和编辑
2203 Self-supervised Learning of Visual Graph Matching 视觉图匹配的自监督学习
2206 Disentangling Architecture and Training for Optical Flow 解开光流的架构和训练
2217 PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation PointFix：学习修复域偏差以实现强大的在线立体适应
2218 Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition 教在哪里看：用于低分辨率人脸识别的注意力相似性知识蒸馏
2219 Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows Iwin：通过带有不规则窗口的 Transformer 进行人机交互检测
2221 Single Stage Virtual Try-on via Deformable Attention Flows 通过可变形注意力流进行单阶段虚拟试穿
2222 Learning Deep Non-Blind Image Deconvolution Without Ground Truths 在没有基本事实的情况下学习深度非盲图像反卷积
2233 Rethinking Zero-shot Action Recognition: Learning from Latent Atomic Actions 重新思考零样本动作识别：从潜在原子动作中学习
2234 NeuRIS: Neural Reconstruction of Indoor Scenes Using Normal Priors NeuRIS：使用正常先验的室内场景的神经重建
2237 Rethinking Data Augmentation for Robust Visual Question Answering 重新思考数据增强以实现强大的视觉问答
2240 Dual-Domain Self-Supervised Learning and Model Adaption for Deep Compressive Imaging 深度压缩成像的双域自监督学习和模型自适应
2242 Explicit Image Caption Editing 显式图像标题编辑
2255 SphereFed: Hyperspherical Federated Learning SphereFed：超球面联邦学习
2257 Local Color Distributions Prior for Image Enhancement 图像增强的局部颜色分布先验
2267 Teaching with Soft Label Smoothing for Mitigating Noisy Labels in Facial Expressions 使用软标签平滑来减轻面部表情中的嘈杂标签的教学
2269 Multi-Modal Masked Pre-Training for Monocular Panoramic Depth Completion 单目全景深度完成的多模态掩蔽预训练
2272 2D Amodal Instance Segmentation Guided by 3D Shape Prior 由 3D 形状先验引导的 2D Amodal 实例分割
2280 How to Synthesize a Large-Scale and Trainable Micro-Expression Dataset? 如何合成大规模且可训练的微表情数据集？
2282 RFNet-4D: Joint Object Reconstruction and Flow Estimation from 4D Point Clouds RFNet-4D：来自 4D 点云的联合对象重建和流量估计
2285 HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors HEAD：异构物体检测器的 HEtero 辅助蒸馏
2290 Generalizable Patch-Based Neural Rendering 可泛化的基于补丁的神经渲染
2293 Meta Spatio-Temporal Debiasing for Video Scene Graph Generation 用于视频场景图生成的元时空去偏
2307 A Sliding Window Scheme for Online Temporal Action Localization 一种在线时间动作定位的滑动窗口方案
2310 Ultra-high-resolution unpaired stain transformation via Kernelized Instance Normalization 通过核化实例归一化实现超高分辨率非配对染色转换
2311 SESS: Saliency Enhancing with Scaling and Sliding SESS：通过缩放和滑动增强显着性
2312 Data Efficient 3D Learner via Knowledge Transferred from 2D Model 通过从 2D 模型转移的知识实现数据高效的 3D 学习器
2319 MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis MeshMAE：用于 3D 网格数据分析的掩码自动编码器
2327 ERA: Expert Retrieval and Assembly for Early Action Prediction ERA：早期行动预测的专家检索和组装
2328 Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection 在 HOI 检测中挖掘身体部位交互性学习的跨人线索
2334 Improving GANs for Long-Tailed Data through Group Spectral Regularization 通过群谱正则化改进长尾数据的 GAN
2336 Hierarchical Semantic Regularization of Latent Spaces in StyleGANs StyleGAN 中潜在空间的分层语义正则化
2337 Symmetry Regularization and Saturating Nonlinearity for Robust Quantization 稳健量化的对称正则化和饱和非线性
2350 IntereStyle: Encoding an Interest Region for Robust StyleGAN Inversion IntereStyle：为鲁棒 StyleGAN 反转编码兴趣区域
2369 Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation 通过学习多尺度局部线性变换改进 RGB-D 点云配准
2373 Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis 学习动态面部辐射场以进行少量说话头部合成
2378 StyleLight: HDR Panorama Generation for Lighting Estimation and Editing StyleLight：用于照明估计和编辑的 HDR 全景图生成
2379 You Should Look at All Objects 您应该查看所有对象
2384 BRNet: Exploring Comprehensive Features for Monocular Depth Estimation BRNet：探索单目深度估计的综合特征
2385 A Perturbation-Constrained Adversarial Attack for Evaluating the Robustness of Optical Flow 一种用于评估光流鲁棒性的扰动约束对抗攻击
2403 CoupleFace: Relation Matters for Face Recognition Distillation CoupleFace：人脸识别蒸馏的关系问题
2404 Collaborating Domain-shared and Target-specific Feature Clustering for Cross-domain 3D Action Recognition 用于跨域 3D 动作识别的协作域共享和特定于目标的特征聚类
2406 Adaptive Spatial-BCE Loss for Weakly Supervised Semantic Segmentation 弱监督语义分割的自适应空间 BCE 损失
2418 Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement 通过逆运动学和细化进行多人 3D 姿势和形状估计
2423 Explaining Deepfake Detection by Analysing Image Matching 通过分析图像匹配来解释 Deepfake 检测
2424 L-CoDer: Language-based Colorization with Color-object Decoupling Transformer L-CoDer：使用颜色对象解耦变压器的基于语言的着色
2449 GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation GitNet：基于几何先验的鸟瞰图分割转换
2459 Unsupervised Deep Multi-Shape Matching 无监督深度多形状匹配
2463 GaitEdge: Beyond Plain End-to-end Gait Recognition for Better Practicality GaitEdge：超越普通的端到端步态识别，提高实用性
2483 EAutoDet: Efficient Architecture Search for Object Detection EAutoDet：用于对象检测的高效架构搜索
2485 A Max-Flow based Approach for Neural Architecture Search 一种基于 Max-Flow 的神经架构搜索方法
2488 Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding 洗牌视频是否有益于时间偏差问题：一种新的时间接地训练框架
2494 tSF: Transformer-based Semantic Filter for Few-Shot Learning tSF：基于 Transformer 的语义过滤器，用于 Few-Shot 学习
2501 Dense Gaussian Processes for Few-Shot Segmentation 用于小样本分割的密集高斯过程
2507 Adversarial Feature Augmentation for Cross-domain Few-shot Classification 跨域小样本分类的对抗性特征增强
2511 Real-Time Neural Character Rendering with Pose-Guided Multiplane Images 使用姿态引导的多平面图像进行实时神经字符渲染
2512 Constructing Balance from Imbalance for Long-tailed Image Recognition 从不平衡中构建平衡用于长尾图像识别
2516 SparseNeuS: Fast Generalizable Neural Surface Reconstruction from Sparse Views SparseNeuS：基于稀疏视图的快速泛化神经表面重建
2526 Contrastive Monotonic Pixel-Level Modulation 对比单调像素级调制
2538 Dual Perspective Network for Audio Visual Event Localization 用于视听事件定位的双视角网络
2542 SiamDoGe: Domain Generalizable Semantic Segmentation using Siamese Network SiamDoGe：使用暹罗网络的域可概括语义分割
2545 Is Appearance Free Action Recognition Possible? 外观免费动作识别可能吗？
2557 Detecting Twenty-thousand Classes using Image-level Supervision 使用图像级监督检测两万个类别
2558 DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation DCL-Net：用于 6D 姿势估计的深度对应学习网络
2565 Learning Cross-Video Neural Representations for High-Quality Frame Interpolation 学习用于高质量帧插值的跨视频神经表示
2568 Learning Visibility for Robust Dense Human Body Estimation 鲁棒密集人体估计的学习可见性
2573 Texturify: Generating Textures on 3D Shape Surfaces 纹理化：在 3D 形状表面上生成纹理
2575 Unsupervised Selective Labeling for More Effective Semi-Supervised Learning 用于更有效的半监督学习的无监督选择性标记
2576 Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly 可靠的视觉问答：弃权而不是错误地回答
2581 Studying Bias in GANs through the Lens of Race 从种族的角度研究 GAN 中的偏见
2583 On Multi-Domain Long-Tailed Recognition, Imbalanced Domain Generalization and Beyond 关于多域长尾识别、不平衡域泛化等
2584 Disentangling Object Motion and Occlusion for Unsupervised Multi-frame Monocular Depth 解开无监督多帧单目深度的物体运动和遮挡
2586 Autoregressive 3D Shape Generation via Canonical Mapping 通过规范映射生成自回归 3D 形状
2589 Learning Continuous Implicit Representation for Near-Periodic Patterns 学习近周期模式的连续隐式表示
2596 Robust Landmark-based Stent Tracking in X-ray Fluoroscopy X 射线透视中基于地标的强大支架跟踪
2598 Depth Field Networks for Generalizable Multi-view Scene Representation 用于可泛化多视图场景表示的深度场网络
2601 Max Pooling with Vision Transformers reconciles class and shape in weakly supervised semantic segmentation 使用 Vision Transformers 的 Max Pooling 在弱监督语义分割中协调类和形状
2605 GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features GRIT：使用双重视觉特征的更快更好的图像字幕转换器
2609 Learning Semantic Correspondence with Sparse Annotations 使用稀疏注释学习语义对应
2610 A Real World Dataset for Multi-view 3D Reconstruction 用于多视图 3D 重建的真实世界数据集
2620 Social ODE: Multi-Agent Trajectory Forecasting with Neural Ordinary Differential Equations 社会 ODE：使用神经常微分方程的多智能体轨迹预测
2621 3D Instances as 1D Kernels 3D 实例作为 1D 内核
2623 Social-SSL: Self-Supervised Cross-Sequence Representation Learning Based on Transformers for Multi-Agent Trajectory Prediction Social-SSL：基于 Transformers 的自监督交叉序列表示学习，用于多智能体轨迹预测
2624 Context-Aware Streaming Perception in Dynamic Environments 动态环境中的上下文感知流感知
2625 PointTree: Transformation-Robust Point Cloud Encoder with Relaxed K-D Trees PointTree：具有松弛 KD 树的变换鲁棒点云编码器
2631 Dense Siamese Network for Dense Unsupervised Learning 用于密集无监督学习的密集连体网络
2633 Uncertainty-aware Multi-modal Learning via Cross-modal Random Network Prediction 通过跨模态随机网络预测的不确定性感知多模态学习
2638 Enhanced Accuracy and Robustness via Multi-Teacher Adversarial Distillation 通过多教师对抗蒸馏提高准确性和鲁棒性
2645 End-to-end graph-constrained vectorized floorplan generation with panoptic refinement 具有全景细化的端到端图约束矢量化平面图生成
2649 Context Enhanced Stereo Transformer 上下文增强型立体声变压器
2652 NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition NSNet：用于高效视频识别的非显着性抑制采样器
2657 SpOT: Spatiotemporal Modeling for 3D Object Tracking SpOT：用于 3D 对象跟踪的时空建模
2663 Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning 用于人体骨骼表示学习的分层自监督变压器
2666 Few-Shot Video Object Detection 少镜头视频目标检测
2667 Improving the Reliability for Confidence Estimation 提高置信度估计的可靠性
2686 Selective Query-guided Debiasing for Video Corpus Moment Retrieval 用于视频语料库时刻检索的选择性查询引导去偏
2688 Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition 了解艺术字：用于场景文本识别的角引导转换器
2691 Monocular 3D Object Detection with Depth from Motion 具有运动深度的单目 3D 对象检测
2701 Posterior Refinement on Metric Matrix Improves Generalization in Metric Learning 度量矩阵的后验改进提高了度量学习的泛化能力
2707 DISP6D: Disentangled Implicit Shape and Pose Learning for Scalable 6D Pose Estimation DISP6D：用于可扩展 6D 姿势估计的分离隐式形状和姿势学习
2709 Few-shot Image Generation with Mixup-based Distance Learning 使用基于混合的远程学习生成少镜头图像
2715 Data-Free Neural Architecture Search via Recursive Label Calibration 通过递归标签校准进行无数据神经架构搜索
2717 Distilling Object Detectors With Global Knowledge 用全球知识蒸馏目标检测器
2723 Fine-Grained Scene Graph Generation with Data Transfer 具有数据传输的细粒度场景图生成
2730 NEST: Neural Event Stack for Event-based Image Enhancement NEST：用于基于事件的图像增强的神经事件堆栈
2732 Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation 面向轻量级半监督语义分割的多粒度蒸馏方案
2740 A Style-Based GAN Encoder for High Fidelity Reconstruction of Images and Videos 一种基于样式的 GAN 编码器，用于图像和视频的高保真重建
2746 Unifying Visual Perception by Dispersible Points Learning 通过分散点学习统一视觉感知
2747 Towards High-Fidelity Single-view Holistic Reconstruction of Indoor Scenes 迈向室内场景的高保真单视图整体重建
2753 Balancing Stability and Plasticity through Advanced Null Space in Continual Learning 通过持续学习中的高级零空间平衡稳定性和可塑性
2756 Multimodal Transformer for Automatic 3D Annotation and Object Detection 用于自动 3D 注释和对象检测的多模态转换器
2761 SP-Net: Slowly Progressing Dynamic Inference Networks SP-Net：缓慢发展的动态推理网络
2764 No Token Left Behind: Explainability-Aided Image Classification and Generation 不遗余力：可解释性辅助图像分类和生成
2766 Dynamically Transformed Instance Normalization Network for Generalizable Person Re-Identification 用于可泛化人员重新识别的动态转换实例归一化网络
2772 Editable Indoor Lighting Estimation 可编辑的室内照明估计
2783 PseCo: Pseudo Labeling and Consistency Training for Semi-Supervised Object Detection PseCo：半监督目标检测的伪标签和一致性训练
2786 CompNVS: Novel View Synthesis with Scene Completion CompNVS：具有场景完成功能的新型视图合成
2787 Dynamic 3D Scene Analysis by Point Cloud Accumulation 通过点云累积进行动态 3D 场景分析
2798 FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs FakeCLR：探索对比学习以解决数据高效 GAN 中的潜在不连续性
2802 Resolving Copycat Problems in Visual Imitation Learning via Residual Action Prediction 通过残差动作预测解决视觉模仿学习中的模仿问题
2804 REALY: Rethinking the Evaluation of 3D Face Reconstruction REALY：重新思考 3D 人脸重建的评估
2806 TransMatting: Enhancing Transparent Objects Matting with Transformers TransMatting：使用变形金刚增强透明对象的抠图
2808 OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses OccamNets：通过支持更简单的假设来减轻数据集偏差
2814 Diverse Image Inpainting with Normalizing Flow 具有标准化流程的多样化图像修复
2818 Video Activity Localisation with Uncertainties in Temporal Boundary 具有时间边界不确定性的视频活动定位
2822 SketchSampler: Sketch-based 3D Reconstruction via View-dependent Depth Sampling SketchSampler：通过依赖于视图的深度采样进行基于草图的 3D 重建
2827 DisCo: Remedying Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning DisCo：用蒸馏对比学习纠正轻量级模型的自我监督学习
2829 Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection 探索分辨率和退化线索作为低质量目标检测的自监督信号
2840 CP2: Copy-Paste Contrastive Pretraining for Semantic Segmentation CP2：语义分割的复制粘贴对比预训练
2852 Learning from Multiple Annotator Noisy Labels via Sample-wise Label Fusion 通过 Sample-wise Label Fusion 从多个注释器噪声标签中学习
2856 Robust Category-Level 6D Pose Estimation with Coarse-to-Fine Rendering of Neural Features 具有神经特征粗到细渲染的鲁棒类别级 6D 姿势估计
2861 A Unified Framework for Domain Adaptive Pose Estimation 域自适应姿态估计的统一框架
2862 A Broad Study of Pre-training for Domain Generalization and Adaptation 领域泛化和适应预训练的广泛研究
2863 BlobGAN: Spatially Disentangled Scene Representations BlobGAN：空间分离的场景表示
2864 LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity LGV：从大型几何附近提升对抗性示例的可迁移性
2871 LocalBins: Improving Depth Estimation by Learning Local Distributions LocalBins：通过学习局部分布改进深度估计
2872 Prior Knowledge Guided Unsupervised Domain Adaptation 先验知识引导的无监督域适应
2874 Diverse Human Motion Prediction Guided by Multi-Level Spatial-Temporal Anchors 多级时空锚引导的多样化人体运动预测
2877 Fast Two-step Blind Optical Aberration Correction 快速两步盲光学像差校正
2887 Controllable and Guided Face Synthesis for Unconstrained Face Recognition 用于无约束人脸识别的可控引导人脸合成
2888 2D GANs Meet Unsupervised Single-view 3D Reconstruction 2D GAN 遇到无监督单视图 3D 重建
2891 Seeing Far in the Dark with Patterned Flash 用带图案的闪光灯在黑暗中看远
2900 Unified Implicit Neural Stylization 统一隐式神经程式化
2901 Improved Masked Image Generation with Token-Critic 使用 Token-Critic 改进蒙版图像生成
2902 UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation UNIF：用于衣服人体重建和动画的联合神经隐式函数
2903 PseudoClick: Interactive Image Segmentation with Click Imitation PseudoClick：具有点击模仿的交互式图像分割
2904 CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One CoSCL：小型持续学习者的合作强于大型持续学习者
2909 Scalable Learning to Optimize: A Learned Optimizer Can Train Big Models 可扩展的优化学习：学习的优化器可以训练大型模型
2911 InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images InfiniteNature-Zero：从单个图像中学习自然场景的永久视图生成
2921 PRIF: Primary Ray-based Implicit Function PRIF：基于主要光线的隐式函数
2925 From Face to Natural Image: Learning Real Degradation for Blind Image Super-Resolution 从人脸到自然图像：学习盲图像超分辨率的真实退化
2936 QISTA-ImageNet: A Deep Compressive Image Sensing Framework Solving Lq-Norm Optimization Problem QISTA-ImageNet：解决 Lq-Norm 优化问题的深度压缩图像传感框架
2943 Trust, but Verify: Using Self-Supervised Probing to Improve Trustworthiness 信任，但要验证：使用自我监督探测来提高可信度
2948 Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding 通过视图旋转和关系推理进行空间和视觉透视以实现具身参考理解
2953 Med-DANet: Dynamic Architecture Network for Efficient Medical Volumetric Segmentation Med-DANet：用于高效医学体积分割的动态架构网络
3005 Worst Case Matters for Few-Shot Recognition 最坏情况对少数镜头识别很重要
3007 CT^2: Colorization Transformer via Color Tokens CT^2：通过颜色标记的着色转换器
3017 Self-Filtering: A Noise-Aware Sample Selection for Label Noise with Confidence Penalization 自过滤：具有置信度惩罚的标签噪声的噪声感知样本选择
3035 Point Cloud Domain Adaptation via Masked Local 3D Structure Prediction 通过掩蔽局部 3D 结构预测进行点云域自适应
3041 Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection 平移、缩放和旋转：跨模式对齐遇到 RGB 红外车辆检测
3043 Simple Baselines for Image Restoration 图像恢复的简单基线
3058 RDA: Reciprocal Distribution Alignment for Robust Semi-supervised Learning RDA：鲁棒半监督学习的互惠分布对齐
3060 Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification 探索用于大规模零样本图像分类的分层图表示
3080 Doubly Deformable Aggregation of Covariance Matrices for Few-shot Segmentation 用于少镜头分割的协方差矩阵的双重可变形聚合
3086 PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching PCW-Net：立体匹配的金字塔组合和扭曲成本量
3093 MemSAC: Memory Augmented Sample Consistency for Large Scale Domain Adaptation MemSAC：用于大规模域适应的内存增强样本一致性
3094 GCISG: Guided Causal Invariant Learning for Improved Syn-to-real Generalization GCISG：用于改进 Syn-to-real 泛化的引导因果不变学习
3101 Temporal Saliency Query Network for Efficient Video Recognition 用于高效视频识别的时间显着性查询网络
3116 Towards Interpretable Video Super-Resolution via Alternating Optimization 通过交替优化实现可解释的视频超分辨率
3118 R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning R-DFCIL：用于无数据类增量学习的关系引导表示学习
3125 Spike Transformer: Monocular Depth Estimation for Spiking Camera 尖峰变压器：尖峰相机的单目深度估计
3127 Towards Robust Face Recognition with Comprehensive Search 通过全面搜索实现稳健的人脸识别
3129 Improving Image Restoration by Revisiting Global Information Aggregation 通过重新审视全球信息聚合改进图像恢复
3132 Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction 学习多模态轨迹预测的行人群体表示
3138 RFLA: Gaussian Receptive Field based Label Assignment for Tiny Object Detection RFLA：用于微小物体检测的基于高斯感受野的标签分配
3139 Semi-supervised Single-view 3D Reconstruction via Prototype Shape Priors 通过原型形状先验的半监督单视图 3D 重建
3145 Sequential Multi-View Fusion Network for Fast LiDAR Point Motion Estimation 用于快速 LiDAR 点运动估计的序列多视图融合网络
3147 A Large-scale Multiple-objective Method for Black-box Attack against Object Detection 一种针对目标检测的黑盒攻击的大规模多目标方法
3150 GradAuto: Energy-oriented Attack on Dynamic Neural Networks GradAuto：对动态神经网络的能量导向攻击
3151 Semantic-guided Multi-Mask Image Harmonization 语义引导的多掩模图像协调
3155 Manifold Adversarial Learning for Cross-domain 3D Shape Representation 跨域 3D 形状表示的流形对抗学习
3167 GAN with Multivariate Disentangling for Controllable Hair Editing 具有多变量解缠结的 GAN 用于可控的头发编辑
3169 Fast-MoCo: Boost Momentum-based Contrastive Learning with Combinatorial Patches Fast-MoCo：使用组合补丁促进基于动量的对比学习
3179 Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation 用于 Few-Shot 分割的密集交叉查询和支持注意力加权掩码聚合
3180 Acknowledging the Unknown for Multi-label Learning with Single Positive Labels 用单个正标签承认未知的多标签学习
3181 Discovering Transferable Forensic Features for CNN-generated Images Detection 为 CNN 生成的图像检测发现可转移的取证特征
3187 Domain Adaptive Person Search 域自适应人物搜索
3200 LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human Modeling LoRD：用于高保真动态人体建模的局部 4D 隐式表示
3202 Bilateral Normal Integration 双边正态积分
3203 Harmonizer: Learning to Perform White-Box Image and Video Harmonization Harmonizer：学习执行白盒图像和视频协调
3213 On the Versatile Uses of Partial Distance Correlation in Deep Learning 关于部分距离相关在深度学习中的多种用途
3214 Object-Centric Unsupervised Image Captioning 以对象为中心的无监督图像字幕
3217 Pose2Room: Understanding 3D Scenes from Human Activities Pose2Room：从人类活动中理解 3D 场景
3218 Capturing, Reconstructing, and Simulating: the UrbanScene3D Dataset 捕获、重建和模拟：UrbanScene3D 数据集
3225 A Spectral View of Randomized Smoothing under Common Corruptions: Benchmarking and Improving Certified Robustness 常见损坏下随机平滑的频谱视图：基准测试和提高认证的稳健性
3228 Text2LIVE: Text-Driven Layered Image and Video Editing Text2LIVE：文本驱动的分层图像和视频编辑
3229 CLIP-Actor: Text-Driven Recommendation and Stylization for Animating Human Meshes CLIP-Actor：用于动画人体网格的文本驱动推荐和样式化
3239 Event-Based Fusion for Motion Deblurring with Cross-modal Attention 具有跨模态注意的基于事件的运动去模糊融合
3240 Interpretable Image Classification with Differentiable Prototypes Assignment 具有可微分原型分配的可解释图像分类
3247 Efficient One-stage Video Object Detection by Exploiting Temporal Consistency 利用时间一致性进行高效的一阶段视频对象检测
3250 ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images ConCL：病理学图像中密集预测预训练的概念对比学习
3254 Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation 利用动作相似性和连续性进行半监督时间动作分割
3257 Fast and High Quality Image Denoising via Malleable Convolution 通过可塑性卷积进行快速、高质量的图像去噪
3265 Data Association between Event Streams andIntensity Frames under Diverse Baselines 不同基线下事件流与强度框架的数据关联
3287 Self-Regulated Feature Learning via Teacher-free Feature Distillation 通过无教师特征蒸馏的自我调节特征学习
3289 TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval TS2-Net：用于文本视频检索的令牌移位和选择转换器
3292 TAPE: Task-Agnostic Prior Embedding for Image Restoration TAPE：用于图像恢复的与任务无关的先验嵌入
3293 MVSalNet:Multi-View Augmentation for RGB-D Salient Object Detection MVSalNet：用于 RGB-D 显着目标检测的多视图增强
3295 Rethinking IoU-based Optimization for Single-stage 3D Object Detection 重新思考基于 IoU 的单阶段 3D 对象检测优化
3298 Uncertainty Inspired Underwater Image Enhancement 不确定性启发的水下图像增强
3300 k-means Mask Transformer k-means 掩码转换器
3302 Contrastive Vision-Language Pre-training with Limited Resources 资源有限的对比视觉语言预训练
3305 Learning Linguistic Association Towards Efficient Text-Video Retrieval 学习语言关联以实现高效的文本视频检索
3308 United Defocus Blur Detection and Deblurring via Adversarial Promoting Learning United Defocus Blur Detection and Deblurring via Adversarial Promoting Learning
3311 AutoMix: Unveiling the Power of Mixup AutoMix：揭示 Mixup 的力量
3314 Unstructured Feature Decoupling for Vehicle Re-Identification 用于车辆重新识别的非结构化特征解耦
3322 Improving Adversarial Robustness of 3D Point Cloud Classification Models 提高 3D 点云分类模型的对抗鲁棒性
3324 ASSISTER: Assistive Navigation via Conditional Instruction Generation ASSISTER：通过条件指令生成辅助导航
3332 Synergistic Self-Supervised and Quantization Learning 协同自监督和量化学习
3342 Deep Hash Distillation for Image Retrieval 用于图像检索的深度哈希蒸馏
3345 Learning Spatial-Preserved Skeleton Representations for Few-Shot Action Recognition 学习用于少镜头动作识别的空间保留骨架表示
3346 Digging into Radiance Grid for Real-Time View Synthesis with Detail Preservation 挖掘辐射网格以进行实时视图合成并保留细节
3351 S^2Contact: Graph-based Network for 3D Hand-Object Contact Estimation with Semi-Supervised Learning S^2Contact：基于图的网络，用于半监督学习的 3D 手对象接触估计
3359 TD-Road: Top-Down Road Network Extraction with Holistic Graph Construction TD-Road：使用整体图构建的自上而下的道路网络提取
3366 StyleGAN-Human: A Data-Centric Odyssey of Human Generation StyleGAN-Human：以数据为中心的人类世代奥德赛
3369 Hourglass Attention Network for Image Inpainting 用于图像修复的沙漏注意力网络
3370 MaxViT: Multi-Axis Vision Transformer MaxViT：多轴视觉转换器
3378 Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images Gen6D：基于 RGB 图像的可概括的无模型 6 自由度对象姿态估计
3385 ColorFormer: Image Colorization via Color Memory assisted Hybrid-attention Transformer ColorFormer：通过颜色记忆辅助的混合注意力变压器进行图像着色
3387 Spotting Temporally Precise, Fine-Grained Events in Video 在视频中发现时间上精确、细粒度的事件
3390 SegPGD: An Effective and Efficient Adversarial Attack for Evaluating and Boosting Segmentation Robustness SegPGD：一种有效且高效的对抗性攻击，用于评估和提升分段鲁棒性
3391 Adversarial Erasing Framework via Triplet with Gated Pyramid Pooling Layer for Weakly Supervised Semantic Segmentation 对抗性擦除框架通过三元组与门控金字塔池化层进行弱监督语义分割
3393 Semi-Supervised Vision Transformers 半监督视觉变形金刚
3394 Learning an Isometric Surface Parameterization for Texture Unwrapping 学习用于纹理展开的等距表面参数化
3409 Mimic Embedding via Adaptive Aggregation: Learning Generalizable Person Re-identification 通过自适应聚合模拟嵌入：学习可概括的人重新识别
3418 CryoAI: Amortized Inference of Poses for Ab Initio Reconstruction of 3D Molecular Volumes from Real Cryo-EM Images CryoAI：从真实 Cryo-EM 图像中从头算重建 3D 分子体积的位姿摊销推断
3419 EAGAN: Efficient Two-stage Evolutionary Architecture Search for GANs EAGAN：GAN 的高效两阶段进化架构搜索
3428 ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer ScalableViT：重新思考 Vision Transformer 面向上下文的泛化
3429 PlaneFormers: From Sparse View Planes to 3D Reconstruction PlaneFormers：从稀疏视图平面到 3D 重建
3438 Domain Adaptive Video Segmentation via Temporal Pseudo Supervision 基于时间伪监督的域自适应视频分割
3442 Diverse Learner: Exploring Diverse Supervision for Semi-supervised Object Detection Diverse Learner：探索用于半监督目标检测的多样化监督
3452 Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction 被忽视的姿势实际上是有道理的：为人体运动预测提炼特权知识
3455 Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection 面向基于 DETR 的人机交互检测的硬性查询挖掘
3458 Learning Extremely Lightweight and Robust Model with Differentiable Constraints on Sparsity and Condition Number 学习对稀疏性和条件数具有可微约束的极轻量级和鲁棒性模型
3470 Structural Triangulation: A Closed-Form Solution to Constrained 3D Human Pose Estimation 结构三角剖分：约束 3D 人体姿势估计的封闭式解决方案
3474 Latency-Aware Collaborative Perception 延迟感知协作感知
3475 Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection 用于 3D 对象检测的同质多模态特征融合和交互
3484 Unfolded Deep Kernel Estimation for Blind Image Super-resolution 盲图像超分辨率的展开深度核估计
3487 Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning 重新思考基于聚类的伪标签无监督元学习
3489 Continual Semantic Segmentation via Structure Preserving and Projected Feature Alignment 通过结构保持和投影特征对齐进行连续语义分割
3498 SC-wLS: Towards Interpretable Feed-forward Camera Re-localization SC-wLS：迈向可解释的前馈相机重新定位
3500 Weakly-Supervised Stitching Network for Real-World Panoramic Image Generation 用于真实世界全景图像生成的弱监督拼接网络
3503 FloatingFusion: Depth from ToF and Image-stabilized Stereo Cameras FloatingFusion：来自 ToF 和图像稳定立体相机的深度
3504 Dual-Evidential Learning for Weakly-supervised Temporal Action Localization 弱监督时间动作定位的双证据学习
3511 DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation DynaST：用于示例引导图像生成的动态稀疏变换器
3512 D2HNet: Joint Denoising and Deblurring with Hierarchical Network for Robust Night Image Restoration D2HNet：与分层网络联合去噪和去模糊，用于强大的夜间图像恢复
3514 DELTAR: Depth Estimation from a Light-weight ToF Sensor and RGB Image DELTAR：来自轻量级 ToF 传感器和 RGB 图像的深度估计
3515 ERA: Enhanced Rational Activations ERA：增强的理性激活
3518 FrequencyLowCut pooling – Plug & Play against Catastrophic Overfitting FrequencyLowCut 池化——即插即用，防止灾难性过拟合
3520 Interclass Prototype Relation for Few-Shot Segmentation Few-Shot 分割的类间原型关系
3523 Multi-Faceted Distillation of Base-Novel Commonality for Few-shot Object Detection 用于少镜头目标检测的基础新共性的多方面蒸馏
3525 X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks X-DETR：用于实例视觉语言任务的多功能架构
3535 Equivariance and Invariance Inductive Bias for Learning from Insufficient Data 从不充分数据中学习的等方差和不变性归纳偏差
3539 Multimodal Conditional Image Synthesis with Product-of-Experts GANs 使用专家产品 GAN 的多模态条件图像合成
3551 Balancing between Forgetting and Acquisition in Incremental Subpopulation Learning 增量亚群学习中遗忘与习得的平衡
3555 TensoRF: Tensorial Radiance Fields TensoRF：张量辐射场
3580 PointCLM: A Contrastive Learning-based Framework for Multi-instance Point Cloud Registration PointCLM：基于对比学习的多实例点云注册框架
3581 Slim Scissors: Segmenting Thin Object from Synthetic Background Slim Scissors：从合成背景中分割出薄物体
3586 Auto-regressive Image Synthesis with Integrated Quantization 具有集成量化的自回归图像合成
3591 CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition CLASTER：用于零样本动作识别的强化学习聚类
3593 Discovering Human-Object Interaction Concepts via Self-Compositional Learning 通过自组合学习发现人-物交互概念
3598 Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance 通过学习的逐层重要性的混合精度神经网络量化
3601 Event-guided Deblurring of Unknown Exposure Time Videos 未知曝光时间视频的事件引导去模糊
3604 TREND: Truncated Generalized Normal Density Estimation of Inception Embeddings for GAN Evaluation 趋势：用于 GAN 评估的初始嵌入的截断广义正态密度估计
3606 3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform 通过 Deep Manhattan Hough 变换从全景图像的 Cubemap 估计 3D 房间布局
3622 Learning Disentanglement with Decoupled Labels for Vision-Language Navigation 使用解耦标签学习解开视觉语言导航
3623 JoJoGAN: One Shot Face Stylization JoJoGAN：一枪人脸风格化
3627 Convolutional Embedding Makes Hierarchical Vision Transformer Stronger 卷积嵌入使分层视觉 Transformer 更强大
3631 3D CoMPaT: Composition of Materials on Parts of 3D Things 3D CoMPaT：3D 事物部件上的材料组合
3632 Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration 基于隐式空间校准的 Transformer 的弱监督目标定位
3641 Few-shot Class-incremental Learning for 3D Point Cloud Objects 3D 点云对象的小样本增量学习
3643 Learning Graph Neural Networks for Image Style Transfer 学习用于图像风格迁移的图神经网络
3644 JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes JPerceiver：用于驾驶场景中深度、姿势和布局估计的联合感知网络
3645 Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions 对大规模非平稳任务分布的遗忘较少的元学习
3655 Semi-supervised 3D Object Detection with Proficient Teachers 熟练教师的半监督 3D 对象检测
3658 NeFSAC: Neurally Filtered Minimal Samples NeFSAC：神经过滤的最小样本
3660 Domain Generalization by Mutual-Information Regularization with Pre-trained Models 使用预训练模型通过互信息正则化进行域泛化
3661 AcroFOD: An Adaptive Method for Cross-domain Few-shot Object Detection AcroFOD：一种自适应的跨域小样本目标检测方法
3665 Primitive-based Shape Abstraction via Nonparametric Bayesian Inference 通过非参数贝叶斯推理的基于基元的形状抽象
3670 Active label correction using robust parameter update and entropy propagation 使用鲁棒参数更新和熵传播的主动标签校正
3671 E-Graph: Minimal Solution for Rigid Rotation with Extensibility Graphs E-Graph：带可扩展性图的刚性旋转的最小解决方案
3672 Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation 通过序列到序列转换的统一的完全和时间戳监督的时间动作分割
3673 Exploring Gradient-based Multi-directional Controls in GANs 探索 GAN 中基于梯度的多向控制
3677 Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification 可见红外人重新识别的反事实干预特征转移
3681 A Closer Look at Invariances in Self-supervised Pre-training for 3D Vision 仔细研究 3D 视觉的自监督预训练中的不变性
3685 VecGAN: Image-to-Image Translation with Interpretable Latent Directions VecGAN：具有可解释潜在方向的图像到图像转换
3686 SNeS: Learning Probably Symmetric Neural Surfaces from Incomplete Data SNeS：从不完整的数据中学习可能是对称的神经表面
3689 Three things everyone should know about Vision Transformers 关于视觉变形金刚，每个人都应该知道的三件事
3690 DeiT III: Revenge of the ViT DeiT III：ViT 的复仇
3693 Any-resolution Training for High-resolution Image Synthesis 高分辨率图像合成的任意分辨率训练
3703 HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields HDR-Plenoxels：自校准高动态范围辐射场
3719 PartImageNet: A Large, High-Quality Dataset of Parts PartImageNet：大型、高质量的零件数据集
3721 Abstracting Sketches through Simple Primitives 通过简单的基元抽象草图
3723 MTTrans: Cross-Domain Object Detection with Mean Teacher Transformer MTTrans：使用 Mean Teacher Transformer 进行跨域对象检测
3727 OPD: Single-view 3D Openable Part Detection OPD：单视图 3D 可打开部件检测
3731 TAFIM: Targeted Adversarial Attacks against Facial Image Manipulations TAFIM：针对面部图像操作的有针对性的对抗性攻击
3737 NeuMan: Neural Human Radiance Field from a Single Video NeuMan：来自单个视频的神经人体辐射场
3747 Learning Implicit Templates for Point-Based Clothed Human Modeling 学习基于点的穿衣人体建模的隐式模板
3751 Event Neural Networks 事件神经网络
3755 Learning to Censor by Noisy Sampling 通过噪声采样学习审查
3757 Unpaired Image Translation via Vector Symbolic Architectures 基于矢量符号架构的非配对图像翻译
3758 ConMatch: Semi-Supervised Learning with Confidence-Guided Consistency Regularization ConMatch：具有信心引导一致性正则化的半监督学习
3760 Granularity-aware Adaptation for Image Retrieval over Multiple Tasks 多任务图像检索的粒度感知自适应
3769 EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers EdgeViTs：使用视觉转换器在移动设备上竞争轻量级 CNN
3780 Multi-Domain Multi-Definition Landmark Localization for Small Datasets 小数据集的多域多定义地标定位
3781 TAVA: Template-free Animatable Volumetric Actors TAVA：无模板动画体积 Actor
3792 Stereo Depth Estimation with Echoes 带有回声的立体深度估计
3794 EASNet:Searching Elastic and Accurate Network Architecture for Stereo Matching EASNet：为立体匹配搜索弹性和准确的网络架构
3798 DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection DEVIANT：用于单目 3D 对象检测的深度 EquiVarIAnt 网络
3809 RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation RBP-Pose：用于类别级姿态估计的剩余边界框投影
3820 Levenshtein OCR Levenshtein OCR
3821 Multi-Granularity Prediction for Scene Text Recognition 场景文本识别的多粒度预测
3827 MixSKD: Self-Knowledge Distillation from Mixup for Image Recognition MixSKD：用于图像识别的 Mixup 的自我知识蒸馏
3834 Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input Switch-BERT：通过切换注意力和输入来学习对多模式交互进行建模
3837 Efficient Video Transformers with Spatial-temporal Token Selection 具有时空令牌选择的高效视频转换器
3844 DAS: Densely-Anchored Sampling for Deep Metric Learning DAS：用于深度度量学习的密集锚定采样
3864 ReCoNet: Recurrent Correction Network for Fast and Efficient Multi-modality Image Fusion ReCoNet：用于快速高效的多模态图像融合的循环校正网络
3867 RIBAC: Towards Robust and Imperceptible Backdoor Attack against Compact DNN RIBAC：针对紧凑型 DNN 的稳健且不易察觉的后门攻击
3870 Point Cloud Compression with Sibling Context and Surface Priors 具有兄弟上下文和表面先验的点云压缩
3874 Self-Feature Distillation with Uncertainty Modeling for Degraded Image Recognition 用于退化图像识别的具有不确定性建模的自特征蒸馏
3885 Point Cloud Compression using Range Image-based Entropy Model for Autonomous Driving 使用基于距离图像的熵模型的自动驾驶点云压缩
3887 CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer CCPL：用于通用风格迁移的对比相干性保留损失
3904 CANF-VC: Conditional Augmented Normalizing Flows for Video Compression CANF-VC：用于视频压缩的条件增强归一化流程
3912 Bi-level Feature Alignment for Versatile Image Translation and Manipulation 用于多功能图像翻译和操作的双层特征对齐
3918 Lane Detection Transformer based on Multi-frame Horizontal and Vertical Attention and Visual Transformer Module 基于多帧水平和垂直注意力和视觉变压器模块的车道检测变压器
3921 Label-Guided Auxiliary Training Improves 3D Object Detector 标签引导辅助训练改进了 3D 对象检测器
3932 FedX: Unsupervised Federated Learning with Cross Knowledge Distillation FedX：具有交叉知识蒸馏的无监督联合学习
3936 ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object Detection ProposalContrast：基于 LiDAR 的 3D 对象检测的无监督预训练
3948 Audio-Driven Stylized Gesture Generation with Flow-Based Model 使用基于流的模型的音频驱动风格化手势生成
3958 Unsupervised Domain Adaptation for One-Stage Object Detector using Offsets to Bounding Box 使用偏移到边界框的单阶段目标检测器的无监督域自适应
3964 Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework 用于跟踪的联合特征学习和关系建模：单流框架
3965 PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map PreTraM：通过连接轨迹和地图的自我监督预训练
3966 DeepPS2: Revisiting Photometric Stereo using Two Differently Illuminated Images DeepPS2：使用两个不同照明的图像重新审视光度立体
3977 Learn From All: Erasing Attention Consistency for Noisy Label Facial Expression Recognition 向所有人学习：消除噪声标签面部表情识别的注意力一致性
3984 Novel Class Discovery without Forgetting 不忘初心的新课发现
3985 Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation 用于人体姿态估计的结构组的自约束推理优化
3989 Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning 预测不是理解：识别和解决机器学习中的欠规范
3991 A Non-isotropic Probabilistic Take on Proxy-based Deep Metric Learning 基于代理的深度度量学习的非各向同性概率论
3998 Relative Pose from SIFT Features SIFT 特征的相对位姿
3999 Monocular 3D Object Reconstruction with GAN Inversion 使用 GAN 反演的单目 3D 对象重建
4001 PromptDet: Towards Open-vocabulary Detection using Uncurated Images PromptDet：使用未经处理的图像进行开放词汇检测
4005 Densely Constrained Depth Estimator for Monocular 3D Object Detection 用于单目 3D 目标检测的密集约束深度估计器
4016 Content Adaptive Latents and Decoder for Neural Image Compression 用于神经图像压缩的内容自适应潜在和解码器
4018 High-Fidelity Image Inpainting with GAN Inversion 使用 GAN 反转的高保真图像修复
4019 Spatially Invariant Unsupervised 3D Object-Centric Learning and Scene Decomposition 空间不变无监督 3D 以对象为中心的学习和场景分解
4020 W2N: Switching From Weak Supervision to Noisy Supervision for Object Detection W2N：从弱监督切换到嘈杂监督以进行目标检测
4021 UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture UnrealEgo：用于强大的以自我为中心的 3D 人体运动捕捉的新数据集
4022 MotionCLIP: Exposing Human Motion Generation to CLIP Space MotionCLIP：将人体运动生成暴露于 CLIP 空间
4023 Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution 用于真实世界图像超分辨率的高效和退化自适应网络
4024 Unidirectional Video Denoising by Mimicking Backward Recurrent Modules with Look-ahead Forward Ones 通过用前瞻前向模块模仿后向循环模块来进行单向视频去噪
4028 Decoupled Adversarial Contrastive Learning for Self-supervised Adversarial Robustness 用于自监督对抗鲁棒性的解耦对抗对比学习
4029 Map-free Visual Relocalization: Metric Pose Relative to a Single Image 无地图视觉重定位：相对于单个图像的度量姿势
4032 DeltaGAN: Towards Diverse Few-shot ImageGeneration with Sample-Specific Delta DeltaGAN：使用特定样本的 Delta 实现多样化的少镜头图像生成
4035 Sample-Adaptive Augmentation for Long-Tailed Image Classification 长尾图像分类的样本自适应增强
4037 TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers TokenMix：重新思考视觉转换器中用于数据增强的图像混合
4041 UFO: Unified Feature Optimization UFO：统一特征优化
4043 Master of All: Simultaneous Generalization of Urban-Scene Segmentation to All Adverse Weather Conditions 万事通：将城市场景分割同时推广到所有不利天气条件
4047 PalQuant: Accelerating High-precision Networks on Low-precision Accelerators PalQuant：在低精度加速器上加速高精度网络
4057 Self-Supervised Learning for Real-World Super-Resolution from Dual Zoomed Observations 基于双缩放观察的真实世界超分辨率的自我监督学习
4059 UniMiSS: Universal Medical Self-Supervised Learning via Breaking Dimensionality Barrier UniMiSS：通过打破维度障碍的通用医学自我监督学习
4067 Secrets of Event-Based Optical Flow 基于事件的光流的秘密
4073 Self-distilled Feature Aggregation for Self-supervised Monocular Depth Estimation 用于自监督单目深度估计的自蒸馏特征聚合
4074 Negative Samples are at Large: Leveraging Hard-distance Elastic Loss for Re-identification 负样本无处不在：利用硬距离弹性损失进行重新识别
4076 Global-local Motion Transformer for Unsupervised Skeleton-based Action Learning 用于无监督骨架动作学习的全局局部运动转换器
4080 Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoiréing 迈向高效和规模稳健的超高清图像演示
4084 Instance Contour Adjustment via Structure-driven CNN 通过结构驱动的 CNN 进行实例轮廓调整
4085 ERDN: Equivalent Receptive Field Deformable Network for Video Deblurring ERDN：用于视频去模糊的等效感受野可变形网络
4090 Localizing Visual Sounds the Easy Way 本地化视觉声音的简单方法
4105 Polarimetric Pose Prediction 极化姿态预测
4115 DFNet: Enhance Absolute Pose Regression with Direct Feature Matching DFNet：通过直接特征匹配增强绝对姿势回归
4117 A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge A-OKVQA：使用世界知识进行视觉问答的基准
4119 Sound Localization by Self-Supervised Time Delay Estimation 通过自监督时间延迟估计进行声音定位
4120 AdaFocus V3: On Unified Spatial-temporal Dynamic Video Recognition AdaFocus V3：统一时空动态视频识别
4122 Synthesizing Light Field Video from Monocular Video 从单目视频合成光场视频
4123 Discrete-Constrained Regression for Local Counting Models 局部计数模型的离散约束回归
4124 Towards Regression-Free Neural Networks for Diverse Compute Platforms 面向不同计算平台的无回归神经网络
4130 Selection and Cross Similarity for Event-Image Deep Stereo 事件图像深度立体的选择和交叉相似性
4136 Long Movie Clip Classification with State-Space Video Models 使用状态空间视频模型的长影片剪辑分类
4145 Relationship Spatialization for Depth Estimation 深度估计的关系空间化
4150 Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition 面包屑：用于长尾识别的对抗类平衡采样
4152 Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models Image2Point：使用 2D 图像预训练模型理解 3D 点云
4175 Visual Prompt Tuning 视觉提示调整
4181 Multi-scale and Cross-scale Contrastive Learning for Semantic Segmentation 语义分割的多尺度和跨尺度对比学习
4185 Rethinking Generic Camera Models for Deep Single Image Camera Calibration to Recover Rotation and Fisheye Distortion 重新思考用于深度单图像相机校准以恢复旋转和鱼眼失真的通用相机模型
4188 Neural-Sim: Learning to Generate Training Data with NeRF Neural-Sim：学习使用 NeRF 生成训练数据
4195 Word-Level Fine-Grained Story Visualization 词级细粒度故事可视化
4206 Chairs Can be Stood on: Overcoming Object Bias in Human-Object Interaction Detection 椅子可以站立：克服人与物体交互检测中的物体偏差
4208 GOCA: Guided Online Cluster Assignment for Self Supervised Video Representation Learning GOCA：用于自我监督视频表示学习的引导式在线集群分配
4217 Learning Audio-Video Modalities from Image Captions 从图像说明中学习音视频模态
4220 Inverted Pyramid Multi-task Transformer for Dense Scene Understanding 用于密集场景理解的倒金字塔多任务转换器
4222 Image Inpainting with Cascaded Modulation GAN and Object-Aware Training 使用级联调制 GAN 和对象感知训练进行图像修复
4231 Planes vs. Chairs: Category-guided 3D shape learning without any 3D cues 飞机与椅子：类别引导的 3D 形状学习，没有任何 3D 提示
4237 ART-SS: An Adaptive Rejection Technique for Semi-Supervised restoration for adverse weather-affected images ART-SS：一种自适应拒绝技术，用于受恶劣天气影响的图像的半监督恢复
4239 Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction 用于 3D 人体运动预测的骨架分割图散射网络
4241 MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views MHR-Net：从 2D 视图重建非刚性形状的多假设
4243 Unifying Event Detection and Captioning as Sequence Generation via Pre-Training 通过预训练将事件检测和字幕统一为序列生成
4247 Depth Map Decomposition for Monocular Depth Estimation 单目深度估计的深度图分解
4249 Human-centric Image Cropping with Partition-aware and Content-preserving Features 具有分区感知和内容保留功能的以人为本的图像裁剪
4252 Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking 骨干就是你所需要的：视觉对象跟踪的简化架构
4255 StyleFace: Towards Identity-Disentangled Face Generation on Megapixels StyleFace：迈向百万像素上的身份解开人脸生成
4260 Fusion from Decomposition: A Self-Supervised Decomposition Approach for Image Fusion 分解融合：图像融合的自监督分解方法
4261 Learning Degradation Representations for Image Deblurring 学习图像去模糊的退化表示
4269 Aware of the History: Trajectory Forecasting with the Local Behavior Data 了解历史：使用本地行为数据进行轨迹预测
4270 FAR: Fourier Aerial Video Recognition FAR：傅里叶航空视频识别
4271 X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation X-Learner：学习通用视觉表示的跨源和任务
4273 Disentangled Differentiable Network Pruning 解开可微网络剪枝
4275 Video Extrapolation in Space and Time 时空视频外推
4277 IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors IDa-Det：1 位检测器的信息差异感知蒸馏
4278 Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation 用于视觉和语言导航的具有可变长度记忆的多模态变压器
4282 DnA: Improving Few-shot Transfer Learning with Low-Rank Decomposition and Alignment DnA：通过低秩分解和对齐改进 Few-shot 迁移学习
4284 Translating a Visual LEGO Manual to a Machine-Executable Plan 将视觉乐高手册翻译成机器可执行的计划
4286 Cornerformer: Purifying Instances for Corner-based Detectors Cornerformer：净化基于 Corner 的检测器的实例
4287 Contributions of Shape, Texture, and Color in Visual Recognition 形状、纹理和颜色在视觉识别中的贡献
4288 Monitored Distillation for Positive Congruent Depth Completion 正向全等深度完成的监控蒸馏
4292 Towards Unbiased Label Distribution Learning for Facial Pose Estimation Using Anisotropic Spherical Gaussian 使用各向异性球面高斯函数进行面部姿态估计的无偏标签分布学习
4293 AirDet: Few-Shot Detection without Fine-tuning for Autonomous Exploration AirDet：无需微调即可进行自主探索的小样本检测
4295 Learning to Weight Samples for Dynamic Early-exiting Networks 学习为动态早期退出网络加权样本
4300 Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning 使用遥远但相关的邻居进行表示学习的约束均值偏移
4303 SLIP: Self-supervision meets Language-Image Pre-training SLIP：自我监督遇到语言-图像预训练
4304 Learning Visual Styles from Audio-Visual Associations 从视听协会学习视觉风格
4305 Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting 用于经济高效的端到端文本定位的动态低分辨率蒸馏
4310 Prompting Visual-Language Models for Efficient Video Understanding 促进有效视频理解的视觉语言模型
4318 One-Trimap Video Matting One-Trimap 视频抠图
4323 Contrastive Learning for Diverse Disentangled Foreground Generation 用于不同解耦前景生成的对比学习
4326 Resolution-free Point Cloud Sampling Network with Data Distillation 具有数据蒸馏的无分辨率点云采样网络
4327 BIPS: Bi-modal Indoor Panorama Synthesis via Residual Depth-aided Adversarial Learning BIPS：通过残差深度辅助对抗学习的双模式室内全景合成
4330 Augmentation of rPPG Benchmark Datasets: Learning to Remove and Embed rPPG Signals via Double Cycle Consistent Learning from Unpaired Facial Videos rPPG 基准数据集的增强：通过未配对面部视频的双周期一致学习学习删除和嵌入 rPPG 信号
4331 Fabric Material Recovery from Video Using Multi-Scale Geometric Auto-Encoder 使用多尺度几何自动编码器从视频中恢复织物材料
4333 An Invisible Black-box Backdoor Attack through Frequency Domain 一种通过频域的隐形黑盒后门攻击
4336 Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution 学习自监督跨模态超分辨率的相互调制
4338 TransGrasp: Grasp Pose Estimation of a Category of Objects by Transferring Grasps from Only One Labeled Instance TransGrasp：通过仅从一个标记实例转移抓取来估计一类对象的抓取姿势
4343 Learning Instance and Task-Aware Dynamic Kernels for Few-shot Learning 用于小样本学习的学习实例和任务感知动态内核
4346 PillarNet: Real-Time and High-Performance Pillar-based 3D Object Detection PillarNet：实时和高性能基于 Pillar 的 3D 对象检测
4348 Robust Object Detection With Inaccurate Bounding Boxes 具有不准确边界框的鲁棒对象检测
4349 Revisiting the Critical Factors of Augmentation-Invariant Representation Learning 重新审视增强不变表示学习的关键因素
4350 LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds LESS：激光雷达点云的标签高效语义分割
4359 A Fast Knowledge Distillation Framework for Visual Recognition 一种用于视觉识别的快速知识提炼框架
4366 MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment MegBA：用于大规模束调整的基于 GPU 的分布式库
4367 Spectrum-aware and Transferable Architecture Search for Hyperspectral Image Restoration 用于高光谱图像恢复的光谱感知和可转移架构搜索
4374 Boosting Transferability of Targeted Adversarial Examples via Hierarchical Generative Networks 通过分层生成网络提高目标对抗样本的可转移性
4378 Exploring the Devil in Graph Spectral Domain for 3D Point Cloud Attacks 探索 3D 点云攻击的图谱域中的魔鬼
4385 Geometry-aware Single-image Full-body Human Relighting 几何感知单图像全身人体重新照明
4388 Optical Flow Training under Limited Label Budget via Active Learning 通过主动学习在有限标签预算下进行光流训练
4395 RVSL: Robust Vehicle Similarity Learning in Real Hazy Scenes Based on Semi-supervised Learning RVSL：基于半监督学习的真实朦胧场景中的鲁棒车辆相似性学习
4399 3D-Aware Indoor Scene Synthesis with Depth Priors 具有深度先验的 3D 感知室内场景合成
4400 Hierarchical Feature Embedding for Visual Tracking 用于视觉跟踪的分层特征嵌入
4401 Neural Color Operators for Sequential Image Retouching 用于连续图像修饰的神经颜色算子
4402 Optimizing Image Compression via Joint Learning with Denoising 通过联合学习与去噪优化图像压缩
4405 DICE: Leveraging Sparsification for Out-of-Distribution Detection DICE：利用稀疏化进行分布外检测
4406 DeMFI: Deep Joint Deblurring and Multi-Frame Interpolation with Flow-Guided Attentive Correlation and Recursive Boosting DeMFI：深度联合去模糊和多帧插值与流引导的注意力相关和递归提升
4408 Invariant Feature Learning for Generalized Long-Tailed Classification 用于广义长尾分类的不变特征学习
4411 Fine-Grained Visual Entailment 细粒度的视觉内涵
4412 Sliced Recursive Transformer 切片递归变压器
4413 Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval 轻量级注意力特征融合：文本到视频检索的新基线
4416 Asymmetric Relation Consistency Reasoning for Video Relation Grounding 视频关系接地的非对称关系一致性推理
4417 Restore Globally, Refine Locally: A Mask-Guided Scheme to Accelerate Super-Resolution Networks 全局恢复，局部优化：加速超分辨率网络的掩模引导方案
4420 PETR: Position Embedding Transformation for Multi-View 3D Object Detection PETR：用于多视图 3D 对象检测的位置嵌入转换
4422 Contextual Text Block Detection towards Scene Text Understanding 面向场景文本理解的上下文文本块检测
4426 Structure-aware Editable Morphable Model for 3D Facial Detail Animation and Manipulation 用于 3D 面部细节动画和操作的结构感知可编辑变形模型
4429 UniNet: Unified Architecture Search with Convolution, Transformer, and MLP UniNet：使用卷积、Transformer 和 MLP 进行统一架构搜索
4433 Efficient Decoder-free Object Detection with Transformers 使用 Transformer 进行高效的无解码器目标检测
4439 Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation 重新思考关键点表示：将关键点和姿势建模为多人人体姿势估计的对象
4440 CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation CA-SSL：用于检测和分割的与类别无关的半监督学习
4447 StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning Starformer：用于视觉强化学习的具有状态-动作-奖励表示的变压器
4451 S2Net: Stochastic Sequential Pointcloud Forecasting S2Net：随机顺序点云预测
4452 D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding D3Net：用于 3D 密集字幕和视觉接地的统一扬声器-听众架构
4464 AMixer: Adaptive Weight Mixing for Self-Attention Free Vision Transformers AMixer：自注意力自由视觉变形金刚的自适应权重混合
4471 Neural Image Representations for Multi-Image Fusion and Layer Separation 用于多图像融合和层分离的神经图像表示
4477 Panoramic Human Activity Recognition 全景人体活动识别
4478 Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution 用于移动实时超分辨率的编译器感知神经架构搜索
4481 Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation 弱监督点云分割的双重自适应变换
4495 Modality Synergy Complement Learning with Cascaded Aggregation for Visible-Infrared Person Re-Identification 模态协同补充学习与级联聚合用于可见红外人员重新识别
4496 RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation RA-Depth：分辨率自适应自监督单目深度估计
4505 MoFaNeRF: Morphable Facial Neural Radiance Field MoFaNeRF：可变形面部神经辐射场
4507 Modeling Mask Uncertainty in Hyperspectral Image Reconstruction 高光谱图像重建中的掩模不确定性建模
4508 Perceiving and Modeling Density for Image Dehazing 图像去雾的感知和建模密度
4513 Visual Cross-View Metric Localization with Dense Uncertainty Estimates 具有密集不确定性估计的视觉跨视图度量定位
4514 ROBIN: A Benchmark for Robustness to Individual Nuisances in Real-World Out-of-Distribution Shifts ROBIN：在现实世界的分布外转移中对个体滋扰的鲁棒性的基准
4525 The One Where They Reconstructed 3D Humans and Environments in TV Shows 他们在电视节目中重建 3D 人类和环境的地方
4530 PointInst3D: Segmenting 3D Instances by Points PointInst3D：按点分割 3D 实例
4533 PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation PolyphonicFormer：深度感知视频全景分割的统一查询学习
4534 Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap 用于闭合域间隙的对象点云噪声感知合成的准平衡自训练
4537 TinyViT: Fast Pretraining Distillation for Small Vision Transformers TinyViT：小视觉变形金刚的快速预训练蒸馏
4539 Delving into Details: Synopsis-to-Detail Networks for Video Recognition 深入细节：视频识别的概要到细节网络
4547 Bringing Rolling Shutter Images Alive with Dual Reversed Distortion 通过双重反转失真使滚动快门图像栩栩如生
4551 VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual Data VirtualPose：从虚拟数据中学习可概括的 3D 人体姿势模型
4552 Poseur: Direct Human Pose Regression with Transformers 姿势：使用变形金刚进行直接人体姿势回归
4557 Adaptive Image Transformations for Transfer-based Adversarial Attack 基于传输的对抗性攻击的自适应图像转换
4566 D2ADA: Dynamic Density-aware Active Domain Adaptation for Semantic Segmentation D2ADA：用于语义分割的动态密度感知主动域自适应
4568 SQN: Weakly-Supervised Semantic Segmentation of Large-Scale 3D Point Clouds SQN：大规模 3D 点云的弱监督语义分割
4581 Deep Portrait Delighting 令人愉悦的深人像
4584 Vector Quantized Image-to-Image Translation 矢量量化图像到图像转换
4588 PointMixer: MLP-Mixer for Point Cloud Understanding PointMixer：用于点云理解的 MLP-Mixer
4589 V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer V2X-ViT：Vehicle-to-Everything 与 Vision Transformer 的协同感知
4591 SimCC: a Simple Coordinate Classification Perspective for Human Pose Estimation SimCC：人体姿态估计的简单坐标分类视角
4593 Cross-Domain Ensemble Distillation for Domain Generalization 用于域泛化的跨域集成蒸馏
4596 Cross-Modal 3D Shape Generation and Manipulation 跨模态 3D 形状生成和操作
4607 Latent Partition Implicit with Surface Codes for 3D Representation 用于 3D 表示的表面代码隐含的潜在分区
4610 Generative Multiplane Images: Making a 2D GAN 3D-Aware 生成多平面图像：制作 2D GAN 3D 感知
4614 FILM: Frame Interpolation for Large Motion FILM：大动作的帧插值
4619 Facial Depth and Normal Estimation using Single Dual-Pixel Camera 使用单个双像素相机的面部深度和法线估计
4622 Initialization and Alignment for Adversarial Texture Optimization 对抗性纹理优化的初始化和对齐
4631 Regularizing Vector Embedding in Bottom-Up Human Pose Estimation 在自下而上的人体姿态估计中正则化向量嵌入
4633 Equivariant Hypergraph Neural Networks 等变超图神经网络
4636 Learning Quality-aware Dynamic Memory for Video Object Segmentation 视频对象分割的学习质量感知动态内存
4640 Self-supervised Social Relation Representation for Human Group Detection 用于人群检测的自监督社会关系表示
4651 Stripformer: Strip Transformer for Fast Image Deblurring Stripformer：用于快速图像去模糊的 Strip Transformer
4652 Neural Scene Decoration from a Single Photograph 单张照片的神经场景装饰
4656 Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds 用于图像和点云中语言基础的自下而上自上而下检测变压器
4658 CIRCLE:Convolutional Implicit Reconstruction and Completion for Large-scale Indoor Scene CIRCLE：大规模室内场景的卷积隐式重建与补全
4659 Discovering Deformable Keypoint Pyramids 发现可变形的关键点金字塔
4668 TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors TIDEE：使用视觉语义常识先验整理新房间
4669 MOTR: End-to-End Multiple-Object Tracking with TRansformer MOTR：使用 TRansformer 进行端到端多对象跟踪
4672 K-centered Patch Sampling for Efficient Video Recognition 用于高效视频识别的以 K 为中心的补丁采样
4675 Learning Implicit Feature Alignment Function for Semantic Segmentation 学习用于语义分割的隐式特征对齐函数
4677 A Visual Navigation Perspective for Category-Level Object Pose Estimation 类别级对象姿态估计的视觉导航视角
4678 Deep Fourier-based Exposure Correction Network with Spatial-Frequency Interaction 具有空间频率交互的基于深度傅里叶的曝光校正网络
4681 ScaleNet: Searching for the Model to Scale ScaleNet：搜索要扩展的模型
4684 Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels 中心性和一致性：使用实例相关的噪声标签进行学习的两阶段清洁样本识别
4685 GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing GALA：迈向几何和照明感知对象搜索以进行合成
4688 FairGRAPE: Fairness-aware GRAdient Pruning mEthod for Face Attribute Classification FairGRAPE：人脸属性分类的公平感知梯度剪枝方法
4697 Tackling Background Distraction in Video Object Segmentation 解决视频对象分割中的背景干扰问题
4700 Hyperspherical Learning in Multi-Label Classification 多标签分类中的超球面学习
4705 The Surprisingly Straightforward Scene Text Removal Method With Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis 具有门控注意和感兴趣区域生成的令人惊讶的简单场景文本删除方法：综合突出模型分析
4708 FingerprintNet: Synthesized Fingerprints for Generated Image Detection FingerprintNet：用于生成图像检测的合成指纹
4715 ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild ParticleSfM：利用密集点轨迹在野外定位移动相机
4720 Organic Priors in Non-Rigid Structure from Motion 来自运动的非刚性结构中的有机先验
4721 Free-Viewpoint RGB-D Human Performance Capture and Rendering Free-Viewpoint RGB-D 人体性能捕捉和渲染
4727 When Active Learning Meets Implicit Semantic Data Augmentation 当主动学习遇到隐式语义数据增强
4733 Multiview Regenerative Morphing with Dual Flows 具有双流的多视图再生变形
4734 Frequency and Spatial Dual Guidance for Image Dehazing 图像去雾的频率和空间双重引导
4736 The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing 视频编辑剖析：人工智能辅助视频编辑的数据集和基准套件
4739 Hallucinating Pose-Compatible Scenes 幻觉姿势兼容场景
4748 Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection Faster VoxelPose：正交投影的实时 3D 人体姿态估计
4754 Video Interpolation by Event-driven Anisotropic Adjustment of Optical Flow 事件驱动光流各向异性调整的视频插值
4761 Motion and Appearance Adaptation for Cross-Domain Motion Transfer 跨域运动传输的运动和外观自适应
4762 AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets AdaBin：使用自适应二元集改进二元神经网络
4781 Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation 社会隐式：重新思考轨迹预测评估和隐式最大似然估计的有效性
4788 A Generalized & Robust Framework For Timestamp Supervision in Temporal Action Segmentation 时间动作分割中时间戳监督的通用和鲁棒框架
4790 A Deep Moving-camera Background Model 一种深度运动相机背景模型
4800 DLME: Deep Local-flatness Manifold Embedding DLME：深度局部平坦流形嵌入
4802 Neural Video Compression using GANs for Detail Synthesis and Propagation 使用 GAN 进行神经视频压缩以进行细节合成和传播
4804 Few-shot Action Recognition with Hierarchical Matching and Contrastive Learning 基于分层匹配和对比学习的小样本动作识别
4806 TEMOS: Generating diverse human motions from textual descriptions TEMOS：从文本描述中生成不同的人体动作
4807 Perspective Flow Aggregation for Data-Limited 6D Object Pose Estimation 数据受限 6D 对象姿态估计的透视流聚合
4820 TALISMAN: Targeted Active Learning for Object Detection with Rare Classes and Slices using Submodular Mutual Information TALISMAN：使用子模互信息的稀有类和切片的目标主动学习目标检测
4824 Semantic-Aware Fine-Grained Correspondence 语义感知细粒度对应
4826 New Datasets and Models for Contextual Reasoning in Visual Dialog 可视对话框中用于上下文推理的新数据集和模型
4828 Remote Respiration Monitoring of Moving Person Using Radio Signals 使用无线电信号对移动人员进行远程呼吸监测
4832 AdvDO: Realistic Adversarial Attacks for Trajectory Prediction AdvDO：用于轨迹预测的现实对抗攻击
4836 Cross-Modality Transformer for Visible-Infrared Person Re-Identification 用于可见红外人员重新识别的跨模态转换器
4847 Layered Controllable Video Generation 分层可控视频生成
4849 VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition VL-LTR：学习用于长尾视觉识别的分类视觉语言表示
4857 Self-Supervised Classification Network 自监督分类网络
4861 GraphVid: It Only Takes a Few Nodes to Understand a Video GraphVid：只需几个节点即可理解视频
4865 DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction DevNet：通过密度体积构建的自监督单目深度学习
4872 Bayesian Optimization with Clustering and Rollback for CNN Auto Pruning 用于 CNN 自动修剪的聚类和回滚贝叶斯优化
4873 Towards Real-World HDRTV Reconstruction: A Data Synthesis-based Approach 迈向真实世界的 HDRTV 重建：基于数据合成的方法
4874 Quantum Motion Segmentation 量子运动分割
4878 Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection 用于单目 3D 目标检测的跨模态知识蒸馏网络
4879 Open-world Semantic Segmentation via Contrasting and Clustering Vision-language Embedding 通过对比和聚类视觉语言嵌入的开放世界语义分割
4880 Custom Structure Preservation in Face Aging 人脸老化中的自定义结构保留
4883 DANBO: Disentangled Articulated Neural Body Representations via Graph Neural Networks DANBO：通过图神经网络解开的关节神经体表示
4888 Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization 类对上下文是不变的，反之亦然：论分布外泛化的学习不变性
4891 Spatio-Temporal Deformable Attention Network for Video Deblurring 用于视频去模糊的时空可变形注意力网络
4894 CHORE: Contact, Human and Object REconstruction from a single RGB image CHORE：从单个 RGB 图像重建联系人、人和对象
4899 Complementing Brightness Constancy with Deep Networks for Optical Flow Prediction 用深度网络补充亮度恒定性以进行光流预测
4901 Adaptive Token Sampling For Efficient Vision Transformers 高效视觉转换器的自适应令牌采样
4902 Learning Discriminative Shrinkage Deep Networks for Image Deconvolution 学习用于图像反卷积的判别收缩深度网络
4904 Camera Pose Estimation and Localization with Active Audio Sensing 带有主动音频传感的相机姿态估计和定位
4906 Learning Efficient Multi-Agent Cooperative Visual Exploration 学习高效的多智能体合作视觉探索
4908 4DContrast: Contrastive Learning with Dynamic Correspondences for 3D Scene Understanding 4DContrast：用于 3D 场景理解的动态对应对比学习
4910 Implicit Field Supervision For Robust Non-Rigid Shape Matching 鲁棒非刚性形状匹配的隐式现场监督
4916 NeuMesh: Learning Disentangled Neural Mesh-based Implicit Field for Geometry and Texture Editing NeuMesh：学习基于解缠结神经网格的隐式场，用于几何和纹理编辑
4918 Learned Vertex Descent: A New Direction for 3D Human Model Fitting 学习顶点下降：3D人体模型拟合的新方向
4919 KXNet: A Model-Driven Deep Neural Network for Blind Super-Resolution KXNet：用于盲超分辨率的模型驱动深度神经网络
4921 Hierarchical Semi-Supervised Contrastive Learning for Contamination-Resistant Anomaly Detection 用于抗污染异常检测的分层半监督对比学习
4927 Learning to Fit Morphable Models 学习拟合可变形模型
4929 Few-Shot Classification with Contrastive Learning 使用对比学习的 Few-Shot 分类
4931 ARM: Any-Time Super-Resolution Method ARM：随时超分辨率方法
4933 Tracking Every Thing in the Wild 追踪野外的每一件事
4934 Learning Self-prior for Mesh Denoising using Dual Graph Convolutional Networks 使用对偶图卷积网络学习网格去噪的自我先验
4940 Few Zero Level Set-Shot Learning of Shape Signed Distance Functions in Feature Space 特征空间中形状符号距离函数的零级集射击学习
4948 Attention-aware Learning for Hyperparameters Prediction in Image Processing Pipelines 图像处理管道中超参数预测的注意力感知学习
4950 Attaining Class-level Forgetting in Pretrained Model using Few Samples 使用少量样本在预训练模型中实现类级遗忘
4951 Data Invariants to Understand Unsupervised Out-of-Distribution Detection 用于理解无监督分布外检测的数据不变量
4953 STEEX: Steering Counterfactual Explanations with Semantics STEEX：用语义引导反事实解释
4958 Outpainting by Queries 查询外画
4961 HULC: 3D HUman Motion Capture with Pose Manifold SampLing and Dense Contact Guidance HULC：具有姿势流形采样和密集接触引导的 3D 人体运动捕捉
4962 Interpretable Open-Set Domain Adaptation via Angular Margin Separation 通过角边距分离的可解释的开放集域自适应
4963 EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices EgoBody：来自头戴式设备的交互人的人体形状和运动
4966 ViTAS: Vision Transformer Architecture Search ViTAS：视觉转换器架构搜索
4970 LaLaLoc++: Global Floor Plan Comprehension for Layout Localisation in Unvisited Environments LaLaLoc++：未访问环境中布局本地化的全局平面图理解
4972 diffConv: Analyzing Irregular Point Clouds with an Irregular View diffConv：使用不规则视图分析不规则点云
4975 ReAct: Temporal Action Detection with Relational Action Queries ReAct：使用关系动作查询的时间动作检测
4976 StyleBabel: Artistic Style Tagging and Captioning StyleBabel：艺术风格标签和字幕
4977 TACS: Taxonomy Adaptive Cross-Domain Semantic Segmentation TACS：分类法自适应跨域语义分割
4983 Domain Invariant Autoencoders for Self-supervised Learning from Multi-domains 用于多域自监督学习的域不变自动编码器
4987 Learned Variational Video Color Propagation 学习变分视频颜色传播
4988 PD-Flow: A Point Cloud Denoising Framework with Normalizing Flows PD-Flow：具有标准化流的点云去噪框架
4989 RealFlow: EM-based Realistic Optical Flow Datasets Generation from Videos RealFlow：从视频中生成基于 EM 的真实光流数据集
4992 Prototypical Contrast Adaptation for Domain Adaptive Semantic Segmentation 域自适应语义分割的原型对比度自适应
4996 Adversarial Contrastive Learning via Asymmetric InfoNCE 基于非对称 InfoNCE 的对抗性对比学习
4998 NeRF for Outdoor Scene Relighting 用于户外场景重新照明的 NeRF
5001 FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image Fusion FusionVAE：用于 RGB 图像融合的深度分层变分自动编码器
5007 Self-calibrating Photometric Stereo by Neural Inverse Rendering 通过神经逆向渲染自校准光度立体
5009 Time-rEversed diffusioN tEnsor Transformer: A new TENET of Few-Shot Object Detection 逆时扩散张量变换器：Few-Shot 目标检测的新原则
5010 Semi-supervised Object Detection via Virtual Category Learning 基于虚拟类别学习的半监督目标检测
5017 Detecting Generated Images by Real Images 用真实图像检测生成的图像
5018 VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection VisageSynTalk：通过 Speech-Visage 特征选择进行看不见的扬声器视频到语音合成
5020 Delta Distillation for Efficient Video Processing 用于高效视频处理的 Delta 蒸馏
5026 PANDORA: A Panoramic Detection Dataset for Object with Orientation PANDORA：带有方向的对象的全景检测数据集
5032 Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation 实例作为身份：视频实例分割的通用在线范式
5034 Audio-Visual Mismatch-Aware Video Retrieval via Association and Adjustment 通过关联和调整的视听不匹配感知视频检索
5036 3D Clothed Human Reconstruction in the Wild 3D 穿衣人在野外重建
5040 Classification-Regression for Chart Comprehension 用于图表理解的分类回归
5042 Zero-Shot Category-Level Object Pose Estimation 零射击类别级物体姿态估计
5044 AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant AssistQ：以负担为中心的问题驱动的任务完成，用于以自我为中心的助手
5047 Laplace Mesh Transformer: Dual Attention and Topology Aware Network for 3D mesh Classification and Segmentation Laplace Mesh Transformer：用于 3D 网格分类和分割的双注意和拓扑感知网络
5048 CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition CoMER：基于 Transformer 的手写数学表达式识别的建模覆盖率
5049 RBC: Rectifying the Biased Context in Continual Semantic Segmentation RBC：纠正连续语义分割中的偏见上下文
5051 Don’t Forget Me: Accurate Background Recovery for Text Removal via Modeling Local-Global Context 不要忘记我：通过建模局部-全局上下文进行文本删除的准确背景恢复
5066 Semi-Supervised Keypoint Detector and Descriptor for Retinal Image Matching 用于视网膜图像匹配的半监督关键点检测器和描述符
5069 Memory-Augmented Model-Driven Network for Pansharpening 用于全色锐化的内存增强模型驱动网络
5076 Factorizing Knowledge in Neural Networks 在神经网络中分解知识
5080 PrivHAR: Recognizing Human Actions From Privacy-preserving Lens PrivHAR：从保护隐私的角度识别人类行为
5081 Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes 释放变形金刚：具有离散吸收扩散的并行令牌预测，用于从矢量量化代码中快速生成高分辨率图像
5082 Contrastive Vicinal Space for Unsupervised Domain Adaptation 用于无监督域适应的对比邻近空间
5083 Weight Fixing Networks 重量固定网络
5088 Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin Picking Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin Picking
5092 ChunkyGAN: Real Image Inversion via Segments ChunkyGAN：通过分段进行真实图像反转
5096 Solution Space Analysis of Essential Matrix based on Algebraic Error Minimization 基于代数误差最小化的本质矩阵解空间分析
5099 Towards Sequence-Level Training for Visual Tracking 迈向视觉跟踪的序列级训练
5100 EvAC3D: From Event-based Apparent Contours to 3D Models via Continuous Visual Hulls EvAC3D：从基于事件的表观轮廓到通过连续视觉外壳的 3D 模型
5111 Scale-aware Spatio-temporal Relation Learning for Video Anomaly Detection 用于视频异常检测的尺度感知时空关系学习
5114 Tracking by Associating Clips 通过关联剪辑进行跟踪
5117 An Information Theoretic Approach forAttention-Driven Face Forgery Detection 一种用于注意力驱动的人脸伪造检测的信息论方法
5118 Compound Prototype Matching for Few-shot Action Recognition 用于小样本动作识别的复合原型匹配
5119 Self-Promoted Supervision for Few-Shot Transformer 少发变压器的自我监督
5122 Completely Self-Supervised Crowd Counting via Distribution Matching 基于分布匹配的完全自监督人群计数
5123 Geodesic-Former: a Geodesic-Guided Few-shot 3D Point Cloud Instance Segmenter Geodesic-Former：测地线引导的少镜头 3D 点云实例分割器
5127 SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer SeedFormer：使用上采样转换器完成基于补丁种子的点云补全
5129 3D-PL: Domain Adaptive Depth Estimation with 3D-aware Pseudo-Labeling 3D-PL：具有 3D 感知伪标签的域自适应深度估计
5136 Towards Accurate Active Camera Localization 迈向准确的主动相机定位
5138 Few-shot Object Counting and Detection 少镜头目标计数和检测
5140 RealPatch: A Statistical Matching Framework for Model Patching with Real Samples RealPatch：使用真实样本进行模型修补的统计匹配框架
5142 DCCF: Deep Comprehensible Color Filter Learning Framework for High-Resolution Image Harmonization DCCF：用于高分辨率图像协调的深度可理解的滤色器学习框架
5144 GAN Cocktail: mixing GANs without dataset access GAN Cocktail：在没有数据集访问的情况下混合 GAN
5156 Coarse-To-Fine Incremental Few-Shot Learning 粗到精增量小样本学习
5157 Learning Unbiased Transferability for Domain Adaptation by Uncertainty Modeling 通过不确定性建模学习领域适应的无偏可迁移性
5158 Camera Pose Auto-Encoders for Improving Pose Regression 用于改善姿势回归的相机姿势自动编码器
5160 CoGS: Controllable Generation and Search from Sketch and Style CoGS：草图和样式的可控生成和搜索
5172 Active Audio-Visual Separation of Dynamic Sound Sources 动态声源的主动视听分离
5175 AU-aware 3D Face Reconstruction through Personalized AU-specific Blendshape Learning 通过个性化的 AU 特定 Blendshape 学习进行 AU 感知 3D 面部重建
5180 Directed Ray Distance Functions for 3D Scene Reconstruction 用于 3D 场景重建的定向射线距离函数
5189 Background-Insensitive Scene Text Recognition with Text Semantic Segmentation 具有文本语义分割的背景不敏感场景文本识别
5198 Geometry-Guided Progressive NeRF for Generalizable and Efficient Neural Human Rendering 几何引导的渐进式 NeRF 用于可泛化和高效的神经人体渲染
5207 MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning MorphMLP：一种用于时空表示学习的高效 MLP 类主干
5211 Continual Variational Autoencoder Learning via Online Cooperative Memorization 通过在线合作记忆的持续变分自编码器学习
5215 Semantic Novelty Detection via Relational Reasoning 通过关系推理进行语义新奇检测
5217 FindIt: Generalized Localization with Natural Language Queries FindIt：使用自然语言查询的广义本地化
5224 SelectionConv: Convolutional Neural Networks for Non-rectilinear Image Data SelectionConv：用于非直线图像数据的卷积神经网络
5226 UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling UniTAB：统一文本和框输出，用于基础视觉语言建模
5227 HairNet: Hairstyle Transfer with Pose Changes HairNet：具有姿势变化的发型转移
5234 Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition Learn2Augment：学习合成视频以增强动作识别中的数据
5235 Action-based Contrastive Learning for Trajectory Prediction 轨迹预测的基于动作的对比学习
5240 Scaling Open-vocabulary Image Segmentation with Image-level Labels 使用图像级标签缩放开放词汇图像分割
5242 Grasp’D: Differentiable Contact-rich Grasp Synthesis for Multi-fingered Hands Grasp’D：多指手的可微分接触丰富的抓握合成
5247 Improving Closed and Open-Vocabulary Attribute Prediction using Transformers 使用 Transformer 改进封闭式和开放式词汇属性预测
5251 FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context FS-COCO：走向理解上下文中常见对象的手绘草图
5252 A Contrastive Objective for Learning Disentangled Representations 学习分离表示的对比目标
5256 Unbiased Multi-Modality Guidance for Image Inpainting 图像修复的无偏多模态指导
5257 Learned Monocular Depth Priors in Visual-Inertial Initialization 在视觉惯性初始化中学习单目深度先验
5261 DexMV: Imitation Learning for Dexterous Manipulation from Human Videos DexMV：从人类视频中进行灵巧操作的模仿学习
5263 The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning 福尔摩斯的绑架：视觉绑架推理的数据集
5265 Exploring Fine-grained Audiovisual Categorization with the SSW60 Dataset 使用 SSW60 数据集探索细粒度视听分类
5266 Radatron: Accurate Detection Using Multi-Resolution Cascaded MIMO Radar Radatron：使用多分辨率级联 MIMO 雷达进行准确检测
5270 COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality 作曲家：仅关键点模态的视频中群体活动的组合推理
5271 Cross-Modal Knowledge Transfer Without Task-Relevant Source Data 没有任务相关源数据的跨模式知识转移
5272 The Fish Counting Dataset: A Benchmark for Multiple Object Tracking and Counting 鱼类计数数据集：多对象跟踪和计数的基准
5285 Approximate Differentiable Rendering with Algebraic Surfaces 代数曲面的近似可微渲染
5287 Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation From Monocular RGB Image 基于单目 RGB 图像的类别级别 6D 对象姿态估计的对象级别深度重建
5293 DeepMend: Learning Occupancy Functions to Represent Shape for Repair DeepMend：学习占用函数来表示修复形状
5297 Graph Neural Network for Cell Tracking in Microscopy Videos 用于显微镜视频中细胞跟踪的图神经网络
5299 Anti-Neuron Watermarking: Protecting Personal Data Against Unauthorized Neural Networks 反神经元水印：保护个人数据免受未经授权的神经网络的侵害
5303 Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments 用于连续环境中视觉和语言导航的 Sim-2-Sim 传输
5310 PACS: A Dataset for Physical Audiovisual Commonsense Reasoning PACS：物理视听常识推理的数据集
5315 Intelli-Paint: Towards Developing More Human-Intelligible Painting Agents Intelli-Paint：开发更多人类可理解的绘画代理
5317 Rethinking Few-Shot Object Detection on A Multi-Domain Benchmark 在多域基准上重新思考少镜头目标检测
5318 LidarNAS: Unifying and Searching Neural Architectures for 3D Point Clouds LidarNAS：统一和搜索 3D 点云的神经架构
5325 Improving the Intra-class Long-tail in 3D Detection via Rare Example Mining 通过罕见示例挖掘改进 3D 检测中的类内长尾
5326 Learning to Learn with Smooth Regularization 通过平滑正则化学习学习
5327 A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility 具有未知命令可行性的交互式视觉语言导航数据集
5330 CoVisPose: Co-Visibility Pose Transformer for Wide-Baseline Relative Pose Estimation in 360 Indoor Panoramas CoVisPose：用于 360 度室内全景中宽基线相对姿态估计的共可见姿态变换器
5340 PT4AL: Using Self-Supervised Pretext Tasks for Active Learning PT4AL：使用自我监督的借口任务进行主动学习
5350 Uncertainty-DTW for Time Series and Sequences 时间序列和序列的不确定性-DTW
5351 Uncertainty Quantification in Depth Estimation via Constrained Ordinal Regression 通过约束序数回归进行深度估计的不确定性量化
5358 Affine Correspondences between Multi-Camera Systems for 6DOF Relative Pose Estimation 用于 6DOF 相对位姿估计的多相机系统之间的仿射对应关系
5361 All You Need is RAW: Defending Against Adversarial Attacks with Camera Image Pipelines 您只需要 RAW：使用相机图像管道防御对抗性攻击
5362 ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer ParC-Net：具有来自 ConvNets 和 Transformer 优点的位置感知循环卷积
5369 B ́ezierPalm: A Free lunch for Palmprint Recognition B ́ezierPalm：掌纹识别的免费午餐
5372 A Repulsive Force Unit for Garment Collision Handling in Neural Networks 神经网络中服装碰撞处理的排斥力单元
5373 CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation CYBORGS：通过分割中的接地来对比引导对象表示
5377 Connecting Compression Spaces withTransformer for Approximate Nearest Neighbor Search 使用Transformer 连接压缩空间以进行近似最近邻搜索
5381 Training Vision Transformers with Only 2040 Images 仅使用 2040 张图像训练视觉变形金刚
5384 Black-box Few-shot Knowledge Distillation 黑盒小样本知识蒸馏
5388 AutoAvatar: Autoregressive Neural Fields for Dynamic Avatar Modeling AutoAvatar：用于动态头像建模的自回归神经场
5392 Ghost-free High Dynamic Range Imaging with Context-aware Transformer 具有上下文感知变压器的无重影高动态范围成像
5393 Cross-Domain Cross-Set Few-Shot Learning via Learning Compact and Aligned Representations 通过学习紧凑和对齐表示的跨域跨集 Few-Shot 学习
5396 Motion Transformer for Unsupervised Image Animation 用于无监督图像动画的运动转换器
5404 LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection LiDAR 蒸馏：弥合 3D 物体检测的光束诱导域间隙
5405 PSS: Progressive Sample Selection for Open-World Visual Representation Learning PSS：开放世界视觉表示学习的渐进式样本选择
5408 Self-slimmed Vision Transformer 自瘦视觉变压器
5410 Switchable Online Knowledge Distillation 可切换的在线知识蒸馏
5415 Improving Self-supervised Lightweight Model Learning via Hard-aware Metric Distillation 通过硬感知度量蒸馏改进自监督轻量级模型学习
5418 Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing 用于鲁棒少样本跨域人脸反欺骗的自适应变换器
5419 GraphFit: Learning Multi-scale Graph-Convolutional Representation for Point Cloud Normal Estimation GraphFit：学习点云法线估计的多尺度图卷积表示
5422 NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion NÜWA：用于神经视觉世界创造的视觉合成预训练
5424 Are Vision Transformers Robust to Patch-wise Perturbations? 视觉变形金刚对补丁式扰动具有鲁棒性吗？
5428 DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning DualPrompt：免排练持续学习的补充提示
5430 EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer EleGANt：用于化妆转移的精致且可在本地编辑的 GAN
5436 Union-set Multi-source Model Adaptation for Semantic Segmentation 语义分割的联合集多源模型自适应
5441 Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection 桥接图像和视频：大型词汇视频对象检测的简单学习框架
5443 TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs TDAM：用于 CNN 中上下文引导特征选择的自顶向下注意模块
5451 Exploring Disentangled Content Information for Face Forgery Detection 探索用于人脸伪造检测的分离内容信息
5458 Object Discovery via Contrastive Learning for Weakly Supervised Object Detection 通过对比学习进行弱监督目标检测的目标发现
5460 Unifying Vision Unsupervised Contrastive Learning from a Graph Perspective 从图的角度统一视觉无监督对比学习
5463 E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context E-NeRV：使用分离的时空上下文加速神经视频表示
5478 $\ell_\infty$ -Robustness and Beyond: Unleashing Efficient Adversarial Training $\ell_\infty$ -鲁棒性和超越：释放有效的对抗训练
5481 Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization 用于高效和高分辨率图像协调的空间分离曲线渲染网络
5484 Point MixSwap: Attentional Point Cloud Mixing via Swapping Matched Structural Divisions Point MixSwap：通过交换匹配的结构分区进行注意力点云混合
5491 One Size Does NOT Fit All: Data-Adaptive Adversarial Training 一种尺寸并不适合所有人：数据自适应对抗训练
5494 IS-MVSNet: Importance Sampling-based MVSNet IS-MVSNet：基于重要性采样的 MVSNet
5496 Multi-Granularity Pruning for Model Acceleration on Mobile Devices 用于移动设备上的模型加速的多粒度修剪
5500 Style-Agnostic Reinforcement Learning 与风格无关的强化学习
5504 Editing Out-of-domain GAN Inversion via Differential Activations 通过差分激活编辑域外 GAN 反转
5508 Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization 用于弱监督对象定位的 Bagging 区域分类激活图
5512 BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation BATMAN：用于视频对象分割的运动外观相邻空间中的双边注意变换器
5518 Mutually Reinforcing Structure with Proposal Contrastive Consistency for Few-Shot Object Detection 具有建议对比度一致性的相互增强结构用于少镜头目标检测
5523 Panoptic-PartFormer: Learning a Unified model for Panoptic Part Segmentation Panoptic-PartFormer：学习全景零件分割的统一模型
5536 TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers TextAdaIN：关注文本识别器中的捷径学习
5537 Speaker-adaptive Lip Reading with User-dependent Padding 具有用户相关填充的扬声器自适应唇读
5541 Online Domain Adaptation for Semantic Segmentation in Ever-Changing Conditions 在不断变化的条件下进行语义分割的在线域自适应
5542 Point Scene Understanding via Disentangled Instance Mesh Reconstruction 通过解开的实例网格重建理解点场景
5543 Dual Contrastive Learning with Anatomical Auxiliary Supervision for Few-shot Medical Image Segmentation 具有解剖辅助监督的双对比学习用于少镜头医学图像分割
5544 An Efficient Person Clustering Algorithm for Open Checkout-free Groceries 一种用于开放式免结账杂货的高效人员聚类算法
5548 Face2Face^ρ: Real-Time High-Resolution One-Shot Face Reenactment Face2Face^ρ：实时高分辨率一次性人脸重现
5549 Decoupled Contrastive Learning 解耦对比学习
5555 Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning 抽象推理中系统泛化的学习代数表示
5556 On the Robustness of Quality Measures for GANs 关于 GAN 质量度量的稳健性
5557 Automatic Check-Out via Prototype-based Classifier Learning from Single-Product Exemplars 通过基于原型的分类器从单产品示例中学习自动签出
5559 TDViT: Temporal Dilated Transformer for Dense Video Tasks TDViT：用于密集视频任务的时间膨胀变压器
5561 POP: Mining POtential Performance of new fashion products via webly cross-modal query expansion POP：通过webly跨模态查询扩展挖掘新时尚产品的潜在性能
5564 BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis BRACE：舞蹈动作合成的霹雳舞比赛数据集
5578 Towards Racially Unbiased Skin Tone Estimation via Scene Disambiguation 通过场景消歧实现种族无偏肤色估计
5580 Style-Guided Shadow Removal 风格引导的阴影去除
5584 Sound-guided Semantic Video Generation 声音引导的语义视频生成
5585 Robust Visual Tracking by Segmentation 强大的分割视觉跟踪
5591 Semi-Supervised Learning of Optical Flow by Flow Supervisor Flow Supervisor 对光流的半监督学习
5595 Joint Learning of Localized Representations from Medical Images and Reports 从医学图像和报告中联合学习本地化表示
5599 D2C-SR: A Divergence to Convergence Approach for Real-World Image Super-Resolution D2C-SR：现实世界图像超分辨率的发散收敛方法
5612 Continual 3D Convolutional Neural Networks for Real-time Processing of Videos 用于视频实时处理的连续 3D 卷积神经网络
5613 Salient Object Detection for Point Clouds 点云的显着目标检测
5616 Deep ensemble learning by diverse knowledge distillation for fine-grained object classification 用于细粒度对象分类的多样化知识蒸馏深度集成学习
5619 Source-free Video Domain Adaptation by Learning Temporal Consistency for Action Recognition 通过学习时间一致性进行动作识别的无源视频域自适应
5622 DiffuStereo: High Quality Human Reconstruction via Diffusion-based Stereo Using Sparse Cameras DiffuStereo：使用稀疏相机通过基于扩散的立体进行高质量人体重建
5643 GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training GRIT-VLP：用于高效视觉和语言预训练的分组小批量采样
5644 Pose Forecasting in Industrial Human-Robot Collaboration 工业人机协作中的姿态预测
5648 MeshLoc: Mesh-Based Visual Localization MeshLoc：基于网格的视觉定位
5660 Dress Code: High-Resolution Multi-Category Virtual Try-On 着装要求：高分辨率多类别虚拟试穿
5661 UC-OWOD: Unknown-Classified Open World Object Detection UC-OWOD：未知分类开放世界对象检测
5666 Helpful or Harmful: Inter-Task Association in Continual Learning 有益或有害：持续学习中的任务间关联
5667 The Challenges of Continuous Self-Supervised Learning 持续自我监督学习的挑战
5669 RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers RayTran：使用光线追踪变换器从视频中对多个对象进行 3D 姿态估计和形状重建
5670 Deep Radial Embedding for Visual Sequence Learning 用于视觉序列学习的深度径向嵌入
5673 Efficient Point Cloud Segmentation with Geometry-aware Sparse Networks 使用几何感知稀疏网络进行高效点云分割
5677 Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition 用于细粒度动作识别的动态时空专业化学习
5685 TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation TISE：文本到图像合成评估的指标包
5688 CostDCNet: Cost Volume based Depth Completion for a Single RGB-D Image CostDCNet：单个 RGB-D 图像的基于成本体积的深度完成
5697 Efficient Video Deblurring Guided by Motion Magnitude 由运动幅度引导的高效视频去模糊
5702 Space-Partitioning RANSAC 空间分区 RANSAC
5704 Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies 通过对上下文依赖建模实现准确的二元神经网络
5712 Overcoming Shortcut Learning in a Target Domain by Generalizing Basic Visual Factors from a Source Domain 通过从源域概括基本视觉因素来克服目标域中的捷径学习
5713 Shape-Pose Disentanglement using SE(3)-equivariant Vector Neurons 使用 SE(3) 等变向量神经元的形状-姿势解缠结
5721 SimpleRecon: 3D Reconstruction Without 3D Convolutions SimpleRecon：没有 3D 卷积的 3D 重建
5739 SemAug: Semantically Meaningful Image Augmentations for Object Detection Through Language Grounding SemAug：通过语言基础进行目标检测的语义上有意义的图像增强
5740 A data-centric approach for improving ambiguous labels with combined semi-supervised classification and clustering 一种以数据为中心的方法，用于结合半监督分类和聚类来改进模糊标签
5750 SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks SPIN：各向同性网络共享参数的实证评估
5754 SAGA: Stochastic Whole-Body Grasping With Contact SAGA：随机的全身抓取与接触
5761 GTCaR: Graph Transformer for Camera Re-localization GTCaR：用于相机重新定位的图形转换器
5763 3D Object Detection with a Self-supervised Lidar Scene Flow Backbone 使用自监督激光雷达场景流主干进行 3D 对象检测
5764 Actor-centered Representations for Action Localization in Streaming Videos 流视频中动作定位的以演员为中心的表示
5769 Photo-realistic Neural Domain Randomization 逼真的神经域随机化
5770 ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization ShAPO：多对象形状、外观和姿势优化的隐式表示
5771 Structure and Motion for Casual Videos 休闲视频的结构和运动
5775 Single Frame Atmospheric Turbulence Mitigation: A Benchmark Study and A New Physics-Inspired Transformer Model 单帧大气湍流缓解：基准研究和受物理启发的新变压器模型
5778 Incremental Task Learning with Incremental Rank Updates 具有增量等级更新的增量任务学习
5787 Bandwidth-Aware Adaptive Codec for DNN Inference Offloading in IoT 用于物联网中 DNN 推理卸载的带宽感知自适应编解码器
5789 Inpainting at Modern Camera Resolution by Guided PatchMatch with Auto-Curation 通过具有自动管理的 Guided PatchMatch 以现代相机分辨率进行修复
5794 Controllable Video Generation through Global and Local Motion Dynamics 通过全局和局部运动动力学生成可控视频
5812 UniCR: Universally Approximated Certified Robustness via Randomized Smoothing UniCR：通过随机平滑获得普遍近似的认证稳健性
5829 3D Siamese Transformer Network for Single Object Tracking on Point Clouds 用于点云上单个对象跟踪的 3D Siamese Transformer 网络
5837 Hardly Perceptible Trojan Attack against Neural Networks with Bit Flips 使用 Bit Flips 对神经网络进行难以察觉的特洛伊木马攻击
5856 StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN StyleHEAT：通过预训练 StyleGAN 生成一次性高分辨率可编辑说话人脸
5859 Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance 使用无条件分类器指导的自然图像的参考对象操作
5880 Self-Supervised Interactive Object Segmentation Through a Singulation-and-Grasping Approach 通过单一和抓取方法的自我监督交互式对象分割
5898 BigColor: Colorization using a Generative Color Prior for Natural Images BigColor：使用生成颜色先验对自然图像进行着色
5901 Object Wake-up: 3D Object Rigging from a Single Image 对象唤醒：来自单个图像的 3D 对象绑定
5905 ClearPose: Large-scale Transparent Object Dataset and Benchmark ClearPose：大规模透明对象数据集和基准
5907 Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment 用于锻炼形式评估的领域知识知情自我监督表示
5908 Neural Capture of Animatable 3D Human from Monocular Video 从单目视频中对动画 3D 人体进行神经捕获
5913 Open Vocabulary Object Detection with Pseudo Bounding-Box Labels 带有伪边界框标签的开放词汇对象检测
5914 BoundaryFace: A mining framework with noise label self-correction for Face Recognition BoundaryFace：用于人脸识别的具有噪声标签自校正的挖掘框架
5915 IntegratedPIFu: Integrated Pixel Aligned Implicit Function for Single-view Human Reconstruction IntegratedPIFu：用于单视图人体重建的集成像素对齐隐式函数
5922 BMD: A General Class-balanced Multicentric Dynamic Prototype Strategy for Source-free Domain Adaptation BMD：用于无源域适应的通用类平衡多中心动态原型策略
5923 What Matters for 3D Scene Flow Network 3D 场景流网络的重要性
5932 Controllable Shadow Generation Using Pixel Heigh Maps 使用像素高度图的可控阴影生成
5937 CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution CADyQ：图像超分辨率的内容感知动态量化
5940 SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object Detection SPSN：用于 RGB-D 显着目标检测的超像素原型采样网络
5950 Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer 使用与时间无关的 VQGAN 和时间敏感的 Transformer 生成长视频
5951 Combining Internal and External Constraints for Unrolling Shutter in Videos 结合内部和外部约束在视频中展开快门
5961 Global Spectral Filter Memory Network for Video Object Segmentation 用于视频对象分割的全局光谱滤波器记忆网络
5964 SEMICON: A Learning-to-hash Solution for Large-scale Fine-grained Image Retrieval SEMICON：用于大规模细粒度图像检索的学习哈希解决方案
5966 Batch-efficient EigenDecomposition for Small and Medium Matrices 中小型矩阵的批量有效特征分解
5972 General Object Pose Transformation Network from Unpaired Data 来自未配对数据的通用对象姿态变换网络
5974 Robust Network Architecture Search via Feature Distortion Restraining 通过特征失真约束进行鲁棒网络架构搜索
5988 Correspondence Reweighted Translation Averaging 信函重加权翻译平均
5991 FH-Net: A Fast Hierarchical Network for Scene Flow Estimation on Real-world Point Clouds FH-Net：一种用于真实世界点云场景流估计的快速分层网络
5993 RepMix: Representation Mixing for Robust Attribution of Synthesized Images RepMix：用于合成图像鲁棒属性的表示混合
6000 When Deep Classifiers Agree: Analyzing Correlations between Learning Order and Image Statistics 当深度分类器同意时：分析学习顺序和图像统计之间的相关性
6002 S2F2: Single-Stage Flow Forecasting for Future Multiple Trajectories Prediction S2F2：未来多轨迹预测的单阶段流量预测
6004 Few-Shot Object Detection by Knowledge Distillation Using Bag-of-Visual-Words Representations 使用 Bag-of-Visual-Words 表示的知识提炼的 Few-Shot 目标检测
6009 Stochastic Consensus: Enhancing Semi-Supervised Learning with Consistency of Stochastic Classifiers 随机共识：通过随机分类器的一致性增强半监督学习
6011 Learning Where To Look – Generative NAS is Surprisingly Efficient 学习去哪里寻找——生成式 NAS 的效率惊人地高
6023 Realistic One-shot Mesh-based Head Avatars 逼真的一次性基于网格的头部头像
6024 Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning 集成知识引导的子网络搜索和过滤器修剪微调
6037 SALISA: Saliency-based Input Sampling for Efficient Video Object Detection SALISA：用于高效视频对象检测的基于显着性的输入采样
6039 Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer 通过多尺度时空分割注意变换器进行视频实例分割
6044 RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation RankSeg：自适应像素分类与图像类别排名进行分割
6046 Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression Contextformer：具有空间通道注意的变压器，用于学习图像压缩中的上下文建模
6048 Image Super-Resolution with Deep Dictionary 具有深度字典的图像超分辨率
6054 ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement ECO-TR：通过从粗到细的细化找到有效的对应关系
6056 Responsive Listening Head Generation: A Benchmark Dataset and Baseline 响应式听力头生成：基准数据集和基线
6063 WISE: Whitebox Image Stylization by Example-based Learning WISE：基于示例学习的白盒图像风格化
6067 3D Equivariant Graph Implicit Functions 3D 等变图隐函数
6068 AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment AnimeCeleb：用于头部重演的大型动画 CelebHeads 数据集
6076 Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics 通过集成 IMU 运动动力学实现尺度感知、鲁棒和可概括的无监督单目深度估计
6078 Dynamic Local Aggregation Network with Adaptive Clusterer for Anomaly Detection 用于异常检测的具有自适应聚类器的动态局部聚合网络
6080 Learning Semantic Segmentation from Multiple Datasets with Label Shifts 从具有标签移位的多个数据集中学习语义分割
6086 SecretGen: Privacy Recovery on Pre-trained Models via Distribution Discrimination SecretGen：通过分布歧视在预训练模型上恢复隐私
6090 A Kendall Shape Space Approach to 3D Shape Estimation from 2D Landmarks 从 2D 地标估计 3D 形状的 Kendall 形状空间方法
6092 Temporally Consistent Transformer for Video Denoising 用于视频去噪的时间一致变压器
6093 Action Quality Assessment with Temporal Parsing Transformer 使用时间解析转换器进行动作质量评估
6097 A study of Pre-training strategies and datasets for facial representation learning 面部表征学习的预训练策略和数据集研究
6108 Vote from the Center: 6 DoF Pose Estimation in RGB-D Images by Radial Keypoint Voting 来自中心的投票：通过径向关键点投票在 RGB-D 图像中进行 6 DoF 姿态估计
6112 Neural Strands: Learning Hair Geometry and Appearance from Multi-View Images 神经链：从多视图图像中学习头发的几何形状和外观
6114 Conditional Stroke Recovery for Fine-Grained Sketch-Based Image Retrieval 基于细粒度草图的图像检索的条件笔画恢复
6123 Generalized Brain Image Synthesis with Transferable Convolutional Sparse Coding Networks 具有可转移卷积稀疏编码网络的广义脑图像合成
6127 Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning Wave-ViT：统一小波和变换器用于视觉表示学习
6129 GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs GraphCSPN：通过动态 GCN 完成几何感知深度
6132 Flow graph to Video Grounding for Weakly-supervised Multi-Step Localization 用于弱监督多步定位的视频接地流程图
6138 Revisiting Batch Norm Initialization 重新审视批量规范初始化
6141 NewsStories: Illustrating articles with visual summaries NewsStories：用视觉摘要说明文章
6143 Neural Radiance Transfer Fields for Relightable Novel-view Synthesis with Global Illumination 具有全局照明的可重新照明的新视图合成的神经辐射转移场
6144 Improving Few-Shot Learning through Multi-task Representation Learning Theory 通过多任务表示学习理论改进 Few-Shot 学习
6145 Deep Semantic Statistics Matching (D2SM) Denoising Network 深度语义统计匹配 (D2SM) 去噪网络
6148 Long-tailed Instance Segmentation using Gumbel Optimized Loss 使用 Gumbel 优化损失的长尾实例分割
6162 DetMatch: Two Teachers are Better Than One for Joint 2D and 3D Semi-Supervised Object Detection DetMatch：两位老师在联合 2D 和 3D 半监督目标检测方面胜过一位
6177 3D Scene Inference from Transient Histograms 从瞬态直方图推断 3D 场景
6178 SSBNet: Improving Visual Recognition Efficiency by Adaptive Sampling SSBNet：通过自适应采样提高视觉识别效率
6180 Learning Topological Interactions for Multi-Class Medical Image Segmentation 学习多类医学图像分割的拓扑交互
6182 Deep 360° Optical Flow Estimation by Multi-Projection Fusion 基于多投影融合的深度 360° 光流估计
6185 Look Both Ways: Self-Supervising Driver Gaze Estimation and Road Scene Saliency 双向观察：自我监督的驾驶员注视估计和道路场景显着性
6187 Neural Space-filling Curves 神经空间填充曲线
6191 ObjectBox: From Centers to Boxes for Anchor-Free Object Detection ObjectBox：从中心到无锚对象检测的框
6192 MFIM: Megapixel Facial Identity Manipulation MFIM：百万像素面部识别操作
6193 Unsupervised Segmentation in Real-World Images via Spelke Object Inference 通过 Spelke 对象推理在真实世界图像中进行无监督分割
6194 Objects Can Move: 3D Change Detection by GeometricTransformation Consistency 物体可以移动：通过 GeometricTransformation 一致性检测 3D 变化
6199 MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration MUGEN：视频-音频-文本多模态理解和生成的游乐场
6203 PatchRD: Detail-Preserving Shape Completion by Learning Patch Retrieval and Deformation PatchRD：通过学习补丁检索和变形来完成保留细节的形状
6207 Network Binarization via Contrastive Learning 通过对比学习进行网络二值化
6210 Lipschitz Continuity Retained Binary Neural Network Lipschitz 连续性保留二元神经网络
6212 Is Geometry Enough for Matching in Visual Localization? 几何图形是否足以匹配视觉定位？
6214 Webly Supervised Concept Expansion for General Purpose Vision Models 通用视觉模型的网络监督概念扩展
6216 Compositional Human-Scene Interaction Synthesis with Semantic Control 具有语义控制的组合人景交互合成
6218 MaCLR: Motion-aware Contrastive Learning of Representations for Videos MaCLR：视频表示的运动感知对比学习
6220 Transformers as Meta-Learners for Implicit Neural Representations Transformers 作为隐式神经表示的元学习器
6222 RAWtoBit: A Fully End-to-end Camera ISP Network RAWtoBit：一个完全端到端的相机 ISP 网络
6227 SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention SpatialDETR：基于具有全局交叉传感器注意的多视图相机图像的鲁棒可扩展基于变换器的 3D 对象检测
6228 3D Face Reconstruction with Dense Landmarks 具有密集地标的 3D 人脸重建
6236 SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds SWFormer：用于点云中 3D 对象检测的稀疏窗口转换器
6243 A Dense Material Segmentation Dataset for Indoor and Outdoor Scene Parsing 用于室内和室外场景解析的密集材料分割数据集
6247 Incomplete Multi-view Domain Adaptation via Channel Enhancement and Knowledge Transfer 通过通道增强和知识转移的不完全多视图域适应
6250 Exposure-Aware Dynamic Weighted Learning for Single-Shot HDR Imaging 单次 HDR 成像的曝光感知动态加权学习
6259 Seeing through a Black Box: Toward High-Quality Terahertz Imaging via Subspace-and-Attention Guided Restoration 看穿黑匣子：通过子空间和注意力引导恢复实现高质量太赫兹成像
6265 SPViT: Enabling Faster Vision Transformers via Soft Token Pruning SPViT：通过软令牌修剪实现更快的视觉转换器
6269 Soft Masking for Cost-Constrained Channel Pruning 成本受限通道修剪的软掩蔽
6271 Ensemble Learning Priors Driven Deep Unfolding forScalable Snapshot Compressive Imaging 集成学习先验驱动可扩展快照压缩成像的深度展开
6275 A Simple Baseline for Open Vocabulary Semantic Segmentation with Pre-trained Vision-language Model 具有预训练视觉语言模型的开放词汇语义分割的简单基线
6276 Triangle Attack: A Query-efficient Decision-based Adversarial Attack 三角攻击：一种查询效率高的基于决策的对抗性攻击
6282 Tailoring Self-Supervision for Supervised Learning 为监督学习定制自我监督
6283 Difficulty-Aware Simulator for Open Set Recognition 开放集识别的难度感知模拟器
6287 Non-Uniform Step Size Quantization for Accurate Post-Training Quantization 用于精确训练后量化的非均匀步长量化
6295 Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentation on Complex Urban Driving Scenes 复杂城市驾驶场景异常分割的像素级能量偏置弃权学习
6298 FedVLN: Privacy-preserving Federated Vision-and-Language Navigation FedVLN：保护隐私的联合视觉和语言导航
6305 Data-free Backdoor Removal Based on Channel Lipschitzness 基于通道 Lipschitzness 的无数据后门去除
6312 SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning SuperTickets：通过联合架构搜索和参数修剪从超网中绘制与任务无关的彩票
6316 PCR-CG: Point Cloud Registration via Deep Explicit Color and Geometry PCR-CG：通过深度显式颜色和几何进行点云配准
6323 DistPro: Searching A Fast Knowledge Distillation Process via Meta Optimization DistPro：通过元优化搜索快速知识蒸馏过程
6324 Tomography of Turbulence Strength Based on Scintillation Imaging 基于闪烁成像的湍流强度断层扫描
6325 Realistic Blur Synthesis for Learning Image Deblurring 用于学习图像去模糊的逼真模糊合成
6326 Identifying Hard Noise in Long-Tailed Sample Distribution 识别长尾样本分布中的硬噪声
6328 GLAMD: Global and Local Attention MaskDistillation for Object Detectors GLAMD：对象检测器的全局和局部注意掩码蒸馏
6337 Meta-GF: Training Dynamic-Depth Neural Networks Harmoniously Meta-GF：和谐地训练动态深度神经网络
6338 CXR Segmentation by AdaIN-based Domain Adaptation and Knowledge Distillation 基于 AdaIN 的领域适应和知识蒸馏的 CXR 分割
6342 Emotion-aware Multi-view Contrastive Learning for Facial Emotion Recognition 用于面部情绪识别的情绪感知多视图对比学习
6356 FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection FCAF3D：完全卷积无锚的 3D 对象检测
6365 Video Dialog as Conversation about Objects Living in Space-Time 视频对话作为关于生活在时空中的物体的对话
6366 Few-Shot Class-Incremental Learning from an Open-Set Perspective 开放集视角下的 Few-Shot Class-Incremental Learning
6380 ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation ML-BPM：具有双向光度混合的多教师学习，用于语义分割中的开放复合域自适应
6389 DRCNet: Dynamic Image Restoration Contrastive Network DRCNet：动态图像恢复对比网络
6394 Order Learning Using Partially Ordered Data via Chainization 通过链化使用部分有序数据进行有序学习
6395 Style Your Hair: Latent Optimization for Pose-Invariant Hairstyle Transfer via Local-Style-Aware Hair Alignment 设计你的头发：通过局部风格感知头发对齐对姿势不变的发型转移进行潜在优化
6403 High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions 具有未对准和遮挡处理条件的高分辨率虚拟试穿
6418 Zero-Shot Learning for Reflection Removal of Single 360-Degree Image 单一 360 度图像反射去除的零样本学习
6420 A Codec Information Assisted Framework for Efficient Compressed Video Super-Resolution 用于高效压缩视频超分辨率的编解码器信息辅助框架
6421 Towards Ultra Low Latency Spiking Neural Networks for Vision and Sequential Tasks Using Temporal Pruning 使用时间剪枝实现用于视觉和顺序任务的超低延迟尖峰神经网络
6439 MimicME: A Large Scale Diverse 4D Database for Facial Expression Analysis MimicME：用于面部表情分析的大规模多样化 4D 数据库
6441 Black-Box Dissector: Towards Erasing-based Hard-Label Model Stealing Attack Black-Box Dissector：迈向基于擦除的硬标签模型窃取攻击
6451 Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw Puzzles 通过解耦时空拼图进行视频异常检测
6454 Towards Accurate Network Quantization with Equivalent Smooth Regularizers 使用等效平滑正则化器实现准确的网络量化
6455 DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion Model DiffuseMorph：使用扩散模型的无监督可变形图像配准
6459 An Impartial Take to the CNN vs Transformer Robustness Contest 公正地参加 CNN 与 Transformer 鲁棒性竞赛
6460 CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval CODER：用于图像-文本检索的耦合多样性敏感动量对比学习
6463 Weakly Supervised 3D Scene Segmentation with Region-Level Boundary Awareness and Instance Discrimination 具有区域级边界感知和实例识别的弱监督 3D 场景分割
6471 FOSTER: Feature Boosting and Compression for Class-Incremental Learning FOSTER：用于类增量学习的特征提升和压缩
6472 Delving into Universal Lesion Segmentation: Method, Dataset, and Benchmark 深入研究通用病变分割：方法、数据集和基准
6475 Explicit Model Size Control and Relaxation via Smooth Regularization for Mixed-Precision Quantization 通过混合精度量化的平滑正则化显式模型大小控制和松弛
6479 Large scale Real-world Multi Person Tracking 大规模真实世界多人跟踪
6491 Class-agnostic Object Detection with Multi-modal Transformer 使用多模态 Transformer 的与类别无关的目标检测
6493 Language-Grounded Indoor 3D Semantic Segmentation in the Wild 野外基于语言的室内 3D 语义分割
6505 Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis 将可控 NeRF-GAN 的 3D 感知注入 StyleGAN 以进行可编辑的人像图像合成
6512 BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks BASQ：亚 4 位神经网络的分支激活裁剪搜索量化
6513 AdaNeRF: Adaptive Sampling for Real-time Rendering of Neural Radiance Fields AdaNeRF：用于实时渲染神经辐射场的自适应采样
6515 PressureVision: Estimating Hand Pressure from a Single RGB Image PressureVision：从单个 RGB 图像估计手压
6516 Neural Light Field Estimation for Street Scenes with Differentiable Virtual Object Insertion 具有可微分虚拟对象插入的街道场景的神经光场估计
6519 Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation 基于层次聚合的树结构感知少镜头图像分类
6526 PoseScript: 3D Human Poses from Natural Language PoseScript：来自自然语言的 3D 人体姿势
6532 Learning Energy-Based Models With Adversarial Training 通过对抗训练学习基于能量的模型
6538 You Already Have It: A Generator-Free Low-Precision DNN Training Framework using Stochastic Rounding 您已经拥有它：使用随机舍入的无生成器低精度 DNN 训练框架
6540 TIPS: Text-Induced Pose Synthesis 提示：文本诱导的姿势合成
6541 Unsupervised High-Fidelity Facial Texture Generation and Reconstruction 无监督高保真面部纹理生成和重建
6551 Addressing Heterogeneity in Federated Learning via Distributional Transformation 通过分布转换解决联邦学习中的异质性
6555 Adversarial Label Poisoning Attack on Graph Neural Networks via Label Propagation 通过标签传播对图神经网络进行对抗性标签中毒攻击
6559 Approximate Discrete Optimal Transport Plan with Auxiliary Measure Method 辅助测度法的近似离散最优运输方案
6560 Visual Knowledge Tracing 视觉知识追踪
6562 Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning 半泄漏：针对半监督学习的成员推理攻击
6565 DProST: Dynamic Projective Spatial Transformer Network for 6D Pose Estimation DProST：用于 6D 姿态估计的动态投影空间变换器网络
6567 Accurate Detection of Proteins in Cryo-Electron Tomograms from Sparse Labels 从稀疏标签准确检测冷冻电子断层图像中的蛋白质
6568 PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks PACTran：用于估计预训练模型对分类任务的可迁移性的 PAC 贝叶斯度量
6571 Beyond Periodicity: Towards a Unifying Framework for Activations in Coordinate-MLPs 超越周期性：迈向坐标-MLP 激活的统一框架
6576 Subspace Diffusion Generative Models 子空间扩散生成模型
6583 Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features 多模态文本识别网络：视觉和语义特征之间的交互增强
6592 Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments 通过外观和时间对齐的感应式和感应式少镜头视频分类
6599 Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection 学习用于主动说话人检测的长期时空图
6602 Relative Contrastive Loss for Unsupervised Representation Learning 无监督表示学习的相对对比损失
6615 Personalized Education: Blind Knowledge Distillation 个性化教育：盲目的知识蒸馏
6619 Fast Two-View Motion Segmentation Using Christoffel Polynomials 使用 Christoffel 多项式的快速两视图运动分割
6623 Real Spike: Learning Real-valued Spikes for Spiking Neural Networks 真正的尖峰：学习尖峰神经网络的实值尖峰
6627 Language-Driven Artistic Style Transfer 语言驱动的艺术风格迁移
6634 FedLTN: Federated Learning for Sparse and Personalized Lottery Ticket Networks FedLTN：稀疏和个性化彩票网络的联合学习
6639 Transformer with Implicit Edges for Particle-based Physics Simulation 用于基于粒子的物理模拟的隐式边缘变压器
6651 Improving the Perceptual Quality of 2D Animation Interpolation 提高 2D 动画插值的感知质量
6652 Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning 使用基于提示的微调实现开放词汇场景图生成
6655 S3C: Self-Supervised Stochastic Classifiers for Few-Shot Class-Incremental Learning S3C：小样本增量学习的自监督随机分类器
6660 Entry-Flipped Transformer for Inference and Prediction of Participant Behavior 用于推断和预测参与者行为的 Entry-Flipped Transformer
6665 OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning OpenLDN：学习发现开放世界半监督学习的新类
6666 Fine-grained Fashion Representation Learning by Online Deep Clustering 通过在线深度聚类进行细粒度时尚表示学习
6667 Perspective Phase Angle Model for Polarimetric 3D Reconstruction 用于极化 3D 重建的透视相角模型
6670 Selective TransHDR: Transformer-based selective HDR Imaging using Ghost Region Mask Selective TransHDR：使用 Ghost Region Mask 的基于 Transformer 的选择性 HDR 成像
6671 3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal 通过手部去遮挡和移除的 3D 交互手部姿势估计
6672 Pose for Everything: Towards Category-Agnostic Pose Estimation Pose for Everything：迈向与类别无关的姿势估计
6678 Recover Fair Deep Classification Models via Altering Pre-trained Structure 通过改变预训练结构恢复公平的深度分类模型
6680 Improving Fine-Grained Visual Recognition in Low Data Regimes via Self-Boosting Attention Mechanism 通过自增强注意力机制改进低数据区域中的细粒度视觉识别
6686 VSA: Learning Varied-Size Window Attention in Vision Transformers VSA：在视觉变形金刚中学习不同大小的窗口注意力
6693 PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting PoseGPT：基于量化的 3D 人体运动生成和预测
6694 CAViT: Contextual Alignment Vision Transformer for Video Object Re-identification CAViT：用于视频对象重新识别的上下文对齐视觉转换器
6698 Learning Series-Parallel Lookup Tables for Efficient Image Super-Resolution 学习高效图像超分辨率的串并行查找表
6715 Frozen CLIP Models are Efficient Video Learners Frozen CLIP 模型是高效的视频学习者
6719 Deforming Radiance Fields with Cages 用笼子变形辐射场
6720 GeoAug: Data Augmentation for Few-Shot NeRF with Geometry Constrains GeoAug：具有几何约束的 Few-Shot NeRF 的数据增强
6722 DoodleFormer: Creative Sketch Drawing with Transformers DoodleFormer：变形金刚的创意素描
6727 Implicit Neural Representations for Variable Length Human Motion Generation 可变长度人体运动生成的隐式神经表示
6730 FLEX: Extrinsic Parameters-free Multi-view 3D Human Motion Reconstruction FLEX：无外部参数的多视图 3D 人体运动重建
6731 Pairwise Contrastive Learning Network for Action Quality Assessment 用于行动质量评估的成对对比学习网络
6739 UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision Transformer for Face Forgery Detection UIA-ViT：基于视觉转换器的无监督不一致性感知方法用于人脸伪造检测
6742 Large-displacement 3D Object Tracking with Hybrid Non-local Optimization 具有混合非局部优化的大位移 3D 对象跟踪
6745 Learning Object Placement via Dual-path Graph Completion 通过双路径图完成学习对象放置
6777 Unbiased Manifold Augmentation for Coarse Class Subdivision 粗分类的无偏流形增强
6798 Rethinking Video Rain Streak Removal: A New Synthesis Model and A Deraining Network with Video Rain Prior 重新思考视频雨条纹去除：一种新的综合模型和具有视频雨先验的去雨网络
6817 Expanded Adaptive Scaling Normalization for End to End Image Compression 端到端图像压缩的扩展自适应缩放归一化
6827 Embedding contrastive unsupervised features to cluster in- and out-of-distribution noise in corrupted image datasets 嵌入对比无监督特征以在损坏的图像数据集中对分布内和分布外噪声进行聚类
6835 Filter Pruning via Feature Discrimination in Deep Neural Networks 通过深度神经网络中的特征判别进行过滤器修剪
6836 VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer VoViT：基于低延迟图的视听语音分离变压器
6837 SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition SGBANet：用于任意方向场景文本识别的语义 GAN 和平衡注意力网络
6838 DenseHybrid: Hybrid Anomaly Detection for Dense Open-set Recognition DenseHybrid：用于密集开放集识别的混合异常检测
6862 D2-TPred: Discontinuous Dependency for Trajectory Prediction under Traffic Lights D2-TPred：交通灯下轨迹预测的不连续依赖
6867 Where in the World is this Image? Transformer-based Geo-localization in the Wild 这个图像在世界的什么地方？野外基于变压器的地理定位
6884 MODE: Multi-view Omnidirectional Depth Estimation with 360-degree Cameras 模式：使用 360 度相机进行多视图全向深度估计
6895 NashAE: Disentangling Representations through Adversarial Covariance Minimization NashAE：通过对抗协方差最小化解开表示
6900 Rethinking Confidence Calibration for Failure Prediction 重新思考故障预测的置信度校准
6905 Colorization for in situ marine plankton images 原位海洋浮游生物图像的着色
6912 PIP: Physical Interaction Prediction via Mental Simulation with Span Selection PIP：通过心理模拟和跨度选择进行物理交互预测
6917 Generator Knows What Discriminator Should Learn in Unconditional GANs 生成器知道判别器应该在无条件 GAN 中学习什么
6921 A Gyrovector Space Approach for Symmetric Positive Semi-definite Matrix Learning 一种用于对称正半定矩阵学习的陀螺向量空间方法
6940 Compositional Visual Generation with Composable Diffusion Models 具有可组合扩散模型的组合视觉生成
6942 Temporal and cross-modal attention for audio-visual zero-shot learning 视听零样本学习的时间和跨模态注意
6946 Telepresence Video Quality Assessment 网真视频质量评估
6955 Enhancing Multi-modal Features Using Local Self-attention for 3D Object Detection 使用局部自注意增强多模态特征以进行 3D 对象检测
6956 Totems: Physical Objects for Verifying Visual Integrity 图腾：用于验证视觉完整性的物理对象
6959 ManiFest: manifold deformation for few-shot image translation ManiFest：用于少镜头图像转换的流形变形
6963 3D Shape Sequence of Human Comparison and Classification using Current and Varifolds 使用 Current 和 Varifolds 进行人体比较和分类的 3D 形状序列
6971 Decouple-and-Sample: Protecting sensitive information in task agnostic data release 解耦和采样：在与任务无关的数据发布中保护敏感信息
6972 Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space 并非所有模型都是平等的：在自我挑战的 Fisher 空间中预测模型可迁移性
6973 Object Detection as Probabilistic Set Prediction 对象检测作为概率集预测
6974 k-SALSA: k-anonymous synthetic averaging of retinal images via local style alignment k-SALSA：通过局部样式对齐对视网膜图像进行 k-匿名合成平均
6976 Uncertainty-guided Source-free Domain Adaptation 不确定性引导的无源域适应
6978 LA3: Efficient Label-Aware AutoAugment LA3：高效的标签感知 AutoAugment
6982 Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions 具有分层原子动作的细粒度视频的弱监督时间动作检测
6986 Geometric Features Informed Multi-person Human-object Interaction Recognition in Videos 几何特征告知视频中的多人人-物交互识别
6990 FEAR: Fast, Efficient, Accurate and Robust Visual Tracker FEAR：快速、高效、准确和强大的视觉跟踪器
6997 Variance-Aware Weight Initializationfor Point Convolutional Neural Networks 点卷积神经网络的方差感知权重初始化
7004 Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training 从模态共享对比语言图像预训练中学习视觉表示
7016 Single-Stream Multi-Level Alignment for Vision-Language Pretraining 视觉语言预训练的单流多级对齐
7022 Revisiting Outer Optimization in Adversarial Training 重新审视对抗训练中的外部优化
7027 Supervised Attribute Information Removal and Reconstruction for Image Manipulation 用于图像处理的有监督属性信息去除与重建
7028 Conditional-Flow NeRF: Accurate 3D Modelling with Reliable Uncertainty Quantification 条件流 NeRF：具有可靠不确定性量化的准确 3D 建模
7035 BLT: Bidirectional Layout Transformer for Controllable Layout Generation BLT：用于可控版图生成的双向版图转换器
7039 Neural Correspondence Field for Object Pose Estimation 物体姿态估计的神经对应场
7043 The Missing Link: Finding label relations across datasets 缺失的环节：跨数据集查找标签关系
7044 On Label Granularity and Object Localization 关于标签粒度和对象定位
7045 RadioTransformer: A Cascaded Global-Focal Transformer for Visual Attention-guided Disease Classification RadioTransformer：用于视觉注意力引导疾病分类的级联全局焦点变压器
7048 OIMNet++: Prototypical Normalization and Localization-aware Learning for Person Search OIMNet++：用于人员搜索的原型归一化和本地化感知学习
7050 Most and Least Retrievable Images in Visual-Language Query Systems 视觉语言查询系统中最多和最少可检索的图像
7051 Contrasting quadratic assignments for set-based representation learning 对比基于集合的表示学习的二次分配
7061 How stable are Transferability Metrics evaluations? Transferability Metrics 评估的稳定性如何？
7070 A Comparative Study of Graph Matching Algorithms in Computer Vision 计算机视觉中图匹配算法的比较研究
7077 HM: Hybrid Masking for Few-Shot Segmentation HM：用于少镜头分割的混合掩蔽
7082 UCTNet: Uncertainty-aware Cross-modal Transformer Network for Indoor RGB-D Semantic Segmentation UCTNet：用于室内 RGB-D 语义分割的不确定性感知跨模态变换器网络
7090 Learning Omnidirectional Flow in 360° Video via Siamese Representation 通过连体表示学习 360° 视频中的全向流
7092 PREF: Predictability Regularized Neural Motion Fields PREF：可预测性正则化神经运动场
7093 Improving Generalization in Federated Learning by Seeking Flat Minima 通过寻求平坦最小值来提高联邦学习的泛化能力
7099 Efficient Deep Visual and Inertial Odometry with Adaptive Visual Modality Selection 具有自适应视觉模态选择的高效深度视觉和惯性里程计
7102 MultiMAE: Multi-modal Multi-task Masked Autoencoders MultiMAE：多模式多任务掩码自动编码器
7110 GigaDepth: Learning Depth from StructuredLight with Branching Neural Networks GigaDepth：使用分支神经网络从 StructuredLight 学习深度
7122 Diverse Generation from a Single Video Made Possible 从单个视频进行多样化生成成为可能
7127 Privacy-Preserving Action Recognition via Motion Difference Quantization 基于运动差分量化的隐私保护动作识别
7139 Learning Phase Mask for Privacy-Preserving Passive Depth Estimation 隐私保护被动深度估计的学习阶段掩码
7143 DuelGAN: A Duel Between Two Discriminators Stabilizes the GAN Training DuelGAN：两个判别器之间的决斗稳定了 GAN 训练
7151 Should All Proposals be Treated Equally in Object Detection? 在目标检测中是否应该平等对待所有提案？
7153 Interpretations Steered Network Pruning via Amortized Inferred Saliency Maps 解释通过摊销推断显着图引导网络修剪
7158 Out-of-Distribution Identification: Let Detector Tell Which I Am Not Sure 分布外识别：让检测器说出我不确定的情况
7167 Unsupervised Few-Shot Image Classification by Learning Features into Clustering Space 通过将特征学习到聚类空间的无监督少镜头图像分类
7173 ViP: Unified Certified Detection and Recovery for Patch Attack with Vision Transformers ViP：使用 Vision Transformers 进行补丁攻击的统一认证检测和恢复
7174 Panoramic Vision Transformer for Saliency Detection in 360 Videos 用于 360 度视频中的显着性检测的全景视觉转换器
7175 ActiveNeRF: Learning where to See with Uncertainty Estimation ActiveNeRF：通过不确定性估计学习在哪里看
7176 incDFM: Incremental Deep Feature Modeling for Continual Novelty Detection incDFM：用于持续新颖性检测的增量深度特征建模
7186 BA-Net: Bridge Attention for Deep Convolutional Neural Networks BA-Net：深度卷积神经网络的桥梁注意力
7199 Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images 通过预测偏移量实现超分辨率：光栅化图像的超高效超分辨率网络
7210 Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance 来自模糊的动画：具有运动引导的多模态模糊分解
7211 Zero-Shot Attribute Attacks on Fine-Grained Recognition Models 细粒度识别模型上的零样本属性攻击
7214 Break and Make: Interactive Structural Understanding Using LEGO Bricks Break and Make：使用乐高积木进行交互式结构理解
7215 Bi-PointFlowNet: Bidirectional Learning for Point Cloud Based Scene Flow Estimation Bi-PointFlowNet：基于点云的场景流估计的双向学习
7218 PoserNet: Refining Relative Camera Poses Exploiting Object Detections PoserNet：利用对象检测优化相对相机姿势
7224 Towards Effective and Robust Neural Trojan Defenses via Input Filtering 通过输入过滤实现有效和强大的神经木马防御
7230 View Vertically: A Hierarchical Network for Trajectory Prediction via Fourier Spectrums 垂直视图：通过傅里叶谱进行轨迹预测的分层网络
7238 Bi-directional Contrastive Learning for Domain Adaptive Semantic Segmentation 领域自适应语义分割的双向对比学习
7248 Bayesian Tracking of Video Graphs Using Joint Kalman Smoothing and Registration 使用联合卡尔曼平滑和配准对视频图进行贝叶斯跟踪
7277 Rayleigh EigenDirections (REDs): Nonlinear GAN latent space traversals for multidimensional features Rayleigh EigenDirections (REDs)：用于多维特征的非线性 GAN 潜在空间遍历
7278 ActionFormer: Localizing Moments of Actions with Transformers ActionFormer：使用 Transformer 本地化动作时刻
7281 Theoretical Understanding of the Information Flow on Continual Learning Performance 信息流对持续学习绩效的理论理解
7283 3DG-STFM: 3D Geometric Guided Student-Teacher Feature Matching 3DG-STFM：3D 几何引导师生特征匹配
7288 Pure Transformer with Integrated Experts for Scene Text Recognition 具有集成专家的纯 Transformer 用于场景文本识别
7301 AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation AudioScopeV2：用于校准的开放域屏幕声音分离的视听注意架构
7302 Semidefinite Relaxations of Truncated Least-Squares in Robust Rotation Search: Tight or Not 稳健旋转搜索中截断最小二乘的半定松弛：紧与否
7304 Bridging the Domain Gap towards Generalization in Automatic Colorization 在自动着色中弥合领域差距以实现泛化
7311 Learning with Free Object Segments for Long-Tailed Instance Segmentation 使用自由对象段学习长尾实例分割
7315 Rethinking Closed-loop Training for Autonomous Driving 重新思考自动驾驶的闭环训练
7331 Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction 用于 3D 边界框预测的自回归不确定性建模
7337 Learning Regional Purity for Instance Segmentation on 3D Point Clouds 学习 3D 点云实例分割的区域纯度
7345 Lottery Ticket Hypothesis for Spiking Neural Networks 尖峰神经网络的彩票假设
7346 Learning from Unlabeled 3D Environments for Vision-and-Language Navigation 从未标记的 3D 环境中学习视觉和语言导航
7350 A Dataset Generation Framework for Evaluating Megapixel Image Classifiers & their Explanations 用于评估百万像素图像分类器及其解释的数据集生成框架
7351 Sports Video Analysis on Large-Scale Data 大数据体育视频分析
7360 Multi-domain Learning for Updating Face Anti-spoofing Models 用于更新人脸反欺骗模型的多领域学习
7368 Audio-Visual Segmentation 视听分割
7374 SLiDE: Self-supervised LiDAR De-snowing through Reconstruction Difficulty SLiDE：通过重建难度进行自我监督的 LiDAR 去雪
7375 On the Angular Update and Hyperparameter Tuning of a Scale-Invariant Network 关于尺度不变网络的角度更新和超参数调整
7384 IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition IGFormer：用于基于骨架的人类交互识别的交互图转换器
7385 LANA: Latency Aware Network Acceleration LANA：延迟感知网络加速
7388 A Sketch Is Worth a Thousand Words:Image Retrieval with Text and Sketch 素描胜千言：文字与素描的图像检索
7396 HVC-Net: Unifying Homography, Visibility, and Confidence Learning for Planar Object Tracking HVC-Net：统一平面对象跟踪的单应性、可见性和置信度学习
7402 Towards Realistic Semi-Supervised Learning 迈向现实的半监督学习
7414 Unsupervised Pose-aware Part Decomposition for Man-made Articulated Objects 人造关节物体的无监督姿态感知部分分解
7417 3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization 用于深度多摄像机行人定位的 3D 随机遮挡和多层投影
7427 Masked Siamese Networks for Label-Efficient Learning 用于标签高效学习的屏蔽连体网络
7441 A Simple Single-Scale Vision Transformer for Object Detection and Instance Segmentation 用于对象检测和实例分割的简单单尺度视觉转换器
7443 A Cloud 3D Dataset and Application-Specific Learned Image Compression in Cloud 3D 云 3D 数据集和云 3D 中特定于应用程序的学习图像压缩
7449 Cross-Domain Few-Shot Semantic Segmentation 跨域few-shot语义分割
7450 VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments VizWiz-FewShot：在视觉障碍人士拍摄的图像中定位对象
7464 Cartoon Explanations of Image Classifiers 图像分类器的卡通解释
7474 Towards Metrical Reconstruction of Human Faces 迈向人脸的度量重建
7476 DeepShadow: Neural Shape from Shadow DeepShadow：来自阴影的神经形状
7500 Class-Incremental Learning with Cross-Space Clustering and Controlled Transfer 具有跨空间聚类和受控迁移的类增量学习
7509 Object discovery and representation networks 对象发现和表示网络
7511 MeshUDF: Fast and Differentiable Meshing of Unsigned Distance Field Networks MeshUDF：无符号距离场网络的快速可微网格化
7519 Natural Synthetic Anomalies for Self-Supervised Anomaly Detection and Localization 用于自监督异常检测和定位的自然合成异常
7522 Shap-CAM: Visual Explanations for Convolutional Neural Networks based on Shapley Value Shap-CAM：基于 Shapley 值的卷积神经网络的视觉解释
7529 Simple Open-Vocabulary Object Detection with Vision Transformers 使用视觉转换器进行简单的开放词汇对象检测
7533 Video Restoration Framework and its Meta-adaptations to Data-poor Conditions 视频恢复框架及其对数据贫乏条件的元适应
7539 PRIME: A Few Primitives Can Boost Robustness to Common Corruptions PRIME：一些基元可以提高对常见腐败的鲁棒性
7541 AlphaVC: High-Performance and Efficient Learned Video Compression AlphaVC：高性能和高效的学习视频压缩
7542 Content-Oriented Learned Image Compression 面向内容的学习图像压缩
7543 Generating Natural Images with Direct Patch Distributions Matching 使用直接补丁分布匹配生成自然图像
7545 Latent Space Smoothing for Individually Fair Representations 单独公平表示的潜在空间平滑
7555 SAU: Smooth activation function using convolution with approximate identities SAU：使用具有近似身份的卷积的平滑激活函数
7561 TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments TRoVE：将道路场景数据集转换为逼真的虚拟环境
7562 Motion Sensitive Contrastive Learning for Self-supervised Video Representation 用于自监督视频表示的运动敏感对比学习
7573 Scaling Adversarial Training to Large Perturbation Bounds 将对抗训练扩展到大扰动范围
7592 RDO-Q: Extremely Fine-Grained Channel-Wise Quantization via Rate-Distortion Optimization RDO-Q：通过速率失真优化实现极细粒度的信道量化
7605 Camera Auto-calibration from the Steiner Conic of the Fundamental Matrix 来自基本矩阵的施泰纳圆锥曲线的相机自动校准
7626 Understanding Collapse in Non-Contrastive Siamese Representation Learning 了解非对比连体表示学习中的崩溃
7634 AutoTransition: Learning to Recommend Video Transition Effects AutoTransition：学习推荐视频过渡效果
7651 SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement SPE-Net：通过增强旋转鲁棒性来提升点云分析
7667 Text-based Temporal Localization of Novel Events 新事件的基于文本的时间定位
7687 Effective Presentation Attack Detection Driven by Face Related Task 人脸相关任务驱动的有效演示攻击检测
7691 LWGNet – Learned Wirtinger Gradients for Fourier Ptychographic Phase Retrieval LWGNet – 用于傅里叶拼图相位检索的学习 Wirtinger 梯度
7693 Federated Self-supervised Learning for Video Understanding 用于视频理解的联合自监督学习
7694 Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval 通过不确定性学习进行人像检索的可靠性感知预测
7704 The Shape Part Slot Machine: Contact-based Reasoning for Generating 3D Shapes from Parts 形状零件老虎机：从零件生成 3D 形状的基于接触的推理
7710 Attention Diversification for Domain Generalization 域泛化的注意力多样化
7718 Exploiting the local parabolic landscapes of adversarial losses to accelerate black-box adversarial attack 利用对抗性损失的局部抛物线图加速黑盒对抗性攻击
7719 Towards Efficient and Effective Self-Supervised Learning of Visual Representations 迈向高效和有效的视觉表征自我监督学习
7722 TransVLAD: Focusing on Locally Aggregated Descriptors for Few-Shot Learning TransVLAD：关注局部聚合描述符以进行少量学习
7735 Rotation Regularization Without Rotation 没有旋转的旋转正则化
7741 Parameterized Temperature Scaling for Boosting the Expressive Power in Post-Hoc Uncertainty Calibration 用于提高事后不确定性校准中表达能力的参数化温度标度
7746 FairStyle: Debiasing StyleGAN2 with Style Channel Manipulations FairStyle：使用样式通道操作对 StyleGAN2 进行去偏
7756 Dynamic Temporal Filtering in Video Models 视频模型中的动态时间过滤
7764 DH-AUG: DH Forward Kinematics Model Driven Augmentation for 3D Human Pose Estimation DH-AUG：用于 3D 人体姿势估计的 DH 前向运动学模型驱动增强
7765 Super-resolution 3D Human Shape from a Single Low-Resolution Image 来自单个低分辨率图像的超分辨率 3D 人体形状
7771 Trading Positional Complexity vs Deepness in Coordinate Networks 交易位置复杂性与坐标网络中的深度
7785 ESS: Learning Event-based Semantic Segmentation from Still Images ESS：从静止图像中学习基于事件的语义分割
7802 U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture Search U-Boost NAS：利用率提升的可微神经架构搜索
7803 MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud MonteBoxFinder：检测和过滤基元以拟合嘈杂的点云
7808 RRSR:Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection RRSR:Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection
7815 Trapped in texture bias? A large scale comparison of deep instance segmentation 陷入纹理偏差？深度实例分割的大规模比较
7838 Gaussian Activated Neural Radiance Fields for High Fidelity Reconstruction & Pose Estimation 用于高保真重建和姿态估计的高斯激活神经辐射场
7845 MVDG: A Unified Multi-view Framework for Domain Generalization MVDG：用于域泛化的统一多视图框架
7847 MINER: Multiscale Implicit Neural Representation MINER：多尺度隐式神经表示
7856 PTQ4ViT: Post-Training Quantization for Vision Transformers with Twin Uniform Quantization PTQ4ViT：具有双均匀量化的视觉变形金刚的训练后量化
7865 Context-Consistent Semantic Image Editing with Style-Preserved Modulation 使用保留样式调制的上下文一致语义图像编辑
7874 Distilling the Undistillable: Learning from a Nasty Teacher 蒸馏不可蒸馏的东西：向一个讨厌的老师学习
7879 Grounding Visual Representations with Texts for Domain Generalization 以文本为基础的视觉表示以进行域泛化
7883 Towards Accurate Open-Set Recognition via Background-Class Regularization 通过背景类正则化实现准确的开放集识别
7886 Unbiased Gradient Estimation for Differentiable Surface Splatting via Poisson Sampling 通过泊松采样进行可微分表面溅射的无偏梯度估计
7899 In Defense of Image Pre-Training for Spatiotemporal Recognition 防御时空识别的图像预训练
7925 SocialVAE: Human Trajectory Prediction using Timewise Latents SocialVAE：使用 Timewise Latents 的人类轨迹预测
7926 BodySLAM: Joint Camera Localisation, Mapping, and Human Motion Tracking BodySLAM：联合相机定位、映射和人体运动跟踪
7935 Eliminating Gradient Conflict in Reference-based Line-Art Colorization 消除基于参考的艺术线条着色中的梯度冲突
7950 Transfer without Forgetting 转移而不忘记
7955 DSR — A dual subspace re-projection network for surface anomaly detection DSR - 用于表面异常检测的双子空间重投影网络
7964 Multi-Exit Semantic Segmentation Networks 多出口语义分割网络
7968 Almost-Orthogonal Layers for Efficient General-Purpose Lipschitz Networks 高效通用 Lipschitz 网络的几乎正交层
8001 Bridging the visual semantic gap in VLN via semantically richer instructions 通过语义更丰富的指令弥合 VLN 中的视觉语义鸿沟
8003 Kernel Relative-prototype Spectral Filtering for Few-shot Learning 用于少样本学习的内核相对原型谱滤波
8009 StoryDALL-E: Adapting Pretrained Text-to-image Transformers for Story Continuation StoryDALL-E：调整预训练的文本到图像转换器以实现故事继续
8026 Unsupervised Learning of Efficient Geometry-Aware Neural Articulated Representations 高效几何感知神经关节表示的无监督学习
8029 PANDORA: Polarization-Aided Neural Decomposition Of Radiance PANDORA：辐射的偏振辅助神经分解
8042 OCR-free Document Understanding Transformer 无 OCR 文档理解转换器
8048 VQGAN-CLIP: Open Domain Image Generation and Manipulation Using Natural Language VQGAN-CLIP：使用自然语言生成和操作开放域图像
8063 Learning to use unlabeled data in data augmentation for 3D detection 学习在数据增强中使用未标记数据进行 3D 检测
8070 Differentiable Zooming for Multiple Instance Learning on Whole-Slide Images 全幻灯片图像多实例学习的可微缩放
8081 Towards Learning Neural Representations from Shadows 从阴影中学习神经表示
8086 Augmenting Deep Classifiers with Polynomial Neural Networks 用多项式神经网络增强深度分类器
8092 AdaBest: Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation AdaBest：通过自适应偏差估计最小化联邦学习中的客户端漂移
8094 A Simple Approach and Benchmark for 21,000-Category Object Detection 用于 21,000 类目标检测的简单方法和基准
8098 “This is my unicorn, Fluffy”: Personalizing frozen vision-language representations “这是我的独角兽，蓬松”：个性化冻结的视觉语言表示
8106 Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach 位宽自适应量化感知神经网络训练：一种元学习方法
8140 Learning with Noisy Labels by Efficient Transition Matrix Estimation to Combat Label Miscorrection 通过有效的转移矩阵估计学习噪声标签以对抗标签错误校正
8170 Online Task-free Continual Learning with Dynamic Sparse Distributed Memory 具有动态稀疏分布式内存的在线无任务持续学习文章来源地址https://www.toymoban.com/news/detail-420141.html