COCO数据集介绍-Toy模板网

这篇具有很好参考价值的文章主要介绍了COCO数据集介绍。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

前言

以下内容均来自COCO官方
以及Microsoft COCO: Common Objects in Context

一、什么是COCO数据集？

COCO数据集是一个可用于图像检测（image detection），语义分割（semantic segmentation）和图像标题生成（image captioning）的大规模数据集。它有超过330K张图像（其中220K张是有标注的图像），包含150万个目标，80个目标类别（object categories：行人、汽车、大象等），91种材料类别（stuff categoris：草、墙、天空等），每张图像包含五句图像的语句描述，且有250,000个带关键点标注的行人。

COCO数据集可以应用到的Task：

目标检测（object detection），使用 bounding box 或者 object segmentation (也称为instance segmentation)将不同的目标进行标定。

COCO数据集用于image segmentation的教程：Master the COCO Dataset for Semantic Image Segmentation
Densepose（密集姿势估计），DensePose任务涉及同时检测人、分割他们的身体并将属于人体的所有图像像素映射到身体的3D表面。用于不可控条件下的密集人体姿态估计。
Key-points detection（关键点检测），在任意姿态下对人物的关键点进行定位，该任务包含检测行人及定位到行人的关键点。
Stuff Segmentation，语义分割中针对stuff class类的分割。（草，墙壁，天空等）
Panoptic Segmentation（全景分割）。其目的是生成丰富且完整的连贯场景分割，这是实现自主驾驶或增强现实等真实世界视觉系统的重要一步。
image captioning（图像标题生成），根据图像生成一段文字。

一个简单的数据集实例展示：

一张包含汽车，行人，画板的图片：
COCO数据集介绍

附录

80个类别

COCO数据集介绍

person(人)

交通工具：bicycle(自行车) car(汽车) motorbike(摩托车) aeroplane(飞机) bus(公共汽车) train(火车) truck(卡车) boat(船)

公共设施：traffic light(信号灯) fire hydrant(消防栓) stop sign(停车标志) parking meter(停车计费器) bench(长凳)

动物：bird(鸟) cat(猫) dog(狗) horse(马) sheep(羊) cow(牛) elephant(大象) bear(熊) zebra(斑马) giraffe(长颈鹿)

生活用品：backpack(背包) umbrella(雨伞) handbag(手提包) tie(领带) suitcase(手提箱)

运动装备：frisbee(飞盘) skis(滑雪板双脚) snowboard(滑雪板) sports ball(运动球) kite(风筝) baseball bat(棒球棒) baseball glove(棒球手套) skateboard(滑板) surfboard(冲浪板) tennis racket(网球拍)

餐具：bottle(瓶子) wine glass(高脚杯) cup(茶杯) fork(叉子) knife(刀)
spoon(勺子) bowl(碗)

水果：banana(香蕉) apple(苹果) sandwich(三明治) orange(橘子) broccoli(西兰花) carrot(胡萝卜) hot dog(热狗) pizza(披萨) donut(甜甜圈) cake(蛋糕)

家居：chair(椅子) sofa(沙发) pottedplant(盆栽植物) bed(床) diningtable(餐桌) toilet(厕所) tvmonitor(电视机)

电子产品：laptop(笔记本) mouse(鼠标) remote(遥控器) keyboard(键盘) cell phone(电话)
家用电器：microwave(微波炉) oven(烤箱) toaster(烤面包器) sink(水槽) refrigerator(冰箱)
家用产品：book(书) clock(闹钟) vase(花瓶) scissors(剪刀) teddy bear(泰迪熊) hair drier(吹风机) toothbrush(牙刷)

二、COCO数据集的格式介绍

针对上述提到不同的TASK（object detection, keypoint detection, stuff segmentation, panoptic segmentation, densepose, and image captioning），COCO数据集有不同的标注类型。这些标注类型都用“JSON”格式进行存储。接下来将针对性的对其JSON内的标注格式进行详细介绍。

基础的数据格式介绍

针对所有的TASK，所有的annotation都有如下的相同的数据结构/格式：

{
"info": info, "images": [image], "annotations": [annotation], "licenses": [license],
}

info{
"year": int, "version": str, "description": str, "contributor": str, "url": str, "date_created": datetime,
}

image{
"id": int, "width": int, "height": int, "file_name": str, "license": int, "flickr_url": str, "coco_url": str, "date_captured": datetime,
}

license{
"id": int, "name": str, "url": str,
}

提示：info内的信息不需要怎么管，是数据来源和贡献者等信息。license也不需要管。annotation是指不同的TASK的annotation，（那六种task）。image内是图片的size，id，file_name等等。

不同Task下的annotation

针对不同task的annotation，我们分别进行描述：

Object Detection annotation

annotation{
"id": int, "image_id": int, "category_id": int, "segmentation": RLE or [polygon], "area": float, "bbox": [x,y,width,height], "iscrowd": 0 or 1,
}

categories[{
"id": int, "name": str, "supercategory": str,
}]