【COMP9517】Computer Vision

这篇具有很好参考价值的文章主要介绍了【COMP9517】Computer Vision。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

COMP9517: Computer Vision
Objectives: This lab revisits important concepts covered in the Week 1 and Week 2 lectures
and aims to make you familiar with implementing specific algorithms.
Preliminaries: As mentioned in the first lecture, we assume you are familiar with programming
in Python or are willing to learn it independently. You do not need to be an expert, as you will
further develop your skills during the course, but you should at least know the basics. If you
do not yet know Python, we assume you are familiar with at least one other programming
language such as C, in which case it should be relatively easy to learn Python.
To learn or brush up your Python skills, see several free online resources listed at the end of
this document. Especially if you already know C or similar languages, there is no need to go
through all the linked resources in detail. Just quickly learn the syntax and the main features
of the language. The rest will follow as you go.
For implementing and testing computer vision algorithms, we use OpenCV in this course.
OpenCV is a library of programming functions mainly for computer vision. The library is cross
platform and licensed as free and open-source software under Apache License 2. It also
supports training and execution of machine/deep learning models. Originally written in C, with
new algorithms developed in C++, it has wrappers for languages such as Python and Java. As
stated above, in this course we will focus on programming in Python. See the links below for
OpenCV tutorials and documentation.
Software: You are required to use OpenCV 3+ with Python 3+ and submit your code as a
Jupyter notebook (see coding and submission requirements below). In the first tutor
consultation session this week, your tutors will give a demo of the software to be used, and
you can ask any questions you may have about this.
Materials: The sample images to be used in this lab are available via WebCMS3.
Submission: All code and requested results are assessable after the lab. Submit your source
code as a Jupyter notebook (.ipynb) which includes all output and answers to all questions
(see coding requirements at the end of this document) by the above deadline. The submission
link will be announced in due time.
1. Contrast Stretching
Contrast is a measure of the range of intensity values in an image and is defined as the
difference between the maximum pixel value and minimum pixel value. The maximum
possible contrast of an 8-bit image is 255 (max) – 0 (min) = 255. Any value less than that means
the image has lower contrast than possible. Contrast stretching attempts to improve the
contrast of the image by stretching the range of intensity values using linear scaling.
Task (0.75 mark): Write an algorithm that performs contrast stretching as per Equation (1)
above. Read the given image Oakland.png and execute your algorithm to see whether it
indeed improves the contrast. Notice that this is a colour image, which has three channels (R,
G, B), so you need to somehow apply your algorithm to the three channels.
There are different possibilities here and we consider three of them:
The most straightforward is to apply the algorithm to each of the image channels (R, G, B)
individually. That is, the mapping function (1) is calculated for each channel separately and
may be different for each channel depending on its 𝑐𝑐 and 𝑑𝑑 values.
Alternatively, you could convert the colour image to a gray-level image, for example using
the formula Y = 0.299R + 0.587G + 0.114B, then calculate the mapping function (1) on the
gray-level image (Y), and apply it to the channels (R, G, B) of the original colour image. In
this case, the same mapping function is applied to all channels.
Finally, you could convert the colour image to a different colour space, such as HSV, then
calculate the mapping function (1) on the value (V) channel, and apply it to the channels
(R, G, B) of the original colour image. Here again, a single mapping function is calculated
and applied to all image channels.
In your notebook, execute your algorithm and show the input image and output image next
to each other, for each of the three approaches. Also briefly discuss in a comment in your
notebook which approach yields the best contrast-stretched colour image and provide
reasons for why that approach works better than the other two.
2. Histogram Calculation
The histogram of an image shows the counts of the intensity values. It gives only statistical
information about the pixels and removes the location information. For a digital image with 𝐿𝐿
gray levels, from 0 to 𝐿𝐿 − 1 , the histogram is a discrete function ℎ(𝑖𝑖) = 𝑛𝑛 𝑖𝑖 where 𝑖𝑖 ∈
[0, 𝐿𝐿 − 1] is the 𝑖𝑖 th gray level and 𝑛𝑛 𝑖𝑖 is the number of pixels having that gray level.
Task (0.5 mark): Write an algorithm that computes and plots the histogram of an image and
also reports the minimum pixel value and the maximum pixel value in the image. Then execute
your algorithm to compare the histograms and extreme values before and after contrast
stretching of image Oakland.png for each of the three approaches in the previous task.
More specifically, for the first contrast-stretching approach, show the histogram and extreme
values for each of the three channels (R, G, B) of both the input image and the output image.
For the second approach, show the histogram and extreme values of only the gray value
representation (Y) of both the input image and the output image after conversion. For the
third approach, show the histogram and extreme values of only the value channel (V) of both
the input image and the output image after conversion.
To facilitate visual comparison, present the histograms of the input image and corresponding
output image side by side in each case.
3. Image Thresholding
A crucial first step for quantitative analysis of objects (or regions) of interest in images, is to
identify which pixels belong to the objects (the relevant pixels) and which belong to the
background (the irrelevant pixels). This task is called image segmentation.
The simplest technique to perform this task is thresholding. Here, a pixel is considered to
belong to an object if its value is above the threshold, and to the background if its value is
lower than or equal to the threshold.
While an optimal threshold for each image could be selected manually by the user, this is
undesirable in applications that require full automation. Fortunately, several automatic
thresholding techniques exist, as discussed in the lecture.
Task (0.75 mark): Write an algorithm that can threshold an image using the three different
thresholding methods discussed in the lecture: Otsu, IsoData, Triangle. Apply your algorithm
to the images Hardware.png (the objects are the dark nuts and bolts), Nuclei.png (the objects
are the bright cell nuclei), and Orca.png (the object of interest is the Orca).
In your notebook, show the results in table form to facilitate visual comparison of all images.
For example, one table row per input image, successively showing the input image, its
histogram, and the thresholding results using the three methods.
Also briefly discuss the differences in the results in your notebook and provide explanations
(based on the histograms or otherwise) why for some images one thresholding method may
work better than others, while for other images it may be the other way around, or perhaps
in some cases none of the methods work well. Present some general rules of thumb for which
thresholding methods are best for what kind of images.
4. Edge Detection
Edges are an important source of semantic information in images. A gray-scale image can be
thought of as a 2D landscape with areas of different intensities corresponding to different
heights. The edges are the transitions from one such area to the next.
The Laplacian is a second-order derivative operator that can be used to find edges. It
emphasizes pixels in areas of strong intensity changes and de-emphasizes pixels in areas with
slowly varying intensities. The edges are the zero-crossings in the Laplacian image.
Task (0.5 mark): Write an algorithm that computes the Laplacian image of an input image
using the above kernel. Apply the algorithm to the image Laplace.png .
Notice that the calculations may produce negative output pixel values. Thus, make sure you
use the right data types for the calculations and for the output image, and the right intensity
mapping to display the output image.
Coding Requirements
Make sure that in your Jupyter notebook, the input images are readable from the location
specified as an argument, and all output images and other requested results are displayed in
the notebook environment. All cells in your notebook should have been executed so that the
tutor/marker does not have to execute the notebook again to see the results.

文章来源地址https://www.toymoban.com/news/detail-784939.html

到了这里,关于【COMP9517】Computer Vision的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 【计算机视觉】Vision Transformer (ViT)详细解析

    论文地址:An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale code地址:github.com/google-research/vision_transformer Transformer 最早提出是针对NLP领域的,并且在NLP领域引起了强烈的轰动。 提出ViT模型的这篇文章题名为 《An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale》

    2024年02月04日
    浏览(45)
  • 走进人工智能| Computer Vision 数字化时代的视觉启示录

    前言: 计算机视觉是通过模仿人类视觉系统的工作原理,使计算机能够感知、理解和解释图像和视频的能力。 ·计算机视觉是人工智能领域的一个重要分支,它涉及使计算机能够“看”和理解图像和视频的能力。 通过模仿人类视觉系统的工作原理,计算机视觉旨在开发算法

    2024年02月08日
    浏览(53)
  • 使用Google Vision API进行计算机视觉图像创意分析

    介绍 计算机视觉可以用来从图像、视频和音频中提取有用的信息。它允许计算机看到并理解从视觉输入中可以收集到什么信息。在接收到视觉输入后,它可以在图像中收集有价值的信息,并确定必须采取的下一步。 Google Vision API是一种Google云服务,它允许使用计算机视觉从图

    2024年02月06日
    浏览(48)
  • 将Apple Vision Pro和visionOS与计算机视觉结合使用

    在2023年6月5日的WWDC大会上,苹果宣布推出多年来最大规模的硬件和软件组合产品。今年的“One more thing”(“还有一件事”)发布是苹果视觉专业版(Apple Vision Pro),这是一款集成了苹果生态系统的新型空间计算头戴式设备。 苹果视觉专业版是一个垂直整合的硬件和软件平

    2024年02月08日
    浏览(44)
  • 探索人工智能 | 计算机视觉 让计算机打开新灵之窗

    计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。 计算机视觉既是工程领域,也是科学领域中的一个富

    2024年02月14日
    浏览(50)
  • 深入探索人工智能与计算机视觉

    在当今数字化时代,人工智能(AI)和计算机视觉(CV)作为两大前沿技术,正以惊人的速度改变着我们的生活。本文将深入探讨人工智能与计算机视觉的关系、应用以及未来发展方向。 1. 人工智能与计算机视觉的关系 人工智能是一门涵盖众多技术领域的学科,旨在使计算机

    2024年04月14日
    浏览(57)
  • 人工智能在计算机视觉中的应用与挑战

    引言 计算机视觉是人工智能领域的一个重要分支,旨在让计算机能够像人一样理解和解释视觉信息,实现图像和视频的自动识别、理解和分析。计算机视觉技术已经在许多领域产生了深远的影响,如人脸识别、自动驾驶、医学影像分析等。本篇博客将深入探讨人工智能在计算

    2024年02月14日
    浏览(56)
  • 读十堂极简人工智能课笔记04_计算机视觉

    3.2.3.1. 应该发现真正的边缘,而尽量避免错报 3.2.4.1. 应该正确地找出边缘的确切位置 3.2.5.1. 每条实际的边缘应该检测为一条边缘,而不是多条边缘 4.7.5.1. 有数以百万计的几乎任何种类的图像例子 4.7.7.1. 神经网络自己就能完成这一切

    2024年02月19日
    浏览(46)
  • 计算机视觉与人工智能在医美人脸皮肤诊断方面的应用

    近年来,随着计算机技术和人工智能的不断发展,中医领域开始逐渐探索利用这些先进技术来辅助面诊和诊断。在皮肤望诊方面,也出现了一些现代研究,尝试通过图像分析技术和人工智能算法来客观化地获取皮肤相关的色形参数,从而辅助中医面诊。 一些研究将计算机视觉

    2024年02月11日
    浏览(46)
  • 毕业设计:基于机器学习的硬币检测识别系统 人工智能 YOLO 计算机视觉

    目录 前言 课题背景和意义 实现技术思路 一、 硬币检测方法 1.1 规格、变形监测 1.2 变色检测 二、 数据集 三、实验及结果分析 3.1 实验环境搭建 3.2 模型训练 最后     📅大四是整个大学期间最忙碌的时光,一边要忙着备考或实习为毕业后面临的就业升学做准备,一边要为

    2024年02月20日
    浏览(75)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包