Convolutional Neural Networks (CNNs) and other deep networks have enabled unprecedented breakthroughs in a variety of computer vision
tasks, from image clasification to object detection, semantic segmentation, image captionning, and more recently visual question answering.
コンピュータビジョンのタスクとはクラス分類からオブジェクトの特定, セマンティックセグメンテーション, キャプショニング, より最近の視覚問題の回答などである.
A number of previous works have asserted that deeper representations in a CNN capture higher-level visual constructs.
[Grad-CAM: Visual Explanationns from Deep Networks via Gradient-based Localization]
In the last few years， the most
advanced deep neural networks（DNNs) have managed to reach or even surpass human level performance on a wide range of challenging machine learning tasks, including face recognition.
[LOTS about Attacking Deep Features]
In the last few years convolutional neural networks (CNNs) have emerged as powerful image representations for various category-level recognition tasks such as object classification, scene recognition or object detection. Tha basic principles of CNNs are known 80's and the recent successes are combination of advances in GPU
-based computation power together with large labelled image datasets.
[NetVLAD: CNN Architecture for Weakly Supervised Place Recognition]
Since their ground-breaking result s on image classification in recent ImageNet challenge, deep learning
based methods have shined in many other computer vision
tasks, including object detection and semantic segmentation. Recently, they also rekindled highly semantic tasks such as image captioning and visual question answering.
[Deep Image Retrieval: Learning global representations for image search]
Meanwhile, deep transfer learning techniques have gained considerable attention in the computer vision
community. First, a deep convolutional neural network (CNN) is trained on a large labeled dataset. The convolutional layers are then used as mid-level feature extractors on a variety of computer vision
tasks. Although transferring convolutional netwrok features is not a new idea, the simultaneous availability of large datasets and cheap GPU
co-processors has contributed to the achievement of considerable performance gains on a variety computer vision
benchmarks: "SIFT and HOG
descriptors produced big performance gains a decade ago, and now deep convolutional features are providing a similar breakthrough.
A flurry of recent results indicates that image descriptors extracted from deep convolutional neural networks (CNNs) are very powerful and consistently outperform highly tuned state-of-the-art systems on a variety of visual recognition tasks. Embeddings from state-of-the-art CNNs have been applied successfully to a number of problems in computer vision
[Learning Image Embedding using Convolutional Neural Netwroks for Improved Multi-Modal Semantics]
has proven itself as a successful set of models for learning useful semantic representation of data.
[Deep Metric Learning Using Triplet Network]