深度學習 deep learning 介紹王曉剛

資源ID：31640145 資源大?。?span id="mzebxcnn0" class="font-tahoma">19.64MB 全文頁數(shù)：115頁
資源格式： PPT 下載積分：10積分

快捷下載

會員登錄下載

微信登錄下載

三方登錄下載：

微信掃一掃登錄

下載資源需要10積分

郵箱/手機：
溫馨提示：	用戶名和密碼都是您填寫的郵箱或者手機號，方便查詢和重復下載（系統(tǒng)自動生成）
支付方式：
驗證碼：	換一換

賬號：
密碼：
驗證碼：	換一換
當日自動登錄忘記密碼？

友情提示

1、下載資料失敗解決辦法

2、PDF文件下載后，可能會被瀏覽器默認打開，此種情況可以點擊瀏覽器菜單，保存網(wǎng)頁到桌面，就可以正常下載了。

3、本站不支持迅雷下載，請使用電腦自帶的IE瀏覽器，或者360瀏覽器、谷歌瀏覽器下載即可。

4、本站資源下載后的文檔和圖紙-無水印,預覽文檔經(jīng)過壓縮，下載后原文更清晰。

5、試題試卷類文檔，如果標題沒有明確說明有答案則都視為沒有答案，請知曉。

網(wǎng)站客服

侵權(quán)投訴

深度學習 deep learning 介紹王曉剛

Introduction to Deep Learning Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong Deep learning = ? Deep feature presentations are? Outline Historical review of deep learning Introduction to classical deep models Why does deep learning work? Properties of deep feature representations Machine Learning )(xFxyClass label (Classification) Real-valued Vector (Estimation) dog, cat, horse, flower, Object recognition Super resolution Low-resolution image High-resolution image Neural network Back propagation 1986 Solve general learning problems Tied with biological system Nature Neural network Back propagation 1986 x1 x2 x3 w1 w2 w3 g(x) f(net) Nature Neural network Back propagation 1986 Solve general learning problems Tied with biological system But it is given up Hard to train Insufficient computational resources Small training sets Does not work well Nature Neural network Back propagation 1986 2006 SVM Boosting Decision tree KNN Flat structures Loose tie with biological systems Specific methods for specific tasks Hand crafted features (GMM-HMM, SIFT, LBP, HOG) Kruger et al. TPAMI13 Nature Neural network Back propagation 1986 2006 Deep belief net Science Unsupervised & Layer-wised pre-training Better designs for modeling and training (normalization, nonlinearity, dropout) New development of computer architectures GPU Multi-core computer systems Large scale databases Big Data ! Nature Machine Learning with Big Data Machine learning with small data: overfitting, reducing model complexity (capacity) Machine learning with big data: underfitting, increasing model complexity, optimization, computation resource How to increase model capacity? Curse of dimensionality D. Chen, X. Cao, F. Wen, and J. Sun. Blessing of dimensionality: Highdimensional feature and its efficient compression for face verification. In Proc. IEEE Intl Conf. Computer Vision and Pattern Recognition, 2013. Blessing of dimensionality Learning hierarchical feature transforms (Learning features with deep structures) Neural network Back propagation 1986 Solve general learning problems Tied with biological system But it is given up 2006 Deep belief net Science deep learning results Speech 2011 Nature Rank Name Error rate Description 1 U. Toronto 0.15315 Deep learning 2 U. Tokyo 0.26172 Hand-crafted features and learning models. Bottleneck. 3 U. Oxford 0.26979 4 Xerox/INRIA 0.27058 Object recognition over 1,000,000 images and 1,000 categories (2 GPU) Neural network Back propagation 1986 2006 Deep belief net Science Speech 2011 2012 Nature A. Krizhevsky, L. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012. Examples from ImageNet Neural network Back propagation 1986 2006 Deep belief net Science Speech 2011 2012 ImageNet 2013 image classification challenge Rank Name Error rate Description 1 NYU 0.11197 Deep learning 2 NUS 0.12535 Deep learning 3 Oxford 0.13555 Deep learning MSRA, IBM, Adobe, NEC, Clarifai, Berkley, U. Tokyo, UCLA, UIUC, Toronto . Top 20 groups all used deep learning ImageNet 2013 object detection challenge Rank Name Mean Average Precision Description 1 UvA-Euvision 0.22581 Hand-crafted features 2 NEC-MU 0.20895 Hand-crafted features 3 NYU 0.19400 Deep learning Neural network Back propagation 1986 2006 Deep belief net Science Speech 2011 2012 ImageNet 2014 Image classification challenge Rank Name Error rate Description 1 Google 0.06656 Deep learning 2 Oxford 0.07325 Deep learning 3 MSRA 0.08062 Deep learning ImageNet 2014 object detection challenge Rank Name Mean Average Precision Description 1 Google 0.43933 Deep learning 2 CUHK 0.40656 Deep learning 3 DeepInsight 0.40452 Deep learning 4 UvA-Euvision 0.35421 Deep learning 5 Berkley Vision 0.34521 Deep learning Neural network Back propagation 1986 2006 Deep belief net Science Speech 2011 2012 ImageNet 2014 object detection challenge W. Ouyang and X. Wang et al. “DeepID-Net: deformable deep convolutional neural networks for object detection”, CVPR, 2015 RCNN (Berkley) Berkley vision UvA-Euvision DeepInsight GooLeNet (Google) DeepID-Net (CUHK) Model average n/a n/a n/a 40.5 43.9 50.3 Single model 31.4 34.5 35.4 40.2 38.0 47.9 Wanli Ouyang Neural network Back propagation 1986 2006 Deep belief net Science Speech 2011 2012 Google and Baidu announced their deep learning based visual search engines (2013) Google “on our test set we saw double the average precision when compared to other approaches we had tried. We acquired the rights to the technology and went full speed ahead adapting it to run at large scale on Googles computers. We took cutting edge research straight out of an academic research lab and launched it, in just a little over six months.” Baidu Neural network Back propagation 1986 2006 Deep belief net Science Speech 2011 2012 Face recognition 2014 Deep learning achieves 99.53% face verification accuracy on Labeled Faces in the Wild (LFW), higher than human performance Y. Sun, X. Wang, and X. Tang. Deep Learning Face Representation by Joint Identification-Verification. NIPS, 2014. Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. CVPR, 2015. Labeled Faces in the Wild (2007) Best results without deep learning Design Cycle start Collect data Preprocessing Feature design Choose and design model Train classifier Evaluation end Domain knowledge Interest of people working on computer vision, speech recognition, medical image processing, Interest of people working on machine learning Interest of people working on machine learning and computer vision, speech recognition, medical image processing, Preprocessing and feature design may lose useful information and not be optimized, since they are not parts of an end-to-end learning system Preprocessing could be the result of another pattern recognition system Person re-identification pipeline Pedestrian detection Pose estimation Body parts segmentation Photometric & geometric transform Feature extraction Classification Face recognition pipeline Face alignment Geometric rectification Photometric rectification Feature extraction Classification Design Cycle with Deep Learning start Collect data Preprocessing (Optional) Design network Feature learning Classifier Train network Evaluation end Learning plays a bigger role in the design circle Feature learning becomes part of the end-to-end learning system Preprocessing becomes optional means that several pattern recognition steps can be merged into one end-to-end learning system Feature learning makes the key difference We underestimated the importance of data collection and evaluation What makes deep learning successful in computer vision? Deep learning Li Fei-Fei Geoffrey Hinton Data collection Evaluation task One million images with labels Predict 1,000 image categories CNN is not new Design network structure New training strategies Feature learned from ImageNet can be well generalized to other tasks and datasets! Learning features and classifiers separately Not all the datasets and prediction tasks are suitable for learning features with deep models Dataset A feature transform Classifier 1 Classifier 2 . Prediction on task 1 . Prediction on task 2 Deep learning Training stage A Dataset B feature transform Classifier B Prediction on task B (Our target task) Training stage B Deep learning can be treated as a language to described the world with great flexibility Collect data Preprocessing 1 Feature design Classifier Evaluation Preprocessing 2 Collect data Feature transform Feature transform Classifier Deep neural network Evaluation Connection Introduction to Deep Learning Historical review of deep learning Introduction to classical deep models Why does deep learning work? Properties of deep feature representations Introduction on Classical Deep Models Convolutional Neural Networks (CNN) Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based Learning Applied to Document Recognition,” Proceedings of the IEEE, Vol. 86, pp. 2278-2324, 1998. Deep Belief Net (DBN) G. E. Hinton, S. Osindero, and Y. Teh, “A Fast Learning Algorithm for Deep Belief Nets,” Neural Computation, Vol. 18, pp. 1527-1544, 2006. Auto-encoder G. E. Hinton and R. R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks,” Science, Vol. 313, pp. 504-507, July 2006. Classical Deep Models Convolutional Neural Networks (CNN) First proposed by Fukushima in 1980 Improved by LeCun, Bottou, Bengio and Haffner in 1998 Convolution Pooling Learned filters Backpropagation W is the parameter of the network; J is the objective function Output layer Hidden layers Input layer Target values Feedforward operation Back error propagation D. E. Rumelhart, G. E. Hinton, R. J. Williams, “Learning Representations by Back-propagation Errors,” Nature, Vol. 323, pp. 533-536, 1986. Classical Deep Models Deep belief net Hinton06 P(x,h1,h2) = p(x|h1) p(h1,h2) 111hxhxhx1hx,),(),(),(EEeePE(x,h1)=b x+c h1+h1 Wx Initial point Pre-training: Good initialization point Make use of unlabeled data Classical Deep Models Auto-encoder Hinton and Salakhutdinov 2006 x1h2h1hx1Wb1 2Wb2 2Wb3 1Wb4 Encoding: h1 = (W1x+b1) h2 = (W2h1+b2) Decoding: = (W2h2+b3) = (W1h1+b4) 1hxIntroduction to Deep Learning Historical review of deep learning Introduction to classical deep models Why does deep learning work? Properties of deep feature representations Feature Learning vs Feature Engineering Feature Engineering The performance of a pattern recognition system heavily depends on feature representations Manually designed features dominate the applications of image and video understanding in the past Reply on human domain knowledge much more than data Feature design is separate from training the classifier If handcrafted features have multiple parameters, it is hard to manually tune them Developing effective features for new applications is slow Handcrafted Features for Face Recognition 1980s Geometric features 1992 Pixel vector 1997 Gabor filters 2 parameters 2006 Local binary patterns 3 parameters Feature Learning Learning transformations of the data that make it easier to extract useful information when building classifiers or predictors Jointly learning feature transformations and classifiers makes their integration optimal Learn the values of a huge number of parameters in feature representations Faster to get feature representations for new applications Make better use of big data Deep Learning Means Feature Learning Deep learning is about learning hierarchical feature representations Good feature representations should be able to disentangle multiple factors coupled in the data Trainable Feature Transform Trainable Feature Transform Trainable Feature Transform Trainable Feature Transform Data Classifier Pixel 1 Pixel n Pixel 2 Ideal Feature Transform view expression Deep Learning Means Feature Learning How to effectively learn features with deep models With challenging tasks Predict high-dimensional vectors Pre-train on classifying 1,000 categories Fine-tune on classifying 201 categories Feature representation SVM binary classifier for each category Detect 200 object classes on ImageNet W. Ouyang and X. Wang et al. “DeepID-Net: deformable deep convolutional neural networks for object detection”, CVPR, 2015 Dataset A feature transform Classifier A Distinguish 1000 categories Training stage A Dataset B feature transform Classifier B Distinguish 201 categories Training stage B Dataset C feature transform SVM Distinguish one object class from all the negatives Training stage C Fixed Example 1: deep learning generic image features Hinton groups groundbreaking work on ImageNet They did not have much experience on general image classification on ImageNet It took one week to train the network with 60 Million parameters The learned feature representations are effective on other datasets (e.g. Pascal VOC) and other tasks (object detection, segmentation, tracking, and image retrieval) 96 learned low-level filters Image classification result Top hidden layer can be used as feature for retrieval Example 2: deep learning face identity features by recovering canonical-view face images Reconstruction examples from LFW Z. Zhu, P. Luo, X. Wang, and X. Tang, “Deep Learning Identity Preserving Face Space,” ICCV 2013. Deep model can disentangle hidden factors through feature extraction over multiple layers No 3D model; no prior information on pose and lighting condition Model multiple complex transforms Reconstructing the whole face is a much strong supervision than predicting 0/1 class label and helps to avoid overfitting Arbitrary view Canonical view -45o -30o -15o +15o +30o +45o Avg Pose LGBP 26 37.7 62.5 77 83 59.2 36.1 59.3 VAAM 17 74.1 91 95.7 95.7 89.5 74.8 86.9 FA-EGFC3 84.7 95 99.3 99 92.9 85.2 92.7 x SA-EGFC3 93 98.7 99.7 99.7 98.3 93.6 97.2 LE4 + LDA 86.9 95.5 99.9 99.7 95.5 81.8 93.2 x CRBM9 + LDA 80.3 90.5 94.9 96.4 88.3 89.8 87.6 x Ours 95.6 98.5 100.0 99.3 98.5 97.8 98.3 x Comparison on Multi-PIE Deep learning 3D model from 2D images, mimicking human brain activities Z. Zhu, P. Luo, X. Wang, and X. Tang, “Deep Learning and Disentangling Face Representation by Multi-View Perception,” NIPS 2014. Face images in arbitrary views Face identity features Regressor 1 Regressor 2 . Reconstruct view 1 . Reconstruct view 2 Deep learning Training stage A feature transform Linear Discriminant analysis The two images belonging to the same person or not Training stage B Two face images in arbitrary views Fixed Face reconstruction Face verification Example 3: deep learning face identity features from predicting 10,000 classes At training stage, each input image is classified into 10,000 identities with 160 hidden identity features in the top layer The hidden identity features can be well generalized to other tasks (e.g. verification) and identities outside the training set As adding the number of classes to be predicted, the generalization power of the learned features also improves Y. Sun, X. Wang, and X. Tang. Deep Learning Face Representation by Joint Identification-Verification. NIPS, 2014. Dataset A feature transform Classifier A Distinguish 10,000 people Training stage A Dataset B feature transform Linear classifier B The two images belonging to the same person or not Training stage B Fixed Face identification Face verification Deep Structures vs Shallow Structures (Why deep?) Shallow Structures A three-layer neural network (with one hidden layer) can approximate any classification function Most machine learning tools (such as SVM, boosting, and KNN) can be approximated as neural networks with one or two hidden layers Shallow models divide the feature space into regions and match templates in local regions. O(N) parameters are needed to represent N regions SVM Deep Machines are More Efficient for Representing Certain Classes of Functions Theoretical results show that an architecture with insufficient depth can require many more computational elements, potentially exponentially more (with respect to input size), than architectures whose depth is matched to the task (Hastad 1986, Hastad and Goldmann 1991) It also means many more parameters to learn Take the d-bit parity function as an example d-bit logical parity circuits of depth 2 have exponential size (Andrew Yao, 1985) There are functions computable with a polynomial-size logic gates circuits of depth k that require exponential size when restricted to depth k -1 (Hastad, 1986) (X1, . . . , Xd) Xi is even Architectures with multiple levels naturally provide sharing and re-use of components Honglak Lee, NIPS10 Humans Understand the World through Multiple Levels of Abstractions We do not interpret a scene image with pixels Objects (sky, cars, roads, buildings, pedestrians) - parts (wheels, doors, heads) - texture - edges - pixels Attributes: blue sky, red car It is natural for humans to decompose a complex problem into sub-problems through multiple levels of representations Humans Understand the World through Multiple Levels of Abstractions Humans learn abstract concepts on top of less abstract ones Humans can imagine new pictures by re-configuring these abstractions at multiple levels. Thus our brain has good generalization can recognize things never seen before. Our brain can estimate shape, lighting and pose from a face image and generate new images under various lightings and poses. Thats why we have good face recognition capability. Local and Global Representations The way these regions carve the input space still depends on few parameters: this huge number of regions are not placed independently of each other We can thus represent a function that looks complicated but actually has (global) structures Human Brains Process Visual Signals through Multiple Layers A visual cortical area consists of six layers (Kruger et al. 2013) Joint Learning vs Separate Learning Data collection Preprocessing step 1 Preprocessing step 2 Feature extraction Training or manual design Classification Manual design Training or manual design Data collection Feature transform Feature transform Feature transform Classification End-to-end learning ? ? ? Deep learning is a framework/language but not a black-box model Its power comes from joint optimization and increasing the capacity of the learner Domain knowledge could be helpful for designing new deep models and training strategies How to formulate a vision problem with deep learning? Make use of experience and insights obtained in CV research Sequential design/learning vs joint learning Effectively train a deep model (layerwise pre-training + fine tuning) Feature extraction Quantization (visual words) Spatial pyramid (histograms in local regions) Classification Filtering & max pooling Filtering & max pooling Filtering & max pooling Conventional object recognition scheme Krizhevsky NIPS12 Feature extraction filtering Quantization filtering Spatial pyramid multi-level pooling What if we treat an existing deep model as a black box in pedestrian detection? ConvNetUMS Sermnet, K. Kavukcuoglu, S. Chintala, and LeCun, “Pedestrian Detection with Unsupervised Multi-Stage Feature Learning,” CVPR 2013. Results on Caltech Test Results on ETHZ N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. CVPR, 2005. (6000 citations) P. Felzenszwalb, D. McAlester, and D. Ramanan. A Discriminatively Trained, Multiscale, Deformable Part Model. CVPR, 2008. (2000 citations) W. Ouyang and X. Wang. A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling. CVPR, 2012. Our Joint Deep Learning Model W. Ouyang and X. Wang, “Joint Deep Learning for Pedestrian Detection,” Proc. ICCV, 2013. Modeling Part Detectors Design the filters in the second convolutional layer with variable sizes Part models Learned filtered at the second convolutional layer Part models learned from HOG Deformation Layer Visibility Reasoning with Deep Belief Net Correlates with part detection score Experimental Results Caltech Test dataset (largest, most widely used) 2000200220042006200820102012201430405060708090100Average miss rate ( %) Experimental Results Caltech Test dataset (largest, most widely used) 200020022004200620082010201220143040506070809010095% Average miss rate ( %) Experimental Results Caltech Test dataset (largest, most widely used) 200020022004200620082010201220143040506070809010095% 68% Average miss rate ( %) Experimental Results Caltech Test dataset (largest, most widely used) 200020022004200620082010201220143040506070809010095% 68% 63% (state-of-the-art) Average miss rate ( %) Experimental Results Caltech Test dataset (largest, most widely used) 200020022004200620082010201220143040506070809010095% 68% 63% (state-of-the-art) 53% 39% (best performing) Improve by 20% W. Ouyang, X. Zeng and X. Wang, Modeling Mutual Visibility Relationship in Pedestrian Detection , CVPR 2013. W. Ouyang, Xiaogang Wang, Single-Pedestrian Detection aided by Multi-pedestrian Detection , CVPR 20

注意事項

本文（深度學習 deep learning 介紹王曉剛）為本站會員（文***）主動上傳，裝配圖網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對上載內(nèi)容本身不做任何修改或編輯。若此文所含內(nèi)容侵犯了您的版權(quán)或隱私，請立即通知裝配圖網(wǎng)（點擊聯(lián)系客服），我們立即給予刪除！

溫馨提示：如果因為網(wǎng)速或其他原因下載失敗請重新下載，重復下載不扣分。

秋霞电影网午夜鲁丝片无码,真人h视频免费观看视频,囯产av无码片毛片一级,免费夜色私人影院在线观看,亚洲美女综合香蕉片,亚洲aⅴ天堂av在线电影猫咪,日韩三级片网址入口

深度學習 deep learning 介紹 王曉剛

深度學習 deep learning 介紹 王曉剛

深度學習 deep learning 介紹王曉剛

深度學習 deep learning 介紹王曉剛