深度學習 deep learning 介紹 王曉剛
《深度學習 deep learning 介紹 王曉剛》由會員分享,可在線閱讀,更多相關《深度學習 deep learning 介紹 王曉剛(115頁珍藏版)》請在裝配圖網(wǎng)上搜索。
1、Introduction to Deep Learning Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong Deep learning = ? Deep feature presentations are? Outline Historical review of deep learning Introduction to classical deep models Why does deep learning work? Properties of deep fea
2、ture representations Machine Learning )(xFxyClass label (Classification) Real-valued Vector (Estimation) dog, cat, horse, flower, Object recognition Super resolution Low-resolution image High-resolution image Neural network Back propagation 1986 Solve general learning problems Tied with biological s
3、ystem Nature Neural network Back propagation 1986 x1 x2 x3 w1 w2 w3 g(x) f(net) Nature Neural network Back propagation 1986 Solve general learning problems Tied with biological system But it is given up Hard to train Insufficient computational resources Small training sets Does not work well Nature
4、Neural network Back propagation 1986 2006 SVM Boosting Decision tree KNN Flat structures Loose tie with biological systems Specific methods for specific tasks Hand crafted features (GMM-HMM, SIFT, LBP, HOG) Kruger et al. TPAMI13 Nature Neural network Back propagation 1986 2006 Deep belief net Scienc
5、e Unsupervised & Layer-wised pre-training Better designs for modeling and training (normalization, nonlinearity, dropout) New development of computer architectures GPU Multi-core computer systems Large scale databases Big Data ! Nature Machine Learning with Big Data Machine learning with small data:
6、 overfitting, reducing model complexity (capacity) Machine learning with big data: underfitting, increasing model complexity, optimization, computation resource How to increase model capacity? Curse of dimensionality D. Chen, X. Cao, F. Wen, and J. Sun. Blessing of dimensionality: Highdimensional fe
7、ature and its efficient compression for face verification. In Proc. IEEE Intl Conf. Computer Vision and Pattern Recognition, 2013. Blessing of dimensionality Learning hierarchical feature transforms (Learning features with deep structures) Neural network Back propagation 1986 Solve general learning
8、problems Tied with biological system But it is given up 2006 Deep belief net Science deep learning results Speech 2011 Nature Rank Name Error rate Description 1 U. Toronto 0.15315 Deep learning 2 U. Tokyo 0.26172 Hand-crafted features and learning models. Bottleneck. 3 U. Oxford 0.26979 4 Xerox/INRI
9、A 0.27058 Object recognition over 1,000,000 images and 1,000 categories (2 GPU) Neural network Back propagation 1986 2006 Deep belief net Science Speech 2011 2012 Nature A. Krizhevsky, L. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012. Exam
10、ples from ImageNet Neural network Back propagation 1986 2006 Deep belief net Science Speech 2011 2012 ImageNet 2013 image classification challenge Rank Name Error rate Description 1 NYU 0.11197 Deep learning 2 NUS 0.12535 Deep learning 3 Oxford 0.13555 Deep learning MSRA, IBM, Adobe, NEC, Clarifai,
11、Berkley, U. Tokyo, UCLA, UIUC, Toronto . Top 20 groups all used deep learning ImageNet 2013 object detection challenge Rank Name Mean Average Precision Description 1 UvA-Euvision 0.22581 Hand-crafted features 2 NEC-MU 0.20895 Hand-crafted features 3 NYU 0.19400 Deep learning Neural network Back prop
12、agation 1986 2006 Deep belief net Science Speech 2011 2012 ImageNet 2014 Image classification challenge Rank Name Error rate Description 1 Google 0.06656 Deep learning 2 Oxford 0.07325 Deep learning 3 MSRA 0.08062 Deep learning ImageNet 2014 object detection challenge Rank Name Mean Average Precisio
13、n Description 1 Google 0.43933 Deep learning 2 CUHK 0.40656 Deep learning 3 DeepInsight 0.40452 Deep learning 4 UvA-Euvision 0.35421 Deep learning 5 Berkley Vision 0.34521 Deep learning Neural network Back propagation 1986 2006 Deep belief net Science Speech 2011 2012 ImageNet 2014 object detection
14、challenge W. Ouyang and X. Wang et al. “DeepID-Net: deformable deep convolutional neural networks for object detection”, CVPR, 2015 RCNN (Berkley) Berkley vision UvA-Euvision DeepInsight GooLeNet (Google) DeepID-Net (CUHK) Model average n/a n/a n/a 40.5 43.9 50.3 Single model 31.4 34.5 35.4 40.2 38.
15、0 47.9 Wanli Ouyang Neural network Back propagation 1986 2006 Deep belief net Science Speech 2011 2012 Google and Baidu announced their deep learning based visual search engines (2013) Google “on our test set we saw double the average precision when compared to other approaches we had tried. We acqu
16、ired the rights to the technology and went full speed ahead adapting it to run at large scale on Googles computers. We took cutting edge research straight out of an academic research lab and launched it, in just a little over six months.” Baidu Neural network Back propagation 1986 2006 Deep belief n
17、et Science Speech 2011 2012 Face recognition 2014 Deep learning achieves 99.53% face verification accuracy on Labeled Faces in the Wild (LFW), higher than human performance Y. Sun, X. Wang, and X. Tang. Deep Learning Face Representation by Joint Identification-Verification. NIPS, 2014. Y. Sun, X. Wa
18、ng, and X. Tang. Deeply learned face representations are sparse, selective, and robust. CVPR, 2015. Labeled Faces in the Wild (2007) Best results without deep learning Design Cycle start Collect data Preprocessing Feature design Choose and design model Train classifier Evaluation end Domain knowledg
19、e Interest of people working on computer vision, speech recognition, medical image processing, Interest of people working on machine learning Interest of people working on machine learning and computer vision, speech recognition, medical image processing, Preprocessing and feature design may lose us
20、eful information and not be optimized, since they are not parts of an end-to-end learning system Preprocessing could be the result of another pattern recognition system Person re-identification pipeline Pedestrian detection Pose estimation Body parts segmentation Photometric & geometric transform Fe
21、ature extraction Classification Face recognition pipeline Face alignment Geometric rectification Photometric rectification Feature extraction Classification Design Cycle with Deep Learning start Collect data Preprocessing (Optional) Design network Feature learning Classifier Train network Evaluation
22、 end Learning plays a bigger role in the design circle Feature learning becomes part of the end-to-end learning system Preprocessing becomes optional means that several pattern recognition steps can be merged into one end-to-end learning system Feature learning makes the key difference We underestim
23、ated the importance of data collection and evaluation What makes deep learning successful in computer vision? Deep learning Li Fei-Fei Geoffrey Hinton Data collection Evaluation task One million images with labels Predict 1,000 image categories CNN is not new Design network structure New training st
24、rategies Feature learned from ImageNet can be well generalized to other tasks and datasets! Learning features and classifiers separately Not all the datasets and prediction tasks are suitable for learning features with deep models Dataset A feature transform Classifier 1 Classifier 2 . Prediction on
25、 task 1 . Prediction on task 2 Deep learning Training stage A Dataset B feature transform Classifier B Prediction on task B (Our target task) Training stage B Deep learning can be treated as a language to described the world with great flexibility Collect data Preprocessing 1 Feature design Classifi
26、er Evaluation Preprocessing 2 Collect data Feature transform Feature transform Classifier Deep neural network Evaluation Connection Introduction to Deep Learning Historical review of deep learning Introduction to classical deep models Why does deep learning work? Properties of deep feature represent
27、ations Introduction on Classical Deep Models Convolutional Neural Networks (CNN) Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based Learning Applied to Document Recognition,” Proceedings of the IEEE, Vol. 86, pp. 2278-2324, 1998. Deep Belief Net (DBN) G. E. Hinton, S. Osindero, and Y. T
28、eh, “A Fast Learning Algorithm for Deep Belief Nets,” Neural Computation, Vol. 18, pp. 1527-1544, 2006. Auto-encoder G. E. Hinton and R. R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks,” Science, Vol. 313, pp. 504-507, July 2006. Classical Deep Models Convolutional Neural
29、 Networks (CNN) First proposed by Fukushima in 1980 Improved by LeCun, Bottou, Bengio and Haffner in 1998 Convolution Pooling Learned filters Backpropagation W is the parameter of the network; J is the objective function Output layer Hidden layers Input layer Target values Feedforward operation Back
30、 error propagation D. E. Rumelhart, G. E. Hinton, R. J. Williams, “Learning Representations by Back-propagation Errors,” Nature, Vol. 323, pp. 533-536, 1986. Classical Deep Models Deep belief net Hinton06 P(x,h1,h2) = p(x|h1) p(h1,h2) 111hxhxhx1hx,),(),(),(EEeePE(x,h1)=b x+c h1+h1 Wx Initial point P
31、re-training: Good initialization point Make use of unlabeled data Classical Deep Models Auto-encoder Hinton and Salakhutdinov 2006 x1h2h1hx1Wb1 2Wb2 2Wb3 1Wb4 Encoding: h1 = (W1x+b1) h2 = (W2h1+b2) Decoding: = (W2h2+b3) = (W1h1+b4) 1hxIntroduction to Deep Learning Historical review of deep learning
32、Introduction to classical deep models Why does deep learning work? Properties of deep feature representations Feature Learning vs Feature Engineering Feature Engineering The performance of a pattern recognition system heavily depends on feature representations Manually designed features dominate the
33、 applications of image and video understanding in the past Reply on human domain knowledge much more than data Feature design is separate from training the classifier If handcrafted features have multiple parameters, it is hard to manually tune them Developing effective features for new applications
34、 is slow Handcrafted Features for Face Recognition 1980s Geometric features 1992 Pixel vector 1997 Gabor filters 2 parameters 2006 Local binary patterns 3 parameters Feature Learning Learning transformations of the data that make it easier to extract useful information when building classifiers or p
35、redictors Jointly learning feature transformations and classifiers makes their integration optimal Learn the values of a huge number of parameters in feature representations Faster to get feature representations for new applications Make better use of big data Deep Learning Means Feature Learning De
36、ep learning is about learning hierarchical feature representations Good feature representations should be able to disentangle multiple factors coupled in the data Trainable Feature Transform Trainable Feature Transform Trainable Feature Transform Trainable Feature Transform Data Classifier Pixel 1 P
37、ixel n Pixel 2 Ideal Feature Transform view expression Deep Learning Means Feature Learning How to effectively learn features with deep models With challenging tasks Predict high-dimensional vectors Pre-train on classifying 1,000 categories Fine-tune on classifying 201 categories Feature representat
38、ion SVM binary classifier for each category Detect 200 object classes on ImageNet W. Ouyang and X. Wang et al. “DeepID-Net: deformable deep convolutional neural networks for object detection”, CVPR, 2015 Dataset A feature transform Classifier A Distinguish 1000 categories Training stage A Dataset B
39、feature transform Classifier B Distinguish 201 categories Training stage B Dataset C feature transform SVM Distinguish one object class from all the negatives Training stage C Fixed Example 1: deep learning generic image features Hinton groups groundbreaking work on ImageNet They did not have much e
40、xperience on general image classification on ImageNet It took one week to train the network with 60 Million parameters The learned feature representations are effective on other datasets (e.g. Pascal VOC) and other tasks (object detection, segmentation, tracking, and image retrieval) 96 learned low-
41、level filters Image classification result Top hidden layer can be used as feature for retrieval Example 2: deep learning face identity features by recovering canonical-view face images Reconstruction examples from LFW Z. Zhu, P. Luo, X. Wang, and X. Tang, “Deep Learning Identity Preserving Face Spac
42、e,” ICCV 2013. Deep model can disentangle hidden factors through feature extraction over multiple layers No 3D model; no prior information on pose and lighting condition Model multiple complex transforms Reconstructing the whole face is a much strong supervision than predicting 0/1 class label and h
43、elps to avoid overfitting Arbitrary view Canonical view -45o -30o -15o +15o +30o +45o Avg Pose LGBP 26 37.7 62.5 77 83 59.2 36.1 59.3 VAAM 17 74.1 91 95.7 95.7 89.5 74.8 86.9 FA-EGFC3 84.7 95 99.3 99 92.9 85.2 92.7 x SA-EGFC3 93 98.7 99.7 99.7 98.3 93.6 97.2 LE4 + LDA 86.9 95.5 99.9 99.7 95.5 81.8 9
44、3.2 x CRBM9 + LDA 80.3 90.5 94.9 96.4 88.3 89.8 87.6 x Ours 95.6 98.5 100.0 99.3 98.5 97.8 98.3 x Comparison on Multi-PIE Deep learning 3D model from 2D images, mimicking human brain activities Z. Zhu, P. Luo, X. Wang, and X. Tang, “Deep Learning and Disentangling Face Representation by Multi-View P
45、erception,” NIPS 2014. Face images in arbitrary views Face identity features Regressor 1 Regressor 2 . Reconstruct view 1 . Reconstruct view 2 Deep learning Training stage A feature transform Linear Discriminant analysis The two images belonging to the same person or not Training stage B Two face im
46、ages in arbitrary views Fixed Face reconstruction Face verification Example 3: deep learning face identity features from predicting 10,000 classes At training stage, each input image is classified into 10,000 identities with 160 hidden identity features in the top layer The hidden identity features
47、can be well generalized to other tasks (e.g. verification) and identities outside the training set As adding the number of classes to be predicted, the generalization power of the learned features also improves Y. Sun, X. Wang, and X. Tang. Deep Learning Face Representation by Joint Identification-V
48、erification. NIPS, 2014. Dataset A feature transform Classifier A Distinguish 10,000 people Training stage A Dataset B feature transform Linear classifier B The two images belonging to the same person or not Training stage B Fixed Face identification Face verification Deep Structures vs Shallow Stru
49、ctures (Why deep?) Shallow Structures A three-layer neural network (with one hidden layer) can approximate any classification function Most machine learning tools (such as SVM, boosting, and KNN) can be approximated as neural networks with one or two hidden layers Shallow models divide the feature s
50、pace into regions and match templates in local regions. O(N) parameters are needed to represent N regions SVM Deep Machines are More Efficient for Representing Certain Classes of Functions Theoretical results show that an architecture with insufficient depth can require many more computational eleme
51、nts, potentially exponentially more (with respect to input size), than architectures whose depth is matched to the task (Hastad 1986, Hastad and Goldmann 1991) It also means many more parameters to learn Take the d-bit parity function as an example d-bit logical parity circuits of depth 2 have expon
52、ential size (Andrew Yao, 1985) There are functions computable with a polynomial-size logic gates circuits of depth k that require exponential size when restricted to depth k -1 (Hastad, 1986) (X1, . . . , Xd) Xi is even Architectures with multiple levels naturally provide sharing and re-use of compo
53、nents Honglak Lee, NIPS10 Humans Understand the World through Multiple Levels of Abstractions We do not interpret a scene image with pixels Objects (sky, cars, roads, buildings, pedestrians) - parts (wheels, doors, heads) - texture - edges - pixels Attributes: blue sky, red car It is natural for hum
54、ans to decompose a complex problem into sub-problems through multiple levels of representations Humans Understand the World through Multiple Levels of Abstractions Humans learn abstract concepts on top of less abstract ones Humans can imagine new pictures by re-configuring these abstractions at mult
55、iple levels. Thus our brain has good generalization can recognize things never seen before. Our brain can estimate shape, lighting and pose from a face image and generate new images under various lightings and poses. Thats why we have good face recognition capability. Local and Global Representation
56、s The way these regions carve the input space still depends on few parameters: this huge number of regions are not placed independently of each other We can thus represent a function that looks complicated but actually has (global) structures Human Brains Process Visual Signals through Multiple Laye
57、rs A visual cortical area consists of six layers (Kruger et al. 2013) Joint Learning vs Separate Learning Data collection Preprocessing step 1 Preprocessing step 2 Feature extraction Training or manual design Classification Manual design Training or manual design Data collection Feature transform Fe
58、ature transform Feature transform Classification End-to-end learning ? ? ? Deep learning is a framework/language but not a black-box model Its power comes from joint optimization and increasing the capacity of the learner Domain knowledge could be helpful for designing new deep models and training s
59、trategies How to formulate a vision problem with deep learning? Make use of experience and insights obtained in CV research Sequential design/learning vs joint learning Effectively train a deep model (layerwise pre-training + fine tuning) Feature extraction Quantization (visual words) Spatial pyrami
60、d (histograms in local regions) Classification Filtering & max pooling Filtering & max pooling Filtering & max pooling Conventional object recognition scheme Krizhevsky NIPS12 Feature extraction filtering Quantization filtering Spatial pyramid multi-level pooling What if we treat an existing deep mo
61、del as a black box in pedestrian detection? ConvNetUMS Sermnet, K. Kavukcuoglu, S. Chintala, and LeCun, “Pedestrian Detection with Unsupervised Multi-Stage Feature Learning,” CVPR 2013. Results on Caltech Test Results on ETHZ N. Dalal and B. Triggs. Histograms of oriented gradients for human detecti
62、on. CVPR, 2005. (6000 citations) P. Felzenszwalb, D. McAlester, and D. Ramanan. A Discriminatively Trained, Multiscale, Deformable Part Model. CVPR, 2008. (2000 citations) W. Ouyang and X. Wang. A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling. CVPR, 2012. Our Joint Deep
63、Learning Model W. Ouyang and X. Wang, “Joint Deep Learning for Pedestrian Detection,” Proc. ICCV, 2013. Modeling Part Detectors Design the filters in the second convolutional layer with variable sizes Part models Learned filtered at the second convolutional layer Part models learned from HOG Deforma
64、tion Layer Visibility Reasoning with Deep Belief Net Correlates with part detection score Experimental Results Caltech Test dataset (largest, most widely used) 2000200220042006200820102012201430405060708090100Average miss rate ( %) Experimental Results Caltech Test dataset (largest, most widely used
65、) 200020022004200620082010201220143040506070809010095% Average miss rate ( %) Experimental Results Caltech Test dataset (largest, most widely used) 200020022004200620082010201220143040506070809010095% 68% Average miss rate ( %) Experimental Results Caltech Test dataset (largest, most widely used) 20
66、0020022004200620082010201220143040506070809010095% 68% 63% (state-of-the-art) Average miss rate ( %) Experimental Results Caltech Test dataset (largest, most widely used) 200020022004200620082010201220143040506070809010095% 68% 63% (state-of-the-art) 53% 39% (best performing) Improve by 20% W. Ouyang, X. Zeng and X. Wang, Modeling Mutual Visibility Relationship in Pedestrian Detection , CVPR 2013. W. Ouyang, Xiaogang Wang, Single-Pedestrian Detection aided by Multi-pedestrian Detection , CVPR 20
- 溫馨提示:
1: 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2: 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
3.本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 裝配圖網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 6.煤礦安全生產(chǎn)科普知識競賽題含答案
- 2.煤礦爆破工技能鑒定試題含答案
- 3.爆破工培訓考試試題含答案
- 2.煤礦安全監(jiān)察人員模擬考試題庫試卷含答案
- 3.金屬非金屬礦山安全管理人員(地下礦山)安全生產(chǎn)模擬考試題庫試卷含答案
- 4.煤礦特種作業(yè)人員井下電鉗工模擬考試題庫試卷含答案
- 1 煤礦安全生產(chǎn)及管理知識測試題庫及答案
- 2 各種煤礦安全考試試題含答案
- 1 煤礦安全檢查考試題
- 1 井下放炮員練習題含答案
- 2煤礦安全監(jiān)測工種技術比武題庫含解析
- 1 礦山應急救援安全知識競賽試題
- 1 礦井泵工考試練習題含答案
- 2煤礦爆破工考試復習題含答案
- 1 各種煤礦安全考試試題含答案