《數(shù)據(jù)倉(cāng)庫(kù)與數(shù)據(jù)挖掘》第9章
,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Data Mining: Concepts and Techniques,*,第7章: 分類和預(yù)測(cè),What is classification? What is prediction?,Issues regarding classification and prediction,Classification by decision tree induction,Bayesian Classification,Classification by Neural Networks,Classification by Support Vector Machines (SVM),Classification based on concepts from association rule mining,Other Classification Methods,Prediction,Classification accuracy,Summary,2024/12/10,1,Data Mining: Concepts and Techniques,Classification:,predicts categoricalclass labels (discrete or nominal),classifies data (constructsa model) basedon thetraining setand thevalues(class labels) in aclassifying attributeand uses it in classifyingnew data,Prediction:,modelscontinuous-valued functions,i.e.,predicts unknown or missingvalues,TypicalApplications,creditapproval,targetmarketing,medicaldiagnosis,treatment effectiveness analysis,Classificationvs. Prediction,2023/3/7,2,Data Mining:Conceptsand Techniques,Classification—A Two-Step Process,Modelconstruction: describingasetofpredeterminedclasses,Each tuple/sampleisassumed to belongtoa predefinedclass, as determinedbytheclasslabelattribute,Theset of tuplesused formodelconstructionistrainingset,Themodelisrepresentedasclassificationrules, decision trees,ormathematicalformulae,Modelusage: forclassifyingfutureorunknownobjects,Estimateaccuracyofthemodel,Theknownlabeloftestsampleiscomparedwiththeclassifiedresultfromthemodel,Accuracyrate is thepercentage of testset samplesthatarecorrectly classifiedbythe model,Testsetisindependentoftrainingset,otherwiseover-fittingwilloccur,Iftheaccuracyisacceptable,usethemodeltoclassifydatatupleswhoseclasslabelsarenotknown,2023/3/7,3,DataMining:ConceptsandTechniques,ClassificationProcess(1):ModelConstruction,Training,Data,Classification,Algorithms,IFrank=,‘,‘professor,’,’,ORyears>6,THENtenured=,‘,‘yes’,Classifier,(Model),2023/3/7,4,DataMining:Concepts and Techniques,Classification Process (2):UsetheModel inPrediction,Classifier,Testing,Data,Unseen Data,(,Jeff, Professor,4),Tenured?,2023/3/7,5,DataMining:Concepts and Techniques,Supervised vs. UnsupervisedLearning,Supervised learning(classification),Supervision:Thetraining data (observations, measurements,etc.) are accompanied bylabelsindicating t,he c,lassoftheobservations,Newdataisclassified basedonthetrainingset,Unsupervisedlearning(clustering),Theclass labelsoftrainingdata isunknown,Given asetof measurements,observations, etc.withtheaimofestablishingtheexistence of classes orclusters inthedata,2023/3/7,6,Dat,a M,ining:Conceptsand Techniques,第7章: 分,類,類和預(yù),測(cè),測(cè),What is classification?What is prediction?,Issuesregarding classification andprediction,Classificationbydecisiontree induction,BayesianClassification,ClassificationbyNeuralNetworks,ClassificationbySupport VectorMachines(SVM),Classificationbasedonconceptsfrom association rulemining,OtherClassificationMethods,Prediction,Classificationaccuracy,Summary,2023/3/7,7,DataMining: Concepts andTechniques,Issues Regarding Classification andPrediction (1): Dat,Datacleaning,Preprocessdatain orderto reducenoiseandhandle missingvalues,Relevanceanalysis (feature selection),Remove theirrelevant orredundantattributes,Datatransformation,Generalizeand/or normalize data,2023/3/7,8,Data Mining: Conceptsand Techniques,Issuesregarding classification andprediction (2): EvaluatingClassificationMethods,Predictive accuracy,Speed and scalability,time toconstruct themodel,time touse the model,Robustness,handling noiseand missing values,Scalability,efficiency indisk-residentdatabases,Interpretability:,understandingand insight providedby themodel,Goodness of rules,decision treesize,compactness ofclassification rules,2023/3/7,9,Data Mining:Concepts and Techniques,第7章: 分,類,類和預(yù)測(cè),What is classification?What is prediction?,Issuesregarding classification andprediction,Classification bydecision tree induction,Bayesian Classification,Classification byNeuralNetwo,rks,Classification bySupport Vector Machines(SVM),Classification based onconcepts from association rulemining,OtherClassification Methods,Prediction,Classification accuracy,Summary,2023/3/7,10,Data Mining:Conceptsand Techniques,TrainingDataset,This followsanexamplefromQuinlan’sID3,2023/3/7,11,Data Mining:Conceptsand Techniques,Output: ADecisionTreefor,“,“,buys_computer”,age?,overcast,student?,creditrating?,no,yes,fair,excellent,<=30,>40,no,no,yes,yes,yes,30..40,2023/3/7,12,DataMining: Concepts andTechniques,Algorithmfor Decision Tree Induction,Basicalgorithm(a greedyalgorithm),Treeis constructedin atop-down recursive divide-and-conquer manner,At start,all the training examplesareat the root,Attributesarecategorical (ifcontinuous-valued,theyare discretizedin advance),Examples are partitionedrecursively based onselectedattributes,Testattributesareselected on thebasis ofa heuristic orstatistical measure(e.g.,information gain),Conditionsforstopping partitioning,All samples fora given node belongto the same class,Thereareno remaining attributes for furtherpartitioning –majority votingis employed forclassifying the leaf,Thereareno samplesleft,2023/3/7,13,DataMining:Concepts and Techniques,Attribute SelectionMeasure:InformationGain (ID3/C4.5),Select the attributewith the highest information gain,S contains s,i,tuples of classC,i,fori ={1,,…,…, m},informationmeasuresinfo required to classify any arbitrarytuple,,entropyof attributeA withvalues {a,1,,a,2,,…,a,v,},,,informationgainedby branchingonattribute A,,2023/3/7,14,DataMining:Concepts and Techniques,Attribute Selectionby Information GainComputation,Class P:buys_computer =,“,“yes”,Class N:buys_computer =,“,“no,”,”,I(p,n)= I(9, 5) =0.940,Computetheentropyfor,age,:,,,,,,means “age <=30”has5 out of 14samples, with 2yes,’,’esand3 no’s.Hence,,,Similarly,,2023/3/7,15,Data Mining:Concepts and Techniques,OtherAttribute Selection Measures,Gini index(CART,IBM IntelligentMiner),All attributes areassumed continuous-valued,Assumethereexistseveral possiblesplitvaluesfor each attribute,May need other tools, such asclustering,to getthe possible split values,Can bemodified for categorical attributes,2023/3/7,16,Data Mining:Concepts and Techniques,Gini,Index(IBM IntelligentMiner),If a data set,T,contains examplesfrom,n,classes, gini index,,gini,(,T,) is definedas,,where,p,j,is therelative frequency of class,j,in,T.,If a data set,T,is split into twosubsets,T,1,and,T,2,with sizes,N,1,and,N,2,respectively, the,gini,indexof thesplitdatacontains examplesfrom,n,classes, the,gini,index,gini,(,T,) is definedas,,,The attribute provides the smallest,gini,split,(,T,) is chosento split thenode(,need to enumerateall possiblesplitting pointsfor each attribute,).,2023/3/7,17,DataMining: Concepts andTechniques,ExtractingClassificationRules from Trees,Representthe knowledge in theformofIF-THENrules,One rule is createdfor each path from the root toa leaf,Eachattribute-valuepairalong a path formsa conjunction,The leaf node holdsthe classprediction,Rulesareeasier forhumans tounderstand,Example,IF,age,= “<=30” AND,student,= “,no,” THEN,buys_computer,= “,no,”,IF,age,= “<=30” AND,student,= “,yes,” THEN,buys_computer,= “,yes,”,IF,age,= “31,…,…40”THEN,buys_computer,= “,yes,”,IF,age,= “>40”AND,credit_rating,= “,excellent,” THEN,buys_computer,= “,yes,”,IF,age,= “<=30” AND,credit_rating,= “,fair,” THEN,buys_computer,= “,no,”,2023/3/7,18,Data Mining: Conceptsand Techniques,Avoid Overfitting inClassification,Overfitting:An induced tree may overfitthe training data,Too many branches, some mayreflectanomalies dueto noise or outliers,Poor accuracyfor unseen samples,Two approachesto avoid overfitting,Prepruning: Halt treeconstructionearly—do not split anode ifthis would result inthe goodnessmeasurefalling belowa threshold,Difficult to choose an appropriatethreshold,Postpruning: Remove branchesfrom a,“,“fullygrown”tree—get a sequenceof progressively pruned trees,Use a set of data differentfrom the training data to decide which isthe “best pruned tree,”,”,2023/3/7,19,Data Mining: Conceptsand Techniques,Approaches toDetermine theFinal Tree Size,Separate training (2/3) andtesting(1/3)sets,Use cross validation,e.g.,10-foldcrossvalidation,Use allthe data fortraining,but apply astatistical test(e.g.,chi-square) toestimate whether expandingor pruning a node mayimprove the entire distribution,Use minimum description length (MDL) principle,haltinggrowthof thetree when theencoding is minimized,2023/3/7,20,Data Mining: Conceptsand Techniques,Enhancements to basicdecision treeinduction,Allow for continuous-valuedattributes,Dynamically define new discrete-valued attributesthat partition the continuous attribute value into a discreteset ofintervals,Handlemissingattribute values,Assignthe most common valueof theattribute,Assignprobability toeach of the possiblevalues,Attribute construction,Createnew attributesbasedon existing ones thatare sparselyrepresented,This reduces fragmentation,repetition, and replication,2023/3/7,21,Data Mining:Conceptsand Techniques,ClassificationinLargeDatabases,Classification—a classicalproblem extensively studiedbystatisticiansandmachinelearningresearchers,Scalability:Classifyingdatasets withmillionsofexamplesand hundreds of attributeswithreasonable speed,Whydecisiontreeinductionindatamining?,relatively fasterlearningspeed(thanotherclassificationmethods),convertibletosimpleand easytounderstandclassificationrules,canuse SQLqueries foraccessingdatabases,comparable classification accuracy withothermethods,2023/3/7,22,DataMining:ConceptsandTechniques,ScalableDecisionTreeInductionMethodsinDataMiningStudies,SLIQ(EDBT’96,—,—Mehtaetal.),buildsanindexforeachattributeandonlyclasslistandthecurrentattributelistresideinmemory,SPRINT(VLDB’96,—,—J.Shaferetal.),constructsanattributelistdatastructure,PUBLIC(VLDB’98,—,—Rastogi&Shim),integratestreesplittingandtreepruning:stopgrowingthetreeearlier,RainForest(VLDB’98,—,—Gehrke,Ramakrishnan&Ganti),separatesthescalabilityaspectsfromthecriteriathatdeterminethequalityofthetree,buildsanAVC-list(attribute,value,classlabel),2023/3/7,23,DataMining:Concepts and Techniques,DataCube-BasedDecision-Tree Induction,Integrationof generalization with decision-treeinduction (Kamber et al,’,’97).,Classification at primitiveconceptlevels,E.g., precise temperature, humidity,outlook, etc.,Low-level concepts,scattered classes, bushyclassification-trees,Semanticinterpretationproblems.,Cube-based multi-level classification,Relevance analysis at multi-levels.,Information-gainanalysis with dimension+ level.,2023/3/7,24,DataMining:Concepts and Techniques,PresentationofClassification Results,2023/3/7,25,Data Mining: Conceptsand Techniques,Visualizationof aDecision Treein SGI/MineSet3.0,2023/3/7,26,Data Mining: Conceptsand Techniques,Interactive Visual Miningby Perception-Based Classification(PBC),2023/3/7,27,Data Mining: Conceptsand Techniques,第7章: 分類,和,和預(yù)測(cè),What isclassification? Whatis prediction?,Issuesregarding classification andprediction,Classificationby decision tree induction,Bayesian Classification,Classificationby Neural Networks,Classificationby Support Vector Machines(SVM),Classificationbasedon concepts from associationrule mining,Other ClassificationMethods,Prediction,Classificationaccuracy,Summary,2023/3/7,28,Data Mining:Concepts and Techniques,Bayesian Classification:Why?,Probabilistic learning,: Calculateexplicit probabilitiesfor hypothesis, among the mostpractical approaches tocertain types oflearning problems,Incremental,: Eachtraining examplecan incrementallyincrease/decreasethe probability that a hypothesis iscorrect. Prior knowledge canbe combinedwithobserved data.,Probabilistic prediction,: Predict multiple hypotheses, weighted by their probabilities,Standard,: EvenwhenBayesian methods are computationallyintractable, theycan providea standardof optimal decision making against which other methodscan be measured,2023/3/7,29,Data Mining:Concepts and Techniques,Bayesian Theorem:Basics,Let Xbe a data sample whose class label is unknown,Let Hbe a hypothesis that X belongsto class C,For classificationproblems, determine P(H/X): the probability that thehypothesis holds given the observeddata sampleX,P(H):priorprobabilityof hypothesis H (i.e. the initial probability before we observe any data, reflects the background knowledge),P(X):probabilitythat sampledata is observed,P(X|H): probability ofobserving the sample X,giventhat the hypothesis holds,,2023/3/7,30,DataMining:ConceptsandTechniques,BayesianTheorem,Giventrainingdata,X,posterioriprobabilityofahypothesisH,P(H|X),followstheBayestheorem,,,Informally,thiscanbewrittenas,posterior=likelihoodxprior/evidence,MAP(maximumposteriori)hypothesis,,,Practicaldifficulty:requireinitialknowledgeofmanyprobabilities,significantcomputationalcost,2023/3/7,31,Data Mining: Conceptsand Techniques,Naïve Bayes Classifier,A simplified assumption: attributesare conditionally independent:,,,The product ofoccurrence ofsay 2elements x,1,and x,2,, giventhe current class isC, isthe product ofthe probabilities ofeach elementtaken separately, given thesame class P([y,1,,y,2,],C) =P(y,1,,C) * P(y,2,,C),No dependencerelation between attributes,Greatlyreduces the computation cost, onlycountthe class distribution.,Once the probabilityP(X|C,i,) is known, assign Xto theclass with maximum P(X|C,i,)*P(C,i,),2023/3/7,32,Data Mining:Conceptsand Techniques,Trainingdataset,Class:,C1:buys_computer=,‘yes’,C2:buys_computer=,‘no,’,’,,Data sample,X =(age<=30,,Income=medium,,Student=yes,Credit_rating=,Fair),2023/3/7,33,Data Mining:Conceptsand Techniques,NaïveBayesianClassifier:Example,Compute P(X/Ci)for eachclass,,P(age=,“,“<30”| buys_computer=“yes”)= 2/9=0.222,P(age=,“,“<30”| buys_computer=“no”)=3/5=0.6,P(income=,“,“medium”| buys_computer=“yes”)=4/9=0.444,P(income=,“,“medium”| buys_computer=“no”)=2/5=0.4,P(student=“yes”|buys_computer=“yes)=6/9=0.667,P(student=“yes”|buys_computer=“no”)=1/5=0.2,P(credit_rating=“fair,”,” |buys_computer=,“,“yes”)=6/9=0.667,P(credit_rating=“fair,”,” |buys_computer=,“,“no,”,”)=2/5=0.4,,X=(age<=30 ,income=medium,student=yes,credit_rating=fair),,P(X|Ci) :,P(X|buys_computer=,“,“yes”)= 0.222 x0.444x0.667x 0.0.667=0.044,P(X|buys_computer=,“,“no,”,”)=0.6 x0.4 x0.2 x0.4 =0.019,P(X|Ci)*P(Ci):,P(X|buys_computer=,“,“yes”)*P(buys_computer=“yes”)=0.028,P(X|buys_computer=,“,“yes”)*P(buys_computer=“yes”)=0.007,,X belongstoclass “buys_computer=yes”,2023/3/7,34,Data Mining:Conceptsand Techniques,NaïveBayesianClassifier: Comments,Advantages:,Easyto implement,Goodresults obtained inmostof the cases,Disadvantages,Assumption: class conditionalindependence ,thereforelossof accuracy,Practically, dependenciesexist among variables,E.g.,hospitals: patients: Profile: age, family history etc,Symptoms:fever, cough etc., Disease: lung cancer,diabetesetc,Dependencies among thesecannot bemodeled byNaïve BayesianClassifier,How to deal with these dependencies?,Bayesian BeliefNetworks,2023/3/7,35,DataMining: Concepts andTechniques,Bayesian Networks,Bayesian beliefnetwork allowsa,subset,of the variables conditionallyindependent,A graphical model ofcausal relationships,Represents,dependency,among the variables,Gives aspecification ofjoint probability distribution,X,Y,Z,P,Nodes: random variables,Links: dependency,X,Yaretheparentsof Z, and Yis the parent ofP,No dependency between ZandP,Hasno loopsorcycles,2023/3/7,36,DataMining:Concepts and Techniques,BayesianBeliefNetwork:AnExample,Family,History,LungCancer,PositiveXRay,Smoker,Emphysema,Dyspnea,,LC,~,LC,(,FH,S),(,FH,~S),(~,FH,S),(~,FH,~S),0.8,0.2,0.5,0.5,0.7,0.3,0.1,0.9,BayesianBeliefNetworks,Theconditionalprobabilitytable for the variable LungCancer:,Shows the conditional probability for each possiblecombinationof its parents,,,2023/3/7,37,Data Mining:Concepts and Techniques,Learning BayesianNetworks,Several cases,Givenboth the network structure andall variables observable: learn only theCPTs,Network structureknown,somehiddenvariables:methodof gradientdescent, analogous to neuralnetwork learning,Network structureunknown, allvariables observable: searchthrough themodelspaceto reconstruct graph topology,Unknown structure,all hiddenvariables: no goodalgorithmsknownfor this purpose,D. Heckerman, Bayesian networks fordata mining,2023/3/7,38,Data Mining:Concepts and Techniques,第7章: 分,類,類和預(yù)測(cè),What is classification?What is prediction?,Issuesregarding classification andprediction,Classification bydecision tree induction,Bayesian Classification,Classification byNeuralNetworks,Classification bySupport Vector Machines(SVM),Classification based onconcepts from association rulemining,OtherClassification Methods,Prediction,Classification accuracy,Summary,2023/3/7,39,DataMining:ConceptsandTechniques,Classification:,predictscategoricalclasslabels,TypicalApplications,{credithistory,salary}->creditapproval(Yes/No),{Temp,Humidity}-->Rain(Yes/No),Classification,Mathematically,2023/3/7,40,DataMining:ConceptsandTechniques,LinearClassification,BinaryClassificationproblem,Thedataabovetheredlinebelongstoclass,‘,‘x’,Thedatabelowredlinebelongstoclass,‘,‘o’,Examples,–,–SVM,Perceptron,ProbabilisticClassifiers,,x,x,x,x,x,x,x,x,x,x,o,o,o,o,o,o,o,o,o,o,o,o,o,2023/3/7,41,Data Mining: Conceptsand Techniques,DiscriminativeClassifiers,Advantages,prediction accuracy is generally high,(as compared to Bayesian methods –in general),robust,workswhen trainingexamples contain errors,fast evaluation of the learned target function,(Bayesian networks are normally slow),Criticism,long trainingtime,difficult to understand thelearnedfunction (weights),(Bayesian networks can be used easily forpatterndiscovery),not easy to incorporate domain knowledge,(easy in the form ofpriorson thedata ordistributions),2023/3/7,42,Data Mining: Conceptsand Techniques,NeuralNetworks,Analogyto BiologicalSystems (Indeed a great example ofa goodlearning system),MassiveParallelism allowingfor computational efficiency,The first learning algorithmcame in 1959(Rosenblatt) who suggested that ifa target output valueis provided for a single neuron with fixed inputs, onecan incrementally change weights tolearnto produce these outputs using theperceptron learning rule,,2023/3/7,43,Data Mining: Conceptsand Techniques,A Neuron,The,n,-dimensional input vector,x,is mapped intovariable,y,by means of the scalar product anda nonlinear functionmapping,m,k,-,f,weighted,sum,Input,vector,x,output,y,Activation,function,,weight,vector,w,å,w,0,w,1,w,n,x,0,x,1,x,n,2023/3/7,44,DataMining:Concepts and Techniques,A Neuron,m,k,-,f,weighted,sum,Input,vector,x,output,y,Activation,function,,weight,vector,w,å,w,0,w,1,w,n,x,0,x,1,x,n,2023/3/7,45,DataMining:Concepts and Techniques,Multi-LayerPerceptron,Output nodes,Input nodes,Hidden nodes,Output vector,Input vector:,x,i,w,ij,,,NetworkTraining,Theultimateobjective of training,obtain asetofweightsthatmakes almost all the tuplesinthetrainingdata classifiedcorrectly,Steps,Initialize weights withrandom values,Feedtheinput tuples into the network one by one,Foreachunit,Computethenetinput totheunit asa linear combination ofalltheinputsto the unit,Compute theoutputvalueusingthe activationfunction,Compute theerror,Updatethe weightsand thebias,,,Network Pruningand RuleExtraction,Network pruning,Fullyconnectednetworkwill be hardtoarticulate,N,inputnodes,,h,hiddennodesand,m,outputnodesleadto,h(m+N),weights,Pruning:Removesomeofthelinkswithoutaffectingclassificationaccuracyofthe network,Extracting rules fromatrained network,Discretize activationvalues;replace individualactivationvaluebytheclusteraverage maintaining thenetwork accuracy,Enumeratethe outputfrom thediscretizedactivation valuestofind rules betweenactivationvalueandoutput,Find therelationshipbetweentheinputand activationvalue,Combine theabovetwotohaverulesrelatingtheoutput to input,,,Chapter 7. Classification andPrediction,What is classification?What is prediction?,Issuesregarding classification andprediction,Classificationbydecisiontree induction,BayesianClassification,ClassificationbyNeuralNetw