秋霞电影网午夜鲁丝片无码,真人h视频免费观看视频,囯产av无码片毛片一级,免费夜色私人影院在线观看,亚洲美女综合香蕉片,亚洲aⅴ天堂av在线电影猫咪,日韩三级片网址入口

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘技術(shù)講座

上傳人:嘀****l 文檔編號(hào):253012464 上傳時(shí)間:2024-11-27 格式:PPT 頁數(shù):83 大小:10.59MB
收藏 版權(quán)申訴 舉報(bào) 下載
數(shù)據(jù)倉庫與數(shù)據(jù)挖掘技術(shù)講座_第1頁
第1頁 / 共83頁
數(shù)據(jù)倉庫與數(shù)據(jù)挖掘技術(shù)講座_第2頁
第2頁 / 共83頁
數(shù)據(jù)倉庫與數(shù)據(jù)挖掘技術(shù)講座_第3頁
第3頁 / 共83頁

下載文檔到電腦,查找使用更方便

12 積分

下載資源

還剩頁未讀,繼續(xù)閱讀

資源描述:

《數(shù)據(jù)倉庫與數(shù)據(jù)挖掘技術(shù)講座》由會(huì)員分享,可在線閱讀,更多相關(guān)《數(shù)據(jù)倉庫與數(shù)據(jù)挖掘技術(shù)講座(83頁珍藏版)》請(qǐng)?jiān)谘b配圖網(wǎng)上搜索。

1、Click to edit Master title style,,Click to edit Master text styles,,Second level,,Third level,,Fourth level,,Fifth level,,*,,*,Slide Title,,Body Text,,Second level,,Third level,,數(shù)據(jù)倉庫與數(shù)據(jù)挖掘綜述,概念、體系結(jié)構(gòu)、趨勢(shì)、應(yīng)用,報(bào)告人:朱建秋,2001年6月7日,,提綱,數(shù)據(jù)倉庫概念,,數(shù)據(jù)倉庫體系結(jié)構(gòu)及組件,,數(shù)據(jù)倉庫設(shè)計(jì),,數(shù)據(jù)倉庫技術(shù)(與數(shù)據(jù)庫技術(shù)的區(qū)別),,數(shù)據(jù)倉庫性能,,數(shù)據(jù)倉庫應(yīng)用,,數(shù)據(jù)挖掘應(yīng)用概述,,

2、數(shù)據(jù)挖掘技術(shù)與趨勢(shì),,數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目),數(shù)據(jù)倉庫概念,基本概念,,對(duì)數(shù)據(jù)倉庫的一些誤解,基本概念—,數(shù)據(jù)倉庫,,Data warehouse is a subject oriented, integrated,non-volatile and time variant,collection of data,in support of management’s decision ——,[Inmon,1996],.,Data warehouse is,a set of methods, techniques,and tools,that may be leveraged toget

3、her to produce a vehicle that delivers data to end-users on an integrated platform ——,[Ladley,1997],.,Data warehouse is,a process,of crating, maintaining,and using a decision-support infrastructure ——,[Appleton,1995][Haley,1997][Gardner 1998].,基本概念—,數(shù)據(jù)倉庫特征,[,Inmon,1996],面向主題,一個(gè)主題領(lǐng)域的表來源于多個(gè)操作型應(yīng)用(如:客戶主

4、題,來源于:定單處理;應(yīng)收帳目;應(yīng)付帳目;…),典型的主題領(lǐng)域:客戶;產(chǎn)品;交易;帳目,主題領(lǐng)域以一組相關(guān)的表來具體實(shí)現(xiàn),相關(guān)的表通過公共的鍵碼聯(lián)系起來(如:顧客標(biāo)識(shí)號(hào),Customer ID),每個(gè)鍵碼都有時(shí)間元素(從日期到日期;每月累積;單獨(dú)日期…),主題內(nèi)數(shù)據(jù)可以存儲(chǔ)在不同介質(zhì)上(綜合級(jí),細(xì)節(jié)級(jí),多粒度),集成,數(shù)據(jù)提取、凈化、轉(zhuǎn)換、裝載,穩(wěn)定性,批處理增加,倉庫已經(jīng)存在的數(shù)據(jù)不會(huì)改變,隨時(shí)間而變化(時(shí)間維),管理決策支持,基本概念—,Data Mart, ODS,Data Mart,數(shù)據(jù)集市 --,,小型的,面向部門或工作組級(jí)數(shù)據(jù)倉庫。,Operation Data Store,操作數(shù)

5、據(jù)存儲(chǔ) —,ODS,是能支持企業(yè)日常的全局應(yīng)用的數(shù)據(jù)集合,是不同于,DB,的一種新的數(shù)據(jù)環(huán)境, 是,DW,擴(kuò)展后得到的一個(gè)混合形式。四個(gè)基本特點(diǎn):面向主題的(,Subject -Oriented)、,集成的、可變的、 當(dāng)前或接近當(dāng)前的。,基本概念—,ETL,,元數(shù)據(jù),粒度,分割,ETL,ETL(Extract/Transformation/Load)—,數(shù)據(jù)裝載、轉(zhuǎn)換、抽取工具。,Microsoft DTS; IBM Visual Warehouse etc.,元數(shù)據(jù),關(guān)于數(shù)據(jù)的數(shù)據(jù),,用于構(gòu)造、維持、管理、和使用數(shù)據(jù)倉庫,,在數(shù)據(jù)倉庫中尤為重要。,粒度,數(shù)據(jù)倉庫的數(shù)據(jù)單位中保存數(shù)據(jù)的細(xì)化或綜

6、合程度的級(jí)別。細(xì)化程度越高,粒度越小。,分割,數(shù)據(jù)分散到各自的物理單元中去,它們能獨(dú)立地處理。,對(duì)數(shù)據(jù)倉庫的一些誤解,數(shù)據(jù)倉庫與,OLAP,,星型數(shù)據(jù)模型,,多維分析,,數(shù)據(jù)倉庫不是一個(gè)虛擬的概念,,數(shù)據(jù)倉庫與范式理論,,需要非范式化處理,,提綱,數(shù)據(jù)倉庫概念,,數(shù)據(jù)倉庫體系結(jié)構(gòu)及組件,,數(shù)據(jù)倉庫設(shè)計(jì),,數(shù)據(jù)倉庫技術(shù)(與數(shù)據(jù)庫技術(shù)的區(qū)別),,數(shù)據(jù)倉庫性能,,數(shù)據(jù)倉庫應(yīng)用,,數(shù)據(jù)挖掘應(yīng)用概述,,數(shù)據(jù)挖掘技術(shù)與趨勢(shì),,數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目),數(shù)據(jù)倉庫體系結(jié)構(gòu)及組件,體系結(jié)構(gòu),,ETL,工具,,元數(shù)據(jù)庫(,Repository),及元數(shù)據(jù)管理,,數(shù)據(jù)訪問和分析工具,體系結(jié)構(gòu),,[,Piet

7、er ,1998,],Source,Databases,Data Extraction,,Transformation, load,,,Warehouse,Admin.,Tools,Extract,,Transform,and Load,,Data,Modeling,Tool,,,,Central,Metadata,,Architected,Data Marts,Data Access,and Analysis,End-User,DW Tools,Central Data,Warehouse,,Central,Data,Warehouse,,,,,Mid-,Tier,,Mid-,Tier,,,

8、,,,,,Data,Mart,,Data,Mart,,,,Local,Metadata,,,,Local,Metadata,,,,Local,Metadata,Metadata,Exchange,MDB,,,,,,,,,,Data,Cleansing,Tool,,,,,,,,,,,,,,Relational,Appl. Package,Legacy,External,,RDBMS,RDBMS,帶,ODS,的體系結(jié)構(gòu),Source,Databases,Hub - Data Extraction,,Transformation, load,,,Warehouse,Admin.,Tools,Extr

9、act,,Transform,and Load,,Data,Modeling,Tool,,,,Central,Metadata,Architected,Data Marts,Data Access,and Analysis,Central Data Ware-,house and ODS,,Central,Data,Warehouse,,,,,Mid-,Tier,,,,RDBMS,,Data,Mart,,Mid-,Tier,,,,RDBMS,,Data,Mart,,,,Local,Metadata,,,,Local,Metadata,,,,Local,Metadata,Metadata,Exc

10、hange,,ODS,,,,OLTP,Tools,,,Data,Cleansing,Tool,,,,,,,,,,,,,Relational,Appl. Package,Legacy,External,,MDB,,,,,,,,,,End-User,DW Tools,現(xiàn)實(shí)環(huán)境—異質(zhì)性,[,Douglas Hackney ,2001,],,,Custom,Marketing,Data,Warehouse,Packaged,Oracle,Financial,Data,Warehouse,Packaged,I2 Supply Chain,Non- Architected,Data Mart,Subset

11、,Data Marts,,,,,,,,,,,Oracle Financials,,,i2 Supply Chain,,Siebel CRM,,3,rd Party,,,,,,,,,e-Commerce,,,,聯(lián)合型數(shù)據(jù)倉庫/數(shù)據(jù)集市體系結(jié)構(gòu),,Real Time,ODS,Federated,Financial,Data,Warehouse,Subset,Data Marts,Common,Staging,Area,,,,,,,,,Oracle Financials,,,i2 Supply Chain,,Siebel CRM,,3,rd Party,,,,,,,,Federated,Packag

12、ed,I2 Supply,Chain,Data Marts,,,Analytical,Applications,,e-Commerce,,,,Real Time,Data Mining,and Analytics,Real Time,Segmentation,,Classification,,Qualification,,Offerings, etc.,Federated,Marketing,Data,Warehouse,,,,ETL tools & DW templates,Data profiling & reengineering tools,Demand-driven data acq

13、uisition & analysis,Metadata Interchange,Federated data warehouse and data mart systems,Decision engine models, rules and metrics,OLAP & data mining tools,,,Analysis templates,Analytic application development tools & components,Analytic applications,Front- and back-office OLTP,e-Business systems,Ext

14、ernal information providers,CRM Analytics & Reporting,Supply Chain Analytics & Reporting,EKP - Enterprise Knowledge Management Portal,EPM Analytics & Reporting,Business information & recommendations,Informed decisions & actions,Financial Analytics & Reporting,HR Analytics & Reporting,閉環(huán)的聯(lián)合型,BI,體系結(jié)構(gòu),

15、數(shù)據(jù)倉庫的焦點(diǎn)問題-,數(shù)據(jù)的獲得、存儲(chǔ)和使用,,,Relational,Package,Legacy,External,source,Data,Clean,Tool,Data,Staging,Enterprise,Data,Warehouse,Datamart,Datamart,RDBMS,ROLAP,RDBMS,,,,End-User,Tool,,,,End-User,Tool,,MDB,,,,End-User,Tool,,,,End-User,Tool,,,,,,,,數(shù)據(jù)倉庫和集市的加載能力至關(guān)重要,數(shù)據(jù)倉庫和集市的查詢輸出能力至關(guān)重要,,ETL,工具,去掉操作型數(shù)據(jù)庫中的不需要的數(shù)據(jù),,

16、統(tǒng)一轉(zhuǎn)換數(shù)據(jù)的名稱和定義,,計(jì)算匯總數(shù)據(jù)和派生數(shù)據(jù),,估計(jì)遺失數(shù)據(jù)的缺省值,,調(diào)節(jié)源數(shù)據(jù)的定義變化,,ETL,工具體系結(jié)構(gòu),,元數(shù)據(jù)庫及元數(shù)據(jù)管理,元數(shù)據(jù)分類:技術(shù)元數(shù)據(jù);商業(yè)元數(shù)據(jù);數(shù)據(jù)倉庫操作型信息。,-[,Alex Berson etc, 1999,],技術(shù)元數(shù)據(jù),包括為數(shù)據(jù)倉庫設(shè)計(jì)人員和管理員使用的數(shù)據(jù)倉庫數(shù)據(jù)信息,用于執(zhí)行數(shù)據(jù)倉庫開發(fā)和管理任務(wù)。包括:,數(shù)據(jù)源信息,轉(zhuǎn)換描述(從操作數(shù)據(jù)庫到數(shù)據(jù)倉庫的映射方法,以及轉(zhuǎn)換數(shù)據(jù)的算法),目標(biāo)數(shù)據(jù)的倉庫對(duì)象和數(shù)據(jù)結(jié)構(gòu)定義,數(shù)據(jù)清洗和數(shù)據(jù)增加的規(guī)則,數(shù)據(jù)映射操作,訪問權(quán)限,備份歷史,存檔歷史,信息傳輸歷史,數(shù)據(jù)獲取歷史,數(shù)據(jù)訪問,等等,元數(shù)據(jù)庫

17、及元數(shù)據(jù)管理,,商業(yè)元數(shù)據(jù),給用戶易于理解的信息,包括:,主題區(qū)和信息對(duì)象類型,包括查詢、報(bào)表、圖像、音頻、視頻等,Internet,主頁,支持?jǐn)?shù)據(jù)倉庫的其它信息,例如對(duì)于信息傳輸系統(tǒng)包括預(yù)約信息、調(diào)度信息、傳送目標(biāo)的詳細(xì)描述、商業(yè)查詢對(duì)象,等,數(shù)據(jù)倉庫操作型信息,例如,數(shù)據(jù)歷史(快照,版本),擁有權(quán),抽取的審計(jì)軌跡,數(shù)據(jù)用法,,元數(shù)據(jù)庫及元數(shù)據(jù)管理,元數(shù)據(jù)庫(,metadata repository),和工具,— [,Martin Stardt,2000,],數(shù)據(jù)訪問和分析工具,報(bào)表,,OLAP,,數(shù)據(jù)挖掘,提綱,數(shù)據(jù)倉庫概念,,數(shù)據(jù)倉庫體系結(jié)構(gòu)及組件,,數(shù)據(jù)倉庫設(shè)計(jì),,數(shù)據(jù)倉庫技術(shù)(與數(shù)據(jù)

18、庫技術(shù)的區(qū)別),,數(shù)據(jù)倉庫性能,,數(shù)據(jù)倉庫應(yīng)用,,數(shù)據(jù)挖掘應(yīng)用概述,,數(shù)據(jù)挖掘技術(shù)與趨勢(shì),,數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目),數(shù)據(jù)倉庫設(shè)計(jì),自上而下(,Top-Down),,自底而上(,Bottom Up),,混合的方法,,數(shù)據(jù)倉庫建模,Top-down Approach,Build Enterprise data warehouse,Common central data model,Data re-engineering performed once,Minimize redundancy and inconsistency,Detailed and history data; globa

19、l data discovery,Build datamarts from the Enterprise Data Warehouse (EDW),Subset of EDW relevant to department,Mostly summarized data,Direct dependency on EDW data availability,,,,,,,,,,,,Local Data Mart,External Data,,,Local Data Mart,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

20、,,,,,,,,,,,,,,,,,,,,,,,,Operational Data,,,,,,,Enterprise Warehouse,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,自底而上設(shè)計(jì)方法,創(chuàng)建部門的數(shù)據(jù)集市,范圍局限于一個(gè)主題區(qū)域,快速的,ROI --,局部的商業(yè)需求得到滿足,本部門自治 -- 設(shè)計(jì)上具有靈活性,對(duì)其他部門數(shù)據(jù)集市是一個(gè)好的指導(dǎo),容易復(fù)制到其他部門,需要為每個(gè)部門做數(shù)據(jù)重建,有一定級(jí)別的冗余和不一致性,一個(gè)切實(shí)可行的方法,擴(kuò)大到企業(yè)數(shù)據(jù)倉庫,創(chuàng)建,EDB,作為一個(gè)長期的目

21、標(biāo),,,局部數(shù)據(jù)集市,,,,,,,,外部數(shù)據(jù),操作型數(shù)據(jù) (全部),,,,操作型數(shù)據(jù),(局部),操作型數(shù)據(jù),(局部),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,局部數(shù)據(jù)集市,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,企業(yè)數(shù)據(jù)倉庫,EDB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,數(shù)據(jù)倉庫建模 — 星型模式,Example of Star Schema,,Date,Month,Year

22、,Date,CustId,CustName,CustCity,CustCountry,Cust,Sales Fact Table,,,Date,,Product,,,Store,,,Customer,,,unit_sales,,,dollar_sales,,,Yen_sales,Measurements,ProductNo,ProdName,ProdDesc,Category,QOH,Product,StoreID,City,State,Country,Region,Store,數(shù)據(jù)倉庫建模 — 雪片模式,,Date,Month,Date,CustId,CustName,CustCity,Cu

23、stCountry,Cust,,Sales Fact Table,,,Date,,Product,,,Store,,,Customer,,,unit_sales,,,dollar_sales,,,Yen_sales,Measurements,ProductNo,ProdName,ProdDesc,Category,QOH,Product,Month,Year,Month,Year,Year,City,State,City,Country,Region,Country,State,Country,State,StoreID,City,Store,Example of Snowflake Sche

24、ma,操作型(,OLTP,),數(shù)據(jù)源 --- 銷售庫,星形模式,時(shí)間維,事實(shí)表,多維模型,事實(shí),度量,(,Metrics),時(shí)間維,時(shí)間維的屬性,提綱,數(shù)據(jù)倉庫概念,,數(shù)據(jù)倉庫體系結(jié)構(gòu)及組件,,數(shù)據(jù)倉庫設(shè)計(jì),,數(shù)據(jù)倉庫技術(shù)(與數(shù)據(jù)庫技術(shù)的區(qū)別),,數(shù)據(jù)倉庫性能,,數(shù)據(jù)倉庫應(yīng)用,,數(shù)據(jù)挖掘應(yīng)用概述,,數(shù)據(jù)挖掘技術(shù)與趨勢(shì),,數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目),數(shù)據(jù)倉庫技術(shù) —,[,Inmon,1996],管理大量數(shù)據(jù),能夠管理大量數(shù)據(jù)的能力,能夠管理好的能力,管理多介質(zhì)(層次),主存、擴(kuò)展內(nèi)存、高速緩存、,DASD、,光盤、縮微膠片,監(jiān)視數(shù)據(jù),決定是否應(yīng)數(shù)據(jù)重組,決定索引是否建立得不恰當(dāng),決定是否

25、有太多數(shù)據(jù)溢出,決定剩余的可用空間,利用多種技術(shù)獲得和傳送數(shù)據(jù),批模式,聯(lián)機(jī)模式并不非常有用,程序員/設(shè)計(jì)者對(duì)數(shù)據(jù)存放位置的控制(塊/頁),數(shù)據(jù)的并行存儲(chǔ)/管理,元數(shù)據(jù)管理,數(shù)據(jù)倉庫技術(shù) —,[,Inmon,1996],數(shù)據(jù)倉庫語言接口,能夠一次訪問一組數(shù)據(jù),能夠一次訪問一條記錄,支持一個(gè)或多個(gè)索引,有,SQL,接口,數(shù)據(jù)的高效裝入,高效索引的利用,用位映像的方法、多級(jí)索引等,數(shù)據(jù)壓縮,I/O,資源比,CPU,資源少得多,因此數(shù)據(jù)解壓縮不是主要問題,復(fù)合鍵碼(因?yàn)閿?shù)據(jù)隨時(shí)間變化),變長數(shù)據(jù),加鎖管理(程序員能顯式控制鎖管理程序),單獨(dú)索引處理(查看索引就能提供某些服務(wù)),快速恢復(fù),數(shù)據(jù)倉庫技

26、術(shù) —,[,Inmon,1996],其他技術(shù)特征,傳統(tǒng)技術(shù)起很小作用,事務(wù)集成性、高速緩存、行/頁級(jí)鎖定、參照完整性、數(shù)據(jù)視圖,傳統(tǒng),DBMS,與數(shù)據(jù)倉庫,DBMS,區(qū)別,為數(shù)據(jù)倉庫和決策支持優(yōu)化設(shè)計(jì),管理更多數(shù)據(jù):10,GB/100GB/TB,傳統(tǒng),DBMS,適合記錄級(jí)更新,提供:鎖定,Lock、,提交,Commit、,檢測(cè)點(diǎn),CheckPoint、,日志處理,Log、,死鎖處理,DeadLock、,回退,Roolback.,基本數(shù)據(jù)管理,如:塊管理,傳統(tǒng),DBMS,需要預(yù)留空間,索引區(qū)別:傳統(tǒng),DBMS,限制索引數(shù)量,數(shù)據(jù)倉庫,DBMS,沒有限制,通用,DBMS,物理上優(yōu)化便于事務(wù)訪問處

27、理,而數(shù)據(jù)倉庫便于,DSS,訪問分析,改變,DBMS,技術(shù),多維,D,BMS,和數(shù)據(jù)倉庫,多維,DBMS,作為數(shù)據(jù)倉庫的數(shù)據(jù)庫技術(shù),這種想法是不正確的,多維,DBMS(OLAP),是一種技術(shù),數(shù)據(jù)倉庫是一種體系結(jié)構(gòu)的基礎(chǔ),雙重粒度級(jí)別(,DASD/,磁帶),數(shù)據(jù)倉庫技術(shù) —,[,Inmon,1996],數(shù)據(jù)倉庫環(huán)境中的元數(shù)據(jù),DSS,分析人員和,IT,專業(yè)人員不同,需要元數(shù)據(jù)的幫助,操作型環(huán)境和數(shù)據(jù)倉庫環(huán)境之間的映射需要元數(shù)據(jù),數(shù)據(jù)倉庫包含很長時(shí)間的數(shù)據(jù),必須有元數(shù)據(jù)標(biāo)記數(shù)據(jù)結(jié)構(gòu)/定義,上下文和內(nèi)容(上下文維),簡單上下文信息(數(shù)據(jù)結(jié)構(gòu)/編碼/命名約定/度量),復(fù)雜上下文信息(產(chǎn)品定義/市場(chǎng)

28、領(lǐng)域/定價(jià)/包裝/組織結(jié)構(gòu)),外部上下文信息(經(jīng)濟(jì)預(yù)測(cè):通貨膨脹、金融、稅收/政治信息/競(jìng)爭信息/技術(shù)進(jìn)展),刷新數(shù)據(jù)倉庫,數(shù)據(jù)復(fù)制(觸發(fā)器),變化數(shù)據(jù)捕獲(,CDC)(,日志),提綱,數(shù)據(jù)倉庫概念,,數(shù)據(jù)倉庫體系結(jié)構(gòu)及組件,,數(shù)據(jù)倉庫設(shè)計(jì),,數(shù)據(jù)倉庫技術(shù)(與數(shù)據(jù)庫技術(shù)的區(qū)別),,數(shù)據(jù)倉庫性能,,數(shù)據(jù)倉庫應(yīng)用,,數(shù)據(jù)挖掘應(yīng)用概述,,數(shù)據(jù)挖掘技術(shù)與趨勢(shì),,數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目),數(shù)據(jù)倉庫性能,— [,Inmon, 1999],使用,,數(shù)據(jù),,平臺(tái),,服務(wù)管理,王天佑 等譯,《數(shù)據(jù)倉庫管理》, 電子工業(yè)出版社,2000年5月,提綱,數(shù)據(jù)倉庫概念,,數(shù)據(jù)倉庫體系結(jié)構(gòu)及組件,,數(shù)據(jù)倉

29、庫設(shè)計(jì),,數(shù)據(jù)倉庫技術(shù)(與數(shù)據(jù)庫技術(shù)的區(qū)別),,數(shù)據(jù)倉庫性能,,數(shù)據(jù)倉庫應(yīng)用,,數(shù)據(jù)挖掘應(yīng)用概述,,數(shù)據(jù)挖掘技術(shù)與趨勢(shì),,數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目),數(shù)據(jù)倉庫應(yīng)用,—,DW,用戶數(shù)的調(diào)查,,“,DW,系統(tǒng)的用戶,在100-500以內(nèi)或以上,是未來一段時(shí)期內(nèi),的主要部分“,DW,用戶,的調(diào)查,最近一年,Meta Group Survey,調(diào)查對(duì)象:3000+ 用戶或意向用戶,DW,數(shù)據(jù)規(guī)模的調(diào)查,DW,規(guī)模的調(diào)查,最近一年,Meta Group Survey,調(diào)查對(duì)象:3000+ 用戶或意向用戶,How Much?,$3-6,m for mid-size company, less if

30、smaller, more if larger,$10m+ for large organizations, large data sets,10-50+% annual maintenance costs,33% Hardware / 33% Software / 33% Services,How Long?,2-4,years for 80/20 of full system for mid-size company,6-12 months for initial iteration,3-6 months for subsequent iterations,,How Risky?,For

31、EDW Projects, 20% (Meta) to 70% (OTR, DWN) fail,High failure rate for non-business driven initiatives,Very few systems meet the expectations of the business,Failure not due to technology, due to “soft” issues,Massive upside to successful projects (100% - 2000+% ROI),99%,politics - 1% technology,參考文獻(xiàn)

32、,提綱,數(shù)據(jù)倉庫概念,,數(shù)據(jù)倉庫體系結(jié)構(gòu)及組件,,數(shù)據(jù)倉庫設(shè)計(jì),,數(shù)據(jù)倉庫技術(shù)(與數(shù)據(jù)庫技術(shù)的區(qū)別),,數(shù)據(jù)倉庫性能,,數(shù)據(jù)倉庫應(yīng)用,,數(shù)據(jù)挖掘應(yīng)用概述,,數(shù)據(jù)挖掘技術(shù)與趨勢(shì),,數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目),數(shù)據(jù)挖掘應(yīng)用綜述,數(shù)據(jù)挖掘應(yīng)用概述,,數(shù)據(jù)挖掘技術(shù)與趨勢(shì),,數(shù)據(jù)挖掘應(yīng)用平臺(tái),數(shù)據(jù)挖掘應(yīng)用概述,應(yīng)用比例,,Data Mining Upsides,,Data Mining Downsides,,Data Mining Use,,Data Mining Industry and Application,,Data Mining Costs,應(yīng)用比例,,,,Discovery of pr

33、eviously unknown relationships, trends, anomalies, etc.,Powerful competitive weapon,Automation of repetitive analysis,Predictive capabilities,Data Mining Upsides,,,,Knowledge discovery technology immature,Long learning and tuning cycles for some technologies,“Black box” technology minimizes confiden

34、ce,VLDB (Very Large Data Base) requirements,Data Mining Downsides,Data Mining Uses,Discover anomalies, outliers and exceptions in process data,Discover behavior and predict outcomes of customer relationships,Churn management,Target marketing (market of one),Promotion management,Fraud detection,Patte

35、rn ID & matching (dark programs, science),,Data Mining Industry and Applications,From research prototypes to data mining products, languages, and standards,IBM Intelligent Miner, SAS Enterprise Miner, SGI MineSet, Clementine, MS/SQLServer 2000, DBMiner, BlueMartini, MineIt, DigiMine, etc.,A few data

36、 mining languages and standards (esp. MS OLEDB for Data Mining).,Application achievements in many domains,Market analysis, trend analysis, fraud detection, outlier analysis, Web mining, etc.,Data Mining Costs,Desktop tools: $500 and up (MSFT coming at low price point),Server / MF based: $20,000 to $

37、700,000+,Must also add cost of extensive consulting for high end tools,Don’t forget long training and learning curve time,Ongoing process, not task automation software,提綱,數(shù)據(jù)倉庫概念,,數(shù)據(jù)倉庫體系結(jié)構(gòu)及組件,,數(shù)據(jù)倉庫設(shè)計(jì),,數(shù)據(jù)倉庫技術(shù)(與數(shù)據(jù)庫技術(shù)的區(qū)別),,數(shù)據(jù)倉庫性能,,數(shù)據(jù)倉庫應(yīng)用,,數(shù)據(jù)挖掘應(yīng)用概述,,數(shù)據(jù)挖掘技術(shù)與趨勢(shì),,數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目),數(shù)據(jù)挖掘趨勢(shì),歷史回顧,,多學(xué)科交叉,,數(shù)據(jù)挖掘從多個(gè)

38、角度分類,,最近十年的研究進(jìn)展,,數(shù)據(jù)挖掘的趨勢(shì),,數(shù)據(jù)挖掘與標(biāo)準(zhǔn)化進(jìn)程,,歷史回顧,1989,IJCAI Workshop on Knowledge Discovery in Databases,Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley, 1991),1991-1994 Workshops on Knowledge Discovery in Databases,Advances in Knowledge Discovery and Data Mining (U. Fayyad, G. Piate

39、tsky-Shapiro, P. Smyth, and R. Uthurusamy, 1996),1995-1998 International Conferences on Knowledge Discovery in Databases and Data Mining (KDD’95-98),Journal of Data Mining and Knowledge Discovery (1997),1998 ACM SIGKDD, SIGKDD’1999-2001 conferences, and SIGKDD Explorations,More conferences on data m

40、ining,PAKDD, PKDD, SIAM-Data Mining, (IEEE) ICDM, DaWaK, SPIE-DM, etc.,Data Mining: Confluence of Multiple Disciplines,,,Data Mining,Database,Technology,Statistics,Other,Disciplines,Information,Science,Machine,Learning (AI),Visualization,A Multi-Dimensional View of Data Mining,,Research Progress in

41、the Last Decade,Multi-dimensional data analysis: Data warehouse and OLAP (on-line analytical processing),Association, correlation, and causality analysis,Classification: scalability and new approaches,Clustering and outlier analysis,Sequential patterns and time-series analysis,Similarity analysis: c

42、urves, trends, images, texts, etc.,Text mining, Web mining and Weblog analysis,Spatial, multimedia, scientific data analysis,Data preprocessing and database compression,Data visualization and visual data mining,Many others, e.g., collaborative filtering,Research Directions,— [Han J. W. , 2001],Web m

43、ining,Towards integrated data mining environments and tools,“Vertical” (or application-specific) data mining,Invisible data mining,Towards intelligent, efficient, and scalable data mining methods,Towards Integrated Data Mining Environments and Tools,OLAP Mining: Integration of Data Warehousing and D

44、ata Mining,Querying and Mining: An Integrated Information Analysis Environment,Basic Mining Operations and Mining Query Optimization,“Vertical” (or application-specific) data mining,Invisible data mining,Querying and Mining: An Integrated Information Analysis Environment,Data mining as a component o

45、f DBMS, data warehouse, or Web information system,Integrated information processing environment,MS/SQLServer-2000 (Analysis service),IBM IntelligentMiner on DB2,SAS EnterpriseMiner: data warehousing + mining,Query-based mining,Querying database/DW/Web knowledge,Efficiency and flexibility: preproces

46、sing, on-line processing, optimization, integration, etc.,“,Vertical” Data Mining,Generic data mining tools? —Too simple to match domain-specific, sophisticated applications,Expert knowledge and business logic represent many years of work in their own fields!,Data mining + business logic + domain ex

47、perts,A multi-dimensional view of data miners,Complexity of data: Web, sequence, spatial, multimedia, …,Complexity of domains: DNA, astronomy, market, telecom, …,Domain-specific data mining tools,Provide concrete, killer solution to specific problems,Feedback to build more powerful tools,Invisible D

48、ata Mining,Build mining functions into daily information services,Web search engine (link analysis, authoritative pages, user profiles)—adaptive web sites, etc.,Improvement of query processing: history + data,Making service smart and efficient,Benefits from/to data mining research,Data mining resear

49、ch has produced many scalable, efficient, novel mining solutions,Applications feed new challenge problems to research,Towards Intelligent Tools for Data Mining,Integration paves the way to intelligent mining,Smart interface brings intelligence,Easy to use, understand and manipulate,One picture may w

50、orth 1,000 words,Visual and audio data mining,Human-Centered Data Mining,Towards self-tuning, self-managing, self-triggering data mining,Integrated Mining: A Booster for Intelligent Mining,Integration paves the way to intelligent mining,Data mining integrates with DBMS, DW, WebDB, etc,Integration in

51、herits the power of up-to-date information technology: querying, MD analysis, similarity search, etc.,Mining can be viewed as querying database knowledge,Integration leads to standard interface/language, function/process standardization, utility, and reachability,Efficiency and scalability bring int

52、elligent mining to reality,數(shù)據(jù)挖掘與標(biāo)準(zhǔn)化進(jìn)程,CRISP—DM,過程標(biāo)準(zhǔn)化,(CRoss-Industry Standard Process for Data Mining),XML,與數(shù)據(jù)預(yù)處理相結(jié)合,SOAP(,Simple Object Access Protocol,),數(shù)據(jù)庫與系統(tǒng)互操作的標(biāo)準(zhǔn),PMML,預(yù)言模型交換標(biāo)準(zhǔn),OLE DB For Data Mining,數(shù)據(jù)挖掘系統(tǒng)基于,API,的接口,提綱,數(shù)據(jù)倉庫概念,,數(shù)據(jù)倉庫體系結(jié)構(gòu)及組件,,數(shù)據(jù)倉庫設(shè)計(jì),,數(shù)據(jù)倉庫技術(shù)(與數(shù)據(jù)庫技術(shù)的區(qū)別),,數(shù)據(jù)倉庫性能,,數(shù)據(jù)倉庫應(yīng)用,,數(shù)據(jù)挖掘應(yīng)用概述,,數(shù)

53、據(jù)挖掘技術(shù)與趨勢(shì),,數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目),數(shù)據(jù)挖掘應(yīng)用平臺(tái),項(xiàng)目最終目標(biāo),,研究內(nèi)容(含系統(tǒng)結(jié)構(gòu)、層次等),,技術(shù)路線和實(shí)現(xiàn)方法,,關(guān)鍵技術(shù)分析,,成果形式和考核指標(biāo),項(xiàng)目最終目標(biāo)(1),一年內(nèi),研究數(shù)據(jù)挖掘技術(shù),實(shí)現(xiàn)數(shù)據(jù)挖掘主要算法,開發(fā)出擁有自主知識(shí)產(chǎn)權(quán)并具有擴(kuò)充性好、便于應(yīng)用的特點(diǎn)的數(shù)據(jù)挖掘應(yīng)用平臺(tái),建立一套規(guī)范實(shí)用的數(shù)據(jù)挖掘?qū)嶋H應(yīng)用方法論,,,,項(xiàng)目最終目標(biāo)(2),所研究的數(shù)據(jù)挖掘技術(shù)達(dá)到國際先進(jìn)水平,,實(shí)現(xiàn)主要的數(shù)據(jù)挖掘算法,如關(guān)聯(lián)規(guī)則、聚集、分類等,,所開發(fā)的數(shù)據(jù)挖掘應(yīng)用平臺(tái)擁有自主知識(shí)產(chǎn)權(quán),并具有擴(kuò)充性好,便于應(yīng)用的特點(diǎn),,所建立的數(shù)據(jù)挖掘應(yīng)用的方法論規(guī)范實(shí)用,,,研

54、究內(nèi)容,層次結(jié)構(gòu),,數(shù)據(jù)挖掘 + 商業(yè)邏輯 + 行業(yè)應(yīng)用,,可擴(kuò)展性的體系結(jié)構(gòu),,軟件結(jié)構(gòu),,數(shù)據(jù)挖掘平臺(tái)的應(yīng)用,,針對(duì)行業(yè)的解決方案,,軟件開發(fā)商二次開發(fā),層次結(jié)構(gòu):,數(shù)據(jù)挖掘 + 商業(yè)邏輯 + 行業(yè)應(yīng)用,關(guān)聯(lián)規(guī)則、序列模式、分類、聚集、神經(jīng)元網(wǎng)絡(luò)、偏差分析…,數(shù)據(jù)挖掘算法層,產(chǎn)品推薦、客戶細(xì)分、客戶流失、欺詐甄別、特征分析,…,商業(yè)邏輯層,基因(,DNA),分析、銀行、保險(xiǎn)、電信、證券、零售業(yè),…,行業(yè)應(yīng)用層,,,,,數(shù)據(jù)挖掘應(yīng)用平臺(tái),可擴(kuò)展性的體系結(jié)構(gòu),MIS,ERP,CRM,E_Business,數(shù)據(jù)挖掘應(yīng)用平臺(tái),,探索數(shù)據(jù)倉庫,數(shù)據(jù)挖掘,算法庫,,模型庫,組件庫,產(chǎn)品推薦,客戶細(xì)分,

55、客戶流失,欺詐甄別,特征分析,序列分析,… …,,,,,,,,行業(yè)應(yīng)用,知識(shí),,,,,,,,數(shù)據(jù)挖掘應(yīng)用服務(wù)器,信息系統(tǒng),行業(yè)客戶端,,,軟件結(jié)構(gòu),供數(shù)據(jù)挖掘使用的數(shù)據(jù)倉庫,,ETL,工具,,數(shù)據(jù)挖掘應(yīng)用服務(wù)器,,數(shù)據(jù)挖掘應(yīng)用服務(wù)器管理平臺(tái),,針對(duì)行業(yè)的分析平臺(tái),數(shù)據(jù)挖掘平臺(tái)的應(yīng)用:,針對(duì)行業(yè)的解決方案,,,,,信息系統(tǒng),數(shù)據(jù)源,,,針對(duì)行業(yè)的數(shù)據(jù)挖掘應(yīng)用,,模型使用,數(shù)據(jù)挖掘平臺(tái)的應(yīng)用,:,軟件開發(fā)商二次開發(fā),,,,,信息系統(tǒng),數(shù)據(jù)源,,,軟件產(chǎn)品:,MIS、ERP、CRM,……,,模型使用,,原來的,軟件產(chǎn)品,,增加數(shù)據(jù)挖掘決策支持模塊,技術(shù)路線和實(shí)現(xiàn)方法,數(shù)據(jù)挖掘應(yīng)用服務(wù)器,應(yīng)用服務(wù)

56、器管理平臺(tái),行業(yè)應(yīng)用,,,1了解掌握研究動(dòng)態(tài),2 商業(yè)模型研究,3 數(shù)據(jù)倉庫建模,4 數(shù)據(jù)挖掘算法實(shí)現(xiàn),5 服務(wù)器框架構(gòu)建,,階段一,階段二,階段三,1 模型創(chuàng)建可視化,2 服務(wù)器調(diào)度和監(jiān)聽,3 數(shù)據(jù)抽取工具研制,4 用戶界面友好,,1 模型顯示可視化,2 模型組件的應(yīng)用,3 特定行業(yè)應(yīng)用,4 組件二次開發(fā)應(yīng)用,5 人機(jī)接口友好,,,,關(guān)鍵技術(shù)分析,商業(yè)模型在數(shù)據(jù)倉庫中的實(shí)現(xiàn),,,商業(yè)模型可視化研究,,模型平滑地嵌入其他應(yīng)用(,ERP,CRM),,ETL,(,抽取、轉(zhuǎn)換、裝載)工具的研制,,,挖掘算法與商業(yè)模型之間的映射關(guān)系,,,數(shù)據(jù)挖掘算法的優(yōu)化,,Any Questions?,Zhujia

57、nqiu@,,演講完畢,謝謝觀看!,內(nèi)容總結(jié),數(shù)據(jù)倉庫與數(shù)據(jù)挖掘綜述。每個(gè)鍵碼都有時(shí)間元素(從日期到日期。隨時(shí)間而變化(時(shí)間維)。數(shù)據(jù)集市 -- 小型的,面向部門或工作組級(jí)數(shù)據(jù)倉庫?,F(xiàn)實(shí)環(huán)境—異質(zhì)性[Douglas Hackney ,2001]。本部門自治 -- 設(shè)計(jì)上具有靈活性。操作型(OLTP)數(shù)據(jù)源 --- 銷售庫。批模式,聯(lián)機(jī)模式并不非常有用。程序員/設(shè)計(jì)者對(duì)數(shù)據(jù)存放位置的控制(塊/頁)。單獨(dú)索引處理(查看索引就能提供某些服務(wù))。傳統(tǒng)DBMS與數(shù)據(jù)倉庫DBMS區(qū)別。管理更多數(shù)據(jù):10GB/100GB/TB。通用DBMS物理上優(yōu)化便于事務(wù)訪問處理,而數(shù)據(jù)倉庫便于DSS訪問分析。多維DBMS(OLAP)是一種技術(shù),數(shù)據(jù)倉庫是一種體系結(jié)構(gòu)的基礎(chǔ)。DSS分析人員和IT專業(yè)人員不同,需要元數(shù)據(jù)的幫助。操作型環(huán)境和數(shù)據(jù)倉庫環(huán)境之間的映射需要元數(shù)據(jù)。上下文和內(nèi)容(上下文維)。簡單上下文信息(數(shù)據(jù)結(jié)構(gòu)/編碼/命名約定/度量)。調(diào)查對(duì)象:3000+ 用戶或意向用戶,

展開閱讀全文
溫馨提示:
1: 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2: 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3.本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 裝配圖網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

相關(guān)資源

更多
正為您匹配相似的精品文檔
關(guān)于我們 - 網(wǎng)站聲明 - 網(wǎng)站地圖 - 資源地圖 - 友情鏈接 - 網(wǎng)站客服 - 聯(lián)系我們

copyright@ 2023-2025  zhuangpeitu.com 裝配圖網(wǎng)版權(quán)所有   聯(lián)系電話:18123376007

備案號(hào):ICP2024067431號(hào)-1 川公網(wǎng)安備51140202000466號(hào)


本站為文檔C2C交易模式,即用戶上傳的文檔直接被用戶下載,本站只是中間服務(wù)平臺(tái),本站所有文檔下載所得的收益歸上傳人(含作者)所有。裝配圖網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)上載內(nèi)容本身不做任何修改或編輯。若文檔所含內(nèi)容侵犯了您的版權(quán)或隱私,請(qǐng)立即通知裝配圖網(wǎng),我們立即給予刪除!