數(shù)據(jù)挖掘——第二章認(rèn)識(shí)數(shù)據(jù)

單擊此處編輯母版標(biāo)題樣式,,單擊此處編輯母版文本樣式,,第二級(jí),,第三級(jí),,第四級(jí),,第五級(jí),,*,,*,,第2章認(rèn)識(shí)數(shù)據(jù),,,,一·數(shù)據(jù)對(duì)象,,數(shù)據(jù)集由數(shù)據(jù)對(duì)象構(gòu)成,,個(gè)數(shù)據(jù)對(duì)象代表一個(gè)實(shí)體,,例子,,銷(xiāo)售數(shù)據(jù)庫(kù) sales database:客戶/顧客,商店物品, sales,,醫(yī)學(xué)數(shù)據(jù)庫(kù):,,s,treatments,,大學(xué)數(shù)據(jù)庫(kù): students,, professors,, courses,,又稱(chēng)為樣本,事例,實(shí)例,數(shù)據(jù)點(diǎn),對(duì)象,元組 tuples.,,數(shù)據(jù)對(duì)象由屬性來(lái)描述,,Database rows ->data objects; columns -attributes.,,,,屬性「,,屬性 Attribute(or維度,特征,變量):一個(gè)數(shù)據(jù)字段,表示,,個(gè)數(shù)據(jù)對(duì)象的某個(gè)特征,,E. g, customer-ID, name, address,,類(lèi)型:,,名詞性 Nominal,,元的,,數(shù)字的 Numeric:數(shù)量的,,Interval-scaled,,· Ratio-scaled,,,,屬性類(lèi)型,,,名詞性 Nomina:類(lèi)別,狀態(tài),r“名目,,whie olor=auburn, black, blond, brown, grey, red.,,Hai,,婚姻狀態(tài),職業(yè) occupation, ID numbers, zip codes,,只有2個(gè)狀態(tài)的名詞性屬性(0and1),,對(duì)稱(chēng)二元 Symmetric binary同樣重要的兩相,,e.g., gender,,非對(duì)稱(chēng) Asymmetric binary:非同等重要,,eg,醫(yī)療檢查( positive vs negative),,慣例 Convention: assign1 to most important,,outcome(e.g, HIV positive),,·順序的 Ordinal,,值有一個(gè)有意義的順序排序)但連續(xù)值之間的大小未知,,size={smal, medium,arge,等級(jí),軍隊(duì)排名,,,,數(shù)值屬性的類(lèi)型,,數(shù)量 Quantity( integer or real-valued),,區(qū)間mera,,在某個(gè)同等大小的一個(gè)尺度單位上 Measured on a,,scale of equal-sized units,,值有序,,E.g., temperature in C or F, calendar dates,,沒(méi)有真正的零點(diǎn),,Ratio,,有真正的零點(diǎn),,可以講值是被測(cè)量單位一個(gè)數(shù)量級(jí)(0Kwe,,s high a,,eg,溫度在開(kāi)爾又長(zhǎng)度計(jì)數(shù)貨幣的數(shù)量,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,。
