《數(shù)據(jù)庫管理系統(tǒng)》word版.doc
《《數(shù)據(jù)庫管理系統(tǒng)》word版.doc》由會員分享,可在線閱讀,更多相關(guān)《《數(shù)據(jù)庫管理系統(tǒng)》word版.doc(20頁珍藏版)》請?jiān)谘b配圖網(wǎng)上搜索。
單位代碼 01 學(xué) 號040101086 分 類 號 密 級____ ___ _ 文獻(xiàn)翻譯 數(shù)據(jù)庫管理系統(tǒng)概述 院(系)名稱 信息工程學(xué)院 專 業(yè) 名 稱 計(jì)算機(jī)科學(xué)與技術(shù) 學(xué) 生 姓 名 指 導(dǎo) 教 師 2008年4月15日 英文譯文 數(shù)據(jù)庫管理系統(tǒng)概述 赫克托加西亞-莫利納,杰夫?yàn)鯛柭?,珍妮? 1.2 數(shù)據(jù)庫管理系統(tǒng)概述 從圖1.1我們可以看到一個(gè)完整的數(shù)據(jù)庫管理系統(tǒng)概況。單框代表系統(tǒng)組件,而雙框代表內(nèi)存數(shù)據(jù)結(jié)構(gòu)。實(shí)線顯示控制流和數(shù)據(jù)流,而虛線僅表示數(shù)據(jù)流。由于這個(gè)圖很復(fù)雜,我們將分幾個(gè)階段來考慮細(xì)節(jié)。首先,在頂部,我們認(rèn)為應(yīng)該有兩個(gè)不同的命令來源到達(dá)數(shù)據(jù)庫: (1)請求或修改數(shù)據(jù)的傳統(tǒng)用戶和應(yīng)用程序。 (2)數(shù)據(jù)庫管理員:負(fù)責(zé)數(shù)據(jù)庫結(jié)構(gòu)或模型的個(gè)人或組織。 1.2.1 數(shù)據(jù)定義語言命令 第二種命令是簡單的進(jìn)程,從圖1.1的右上側(cè)開始,我們可以看見它的路徑。例如,為一所大學(xué)搞注冊的數(shù)據(jù)庫管理員,或簡稱DBA,應(yīng)該為每個(gè)學(xué)生建一張表或關(guān)系,從而說明這個(gè)學(xué)生所參加的課程以及那門課程的分?jǐn)?shù)。數(shù)據(jù)庫管理員還要規(guī)定學(xué)生的成績只能是A 、B 、C 、D和F。這個(gè)結(jié)構(gòu)和約束信息就是數(shù)據(jù)庫的全部。這表明在圖1.1中,數(shù)據(jù)庫管理員必須要有特殊的權(quán)力才能執(zhí)行模式更改指令,因?yàn)檫@些指令對數(shù)據(jù)庫有著深遠(yuǎn)的影響。這些模式更改數(shù)據(jù)庫定義語言指令(“DDL”代表“數(shù)據(jù)定義語言”)是由數(shù)據(jù)庫定義語言處理器解析,并傳遞給執(zhí)行引擎,經(jīng)過搜索/存檔/記錄管理,再到元數(shù)據(jù),即模型信息數(shù)據(jù)庫。 1.2.2 查詢處理概述 與數(shù)據(jù)庫管理系統(tǒng)的絕大部份交互都是沿著圖1.1左側(cè)的路徑。用戶或應(yīng)用程序啟動一些行為,并不會影響數(shù)據(jù)庫的模式,但可能會影響到數(shù)據(jù)庫的內(nèi)容(如果是一個(gè)修改命令行為),或?qū)臄?shù)據(jù)庫中提取數(shù)據(jù)(如果是一個(gè)查詢行為)。1.1節(jié)講過,用這些命令描述的語言稱為數(shù)據(jù)操縱語言(即DML),說白了就是查詢語言。我們可以使用很多數(shù)據(jù)操縱語言,但是在范例1.1 中所提到的那些數(shù)據(jù)查詢語言,是目前最常用的。DML語句由兩個(gè)獨(dú)立的子系統(tǒng)來處理,其過程如下: 查詢回復(fù) 查詢就是利用查詢編譯器進(jìn)行解析和優(yōu)化。由此產(chǎn)生的查詢計(jì)劃,或數(shù)據(jù)庫管理系統(tǒng)的行為序列將會作用于對查詢的回復(fù)。執(zhí)行引擎會為小段數(shù)據(jù),特別是記錄或關(guān)系元組發(fā)送一系列響應(yīng)到資源管理器,從而讓它了解數(shù)據(jù)文件(具有的關(guān)系)、那些文檔格式和記錄大小、索引文件,這有助于快速找到數(shù)據(jù)文件的元素。 請求數(shù)據(jù)被翻譯成頁,這些請求被傳遞給緩沖管理器。我們將在1.2.3節(jié)討論緩沖區(qū)管理器的作用,但簡單來說,它的任務(wù)是把在二級存儲器里(通常是磁盤)永久保存的部分合適數(shù)據(jù)發(fā)送到主存緩沖器中。通常,頁或“磁盤塊”是緩沖器和磁盤間的傳送單元。 緩沖管理器和存儲管理器相互通信而從磁盤獲得數(shù)據(jù)。存儲管理器可能會含有一些操作系統(tǒng)指令,但更特殊的是,數(shù)據(jù)庫管理系統(tǒng)可以直接向磁盤控制器發(fā)送指令。 事物處理 查詢和其它數(shù)據(jù)操縱語言行為被劃分成事物,事物是彼此孤立必須自動執(zhí)行的單元。通常每一個(gè)查詢或修改行為自身就是一個(gè)事物。此外,事物的執(zhí)行必須是持久的,意思是任何一個(gè)完成了的事物其結(jié)果必須是恒定的,即使系統(tǒng)恰巧在事物完成時(shí)崩潰。我們把事物處理器分成兩個(gè)主要部分: (1)一個(gè)并發(fā)控制管理器,或者調(diào)度器,負(fù)責(zé)確保事物的原子性和孤立性。 (2)一個(gè)日志恢復(fù)管理器,負(fù)責(zé)確保事物的持久性。 我們將在1.2.4節(jié)進(jìn)一步講述這些組件。 1.2.3存儲緩沖管理器 數(shù)據(jù)庫的數(shù)據(jù)通常放在二級存儲器,在現(xiàn)今的計(jì)算機(jī)系統(tǒng)中“二級存儲器“ 一般指磁盤。不過,要對數(shù)據(jù)執(zhí)行任何有用的操作,則數(shù)據(jù)必須在主存。存儲管理器的工作是控制數(shù)據(jù)在磁盤的存放以及數(shù)據(jù)在磁盤和主存儲器間的傳遞。 在一個(gè)簡單的數(shù)據(jù)庫系統(tǒng)中,存儲管理器或許僅僅是底層操作系統(tǒng)的文件系統(tǒng)。但是,為了提高效率,數(shù)據(jù)庫管理系統(tǒng)一般直接控制對磁盤的存儲,至少在某些情況下。存儲管理器記錄文件在磁盤上的位置,并獲得該塊或含有來自緩沖管理器回復(fù)的文件的那些塊。大家知道,磁盤一般可分為磁盤塊,這些磁盤塊是一些相鄰的區(qū)域,含有大量的字節(jié),可能是212或214(約4000至16000字節(jié))。 緩沖管理器負(fù)責(zé)把可用主存劃分成許多緩沖器,它們是頁大小的區(qū)域,能夠存放磁盤塊大小的內(nèi)容。因此,當(dāng)所有的數(shù)據(jù)庫管理器組件需要來自磁盤的信息時(shí),便直接或間接通過執(zhí)行引擎與緩沖器和緩沖管理器交互。不同組件所需要的各種信息可能包括: (1)數(shù)據(jù):數(shù)據(jù)庫本身的內(nèi)容。 (2)元數(shù)據(jù):描述數(shù)據(jù)庫結(jié)構(gòu)和約束的數(shù)據(jù)庫模型。 (3)統(tǒng)計(jì)數(shù)據(jù):數(shù)據(jù)庫管理系統(tǒng)收集和存儲的有關(guān)數(shù)據(jù)的屬性,如大小、值、各種關(guān)系以及數(shù)據(jù)庫組件。 (4)索引:支持高效訪問數(shù)據(jù)的數(shù)據(jù)結(jié)構(gòu)。 有關(guān)緩沖管理器的更完整描述及其發(fā)揮的作用將在15.7節(jié)講述。 1.2.4事物處理 把一個(gè)或更多的數(shù)據(jù)庫操作分組成一個(gè)事務(wù)是很正常的,事務(wù)就是一個(gè)必須要自動執(zhí)行并明顯脫離其它事務(wù)的工作單元。此外,數(shù)據(jù)庫管理系統(tǒng)提供持久性保證:事務(wù)一旦完成,將永遠(yuǎn)不會消失。因此,事務(wù)管理器接受來自一個(gè)應(yīng)用的事務(wù)指令,這些指令會告訴事務(wù)管理器什么時(shí)候事務(wù)開始或結(jié)束,以及此應(yīng)用所其期望的信息。所以接受交易指令,從一個(gè)應(yīng)用,其中告訴經(jīng)理人交易時(shí),交易的開始和結(jié)束,以及信息的期望應(yīng)用(例如,有些可能不希望請求原子數(shù))。事務(wù)處理器執(zhí)行下列任務(wù): (1)登記日志:為了保證持久性,數(shù)據(jù)庫的每一次變動都會單獨(dú)記錄在磁盤上。日志管理器遵循其中一些設(shè)計(jì),以確保無論何時(shí)系統(tǒng)發(fā)生故障或“沖突“現(xiàn)象,恢復(fù)管理器將能夠?qū)彶槿罩镜淖兓突謴?fù)數(shù)據(jù)庫,使其狀態(tài)一致。日志管理器最初把日志記錄在緩沖器里,并與緩沖區(qū)管理器協(xié)商,以確保緩沖器里的內(nèi)容在適當(dāng)?shù)臅r(shí)候?qū)懟氐酱疟P(磁盤里可以防止沖突)。 (2)并發(fā)控制:事物必須能獨(dú)立執(zhí)行。但在大多數(shù)系統(tǒng)中,事實(shí)上有許多事務(wù)同時(shí)執(zhí)行。因此,調(diào)度器(并發(fā)控制管理器)必須確保各種事務(wù)的個(gè)人行動有序進(jìn)行,結(jié)果就象是這些事務(wù)是一個(gè)整體在執(zhí)行,一次一個(gè)。一個(gè)典型的調(diào)度程序,它的工作就是在某些數(shù)據(jù)庫片段保持鎖。這些鎖,是防止兩個(gè)事務(wù)訪問同一塊數(shù)據(jù),以至于交互性很差。這些鎖一般都存放在主存的鎖表里,就象圖1.1 展示的那樣。調(diào)度器通過禁止執(zhí)行引擎訪問部分鎖定的數(shù)據(jù)庫來制約查詢的執(zhí)行和其他數(shù)據(jù)庫操作。 (3)解除死瑣:當(dāng)事物經(jīng)由調(diào)度器授予的鎖來競爭資源時(shí),它們很容易陷入一種狀態(tài),在這種狀態(tài)下任何事務(wù)都不能進(jìn)行,因?yàn)槊恳粋€(gè)事物都需要彼此已擁有的資源。事務(wù)管理器有責(zé)任干預(yù)和取消一個(gè)或更多的事務(wù),從而讓其它事物可以進(jìn)行下去。 1.2.5查詢處理器 數(shù)據(jù)庫管理系統(tǒng)這部分,對用戶影響最大的就是查詢處理器。圖1.1中查詢處理器由兩部分組成: 1、查詢編譯器,將查詢結(jié)果翻譯成一種內(nèi)部形式,即查詢計(jì)劃。后者是對數(shù)據(jù)的一系列操作。通常這些在查詢計(jì)劃里的操作是對“關(guān)系代數(shù)“的操作,這些將在5.2節(jié)討論。往往是在一查詢計(jì)劃是實(shí)施的"關(guān)系代數(shù)"的經(jīng)營方式,這是討論在第。查詢編譯器包括三個(gè)主要單元: (1)查詢分析器,它根據(jù)文字上的形式查詢建立在一個(gè)樹結(jié)構(gòu)。 (2)查詢預(yù)處理器,它從事對查詢的語義檢查(例如,確保查詢中的所有關(guān)系都真實(shí)存在),并把分析樹轉(zhuǎn)變成一棵代表初始查詢計(jì)劃的代數(shù)運(yùn)算樹。 (3)查詢優(yōu)化器,它將原始查詢計(jì)劃轉(zhuǎn)變成對實(shí)際數(shù)據(jù)操作的最佳可用序列。查詢編譯器使用元數(shù)據(jù)和統(tǒng)計(jì)數(shù)據(jù),以決定哪些操作序列可能是最快的。例如,存在著一種索引,它是提供訪問數(shù)據(jù)的一種專門數(shù)據(jù)結(jié)構(gòu)。并為那些數(shù)據(jù)的一個(gè)或多個(gè)組件賦值,可以使這些計(jì)劃速度遠(yuǎn)遠(yuǎn)超過另外的那些。 2、執(zhí)行引擎,它負(fù)責(zé)執(zhí)行所選定查詢計(jì)劃的每一步。執(zhí)行引擎會直接或通過緩沖器與其它大部分?jǐn)?shù)據(jù)庫組件相交互。為了處理那些數(shù)據(jù),它必須將來自數(shù)據(jù)庫的數(shù)據(jù)送到緩沖器里。它需要與調(diào)度器相交互,為了防止訪問已鎖定的數(shù)據(jù),并與日志管理器相聯(lián)系,以確保所有數(shù)據(jù)庫的變化都妥當(dāng)記錄。 1.3數(shù)據(jù)庫概述—系統(tǒng)研究 意念相關(guān)數(shù)據(jù)庫系統(tǒng),可分為三大類: (1)數(shù)據(jù)庫設(shè)計(jì)。怎樣創(chuàng)建一個(gè)有用的數(shù)據(jù)庫?什么樣的信息進(jìn)入數(shù)據(jù)庫?這些信息是怎么組織的?要對數(shù)據(jù)項(xiàng)的值和類型提出什么樣的假設(shè)?數(shù)據(jù)項(xiàng)又是如何連接的? (2)數(shù)據(jù)庫編程。怎樣表達(dá)查詢和其它數(shù)據(jù)庫操作?在一個(gè)應(yīng)用中如何使用數(shù)據(jù)庫管理系統(tǒng)的其他功能,如事務(wù)或約束?數(shù)據(jù)庫編程和常規(guī)編程是怎樣融合的? (3)數(shù)據(jù)庫系統(tǒng)實(shí)施。如何建立一個(gè)數(shù)據(jù)庫管理系統(tǒng),包括查詢處理,事務(wù)處理以及實(shí)現(xiàn)有效訪問的組織存儲等事情? 1.3.1數(shù)據(jù)庫設(shè)計(jì) 第2章剛開始為表達(dá)數(shù)據(jù)庫設(shè)計(jì)描述了一高級概念,即實(shí)體關(guān)系模型。我們在第3章介紹了關(guān)系模型,它是數(shù)據(jù)庫管理系統(tǒng)最廣泛采用的,且我們在1.1.2節(jié)接觸過 。我們講述了如何把實(shí)體關(guān)系設(shè)計(jì)轉(zhuǎn)換成關(guān)系設(shè)計(jì),又叫“關(guān)系數(shù)據(jù)庫模式”。以后,在6.6節(jié),我們將向大家展示如何使關(guān)系數(shù)據(jù)庫模式格式化成SQL語言的數(shù)據(jù)定義部分。 第3章還向讀者介紹了“依賴”的概念,這是格式化的描述一個(gè)關(guān)系中元組間關(guān)系的假設(shè)。依賴允許我們通過一個(gè)被稱為關(guān)系“正?;钡倪M(jìn)程改進(jìn)關(guān)系數(shù)據(jù)庫的設(shè)計(jì)。 在第4章我們將探討數(shù)據(jù)庫設(shè)計(jì)中的面向?qū)ο蠓椒?。那里,我們采用了ODL語言,它允許用面向?qū)ο蟮母呒壵Z句來描述數(shù)據(jù)庫。我們也在尋找將面向?qū)ο蟮脑O(shè)計(jì)與關(guān)系模型相結(jié)合的方法,從而得到一種所謂的“對象-關(guān)系”模型。最后,第四章還介紹了“半結(jié)構(gòu)化數(shù)據(jù)”,它是一種特別靈活的數(shù)據(jù)庫模型,我們可以在文檔語言XML中看到它的時(shí)尚體現(xiàn)。 1.3.2數(shù)據(jù)庫編程 第5章整個(gè)10節(jié)都涵蓋有數(shù)據(jù)庫編程。第5章首先以關(guān)系模型的一個(gè)抽象查詢方法開始,介紹了構(gòu)成“關(guān)系代數(shù)”的操作符集。 第6章介紹了有關(guān)SQL查詢和數(shù)據(jù)庫模型語句的基本思想。第七章介紹了有關(guān)數(shù)據(jù)上的約束和觸發(fā)器SQL的各方面。 第8章涵蓋了SQL編程的某些高級方面。首先,最簡單的SQL編程模型是一個(gè)獨(dú)立、通用查詢界面,在實(shí)踐中大多數(shù)SQL編程是嵌入在一個(gè)用傳統(tǒng)語言編寫的較大項(xiàng)目,如C語言。在第八章我們學(xué)習(xí)如何將周圍程序與SQL語句連接起來,以及怎樣將數(shù)據(jù)從數(shù)據(jù)庫傳遞給程序變量,反之亦然。本章還講述了如何利用SQL的功能,簡化事務(wù),連接客戶機(jī)到服務(wù)器,并授權(quán)非法用戶進(jìn)入數(shù)據(jù)庫。 在第9章我們將注意力轉(zhuǎn)向面向?qū)ο蟮臄?shù)據(jù)庫編程標(biāo)準(zhǔn)。在這里,我們考慮兩個(gè)方向。第一、OQL(對象查詢語言),可以看作是試圖使C + + ,或其他面向?qū)ο缶幊陶Z言與高級數(shù)據(jù)庫編程需求相兼容。第二、近來在SQL標(biāo)準(zhǔn)中采用的面向?qū)ο筇卣?,可以被看作是使關(guān)系數(shù)據(jù)庫、SQL與面向?qū)ο缶幊碳嫒莸囊淮螄L試。 最后,在第10章,我們回到在第5章中開始的對抽象查詢語言的研究。在這里,我們研究邏輯語言,看看它們是如何被用于擴(kuò)展現(xiàn)代SQL功能的。 1.3.3數(shù)據(jù)庫系統(tǒng)實(shí)現(xiàn) 本書的第三部分重點(diǎn)在如何實(shí)現(xiàn)數(shù)據(jù)庫管理系統(tǒng)。數(shù)據(jù)庫系統(tǒng)的實(shí)現(xiàn),這個(gè)課題可以大致分為三個(gè)部分: (1)存儲管理:如何有效使用二級存儲來容納數(shù)據(jù)以及實(shí)現(xiàn)它們的快速訪問。 (2)查詢處理:如何用一種很高級的語言,如SQL來表示查詢,并能實(shí)現(xiàn)高效執(zhí)行。 (3)事務(wù)管理:如何用1.2.4節(jié)中提到的ACID屬性支持事務(wù)。 這里的每個(gè)題目都涵蓋了書中的幾個(gè)章節(jié)。 存儲管理概述 第11章介紹了存儲器。不過,由于二級存儲器,尤其是磁盤,是數(shù)據(jù)庫管理系統(tǒng)管理數(shù)據(jù)的中心,所以我們要仔細(xì)研究數(shù)據(jù)存儲的方式以及在磁盤上的訪問。于是我們引入了基于磁盤數(shù)據(jù)的“塊模型”, 它幾乎影響了數(shù)據(jù)庫系統(tǒng)中所有的操作。 第12章涉及儲存的數(shù)據(jù)元素關(guān)系,元組,屬性值,以及其它數(shù)據(jù)模型里的等價(jià)物——符合數(shù)據(jù)塊模型的要求。接著我們看看用于構(gòu)建索引的重要數(shù)據(jù)結(jié)構(gòu)。索引是一個(gè)支持高效存取的數(shù)據(jù)結(jié)構(gòu)。 第13章涵蓋了重要的一維索引結(jié)構(gòu)—索引順序文件,B-樹和哈希表。這些索引通常被用于數(shù)據(jù)庫管理系統(tǒng),以支持屬性值已知并符合元組要求的查詢。B-樹也是用來訪問按給定屬性排列的關(guān)系。 第14章論述了多維索引,它們是專門應(yīng)用的數(shù)據(jù)結(jié)構(gòu),如地理數(shù)據(jù)庫,那里可以專門查詢某個(gè)地區(qū)的相關(guān)內(nèi)容。這些索引結(jié)構(gòu)也支持復(fù)雜的SQL查詢,這種查詢限定兩個(gè)或兩個(gè)以上屬性的值,而其中的這些結(jié)構(gòu)已開始在商業(yè)數(shù)據(jù)庫管理系統(tǒng)中出現(xiàn)。 查詢處理概述 第15章,涵蓋了基本的查詢執(zhí)行。我們學(xué)過一些關(guān)系代數(shù)操作的高效算法。這些算法的設(shè)計(jì)是高效的,當(dāng)數(shù)據(jù)存儲在磁盤時(shí),并在某些情況下,這些算法與主存算法有很大的差別。 在第16章,我們考慮查詢編譯器和優(yōu)化器的結(jié)構(gòu)。我們將從解析查詢以及對它們的語義檢查開始。接著,我們考慮查詢轉(zhuǎn)換,從SQL到關(guān)系代數(shù),邏輯查詢計(jì)劃的選擇,也就是,一個(gè)代數(shù)式,代表必須執(zhí)行的特殊操作,以及有關(guān)操作命令的必要約束。最后,我們探討物理查詢計(jì)劃的選擇,在此過程中,我們對特殊操作命令,用來實(shí)現(xiàn)每一步操作的算法都做了簡要概述。 事務(wù)處理概述 在第17章中,我們了解到在數(shù)據(jù)庫管理系統(tǒng)中如何實(shí)現(xiàn)事務(wù)的持久性。中心思想是設(shè)置一個(gè)能記錄數(shù)據(jù)庫所有變化的日志。任何存在于主存但不在磁盤的內(nèi)容都可能在沖突(比如,電力供應(yīng)中斷)時(shí)丟失。因此,我們必須謹(jǐn)慎行事,以一種恰當(dāng)?shù)闹刃驅(qū)?shù)據(jù)從從緩沖區(qū)移到磁盤,無論是數(shù)據(jù)庫自身的變化還是日志的變更。這里有幾個(gè)日志策略可用,但每次都在某些方面限制了我們的行動自由。 隨后,我們在第18章談到了并發(fā)控制的獨(dú)立性和原子性。我們將事務(wù)看作是讀寫數(shù)據(jù)庫元素的操作序列。本章的主要課題是如何管理數(shù)據(jù)庫元素上的鎖:使用的不同類型的鎖,事務(wù)獲得和釋放鎖的方式。此外,本章還研究了不使用瑣而能保證事務(wù)原子性和獨(dú)立性的一系列方法。 第19章總結(jié)了我們對事務(wù)處理的學(xué)習(xí)。我們總結(jié)了日志需求間的交互,這在第17章討論過,和并發(fā)性的要求,在第18章講過。處理死鎖,事務(wù)管理器的另一項(xiàng)重要功能,這里也提到過。在分散的環(huán)境里延長并發(fā)控制,也會在第19章介紹。 最后,我們認(rèn)為事務(wù)是“長”的是可能的,它會花費(fèi)幾小時(shí)或幾天的時(shí)間,而不是數(shù)毫秒。長事務(wù)不可能鎖住數(shù)據(jù)而沒有產(chǎn)生混亂,因?yàn)橛锌赡苡衅渌脩羰褂么藬?shù)據(jù),所以這迫使我們重新思考包含長事務(wù)的應(yīng)用并發(fā)控制。 1.3.4信息集成概述 數(shù)據(jù)庫系統(tǒng)近來的許多演變都朝著允許來自不同數(shù)據(jù)源功能的方向發(fā)展,這些數(shù)據(jù)源可能是在一個(gè)更大的整體上不能被數(shù)據(jù)庫管理系統(tǒng)處理的數(shù)據(jù)庫或信息資源。在第1.1.7節(jié),我們簡要的向你介紹了這些問題。我們討論集成的主要模式,包括翻譯和集成的源拷貝,稱為“數(shù)據(jù)倉庫”,以及收集來源的虛擬“觀點(diǎn)”,又叫解調(diào)器。 摘自:赫克托加西亞-莫利納,杰夫?yàn)鯛柭淠莘? 數(shù)據(jù)庫系統(tǒng)世界. 附:英文原文 Overview of a Database Management System Hector Garcia-Molina, Jeff Ullman, Jennifer Widom 1.2 Overview of a Database Management System In Fig. 1.1 we see an outline of a complete DBMS. Single boxes represent system components, while double boxes represent in-memory data structures. The solid lines indicate control and data flow, while dashed lines indicate data flow only. Since the diagram is complicated, we shall consider the details in several stages. First, at the top, we suggest that there are two distinct sources of commands to the DBMS: 1. Conventional users and application programs that ask for data or modify data. 2. A database administrator: a person or persons responsible for the structure or schema of the database. 1.2.1 Data-Definition Language Commands The second kind of command is the simpler to process, and we show its trail beginning at the upper right side of Fig. 1.1. For example, the database administrator, or DBA, for a university registrars database might decide that there should be a table or relation with columns for a student, a course the student has taken, and a grade for that student in that course. The DBA might also decide that the only allowable grades are A, B, C, D, and F. This structure and constraint information is all part of the schema of the database. It is shown in Fig. 1.1 as entered by the DBA, who needs special authority to execute schema-altering commands, since these can have profound effects on the database. These schema-altering DDL commands (“DDL” stands for “data-definition language”) are parsed by a DDL processor and passed to the execution engine, which then goes through the index/file/record manager to alter the metadata, that is, the schema information for the database. 1.2.2 Overview of Query Processing The great majority of interactions with the DBMS follow the path on the left side of Fig. 1.1. A user or an application program initiates some action that does not affect the schema of the database, but may affect the content of the database (if the action is a modification command) or will extract data from the database. Remember from Section 1.1 that the language in which these commands are expressed is called a data-manipulation language (DML) or somewhat colloquially a query language. There are many data-manipulation languages available, but SQL, which was mentioned in Example 1.1, is by far the most commonly used. DML statements are handled by two separate subsystems, as follows. Answering the query The query is parsed and optimized by a query compiler. The resulting query plan, or sequence of actions the DBMS will perform to answer the query, is passed to the execution engine. The execution engine issues a sequence of requests for small pieces of data, typically records or tuples of a relation, to a resource manager that knows about data files (holding relations), the format and size of records in those files, and index files, which help find elements of data files quickly. The requests for data are translated into pages and these requests are passed to the buffer manager. We shall discuss the role of the buffer manager in Section 1.2.3, but briefly, its task is to bring appropriate portions of the data from secondary storage (disk, normally) where it is kept permanently, to main memory buffers. Normally, the page or “disk block” is the unit of transfer between buffers and disk. The buffer manager communicates with a storage manager to get data from disk. The storage manager might involve operating-system commands, but more typically, the DBMS issues commands directly to the disk controller. Transaction processing Queries and other DML actions are grouped into transactions, which are units that must be executed atomically and in isolation from one another. Often each query or modification action is a transaction by itself. In addition, the execution of transactions must be durable, meaning that the effect of any completed transaction must be preserved even if the system fails in some way right after completion of the transaction. We divide the transaction processor into two major parts: 1. A concurrency-control manager, or scheduler, responsible for assuring atomicity and isolation of transactions, and 2. A logging and recovery manager, responsible for the durability of transactions. We shall consider these components further in Section 1.2.4. 1.2.3 Storage and Buffer Management The data of a database normally resides in secondary storage; in todays computer systems “secondary storage” generally means magnetic disk. However, to perform any useful operation on data, that data must be in main memory. It is the job of the storage manager to control the placement of data on disk and its movement between disk and main memory. In a simple database system, the storage manager might be nothing more than the file system of the underlying operating system. However, for efficiency purposes, DBMSs normally control storage on the disk directly, at least under some circumstances. The storage manager keeps track of the location of files on the disk and obtains the block or blocks containing a file on request from the buffer manager. Recall that disks are generally divided into disk blocks, which are regions of contiguous storage containing a large number of bytes, perhaps 212 or 214 (about 4000 to 16,000 bytes). The buffer manager is responsible for partitioning the available main memory into buffers, which are page-sized regions into which disk blocks can be transferred. Thus, all DBMS components that need information from the disk will interact with the buffers and the buffer manager, either directly or through the execution engine. The kinds of information that various components may need include: 1. Data: the contents of the database itself. 2. Metadata: the database schema that describes the structure of the database. 3. Statistics: information gathered and stored by the DBMS about data properties such as the sizes of, and values in, various relations or other components of the database. 4. Indexes: data structures that support efficient access to the data. A more complete discussion of the buffer manager and its role appears in Section 15.7. 1.2.4 Transaction Processing It is normal to group one or more database operations into a transaction, which is a unit of work that must be executed atomically and in apparent isolation from other transactions. In addition, a DBMS offers the guarantee of durability: that the work of a completed transaction will never be lost. The transaction manager therefore accepts transaction commands from an application, which tell the transaction manager when transactions begin and end, as well as information about the expectations of the application (some may not wish to require atomicity, for example). The transaction processor performs the following tasks: 1. Logging: In order to assure durability, every change in the database is logged separately on disk. The log manager follows one of several policies designed to assure that no matter when a system failure or “crash” occurs, a recovery manager will be able to examine the log of changes and restore the database to some consistent state. The log manager initially writes the log in buffers and negotiates with the buffer manager to make sure that buffers are written to disk (where data can survive a crash) at appropriate times. 2. Concurrency control: Transactions must appear to execute in isolation. But in most systems, there will in truth be many transactions executing at once. Thus, the scheduler (concurrency-control manager) must assure that the individual actions of multiple transactions are executed in such an order that the net effect is the same as if the transactions had in fact executed in their entirety, one-at-a-time. A typical scheduler does its work by maintaining locks on certain pieces of the database. These locks prevent two transactions from accessing the same piece of data in ways that interact badly. Locks are generally stored in a main-memory lock table, as suggested by Fig. 1.1. The scheduler affects the execution of queries and other database operations by forbidding the execution engine from accessing locked parts of the database. 3. Deadlock resolution: As transactions compete for resources through the locks that the scheduler grants, they can get into a situation where none can proceed because each needs something another transaction has. The transaction manager has the responsibility to intervene and cancel (“roll-back” or “abort”) one or more transactions to let the others proceed. 1.2.5 The Query Processor The portion of the DBMS that most affects the performance that the user sees is the query processor. In Fig. 1.1 the query processor is represented by two components: 1. The query compiler, which translates the query into an internal form called a query plan. The latter is a sequence of operations to be performed on the data. Often the operations in a query plan are implementations of “relational algebra” operations, which are discussed in Section 5.2. The query compiler consists of three major units: (a) A query parser, which builds a tree structure from the textual form of the query. (b) A query preprocessor, which performs semantic checks on the query (e.g., making sure all relations mentioned by the query actually exist), and performing some tree transformations to turn the parse tree into a tree of algebraic operators representing the initial query plan. (c) A query optimizer, which transforms the initial query plan into the best available sequence of operations on the actual data. The query compiler uses metadata and statistics about the data to decide which sequence of operations is likely to be the fastest. For example, the existence of an index, which is a specialized data structure that facilitates access to data, given values for one or more components of that data, can make one plan much faster than another. 2. The execution engine, which has the responsibility for executing each of the steps in the chosen query plan. The execution engine interacts with most of the other components of the DBMS, either directly or through the buffers. It must get the data from the database into buffers in order to manipulate that data. It needs to interact with the scheduler to avoid accessing data that is locked, and with the log manager to make sure that all database changes are properly logged. 1.3 Outline of Database-System Studies Ideas related to database systems can be divided into three broad categories: 1. Design of databases. How does one develop a useful database? What kinds of information go into the database? How is the information structured? What assumptions are made about types or values of data items? How do data items connect? 2. Database programming. How does one express queries and other operations on the database? How does one use other capabilities of a DBMS, such as transactions or constraints, in an application? How is database programming combined with conventional programming? 3. Database system implementation. How does one build a DBMS, including such matters as query processing, transaction processing and organizing storage for efficient access? 1.3.1 Database Design Chapter 2 begins with a high-level notation for expressing database designs, called the entity-relationship model. We introduce in Chapter 3 the relational model, which is the model used by the most widely adopted DBMSs, and which we touched upon briefly in Section 1.1.2. We show how to translate entity-relationship designs into relational designs, or “relational database schemas”. Later, in Section 6.6, we show how to render relational database schemas formally in the data-definition portion of the SQL language.Chapter 3 also introduces the reader to the notion of “dependencies”, which are formally stated assumptions about relationships among tuples in a relation. Dependencies allow us to improve relational database designs, through a process known as “normalization” of relations. In Chapter 4 we look at object-oriented approaches to database design. There, we cover the language ODL, which allows one to describe databases in a high-level, object-oriented fashion. We also look at ways in which object-oriented design has been combined with relational modeling, to yield the so-called “object-relational” model.Finally, Chapter 4 also introduces “semistructured data” as an especially flexible database model, and we see its modern embodiment in the document language XML. 1.3.2 Database Programming Chapters 5 through 10 cover database programming. We start in Chapter 5 with an abstract treatment of queries in the relational model, introducing the family of operators on relations that form “relational algebra”.Chapters 6 through 8 are devoted to SQL programming. As we mentioned, SQL is the dominant query language of the day. Chapter 6 introduces basic ideas regarding queries in SQL and the expression of database schemas in SQL.Chapter 7 covers aspects of SQL concerning constraints and triggers on the data.Chapter 8 covers certain advanced aspects of SQL programming. First, while the simplest model of SQL programming is a stand-alone, generic query interface, in practice most SQL programming is embedded in a larger program that is written in a conventional language, such as C. In Chapter 8 we learn how to connect SQL statements with a surrounding program and to pass data from the database to the programs variables and vice versa. This chapter also covers how one uses SQL features th- 1.請仔細(xì)閱讀文檔,確保文檔完整性,對于不預(yù)覽、不比對內(nèi)容而直接下載帶來的問題本站不予受理。
- 2.下載的文檔,不會出現(xiàn)我們的網(wǎng)址水印。
- 3、該文檔所得收入(下載+內(nèi)容+預(yù)覽)歸上傳者、原創(chuàng)作者;如果您是本文檔原作者,請點(diǎn)此認(rèn)領(lǐng)!既往收益都?xì)w您。
下載文檔到電腦,查找使用更方便
9.9 積分
下載 |
- 配套講稿:
如PPT文件的首頁顯示word圖標(biāo),表示該P(yáng)PT已包含配套word講稿。雙擊word圖標(biāo)可打開word文檔。
- 特殊限制:
部分文檔作品中含有的國旗、國徽等圖片,僅作為作品整體效果示例展示,禁止商用。設(shè)計(jì)者僅對作品中獨(dú)創(chuàng)性部分享有著作權(quán)。
- 關(guān) 鍵 詞:
- 數(shù)據(jù)庫管理系統(tǒng) 數(shù)據(jù)庫 管理 系統(tǒng) word
鏈接地址:http://www.3dchina-expo.com/p-8937278.html