1.課程簡(jiǎn)介
(1)教師是Rachid Hamadi,名字發(fā)音是Ra’shid。
(2)課程主要是理論,少量編程,使用開源軟件PostgreSQL 9.3.3,這個(gè)軟件是engineer-based,市面常見的是web-based MySQL。
(3)共3個(gè)assignment:
– Assignment 1: ER Diagram / Relational Mapping (10%) (due Week 5)
– Assignment 2: SQL Queries and Functions (15%) (due Week 8)
– Assignment 3: Normalisation / Relational Algebra / Transaction (15%) (due Week 11)
database的一個(gè)核心是消除redundancy,一條數(shù)據(jù)不要重復(fù)儲(chǔ)存。database最常見的問題是如何同時(shí)處理數(shù)據(jù),比如,在一個(gè)賬戶下多人同時(shí)save和withdraw。
(4)9311的后續(xù)課程
? COMP9315: how to build relational DBMSs (write your own Oracle)
? COMP9318: techniques for data mining (discovering patterns in DB)
? COMP6714: information retrieval, web search (dealing with text data)
? COMP9319: web search and data compression (dealing searching compressed web data)
? COMP932(1|2|3): service-oriented computing, which relies on DB background
課件中在大標(biāo)題上經(jīng)常能看到“cont’d”,含義是continue
2.Data Modeling
(1)什么是database
最簡(jiǎn)單的定義是Database is a set of tables that data are related to each other. 換言之,無關(guān)聯(lián)的數(shù)據(jù)不應(yīng)該用數(shù)據(jù)庫處理。
Textbook的定義:a collection of related data
(2)database中的重要概念
1.Data:能描述所代表的對(duì)象特征
2.Relationships:定義數(shù)據(jù)間的關(guān)聯(lián)
3.Constraints:關(guān)于數(shù)據(jù)和關(guān)聯(lián)的限制,比如bank teller不能修改客戶的賬戶金額
4.Redundancy: 每條數(shù)據(jù)只在一處存儲(chǔ)
5.Data manipulation:常見declarative和procedural兩類,data manipulation是數(shù)據(jù)操作,比如篩選數(shù)據(jù)、插入、刪除。declarative programming描述對(duì)象的性質(zhì)而不是流程,主要在邏輯層面構(gòu)建管理數(shù)據(jù)庫,procedural programming起源于structured programming,基于程序調(diào)用(procedure call)或函數(shù)調(diào)用(function call)進(jìn)行流程控制。
6.Transactions:多條數(shù)據(jù)的同時(shí)處理,一條數(shù)據(jù)的多用戶處理,以及引發(fā)的數(shù)據(jù)集聚效應(yīng)
7.Concurrency:多用戶同時(shí)調(diào)用一個(gè)數(shù)據(jù)
8.Scale:big data被應(yīng)用的越來越多,如何構(gòu)建成規(guī)模的數(shù)據(jù)庫變成越來越重要的問題
(3)什么是data
Textbook的定義:Known facts that can be recorded and have explicit meaning. 這意味著數(shù)據(jù)應(yīng)該有數(shù)據(jù)類型和存儲(chǔ)方式等信息。
孤立存在的數(shù)據(jù)沒有什么價(jià)值,有關(guān)聯(lián)的數(shù)據(jù)形成確定的數(shù)據(jù)結(jié)構(gòu),再通過database management system(DBMS)管理就實(shí)現(xiàn)了數(shù)據(jù)的價(jià)值。
(4)什么是DBMS
Textbook的定義:a collection of programs that enables users to create and maintain a database.
(5)什么是database system
Textbook的定義:the database and DBMS together.
按照范圍來記憶,上文提到最小的單元是data,data關(guān)聯(lián)起來后形成database,構(gòu)建對(duì)database的管理形成DBMS,database和DBMS加和構(gòu)成了database system。即
data < database < DBMS < database system

(6)Database Users
1.Database Administrator(DBA):管理數(shù)據(jù)庫權(quán)限,協(xié)調(diào)服務(wù)用戶,維護(hù)數(shù)據(jù)庫的軟硬件,提升數(shù)據(jù)庫的效率
2.Databse Designer:定義數(shù)據(jù)結(jié)構(gòu),數(shù)據(jù)限制,數(shù)據(jù)傳輸,要直接對(duì)終端客戶的需求負(fù)責(zé)
3.End Users:數(shù)據(jù)查詢,數(shù)據(jù)更新等。其下又分為4類用戶--a. Casual,偶爾使用數(shù)據(jù)庫;b. Naive,占據(jù)絕大多數(shù)的end users,他們不需要了解數(shù)據(jù)庫的底層構(gòu)建,而是在頂層使用,例如之前提到的bank-teller;c. Sophisticated,這些用戶非常熟悉database,他們的工作和database密切相關(guān),例如商業(yè)分析師、科學(xué)家和工程師;d. Stand-alone,維護(hù)使用自己的數(shù)據(jù)庫,并不分享,例如個(gè)人通訊錄
(7)DBMS
課程核心是學(xué)會(huì)SQL-Structured Query Language,具體使用什么樣的DBMS不重要,一般Relational database system is more efficient and flexible than object-oriented database system.
注:PL-Procedural Language
1.Oracle:resource hungry, non-flexible
2.MySQL:開源,高效,web-based,但oracle控制了MySQL,未來可能解除開源的屬性
3.PostgreSQL:課程使用
4.MongoDB:開源,NoSQL database program

(8)數(shù)據(jù)庫開發(fā)流程
database application development:
1.analyse application requirements
2.develop a data model to meet these requirements
3.define operations (transactions) on this model
4.implement the data model as relational schema
5.implement transactions via SQL and PLs
6.construct a web interface to these transactions
分析需求-建立模型-定義操作-實(shí)現(xiàn)關(guān)系模型-實(shí)現(xiàn)操作-構(gòu)建交互界面
(9)數(shù)據(jù)庫語言
1.Request to DBMS(e.g. SQL):
data manipulation language, DML (data level);
data definition language, DDL (data structure constraints);
create and drop database, indexes, functions
2.Results/effects from DBMS requests
tuples or sets of tuples
changes to underlying data store
(10)Data Modeling
1.Data modeling的目標(biāo):describe information, relationships, constraints.
2.Data modelling is a design process, it converts requirements into a data model.
3.Data modeling類型:logical(抽象的概念設(shè)計(jì),比如Entity Relationship, Object Definition Language),physical(實(shí)用型,例如relational)。一般先design using abstract mode,再 map to physical model。

4.design ideas:從簡(jiǎn)單的信息入手,先找到對(duì)象然后分析關(guān)聯(lián),一般自然語言的名詞是數(shù)據(jù),而動(dòng)詞是關(guān)聯(lián),考慮所有可能的數(shù)據(jù)。沒有完美的design,design時(shí)要多考慮正確性correctness,完整性completeness,一致性consistency。
Example Gmail
最重要的是分析model的核心要素:information/data, relationships, constraints
Data Objects:
1.Users--username, password, security questions, backup email, phone, etc.
2.Emails--subject, body, from, to, cc, bcc, date-sent, status, etc.
3.Attachments--content-type, content, etc.
4.Sessions--user, expiry-time, location, etc.
5.Labels--name, color, etc.
Relationships:
Email-has-Label
Email-has-Attachment
Email-is a reply to-Email
3.ER Notation(以Textbook的Entity Relationship notation為準(zhǔn))
(1)ER Model
世界是由互相關(guān)聯(lián)的entities構(gòu)建起來的。
ER的三個(gè)核心要素是:
attributes屬性--data item describing a property of interest
entity實(shí)體--collection of attributes describing object of interest
relationship關(guān)聯(lián)--association between entities (objects)
(2)ER Diagrams
ER diagrams are a graphical tool for data modelling.
It is consist of Data, Relationships, Attributes, Connections among them.

圖中有3種一維幾何圖形(線段,加粗線段,箭頭)和3種二維幾何圖形(矩形,橢圓,菱形)
1.橢圓,代表attributes
2.矩形,代表data/entities
3.菱形,代表relationships
4.線段,代表participation中的”部分關(guān)系“
5.加粗線段,代表participation中的”完全關(guān)系“
6.箭頭,代表cardinality中的”1“;非箭頭,代表cardinality中的”多“
7.加橫線的橢圓,代表key attributes,一般是unique的
另外
1.entity/data通常用單數(shù)名詞表示一類概念
2.如果一個(gè)橢圓是可以依據(jù)其他橢圓計(jì)算得到的,則用虛線橢圓表示,比如年齡可以用出生日期計(jì)算。叫做derived attributes
3.如果一個(gè)橢圓包含多個(gè)橢圓,則用雙線橢圓表示,比如最喜歡的食物,其中包含多個(gè)attributes。叫做multivalued attributes
(3)entity sets
一種是extensional view,通過同類事物擴(kuò)展得到一個(gè)entity set;
另一種是intensional view,通過抽象事物得到一個(gè)class的entity set。
(4)attributs的keys
1.key/superkey指的是attributes中比較獨(dú)特可以代表object的一個(gè)/一組,從superkey中如果刪除一個(gè)attribute,往往還可以代表object;
2.candidate key值得也是attributes中比較獨(dú)特可以代表object的一個(gè)/一組,但是如果從candidate key中刪除一個(gè)attribute,不能再代表object;
3.primary key是由databse designer選擇的
注:ER圖中key用下劃線標(biāo)注。
(5)relational sets
Relationship被定義為an association among several entities;
Relationship set被定義為collection of relationships of the same type;
Degree = # entities involved in relationship (in ER model, ≥ 2);
Cardinality = # associated entities on each side of relationship;
Participation = must every entity be in the relationship





(6)Subclasses and Inheritance
1.A subclass of an entity set A is a set of entities:
? with all attributes of A, plus (usually) its own attributes
? that is involved in all of A's relationships, plus its own
2.Properties of subclasses:
? overlapping or disjoint
? total or partial
