Chapter 10 Functional Dependencies and Normalization for Relational Databases

BCNF (Boyce-Codd Normal Form) Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF There exist relations that are in 3NF but not in BCNF The goal is to have each relation in BCNF (or 3NF)

pdf62 trang | Chia sẻ: vutrong32 | Lượt xem: 1195 | Lượt tải: 0download
Bạn đang xem trước 20 trang tài liệu Chapter 10 Functional Dependencies and Normalization for Relational Databases, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Functional Dependencies and Normalization for Relational Databases CHAPTER 10 Informal Design Guidelines for Relational Databases Semantics of the Relation Attributes Reducing the redundant values in tuples Reducing the null values in tuples Disallowing the possibility of generating spurious tuples Semantics of the Relation Attributes GUIDELINE 1:  Each tuple in a relation should represent one entity or relationship instance. Attributes of different entities should not be mixed in the same relation. Only foreign keys should be used to refer to other entities  Entity and relationship attributes should be kept apart as much as possible. Semantics of the Relation Attributes Example: Redundant Information in Tuples and Update Anomalies Mixing attributes of multiple entities may cause problems Information is stored redundantly wasting storage Problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies Redundant Information in Tuples and Update Anomalies Redundant Information in Tuples and Update Anomalies Consider the relation: EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours) Update Anomaly: Changing the name of project number P1 from “Billing” to “Customer- Accounting” may cause this update to be made for all 100 employees working on project P1. Insert Anomaly: Cannot insert a project unless an employee is assigned to . Inversely - Cannot insert an employee unless an he/she is assigned to a project Redundant Information in Tuples and Update Anomalies Delete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project Redundant Information in Tuples and Update Anomalies GUIDELINE 2: Guideline to Redundant Information in Tuples and Update Anomalies Design the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. If any anomalies are present, note them clearly and make sure that the programs that update the database will operate correctly. Null Values in Tuples GUIDELINE 3: Relations should be designed such that their tuples will have as few NULL values as possible Attributes that are NULL frequently could be placed in separate relations  Reasons for nulls: Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist, but unavailable Spurious Tuples Bad designs for a relational database may result in erroneous results for certain JOIN operations The "lossless join" property is used to guarantee meaningful results for join operations GUIDELINE 4: The relations should be designed to satisfy the lossless join condition. No spurious tuples should be generated by doing a natural-join of any relations. Functional dependencies Consider the following relation schema {Plane}→ StartTime {Pilot, StartDate, StartTime}→ Plane {Plane, StartDate}→ Pilot Functional dependencies A functional dependency is a constraint between two sets of attributes from the database. Suppose that relational database schema has n attributes R{A1, A2, An}, X and Y are subset of R. Definition. A functional dependency between two sets of attributes X and Y: XY, specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that have t1[X]=t2[X] t1[Y] = t2[Y] . Functional dependencies Example: Functional dependencies We say that there is a functional dependency from X to Y, or that Y is functionally dependent on X. The abbreviation for functional dependency is FD or f.d. The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand side. X →Y ⇔ (t1.X = t2.X ⇒ t1.Y = t2.Y) Functional dependencies Example: Social security number determines employee name: SSN -> ENAME Project number determines project name and location PNUMBER -> {PNAME, PLOCATION} Employee SSN and project number determines the hours per week that the employee works on the project: {SSN, PNUMBER} -> HOURS Functional dependencies  If a constraint on R states that there cannot be more than one tuple with a given X value in any relation instance r(R), that is, X is a candidate key of R, this implies that XY for any subset of attributes Y of R If XY in R, this does not say whether or not YX in R If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K]) Functional dependencies(FDs) A functional dependency is a property of the semantics or meaning of the attributes. Relation extensions r(R) that satisfy the functional dependency constraints are called legal relation states of R. A functional dependency is a property of the relation schema R, not of a particular legal relation state r of R. Hence, an FD cannot be inferred automatically from a given relation extension r but must be defined explicitly by someone who knows the semantics of the attributes of R Inference Rules for FDs Given a set of FDs F on relation schema R. Any other functional dependencies hold in all legal relation instances that satisfy the dependencies in F can be said to be inferred or deduced from the FDs in F. The following six inference rules for functional dependencies (Armstrong's inference rules) IR1 (reflexive : phản xạ): If X Y, then XY. IR2. (Augmentation) If X Y, then XZ YZ. IR3. (Transitive) If X Y and Y Z, then X Z Inference Rules for FDs IR4(Decomposition): If X  YZ, then X  Y and X  Z IR5(Union): If X  Y and X  Z, then X  YZ IR6(pseudo-transitive) If X  Y and WY  Z, then WX  Z  The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property) Inference Rules for FDs Closure : the set of all dependencies that can be inferred from F is called the closure of F; it is denoted by F+. Closure of a set of attributes X with respect to F is the set X + of all attributes that are functionally determined by X. X + can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F Inference Rules for FDs Algorithm: Determining X+, the Closure of : the set of attribute X under F X+= X; repeat oldX+= X+; for each functional dependency Y Z in F do If X+  Y then X += X+ ∪Z; until (X+ = oldX+); Inference Rules for FDs Example 1: Let Q be a relation with attributes (A,B,C,D,E,G,H) and let the following functional dependencies hold F={f1: BA; f2: DACE; f3: DH; f4: GH C; f5: ACD} Find the closure X+ of X = {AC}. Answer: X+=ACDEH Inference Rules for FDs Example 2: Let Q be a relation with attributes (A,B,C,D,E,G) and let the following functional dependencies hold F = { f1: A → C; f2: A → EG; f3: B → D; f4: G → E} Find the closure X+ and Y+ of X = {A,B}; Y = {C,G,D} Answer: X+ = {ABCDEG} , Y+= {CGDE} Inference Rules for FDs Example 3: F = {SSNENAME, PNUMBER {PNAME, PLOCATION}, {SSN, PNUMBER}HOURS} Calculate the following closure sets with respect to F; {SSN }+ = {SSN, ENAME} {PNUMBER }+ = {PNUMBER, PNAME, PLOCATION} {SSN, PNUMBER}+ = {SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS} Inference Rules for FDs Algorithm: Determining F+ Find all subsets of relation Q+ Find all possible functional dependencies of Q Find all subset closure of Q Based on the closure of all subsets sought to determine which the functional dependencies belong F+ Inference Rules for FDs Example: Q(A,B,C) F = {AB C,C B} F+ ? All subset of attributes Closure of all subsets A+ = A B+ = B C+ = BC AC+ = ABC AB+ = ABC BC+ = BC  A B C  {A} {B} {C} {A,B} {A,C} {B,C} {A,B,C} Inference Rules for FDs All functional dependencies: AB ABC BC ABCF CA CBCF+ ACBCF+ BCAC AAB AABC BAC ABACF+ CBF CABC ACABCF+ BCABC AC BA BBC ABBCF+ CAB ACBF+ BCA AAC BAB BABC ABABCF+ CAC ACABF+ BCAB Inference Rules for FDs Result: F+ = { ABC, ABAC, ABBC, ABABC, CB, CBC, ACB, ACAB, ACBC, ACABC} Equivalence of Sets of FDs Two sets of FDs F and G are equivalent if: Every FD in F can be inferred from G, and Every FD in G can be inferred from F Hence, F and G are equivalent if F + =G + Definition: F covers G if every FD in G can be inferred from F (i.e., if G + subset-of F +) F and G are equivalent if F covers G and G covers F There is an algorithm for checking equivalence of sets of FDs Minimal Sets of FDs  A set of FDs is minimal if it satisfies the following conditions:  Every dependency in F has a single attribute for its right-hand side. Example: F = {A → B, A → C , B → C, AB → D}  Cannot replace any dependency X A in F with a dependency YA, where Y is a subset of X, and still have a set of dependencies that is equivalent to F. Example: F= {AB→C; B→C} Minimal Sets of FDs Cannot remove any dependency from F and still have a set of dependencies that is equivalent to F. Example: F = {A→BC, B→D, AB→D} redundant because: F ≡ F’= {A→BC, B→D} Minimal Sets of FDs Algorithm 10.2: Finding a Minimal Cover (phủ tối thiểu)for a set of Functional Dependencies F Remove all FDs have left hand side redundant Converse all FDs with the right hand side of more than one attribute to FDs having the right hand side of one attribute Remove all redundant FDs Minimal Sets of FDs Example: Let Q be a relation with attributes Q(A,B,C,D) and FD F={AB → CD, B → C, C → D}  Finding a Minimal Cover of F AB→CD is FD with the left hand side redundant A because B+=BCD⇒ B → CD ∈ F+ =>F≡{B → CD;B → C;C → D} F≡{B → D; B → C;C → D}=F1mc , In F1tt , B → D is redundant. F≡{B → C;C → D}=Fmc Minimal Sets of FDs Example: Q(MSCD,MSSV,CD,HG) F = {MSCD → CD; CD → MSCD; CD,MSSV → HG; MSCD,HG → MSSV; CD,HG → MSSV; MSCD,MSSV → HG}  Find minimal cover for FD F Result: Fmc = { MSCD → CD; CD → MSCD; CD,HG → MSSV; MSCD,MSSV → HG} Normalization of Relations Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form Normalization of Relations 2NF, 3NF, BCNF based on keys and FDs of a relation schema 4NF based on keys, multi-valued dependencies : MVDs; 5NF based on keys, join dependencies : JDs (Chapter 11) Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation; Chapter 11) Practical Use of Normal Forms Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect The database designers need not normalize to the highest possible normal form. (usually up to 3NF, BCNF or 4NF) Denormalization: the process of storing the join of higher normal form relations as a base relation—which is in a lower normal form Definitions of Keys and Attributes Participating in Keys A super key of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S] A key K is a super key with the additional property that removal of any attribute from K will cause K not to be a super key any more. Definitions of Keys and Attributes Participating in Keys Algorithm 10.3: Finding all key for a relation Q and set of Functional Dependencies F Determine all subset of Q+: X1, X2, , Xn Find closure of the Xi. Super key is Xi that they are closure =Q+, suppose that set supper key are S= {S1, S2, Sn} Create the set contain all key of Q from S by consider subset Si, Sj of S S (i ≠j), if Si  Sj then remove Sj (i,j=1..n). Definitions of Keys and Attributes Participating in Keys Example: Definitions of Keys and Attributes Participating in Keys Example: Definitions of Keys and Attributes Participating in Keys Algorithm 10.4: Finding one key K for a relation Q and set of Functional Dependencies F: Step 1: Assign K = Q+ Step 2: A is an attribute of K, K’=K-A. If K’+=Q+ then K=K’. Definitions of Keys and Attributes Participating in Keys Example: U={A,B,C,D,E} , F={AB->C, AC->B, BC->DE} Find K Step1: K=U> K=ABCDE Step2:(K\A)+=>(BCDE)+=BCDE ≠ U+ K=ABCDE Step3:(K\B)+=>(ACDE)+= ABCDE = U+K=ACDE Step4: (K\C)+ =>(ADE)+ = ADE ≠ U+ K=ACDE Step5: (K\D)+ => (ACE)+ = ACEBD=U+K=ACE Step6: (K\E)+ =>(AC)+ = ACBDE =U+ K=AC Definitions of Keys and Attributes Participating in Keys If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. A Prime attribute (thuộc tính khóa) must be a member of some candidate key A Nonprime attribute is not a prime attribute— that is, it is not a member of any candidate key. First Normal Form First normal form (INF): Disallow multivalued attributes, composite attributes, and their combinations. It states that the domain of an attribute must include only atomic values and that the value of any attribute in a tuple must be a single value from the domain of that attribute. First Normal Form Second Normal Form Second normal form (2NF) is based on the concept of full functional dependency. A functional dependency XY is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more. Examples: {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds Second Normal Form Definition. A relation schema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary key of R. A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key R can be decomposed into 2NF relations via the process of 2NF normalization Second Normal Form Algorithm 10.4 :The test for 2NF Give a relational Q, a set FD F Determine Q belong 2FN ? Step1: Find all key of Q Step2: With each key K, Find closure of all the subsets S of K Step3: If there are S+ contain Nonprime attribute then Q is not 2NF else Q is 2NF Second Normal Form Example: Q(A,B,C,D) F={AB→C; B→D; BC→A}. Is Q 2NF? D is nonprime attribute Q is not 2NF Third Normal Form Third normal form (3NF) is based on the concept of transitive dependency. Transitive functional dependency A functional dependency XZ that can be derived from two FDs XY and Y Z Examples: SSN DMGRSSN is a transitive FD since SSNDNUMBER and DNUMBERDMGRSSN hold Third Normal Form  Definition: A relation schema R is in 3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary key. Example: Third Normal Form NOTE: In X  Y and Y  Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency . EMP (SSN, Emp#, Salary ). SSNEmp#Salary and Emp# is a candidate key. General Normal Form Definitions (For Multiple Keys) The above definitions consider the primary key only The following more general definitions take into account relations with multiple candidate keys Partial and full functional dependencies and transitive dependencies will now be considered with respect to all candidate keys of a relation. General Definition of 2NF Definition: A relation schema R is in second normal form (2NF) if every nonprime attribute A in R is not partially dependent on any key of R. Example: General Definition of 2NF  In LOT relation (above example), Suppose that there are two candidate keys: PROPERTY_ID# and {COUNTY_NAME, LOT#}; that is, lot numbers are unique only within each county, but PROPERTY_ID numbers are unique across counties for the entire state. Based on the two candidate keys PROPERTY_ID# and {COUNTY_NAME, LOT#}, we know that functional dependencies FD1 and FD2 hold. We choose PROPERTY_ID# as the primary key, General Definition of 3NF Definition: A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional dependency XA holds in R, either X is a super key of R, or A is a prime attribute of R. BCNF (Boyce-Codd Normal Form) Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was found to be stricter than 3NF. That is, every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF. Definition. A relation schema R is in BCNF if whenever a nontrivial functional dependency XA holds in R, then X is a super key of R BCNF (Boyce-Codd Normal Form) Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF There exist relations that are in 3NF but not in BCNF The goal is to have each relation in BCNF (or 3NF) BCNF (Boyce-Codd Normal Form) Example: Q(A,B,C,D,E,I) F={ACD→EBI;CE→AD}. Q is a BCNF? F ≡ Ftt={ACD→E,ACD→B,ACD→I,CE→A,CE→D} The feft hand side of Ftt are super key Q is BCNF BCNF (Boyce-Codd Normal Form) Example:

Các file đính kèm theo tài liệu này:

  • pdfchapter101_2411.pdf