Bài giảng Database systems - Functional dependencies & normalization

 Further Reading: [1] Chapter 15 & 16  4NF: based on multivalued dependency (MVD).  5NF: based on join dependency.  Such a dependency is very difficult to detect in practice.  Normalization into 5NF is considered very rarely in practice.  Other normal forms & algorithms.  ER modeling: top-down database design.  Bottom-up database design???

pdf71 trang | Chia sẻ: vutrong32 | Lượt xem: 2753 | Lượt tải: 1download
Bạn đang xem trước 20 trang tài liệu Bài giảng Database systems - Functional dependencies & normalization, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
DATABASE SYSTEMS Nguyen Ngoc Thien An FUNCTIONAL DEPENDENCIES & NORMALIZATION Spring 2014 Contents 2  Introduction  Functional Dependencies (FDs)  Normalization  Notes & Suggestions Reading Suggestion: [1] Chapter 15, 16 Top-Down Database Design 3 Mini-world Requirements Conceptual schema E1 E2 R Relation schemas Which is a formal way of analyzing why one grouping of attributes into a relation schema may be better than another? Goals & Evaluation of Design 4  Evaluating design quality of relational schemas  A relational database schema consists of a number of relation schemas.  Each relation schema consists of a number of attributes.  Measure formally why one set of groupings of attributes into relation schemas is better than another.  The implicit goals of the design activity:  Information preservation  Maintain all concepts captured in the conceptual design.  Minimum redundancy  Minimize redundant storage of the same information.  Reduce the need for multiple updates to maintain consistency across multiple copies of the same information. Goodness - Levels 5  Two levels of the goodness of relation schemas:  The logical (or conceptual) level: How users interpret the relation schemas and the meaning of their attributes.  The implementation (or physical storage) level: How the tuples in a base relation are stored and updated. Goodness - Measures 6  The semantics of the attributes is clear.  Redundant information in tuples and update anomalies (modification, deletion, insertion).  Cause difficulties to maintain consistency of data.  Require unnecessary updates that can be avoided.  Reducing the NULL values in tuples.  Waste space at the storage level.  Problems with understanding the attribute meaning.  Other difficulties: aggregation operations and joins.  Disallowing the possibility of generating spurious tuples. Examples of Anomalies 7 Insert a new employee of department 5  Must enter all information of department 5 which must be consistent with existing tuples. Delete the last employee of Department 1  Lose information of Department 1 Change Dname of department 5  Must change Dname of 4 rows How to insert a new department that has no employees? Examples of Spurious Tuples (1) 8 EMP_PROJ (SSN, PNUMBER, HOURS, ENAME, PNAME, PLOCATION) Not PK and FK Examples of Spurious Tuples (2) 9 EMP_PROJ1 * EMP_LOCS Informal Guidelines to Design 10  Guideline 1  Design a relation schema so that it is easy to explain its meaning.  Do not combine attributes from multiple entity types and relationship types into a single relation.  Guideline 2  Design anomaly-free base relations.  Specify views for placing together the attributes frequently referenced in important queries.  If any anomalies are present, make sure that the programs that update the database will operate correctly.  Guideline 3  Determine whether to include the columns that may have NULLs in a relation or to have a separate relation for those columns.  If NULLs are unavoidable, make sure that they do not apply to a majority of tuples in the relation.  Guideline 4  Design relation schemas so that they can be joined with equality conditions on attributes that are appropriately related (primary key, foreign key) pairs. Introduction to Normalization 11  “Goodness” measures  Normalization  Help DB designers determine the best relation schemas.  A formal framework for analyzing relation schemas.  Based on keys and functional dependencies.  A series of normal form tests that can be carried out on individual relation schemas so that the relational database can be normalized to any desired degree.  A process ensuring that the data is structured in such a way that attributes are grouped with the PK. Attributes that do not directly depend on PK may be extracted to form a new relation.  Decomposition Contents 12  Introduction  Functional Dependencies (FDs)  Normalization  Notes & Suggestions Contents 13  Functional Dependencies (FDs)  Definitions  Inference Rules  Closure of Sets of FDs  Equivalence of Sets of FDs  Minimal Cover  Full, Partial, Indirect Dependencies Definitions (1) 14  Functional dependencies (FDs) are used to specify formal measures of the "goodness" of relational designs.  FDs and keys are used to define normal forms for relations.  FDs are constraints that are derived from the meaning and interrelationships of the data attributes.  A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y: X  Y Definitions (2) 15  X  Y holds if for any two tuples t1 and t2 in any relation instance r(R): If t1[X] = t2[X], then t1[Y] = t2[Y]  The values of the Y component functionally depend on, or are functionally determined by the values of the X component.  The values of the X component uniquely (or functionally) determine the values of the Y component.  There is a functional dependency from X to Y.  X  Y in R specifies a constraint on all relation instances r(R).  If X  Y in R, this does not say whether or not Y  X in R. Definitions (3) 16  E.g.:  SSN functionally determines employee name:  SSN  ENAME  Project number determines project name and location:  PNUMBER  {PNAME, PLOCATION}  Ssn and project number determines the hours per week that the employee works on the project:  {SSN, PNUMBER}  HOURS  X is a candidate key of R - this implies that X  Y for any subset of attributes Y of R.  If X is a candidate key of R, then X  R. Inference Rules (1) 17  Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold.  Armstrong's inference rules:  IR1 (Reflexive): If Y ⊆ X, then X  Y.  IR2 (Augmentation): If X  Y, then XZ  YZ (Notation: XZ stands for X ∪ Z)  IR3 (Transitive): If X  Y and Y  Z, then X  Z  Armstrong’s inference rules are sound and complete. Inference Rules (2) 18  Additional inference rules:  IR4 (Decomposition): If X  YZ, then X  Y and X  Z.  Note: XY  A does not necessarily imply either X  A or Y  A.  IR5 (Union): If X  Y and X  Z, then X  YZ.  Note: X  A and Y  B does imply that XY  AB.  IR6 (Psuedotransitivity): If X  Y and WY  Z, then WX  Z.  IR4, IR5 and IR6 can be deduced from IR1, IR2, and IR3 (completeness property). Closure of Sets of FDs (1) 19  Closure of F is the set F+ including all FDs in F as well as all FDs that can be inferred from F.  Closure of a set of attributes X under F is the set X+ of all attributes that are functionally determined by X based on F. Closure of Sets of FDs (2) 20  Exercise: Consider a relation R(A, B, C, D, E) with the following dependencies F: 1. AB  C 2. CD  E 3. DE  B  Find F+.  Find {A, B}+. Equivalence of Sets of FDs 21  Two sets of FDs F and G are equivalent if F+ = G+.  F covers G if G+ ⊆ F+.  F and G are equivalent if F covers G and G covers F.  Algorithm for checking equivalence of sets of FDs:  Determine whether F covers E  For each FD X  Y in E  Calculate X+ with respect to F.  Check whether this X+ includes the attributes in Y.  If this is the case for every FD in E, then F covers E.  Determine whether E and F are equivalent by checking that E covers F and F covers E.  Exercise: Show that the following F and G are equivalent:  F = {A  C, AC  D, E  AD, E  H}  G = {A  CD, E  AH}. Minimal Cover (1) 22  A set of FDs is minimal if it satisfies the following conditions:  Every dependency in F has a single attribute for its right-hand side.  Cannot remove any dependency from F and still have a set of dependencies that is equivalent to F.  Cannot replace any dependency X  A in F with a dependency Y  A where Y is a proper subset of X and still have a set of dependencies that is equivalent to .  A minimal cover of a set of functional dependencies E is a minimal set of dependencies that is equivalent to E. Minimal Cover (2) 23  Always find at least one minimal cover F for any set of dependencies E.  If several sets of FDs qualify as minimal covers of E, it is customary to use additional criteria for minimality.  Choose the minimal set with the smallest number of dependencies or with the smallest total length.  There is no simple algorithm for computing a minimal set of FDs that is equivalent to a set F of FDs  To synthesize a set of relations, we assume that we start with a set of dependencies that is a minimal set. Minimal Cover (3) Minimal Cover (4) 25  Exercise:  Let the given set of FDs be E: {B  A, D  A, AB  D}. We have to find the minimal cover of E.  Answer:  The minimal cover of E is {B  D, D  A}. Full & Partial Dependencies (1) 26  X  Y is a full functional dependency if for any attribute A ∈ X, (X – {A}) does not functionally determine Y.  X  Y is a partial dependency if for some attribute A ∈ X, (X – {A})  Y.  More than one value is required to determine the value of another attribute, the combination of values is called a composite determinant.  E.g.:  {Ssn, Pnumber}  Hours is a full dependency.  {Ssn, Pnumber}  Ename is partial because Ssn  Ename holds. Full & Partial Dependencies (2) 27 Agent-id Booking-date Performer-location Agent-name Agent-location Performer-id Performer-name Performer-type Depend functionally partially on {Performer-id, Agent-id} Depend functionally fully on {Performer-id, Agent-id} Indirect Dependencies 28  A functional dependency X  Y in a relation schema R is a transitive dependency (or indirect dependency) if there exists a set of attributes Z in R that is neither a candidate key nor a subset of any key of R, and both X  Z and Z  Y hold. Performer-location Performer-id Performer-name Performer-type Fee Contents 29  Introduction  Functional Dependencies (FDs)  Normalization  Notes & Suggestions Contents 30  Normalization  Introduction to Normalization  Prime & Nonprime Attributes  Normal Forms  1NF and Dependency Problems  2NF  3NF  Summary of Normal Forms based on Primary Keys General Normal Form Definitions  BCNF Introduction to Normalization (1) 31  Normalization of data  A process of analyzing the given relation schemas based on their FDs and keys to achieve the desirable properties: Minimizing redundancy. Minimizing the insertion, deletion, and update anomalies.  Unsatisfactory relation schemas are decomposed into smaller relation schemas that meet the normal form tests and hence possess the desirable properties. Introduction to Normalization (2) 32  The process of normalization through decomposition must also confirm the existence of additional properties: a. Nonadditive (Lossless) Join Property  Ensure that no spurious tuples are generated when a NATURAL JOIN operation is applied to the relations resulting from the decomposition. b. Dependency Preservation Property  Preservation of the functional dependencies.  The property (a) is extremely critical and cannot be sacrificed.  The property (b) is less stringent and may be sacrificed. Introduction to Normalization (3) 33  Normal forms  1NF, 2NF, 3NF, BCNF: based on the functional dependencies among the attributes of a relation.  4NF: based on the concepts of multivalued dependencies.  5NF: based on the concepts of join dependencies.  The normal form of a relation refers to the highest normal form that it meets.  The database designers not need to normalize to the highest possible normal form.  Usually only up to 3NF, BCNF or 4NF.  Relations may be left in a lower normalization status, such as 2NF, for performance reasons.  Incur the corresponding penalties of dealing with the anomalies. Contents 34  Normalization  Introduction to Normalization  Prime & Nonprime Attributes  Normal Forms  1NF and Dependency Problems  2NF  3NF  Summary of Normal Forms based on Primary Keys General Normal Form Definitions  BCNF Prime & Nonprime Attributes 35  A prime attribute must be a member of some candidate key.  A nonprime attribute is not a prime attribute. It is not a member of any candidate key. Normal Forms 36  1NF  2NF - solves partial dependencies.  3NF - solves indirect dependencies.  BCNF - well-normalized relations. Contents 37  Normalization  Introduction to Normalization  Prime & Nonprime Attributes  Normal Forms  1NF and Dependency Problems  2NF  3NF  Summary of Normal Forms based on Primary Keys General Normal Form Definitions  BCNF 1NF (1) 38  First normal form (1NF): there is only one value at the intersection of each row and column of a relation.  Disallows composite attributes, multivalued attributes, their combinations and nested relations.  The only attribute values permitted by 1NF are single atomic (or indivisible) values.  To be part of the formal definition of a relation in the basic (flat) relational model. 1NF (2) 39 1NF (3) 40 1NF (4) 41 1NF (5) 42 1NF determinancy diagram Performer-name Agent-id Venue-id Event-id Booking-date Performer-location Agent-name Agent-location Venue-name Venue-location Event-name Event-type Performer-id Performer-type Fee Contents 43  Normalization  Introduction to Normalization  Prime & Nonprime Attributes  Normal Forms  1NF and Dependency Problems  2NF  3NF  Summary of Normal Forms based on Primary Keys General Normal Form Definitions  BCNF 2NF (1) 44  2NF solves partial dependency problem in 1NF.  Second normal form (2NF): No nonprime attribute A in R is partially functionally dependent on the primary key of R.  Test: check functional dependencies whose left- hand side attributes are part of the primary key and then check its right-hand side.  Decompose: identify new primary keys and group attributes that are fully functionally dependent on each key together to form separate new relations. 2NF (2) 45 Performer-name Performer-location Performer-id Performer-type Fee Relation in 2NF P-id P-name P-type Fee P-location 101 Baron Singer 75 York 105 Steed Dancer 60 Berlin 108 Jones Actor 85 Bombay 112 Eagles Actor 85 Leeds 126 Stokes Comedian 90 Athens 2NF (3) 46 Relations in 2NF Agent-id Agent-name Agent-location Venue-id Venue-name Venue-location A-id A-name A-location 1295 Burton London 1435 Nunn Boston 1460 Stritch Rome 1504 Lee Taipei 1509 Patel York 1522 Ellis Madrid 1682 Tsang Beijing V-id V-name V-location 17 Silbury Tunis 35 Polish Athens 46 Royale Cairo 54 Nation Lisbon 59 Atlas Tokyo 75 Vostok Kiev 79 Festive Rome 2NF (4) 47 Event-id Event-name Event-type Relation in 2NF E-id E-name E-type 901 The Dark Drama 921 Silver Shoe Ballet 926 Next Year Drama 942 White Lace Ballet 945 Trick-Treat Variety Show 952 Gold Days Drama 959 Show Time Musical 2NF (5) 48 Relation in 2NF Agent-id Venue-id Event-id Booking-date Performer-id P-id A-id V-id E-id Booking-date 101 1295 59 959 25-Nov-99 105 1435 35 921 07-Jan-02 105 1504 54 942 10-Feb-02 108 1682 79 901 29-Jul-03 112 1460 17 926 13-Aug-00 112 1522 46 952 05-May-99 112 1504 75 952 16-Mar-99 126 1509 59 945 02-Sep-01 2NF (6) 49 Performer-name Performer-location Performer-id Performer-type Fee Relation in 2NF P-id P-name P-type Fee P-location 101 Baron Singer 75 York 105 Steed Dancer 60 Berlin 108 Jones Actor 85 Bombay 112 Eagles Actor 85 Leeds 126 Stokes Comedian 90 Athens Problems with 2NF: - Insertion - Modification - Deletion Contents 50  Normalization  Introduction to Normalization  Prime & Nonprime Attributes  Normal Forms  1NF and Dependency Problems  2NF  3NF  Summary of Normal Forms based on Primary Keys General Normal Form Definitions  BCNF 3NF (1) 51  3NF solves transitive dependency problem in 1NF and 2NF.  Third normal form (3NF)  R satisfies 2NF.  And no nonprime attribute A in R is transitively functionally dependent on the primary key of R.  Note:  In X  Y and Y  Z, with X as the primary key, we consider it a problem of the transitive dependency only if Y is not a candidate key.  E.g.: SSN  EmpNo  Salary and EmpNo is a candidate key. 3NF (2) 52  Decompose: identify all transitive dependencies and each transitive dependency will form a new relation, with nonprime attributes participating in the transitive dependency and the attributes which determines others as the attributes for the new relation. 3NF (3) 53 Performer-name Performer-location Performer-id Performer-type Relation in 3NF P-id P-name P-type P-location 101 Baron Singer York 105 Steed Dancer Berlin 108 Jones Actor Bombay 112 Eagles Actor Leeds 126 Stokes Comedian Athens P-type Fee Singer 75 Dancer 60 Actor 85 Comedian 90 Performer-type Fee Relation in 3NF 3NF (4) 54 Performer-name Performer-location Performer-id Performer-type Performer-type Fee Agent-id Agent-name Agent-location Venue-id Venue-name Venue-location Event-id Event-name Event-type Agent-id Venue-id Event-id Booking-date Performer-id Contents 55  Normalization  Introduction to Normalization  Prime & Nonprime Attributes  Normal Forms  1NF and Dependency Problems  2NF  3NF  Summary of Normal Forms based on Primary Keys General Normal Form Definitions  BCNF Summary of Normal Forms based on Primary Keys 56 Contents 57  Normalization  Introduction to Normalization  Prime & Nonprime Attributes  Normal Forms  1NF and Dependency Problems  2NF  3NF  Summary of Normal Forms based on Primary Keys General Normal Form Definitions  BCNF General Normal Form Definitions 58  The above definitions consider the primary key only.  The following more general definitions take into account relations with multiple candidate keys.  2NF: every nonprime attribute A in R is fully functionally dependent on every key of R.  3NF: whenever a FD X  A holds in R, then either: (a) X is a superkey of R, or (b) A is a prime attribute of R 59 2NF 60 3NF 61 Contents 62  Normalization  Introduction to Normalization  Prime & Nonprime Attributes  Normal Forms  1NF and Dependency Problems  2NF  3NF  Summary of Normal Forms based on Primary Keys General Normal Form Definitions  BCNF BCNF (1) 63  A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X  A holds in R, then X is a superkey of R. BCNF (2) 64 BCNF normalization of LOTS1A with the functional dependency FD2 being lost in the decomposition. BCNF (3) 65 TEACH (Student, Course, Instructor) BCNF (4) 66  Three possible pairs: 1. {Student, Instructor} and {Student, Course}. 2. {Course, Instructor} and {Course, Student}. 3. {Instructor, Course} and {Instructor, Student}.  All three decompositions lose the functional dependency FD1. The desirable decomposition of those just shown is (3) because it will not generate spurious tuples after a join. Contents 67  Introduction  Functional Dependencies (FDs)  Normalization  Notes & Suggestions Notes & Suggestions 68  Further Reading: [1] Chapter 15 & 16  4NF: based on multivalued dependency (MVD).  5NF: based on join dependency.  Such a dependency is very difficult to detect in practice.  Normalization into 5NF is considered very rarely in practice.  Other normal forms & algorithms.  ER modeling: top-down database design.  Bottom-up database design??? Q & A 69 Exercises (1) 70  Exercise 1: Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies: 1. A, B  C 2. A  D, E 3. B  F 4. F  G, H 5. D  I, J  Find keys of R.  Decompose R into 2NF, then 3NF relations. Exercises (2) 71  Exercise 2: Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies: 1. A, B  C 2. B, D  E, F 3. A, D  G, H 4. A  I 5. H  J  Find keys of R.  Decompose R into 2NF, then 3NF relations.

Các file đính kèm theo tài liệu này:

  • pdf6_functional_dependencies_normalization_0984.pdf