Bài giảng Database systems - Functional dependencies & normalization
Further Reading: [1] Chapter 15 & 16
4NF: based on multivalued dependency (MVD).
5NF: based on join dependency.
Such a dependency is very difficult to detect in practice.
Normalization into 5NF is considered very rarely in practice.
Other normal forms & algorithms.
ER modeling: top-down database design.
Bottom-up database design???
71 trang |
Chia sẻ: vutrong32 | Lượt xem: 2717 | Lượt tải: 1
Bạn đang xem trước 20 trang tài liệu Bài giảng Database systems - Functional dependencies & normalization, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
DATABASE SYSTEMS
Nguyen Ngoc Thien An
FUNCTIONAL DEPENDENCIES
& NORMALIZATION
Spring 2014
Contents
2
Introduction
Functional Dependencies (FDs)
Normalization
Notes & Suggestions
Reading Suggestion: [1] Chapter 15, 16
Top-Down Database Design
3
Mini-world
Requirements
Conceptual schema
E1
E2
R
Relation schemas
Which is a formal way of analyzing why one grouping of attributes
into a relation schema may be better than another?
Goals & Evaluation of Design
4
Evaluating design quality of relational schemas
A relational database schema consists of a number of
relation schemas.
Each relation schema consists of a number of attributes.
Measure formally why one set of groupings of
attributes into relation schemas is better than another.
The implicit goals of the design activity:
Information preservation
Maintain all concepts captured in the conceptual design.
Minimum redundancy
Minimize redundant storage of the same information.
Reduce the need for multiple updates to maintain consistency
across multiple copies of the same information.
Goodness - Levels
5
Two levels of the goodness of relation
schemas:
The logical (or conceptual) level: How users
interpret the relation schemas and the meaning of
their attributes.
The implementation (or physical storage)
level: How the tuples in a base relation are stored
and updated.
Goodness - Measures
6
The semantics of the attributes is clear.
Redundant information in tuples and update
anomalies (modification, deletion, insertion).
Cause difficulties to maintain consistency of data.
Require unnecessary updates that can be avoided.
Reducing the NULL values in tuples.
Waste space at the storage level.
Problems with understanding the attribute meaning.
Other difficulties: aggregation operations and joins.
Disallowing the possibility of generating spurious
tuples.
Examples of Anomalies
7
Insert a new employee of department 5 Must
enter all information of department 5 which
must be consistent with existing tuples.
Delete the last employee of Department 1
Lose information of Department 1
Change Dname of department 5
Must change Dname of 4 rows
How to insert a new
department that has no
employees?
Examples of Spurious Tuples (1)
8
EMP_PROJ (SSN, PNUMBER, HOURS, ENAME, PNAME, PLOCATION)
Not PK
and FK
Examples of Spurious Tuples (2)
9
EMP_PROJ1 * EMP_LOCS
Informal Guidelines to Design
10
Guideline 1
Design a relation schema so that it is easy to explain its meaning.
Do not combine attributes from multiple entity types and relationship types into a
single relation.
Guideline 2
Design anomaly-free base relations.
Specify views for placing together the attributes frequently referenced in
important queries.
If any anomalies are present, make sure that the programs that update the
database will operate correctly.
Guideline 3
Determine whether to include the columns that may have NULLs in a relation or
to have a separate relation for those columns.
If NULLs are unavoidable, make sure that they do not apply to a majority of
tuples in the relation.
Guideline 4
Design relation schemas so that they can be joined with equality conditions on
attributes that are appropriately related (primary key, foreign key) pairs.
Introduction to Normalization
11
“Goodness” measures Normalization
Help DB designers determine the best relation
schemas.
A formal framework for analyzing relation schemas.
Based on keys and functional dependencies.
A series of normal form tests that can be carried out
on individual relation schemas so that the relational
database can be normalized to any desired degree.
A process ensuring that the data is structured in such
a way that attributes are grouped with the PK.
Attributes that do not directly depend on PK may be
extracted to form a new relation.
Decomposition
Contents
12
Introduction
Functional Dependencies (FDs)
Normalization
Notes & Suggestions
Contents
13
Functional Dependencies (FDs)
Definitions
Inference Rules
Closure of Sets of FDs
Equivalence of Sets of FDs
Minimal Cover
Full, Partial, Indirect Dependencies
Definitions (1)
14
Functional dependencies (FDs) are used to
specify formal measures of the "goodness" of
relational designs.
FDs and keys are used to define normal forms for
relations.
FDs are constraints that are derived from the
meaning and interrelationships of the data
attributes.
A set of attributes X functionally determines a
set of attributes Y if the value of X determines a
unique value for Y: X Y
Definitions (2)
15
X Y holds if for any two tuples t1 and t2 in any
relation instance r(R):
If t1[X] = t2[X], then t1[Y] = t2[Y]
The values of the Y component functionally depend on, or
are functionally determined by the values of the X
component.
The values of the X component uniquely (or functionally)
determine the values of the Y component.
There is a functional dependency from X to Y.
X Y in R specifies a constraint on all relation
instances r(R).
If X Y in R, this does not say whether or not Y X
in R.
Definitions (3)
16
E.g.:
SSN functionally determines employee name:
SSN ENAME
Project number determines project name and location:
PNUMBER {PNAME, PLOCATION}
Ssn and project number determines the hours per
week that the employee works on the project:
{SSN, PNUMBER} HOURS
X is a candidate key of R - this implies that X Y
for any subset of attributes Y of R.
If X is a candidate key of R, then X R.
Inference Rules (1)
17
Given a set of FDs F, we can infer additional
FDs that hold whenever the FDs in F hold.
Armstrong's inference rules:
IR1 (Reflexive): If Y ⊆ X, then X Y.
IR2 (Augmentation): If X Y, then XZ YZ
(Notation: XZ stands for X ∪ Z)
IR3 (Transitive): If X Y and Y Z, then X Z
Armstrong’s inference rules are sound and
complete.
Inference Rules (2)
18
Additional inference rules:
IR4 (Decomposition): If X YZ, then X Y and X
Z.
Note: XY A does not necessarily imply either X A or Y
A.
IR5 (Union): If X Y and X Z, then X YZ.
Note: X A and Y B does imply that XY AB.
IR6 (Psuedotransitivity): If X Y and WY Z, then
WX Z.
IR4, IR5 and IR6 can be deduced from IR1, IR2,
and IR3 (completeness property).
Closure of Sets of FDs (1)
19
Closure of F is the set F+ including all FDs in F as
well as all FDs that can be inferred from F.
Closure of a set of attributes X under F is the
set X+ of all attributes that are functionally
determined by X based on F.
Closure of Sets of FDs (2)
20
Exercise: Consider a relation R(A, B, C, D, E)
with the following dependencies F:
1. AB C
2. CD E
3. DE B
Find F+.
Find {A, B}+.
Equivalence of Sets of FDs
21
Two sets of FDs F and G are equivalent if F+ = G+.
F covers G if G+ ⊆ F+.
F and G are equivalent if F covers G and G covers F.
Algorithm for checking equivalence of sets of FDs:
Determine whether F covers E
For each FD X Y in E
Calculate X+ with respect to F.
Check whether this X+ includes the attributes in Y.
If this is the case for every FD in E, then F covers E.
Determine whether E and F are equivalent by checking that E
covers F and F covers E.
Exercise: Show that the following F and G are
equivalent:
F = {A C, AC D, E AD, E H}
G = {A CD, E AH}.
Minimal Cover (1)
22
A set of FDs is minimal if it satisfies the following
conditions:
Every dependency in F has a single attribute for its
right-hand side.
Cannot remove any dependency from F and still have
a set of dependencies that is equivalent to F.
Cannot replace any dependency X A in F with a
dependency Y A where Y is a proper subset of X
and still have a set of dependencies that is equivalent
to .
A minimal cover of a set of functional
dependencies E is a minimal set of dependencies
that is equivalent to E.
Minimal Cover (2)
23
Always find at least one minimal cover F for any set of
dependencies E.
If several sets of FDs qualify as minimal covers of E, it
is customary to use additional criteria for minimality.
Choose the minimal set with the smallest number of
dependencies or with the smallest total length.
There is no simple algorithm for computing a minimal
set of FDs that is equivalent to a set F of FDs
To synthesize a set of relations, we assume that we
start with a set of dependencies that is a minimal set.
Minimal Cover (3)
Minimal Cover (4)
25
Exercise:
Let the given set of FDs be E: {B A, D A, AB
D}. We have to find the minimal cover of E.
Answer:
The minimal cover of E is {B D, D A}.
Full & Partial Dependencies (1)
26
X Y is a full functional dependency if for any
attribute A ∈ X, (X – {A}) does not functionally
determine Y.
X Y is a partial dependency if for some attribute A
∈ X, (X – {A}) Y.
More than one value is required to determine the
value of another attribute, the combination of values is
called a composite determinant.
E.g.:
{Ssn, Pnumber} Hours is a full dependency.
{Ssn, Pnumber} Ename is partial because Ssn
Ename holds.
Full & Partial Dependencies (2)
27
Agent-id
Booking-date
Performer-location
Agent-name
Agent-location
Performer-id
Performer-name
Performer-type
Depend
functionally
partially on
{Performer-id,
Agent-id}
Depend functionally fully on
{Performer-id, Agent-id}
Indirect Dependencies
28
A functional dependency X Y in a relation
schema R is a transitive dependency (or
indirect dependency) if there exists a set of
attributes Z in R that is neither a candidate key
nor a subset of any key of R, and both X Z
and Z Y hold.
Performer-location Performer-id
Performer-name
Performer-type Fee
Contents
29
Introduction
Functional Dependencies (FDs)
Normalization
Notes & Suggestions
Contents
30
Normalization
Introduction to Normalization
Prime & Nonprime Attributes
Normal Forms
1NF and Dependency Problems
2NF
3NF
Summary of Normal Forms based on Primary Keys
General Normal Form Definitions
BCNF
Introduction to Normalization (1)
31
Normalization of data
A process of analyzing the given relation
schemas based on their FDs and keys to achieve
the desirable properties:
Minimizing redundancy.
Minimizing the insertion, deletion, and update
anomalies.
Unsatisfactory relation schemas are
decomposed into smaller relation schemas that
meet the normal form tests and hence possess
the desirable properties.
Introduction to Normalization (2)
32
The process of normalization through
decomposition must also confirm the existence of
additional properties:
a. Nonadditive (Lossless) Join Property
Ensure that no spurious tuples are generated when a
NATURAL JOIN operation is applied to the relations
resulting from the decomposition.
b. Dependency Preservation Property
Preservation of the functional dependencies.
The property (a) is extremely critical and cannot
be sacrificed.
The property (b) is less stringent and may be
sacrificed.
Introduction to Normalization (3)
33
Normal forms
1NF, 2NF, 3NF, BCNF: based on the functional
dependencies among the attributes of a relation.
4NF: based on the concepts of multivalued dependencies.
5NF: based on the concepts of join dependencies.
The normal form of a relation refers to the highest
normal form that it meets.
The database designers not need to normalize to the
highest possible normal form.
Usually only up to 3NF, BCNF or 4NF.
Relations may be left in a lower normalization status, such
as 2NF, for performance reasons.
Incur the corresponding penalties of dealing with the
anomalies.
Contents
34
Normalization
Introduction to Normalization
Prime & Nonprime Attributes
Normal Forms
1NF and Dependency Problems
2NF
3NF
Summary of Normal Forms based on Primary Keys
General Normal Form Definitions
BCNF
Prime & Nonprime Attributes
35
A prime attribute must be a member of some
candidate key.
A nonprime attribute is not a prime attribute.
It is not a member of any candidate key.
Normal Forms
36
1NF
2NF - solves partial dependencies.
3NF - solves indirect dependencies.
BCNF - well-normalized relations.
Contents
37
Normalization
Introduction to Normalization
Prime & Nonprime Attributes
Normal Forms
1NF and Dependency Problems
2NF
3NF
Summary of Normal Forms based on Primary Keys
General Normal Form Definitions
BCNF
1NF (1)
38
First normal form (1NF): there is only one
value at the intersection of each row and
column of a relation.
Disallows composite attributes, multivalued
attributes, their combinations and nested
relations.
The only attribute values permitted by 1NF are
single atomic (or indivisible) values.
To be part of the formal definition of a relation
in the basic (flat) relational model.
1NF (2)
39
1NF (3)
40
1NF (4)
41
1NF (5)
42
1NF determinancy diagram
Performer-name
Agent-id
Venue-id
Event-id
Booking-date
Performer-location
Agent-name
Agent-location
Venue-name
Venue-location
Event-name
Event-type
Performer-id
Performer-type Fee
Contents
43
Normalization
Introduction to Normalization
Prime & Nonprime Attributes
Normal Forms
1NF and Dependency Problems
2NF
3NF
Summary of Normal Forms based on Primary Keys
General Normal Form Definitions
BCNF
2NF (1)
44
2NF solves partial dependency problem in 1NF.
Second normal form (2NF): No nonprime
attribute A in R is partially functionally dependent
on the primary key of R.
Test: check functional dependencies whose left-
hand side attributes are part of the primary key
and then check its right-hand side.
Decompose: identify new primary keys and group
attributes that are fully functionally dependent on
each key together to form separate new relations.
2NF (2)
45
Performer-name
Performer-location Performer-id
Performer-type Fee
Relation in 2NF
P-id P-name P-type Fee P-location
101 Baron Singer 75 York
105 Steed Dancer 60 Berlin
108 Jones Actor 85 Bombay
112 Eagles Actor 85 Leeds
126 Stokes Comedian 90 Athens
2NF (3)
46
Relations in 2NF
Agent-id
Agent-name
Agent-location
Venue-id
Venue-name
Venue-location
A-id A-name A-location
1295 Burton London
1435 Nunn Boston
1460 Stritch Rome
1504 Lee Taipei
1509 Patel York
1522 Ellis Madrid
1682 Tsang Beijing
V-id V-name V-location
17 Silbury Tunis
35 Polish Athens
46 Royale Cairo
54 Nation Lisbon
59 Atlas Tokyo
75 Vostok Kiev
79 Festive Rome
2NF (4)
47
Event-id
Event-name
Event-type
Relation in 2NF
E-id E-name E-type
901 The Dark Drama
921 Silver Shoe Ballet
926 Next Year Drama
942 White Lace Ballet
945 Trick-Treat Variety Show
952 Gold Days Drama
959 Show Time Musical
2NF (5)
48
Relation in 2NF
Agent-id
Venue-id
Event-id
Booking-date Performer-id
P-id A-id V-id E-id Booking-date
101 1295 59 959 25-Nov-99
105 1435 35 921 07-Jan-02
105 1504 54 942 10-Feb-02
108 1682 79 901 29-Jul-03
112 1460 17 926 13-Aug-00
112 1522 46 952 05-May-99
112 1504 75 952 16-Mar-99
126 1509 59 945 02-Sep-01
2NF (6)
49
Performer-name
Performer-location Performer-id
Performer-type Fee
Relation in 2NF
P-id P-name P-type Fee P-location
101 Baron Singer 75 York
105 Steed Dancer 60 Berlin
108 Jones Actor 85 Bombay
112 Eagles Actor 85 Leeds
126 Stokes Comedian 90 Athens
Problems with 2NF:
- Insertion
- Modification
- Deletion
Contents
50
Normalization
Introduction to Normalization
Prime & Nonprime Attributes
Normal Forms
1NF and Dependency Problems
2NF
3NF
Summary of Normal Forms based on Primary Keys
General Normal Form Definitions
BCNF
3NF (1)
51
3NF solves transitive dependency problem in 1NF
and 2NF.
Third normal form (3NF)
R satisfies 2NF.
And no nonprime attribute A in R is transitively
functionally dependent on the primary key of R.
Note:
In X Y and Y Z, with X as the primary key, we
consider it a problem of the transitive dependency
only if Y is not a candidate key.
E.g.: SSN EmpNo Salary and EmpNo is a candidate
key.
3NF (2)
52
Decompose: identify all transitive
dependencies and each transitive dependency
will form a new relation, with nonprime
attributes participating in the transitive
dependency and the attributes which
determines others as the attributes for the new
relation.
3NF (3)
53
Performer-name
Performer-location Performer-id
Performer-type
Relation in 3NF
P-id P-name P-type P-location
101 Baron Singer York
105 Steed Dancer Berlin
108 Jones Actor Bombay
112 Eagles Actor Leeds
126 Stokes Comedian Athens
P-type Fee
Singer 75
Dancer 60
Actor 85
Comedian 90
Performer-type Fee
Relation in 3NF
3NF (4)
54
Performer-name
Performer-location Performer-id
Performer-type
Performer-type Fee
Agent-id
Agent-name
Agent-location
Venue-id
Venue-name
Venue-location
Event-id
Event-name
Event-type
Agent-id
Venue-id
Event-id
Booking-date Performer-id
Contents
55
Normalization
Introduction to Normalization
Prime & Nonprime Attributes
Normal Forms
1NF and Dependency Problems
2NF
3NF
Summary of Normal Forms based on Primary Keys
General Normal Form Definitions
BCNF
Summary of Normal Forms
based on Primary Keys
56
Contents
57
Normalization
Introduction to Normalization
Prime & Nonprime Attributes
Normal Forms
1NF and Dependency Problems
2NF
3NF
Summary of Normal Forms based on Primary Keys
General Normal Form Definitions
BCNF
General Normal Form Definitions
58
The above definitions consider the primary key
only.
The following more general definitions take
into account relations with multiple candidate
keys.
2NF: every nonprime attribute A in R is fully
functionally dependent on every key of R.
3NF: whenever a FD X A holds in R, then
either:
(a) X is a superkey of R, or
(b) A is a prime attribute of R
59
2NF
60
3NF
61
Contents
62
Normalization
Introduction to Normalization
Prime & Nonprime Attributes
Normal Forms
1NF and Dependency Problems
2NF
3NF
Summary of Normal Forms based on Primary Keys
General Normal Form Definitions
BCNF
BCNF (1)
63
A relation schema R is in Boyce-Codd
Normal Form (BCNF) if whenever an FD X
A holds in R, then X is a superkey of R.
BCNF (2)
64
BCNF normalization of LOTS1A with the functional
dependency FD2 being lost in the decomposition.
BCNF (3)
65
TEACH (Student, Course, Instructor)
BCNF (4)
66
Three possible pairs:
1. {Student, Instructor} and {Student, Course}.
2. {Course, Instructor} and {Course, Student}.
3. {Instructor, Course} and {Instructor, Student}.
All three decompositions lose the functional
dependency FD1. The desirable
decomposition of those just shown is (3)
because it will not generate spurious tuples
after a join.
Contents
67
Introduction
Functional Dependencies (FDs)
Normalization
Notes & Suggestions
Notes & Suggestions
68
Further Reading: [1] Chapter 15 & 16
4NF: based on multivalued dependency (MVD).
5NF: based on join dependency.
Such a dependency is very difficult to detect in
practice.
Normalization into 5NF is considered very rarely in
practice.
Other normal forms & algorithms.
ER modeling: top-down database design.
Bottom-up database design???
Q & A
69
Exercises (1)
70
Exercise 1: Consider the universal relation R =
{A, B, C, D, E, F, G, H, I, J} and the set of
functional dependencies:
1. A, B C
2. A D, E
3. B F
4. F G, H
5. D I, J
Find keys of R.
Decompose R into 2NF, then 3NF relations.
Exercises (2)
71
Exercise 2: Consider the universal relation R =
{A, B, C, D, E, F, G, H, I, J} and the set of
functional dependencies:
1. A, B C
2. B, D E, F
3. A, D G, H
4. A I
5. H J
Find keys of R.
Decompose R into 2NF, then 3NF relations.
Các file đính kèm theo tài liệu này:
- 6_functional_dependencies_normalization_0984.pdf