BCNF (Boyce-Codd Normal Form)
Each normal form is strictly stronger than the previous one
Every 2NF relation is in 1NF
Every 3NF relation is in 2NF
Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in
BCNF The goal is to have each relation in BCNF (or 3NF)
62 trang |
Chia sẻ: vutrong32 | Lượt xem: 1237 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Chapter 10 Functional Dependencies and Normalization for Relational Databases, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Functional Dependencies and
Normalization
for Relational Databases
CHAPTER 10
Informal Design Guidelines for
Relational Databases
Semantics of the Relation Attributes
Reducing the redundant values in tuples
Reducing the null values in tuples
Disallowing the possibility of generating
spurious tuples
Semantics of the Relation Attributes
GUIDELINE 1:
Each tuple in a relation should represent one
entity or relationship instance.
Attributes of different entities should not be
mixed in the same relation.
Only foreign keys should be used to refer to
other entities
Entity and relationship attributes should be kept
apart as much as possible.
Semantics of the Relation Attributes
Example:
Redundant Information in Tuples and
Update Anomalies
Mixing attributes of multiple entities may cause
problems
Information is stored redundantly wasting storage
Problems with update anomalies
Insertion anomalies
Deletion anomalies
Modification anomalies
Redundant Information in Tuples and
Update Anomalies
Redundant Information in Tuples and
Update Anomalies
Consider the relation:
EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)
Update Anomaly: Changing the name of project
number P1 from “Billing” to “Customer-
Accounting” may cause this update to be made for
all 100 employees working on project P1.
Insert Anomaly: Cannot insert a project unless
an employee is assigned to .
Inversely - Cannot insert an employee unless an
he/she is assigned to a project
Redundant Information in Tuples and
Update Anomalies
Delete Anomaly: When a project is deleted, it will
result in deleting all the employees who work on
that project. Alternately, if an employee is the sole
employee on a project, deleting that employee
would result in deleting the corresponding project
Redundant Information in Tuples and
Update Anomalies
GUIDELINE 2: Guideline to Redundant
Information in Tuples and Update Anomalies
Design the base relation schemas so that no
insertion, deletion, or modification anomalies
are present in the relations.
If any anomalies are present, note them clearly
and make sure that the programs that update the
database will operate correctly.
Null Values in Tuples
GUIDELINE 3: Relations should be designed such
that their tuples will have as few NULL values as
possible
Attributes that are NULL frequently could be
placed in separate relations
Reasons for nulls:
Attribute not applicable or invalid
Attribute value unknown (may exist)
Value known to exist, but unavailable
Spurious Tuples
Bad designs for a relational database may result in
erroneous results for certain JOIN operations
The "lossless join" property is used to guarantee
meaningful results for join operations
GUIDELINE 4: The relations should be designed to
satisfy the lossless join condition. No spurious
tuples should be generated by doing a natural-join
of any relations.
Functional dependencies
Consider the following relation schema
{Plane}→ StartTime
{Pilot, StartDate, StartTime}→ Plane
{Plane, StartDate}→ Pilot
Functional dependencies
A functional dependency is a constraint between
two sets of attributes from the database. Suppose
that relational database schema has n attributes
R{A1, A2, An}, X and Y are subset of R.
Definition.
A functional dependency between two sets of
attributes X and Y: XY, specifies a constraint on
the possible tuples that can form a relation state
r of R.
The constraint is that, for any two tuples t1 and
t2 in r that have t1[X]=t2[X] t1[Y] = t2[Y] .
Functional dependencies
Example:
Functional dependencies
We say that there is a functional dependency
from X to Y, or that Y is functionally dependent
on X.
The abbreviation for functional dependency is
FD or f.d.
The set of attributes X is called the left-hand side
of the FD, and Y is called the right-hand side.
X →Y ⇔ (t1.X = t2.X ⇒ t1.Y = t2.Y)
Functional dependencies
Example:
Social security number determines employee
name:
SSN -> ENAME
Project number determines project name and
location
PNUMBER -> {PNAME, PLOCATION}
Employee SSN and project number determines
the hours per week that the employee works on
the project:
{SSN, PNUMBER} -> HOURS
Functional dependencies
If a constraint on R states that there cannot be
more than one tuple with a given X value in any
relation instance r(R), that is, X is a candidate
key of R, this implies that XY for any subset of
attributes Y of R
If XY in R, this does not say whether or not
YX in R
If K is a key of R, then K functionally determines
all attributes in R (since we never have two
distinct tuples with t1[K]=t2[K])
Functional dependencies(FDs)
A functional dependency is a property of the
semantics or meaning of the attributes.
Relation extensions r(R) that satisfy the functional
dependency constraints are called legal relation
states of R.
A functional dependency is a property of the
relation schema R, not of a particular legal relation
state r of R. Hence, an FD cannot be inferred
automatically from a given relation extension r but
must be defined explicitly by someone who knows
the semantics of the attributes of R
Inference Rules for FDs
Given a set of FDs F on relation schema R. Any
other functional dependencies hold in all legal
relation instances that satisfy the dependencies in F
can be said to be inferred or deduced from the FDs
in F.
The following six inference rules for functional
dependencies (Armstrong's inference rules)
IR1 (reflexive : phản xạ): If X Y, then XY.
IR2. (Augmentation) If X Y, then XZ YZ.
IR3. (Transitive) If X Y and Y Z, then X Z
Inference Rules for FDs
IR4(Decomposition):
If X YZ, then X Y and X Z
IR5(Union):
If X Y and X Z, then X YZ
IR6(pseudo-transitive)
If X Y and WY Z, then WX Z
The last three inference rules, as well as
any other inference rules, can be deduced
from IR1, IR2, and IR3 (completeness
property)
Inference Rules for FDs
Closure : the set of all dependencies that can be
inferred from F is called the closure of F; it is
denoted by F+.
Closure of a set of attributes X with respect to F
is the set X + of all attributes that are functionally
determined by X.
X + can be calculated by repeatedly applying IR1,
IR2, IR3 using the FDs in F
Inference Rules for FDs
Algorithm: Determining X+, the Closure of : the
set of attribute X under F
X+= X;
repeat
oldX+= X+;
for each functional dependency Y Z in F do
If X+ Y then X += X+ ∪Z;
until (X+ = oldX+);
Inference Rules for FDs
Example 1: Let Q be a relation with attributes
(A,B,C,D,E,G,H) and let the following functional
dependencies hold
F={f1: BA;
f2: DACE;
f3: DH;
f4: GH C;
f5: ACD}
Find the closure X+ of X = {AC}.
Answer: X+=ACDEH
Inference Rules for FDs
Example 2: Let Q be a relation with attributes
(A,B,C,D,E,G) and let the following functional
dependencies hold
F = { f1: A → C;
f2: A → EG;
f3: B → D;
f4: G → E}
Find the closure X+ and Y+ of X = {A,B}; Y = {C,G,D}
Answer: X+ = {ABCDEG} , Y+= {CGDE}
Inference Rules for FDs
Example 3:
F = {SSNENAME,
PNUMBER {PNAME, PLOCATION},
{SSN, PNUMBER}HOURS}
Calculate the following closure sets with respect to
F;
{SSN }+ = {SSN, ENAME}
{PNUMBER }+ = {PNUMBER, PNAME, PLOCATION}
{SSN, PNUMBER}+ = {SSN, PNUMBER, ENAME,
PNAME, PLOCATION, HOURS}
Inference Rules for FDs
Algorithm: Determining F+
Find all subsets of relation Q+
Find all possible functional dependencies of Q
Find all subset closure of Q
Based on the closure of all subsets sought to
determine which the functional dependencies
belong F+
Inference Rules for FDs
Example: Q(A,B,C) F = {AB C,C B} F+ ?
All subset of attributes
Closure of all subsets
A+ = A
B+ = B
C+ = BC
AC+ = ABC
AB+ = ABC
BC+ = BC
A B C
{A} {B} {C}
{A,B} {A,C}
{B,C}
{A,B,C}
Inference Rules for FDs
All functional dependencies:
AB ABC BC ABCF CA CBCF+ ACBCF+ BCAC
AAB AABC BAC ABACF+ CBF CABC ACABCF+ BCABC
AC BA BBC ABBCF+ CAB ACBF+ BCA
AAC BAB BABC ABABCF+ CAC ACABF+ BCAB
Inference Rules for FDs
Result:
F+ = { ABC, ABAC,
ABBC, ABABC,
CB, CBC,
ACB, ACAB,
ACBC, ACABC}
Equivalence of Sets of FDs
Two sets of FDs F and G are equivalent if:
Every FD in F can be inferred from G, and
Every FD in G can be inferred from F
Hence, F and G are equivalent if F + =G +
Definition: F covers G if every FD in G can be
inferred from F (i.e., if G + subset-of F +)
F and G are equivalent if F covers G and G covers F
There is an algorithm for checking equivalence of
sets of FDs
Minimal Sets of FDs
A set of FDs is minimal if it satisfies the following
conditions:
Every dependency in F has a single attribute for
its right-hand side.
Example:
F = {A → B, A → C , B → C, AB → D}
Cannot replace any dependency X A in F with a
dependency YA, where Y is a subset of X, and
still have a set of dependencies that is equivalent
to F.
Example: F= {AB→C; B→C}
Minimal Sets of FDs
Cannot remove any dependency from F and still
have a set of dependencies that is equivalent to F.
Example:
F = {A→BC, B→D, AB→D} redundant because:
F ≡ F’= {A→BC, B→D}
Minimal Sets of FDs
Algorithm 10.2: Finding a Minimal Cover (phủ
tối thiểu)for a set of Functional Dependencies F
Remove all FDs have left hand side redundant
Converse all FDs with the right hand side of
more than one attribute to FDs having the right
hand side of one attribute
Remove all redundant FDs
Minimal Sets of FDs
Example: Let Q be a relation with attributes
Q(A,B,C,D) and FD F={AB → CD, B → C, C → D}
Finding a Minimal Cover of F
AB→CD is FD with the left hand side redundant
A because B+=BCD⇒ B → CD ∈ F+
=>F≡{B → CD;B → C;C → D}
F≡{B → D; B → C;C → D}=F1mc ,
In F1tt , B → D is redundant.
F≡{B → C;C → D}=Fmc
Minimal Sets of FDs
Example: Q(MSCD,MSSV,CD,HG)
F = {MSCD → CD;
CD → MSCD;
CD,MSSV → HG;
MSCD,HG → MSSV;
CD,HG → MSSV;
MSCD,MSSV → HG}
Find minimal cover for FD F
Result: Fmc = { MSCD → CD;
CD → MSCD;
CD,HG → MSSV;
MSCD,MSSV → HG}
Normalization of Relations
Normalization: The process of decomposing
unsatisfactory "bad" relations by breaking up their
attributes into smaller relations
Normal form: Condition using keys and FDs of a
relation to certify whether a relation schema is in
a particular normal form
Normalization of Relations
2NF, 3NF, BCNF based on keys and FDs of a
relation schema
4NF based on keys, multi-valued dependencies :
MVDs; 5NF based on keys, join dependencies : JDs
(Chapter 11)
Additional properties may be needed to ensure a
good relational design (lossless join, dependency
preservation; Chapter 11)
Practical Use of Normal Forms
Normalization is carried out in practice so that
the resulting designs are of high quality and meet
the desirable properties
The practical utility of these normal forms
becomes questionable when the constraints on
which they are based are hard to understand or
to detect
The database designers need not normalize to the
highest possible normal form. (usually up to 3NF,
BCNF or 4NF)
Denormalization: the process of storing the join
of higher normal form relations as a base
relation—which is in a lower normal form
Definitions of Keys and Attributes
Participating in Keys
A super key of a relation schema R = {A1, A2, ....,
An} is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a super key with the additional property
that removal of any attribute from K will cause K
not to be a super key any more.
Definitions of Keys and Attributes
Participating in Keys
Algorithm 10.3: Finding all key for a relation Q
and set of Functional Dependencies F
Determine all subset of Q+: X1, X2, , Xn
Find closure of the Xi.
Super key is Xi that they are closure =Q+,
suppose that set supper key are S= {S1, S2, Sn}
Create the set contain all key of Q from S by
consider subset Si, Sj of S
S (i ≠j), if Si Sj then remove Sj (i,j=1..n).
Definitions of Keys and Attributes
Participating in Keys
Example:
Definitions of Keys and Attributes
Participating in Keys
Example:
Definitions of Keys and Attributes
Participating in Keys
Algorithm 10.4: Finding one key K for a relation
Q and set of Functional Dependencies F:
Step 1: Assign K = Q+
Step 2: A is an attribute of K, K’=K-A. If K’+=Q+
then K=K’.
Definitions of Keys and Attributes
Participating in Keys
Example:
U={A,B,C,D,E} , F={AB->C, AC->B, BC->DE} Find K
Step1: K=U> K=ABCDE
Step2:(K\A)+=>(BCDE)+=BCDE ≠ U+ K=ABCDE
Step3:(K\B)+=>(ACDE)+= ABCDE = U+K=ACDE
Step4: (K\C)+ =>(ADE)+ = ADE ≠ U+ K=ACDE
Step5: (K\D)+ => (ACE)+ = ACEBD=U+K=ACE
Step6: (K\E)+ =>(AC)+ = ACBDE =U+ K=AC
Definitions of Keys and Attributes
Participating in Keys
If a relation schema has more than one key, each is
called a candidate key. One of the candidate keys
is arbitrarily designated to be the primary key,
and the others are called secondary keys.
A Prime attribute (thuộc tính khóa) must be a
member of some candidate key
A Nonprime attribute is not a prime attribute—
that is, it is not a member of any candidate key.
First Normal Form
First normal form (INF):
Disallow multivalued attributes, composite
attributes, and their combinations.
It states that the domain of an attribute must
include only atomic values and that the value of
any attribute in a tuple must be a single value
from the domain of that attribute.
First Normal Form
Second Normal Form
Second normal form (2NF) is based on the
concept of full functional dependency.
A functional dependency XY is a full functional
dependency if removal of any attribute A from X
means that the dependency does not hold any
more.
Examples:
{SSN, PNUMBER} -> HOURS is a full FD since
neither SSN -> HOURS nor PNUMBER -> HOURS
hold
{SSN, PNUMBER} -> ENAME is not a full FD (it is
called a partial dependency ) since SSN -> ENAME
also holds
Second Normal Form
Definition. A relation schema R is in 2NF if every
nonprime attribute A in R is fully functionally
dependent on the primary key of R.
A relation schema R is in second normal form
(2NF) if every non-prime attribute A in R is fully
functionally dependent on the primary key
R can be decomposed into 2NF relations via the
process of 2NF normalization
Second Normal Form
Algorithm 10.4 :The test for 2NF
Give a relational Q, a set FD F
Determine Q belong 2FN ?
Step1: Find all key of Q
Step2: With each key K, Find closure of all the
subsets S of K
Step3: If there are S+ contain Nonprime
attribute then Q is not 2NF else Q is 2NF
Second Normal Form
Example: Q(A,B,C,D) F={AB→C; B→D; BC→A}. Is Q
2NF?
D is nonprime attribute Q is not 2NF
Third Normal Form
Third normal form (3NF) is based on the
concept of transitive dependency.
Transitive functional dependency A functional
dependency XZ that can be derived from two
FDs XY and Y Z
Examples:
SSN DMGRSSN is a transitive FD since
SSNDNUMBER and DNUMBERDMGRSSN hold
Third Normal Form
Definition: A relation schema R is in 3NF if it
satisfies 2NF and no nonprime attribute of R is
transitively dependent on the primary key.
Example:
Third Normal Form
NOTE:
In X Y and Y Z, with X as the primary key,
we consider this a problem only if Y is not a
candidate key.
When Y is a candidate key, there is no problem
with the transitive dependency .
EMP (SSN, Emp#, Salary ).
SSNEmp#Salary
and Emp# is a candidate key.
General Normal Form Definitions
(For Multiple Keys)
The above definitions consider the primary key
only
The following more general definitions take into
account relations with multiple candidate keys
Partial and full functional dependencies and
transitive dependencies will now be considered
with respect to all candidate keys of a relation.
General Definition of 2NF
Definition: A relation schema R is in second
normal form (2NF) if every nonprime attribute A
in R is not partially dependent on any key of R.
Example:
General Definition of 2NF
In LOT relation (above example), Suppose that
there are two candidate keys:
PROPERTY_ID# and {COUNTY_NAME, LOT#};
that is, lot numbers are unique only within each
county, but PROPERTY_ID numbers are unique
across counties for the entire state.
Based on the two candidate keys PROPERTY_ID#
and {COUNTY_NAME, LOT#}, we know that
functional dependencies FD1 and FD2 hold. We
choose PROPERTY_ID# as the primary key,
General Definition of 3NF
Definition: A relation schema R is in third normal
form (3NF) if, whenever a nontrivial functional
dependency XA holds in R, either X is a super
key of R, or A is a prime attribute of R.
BCNF (Boyce-Codd Normal Form)
Boyce-Codd normal form (BCNF) was proposed as
a simpler form of 3NF, but it was found to be
stricter than 3NF. That is, every relation in BCNF is
also in 3NF; however, a relation in 3NF is not
necessarily in BCNF.
Definition. A relation schema R is in BCNF if
whenever a nontrivial functional dependency
XA holds in R, then X is a super key of R
BCNF (Boyce-Codd Normal Form)
Each normal form is strictly stronger than the
previous one
Every 2NF relation is in 1NF
Every 3NF relation is in 2NF
Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in
BCNF The goal is to have each relation in BCNF (or
3NF)
BCNF (Boyce-Codd Normal Form)
Example: Q(A,B,C,D,E,I) F={ACD→EBI;CE→AD}. Q
is a BCNF?
F ≡ Ftt={ACD→E,ACD→B,ACD→I,CE→A,CE→D}
The feft hand side of Ftt are super key Q is BCNF
BCNF (Boyce-Codd Normal Form)
Example:
Các file đính kèm theo tài liệu này:
- chapter101_2411.pdf