Tree Definitions and Properties

A tree is an abstract data type that stores elements hierarchically. With the exception of the top element, each element in a tree has a parent element and zero or more children elements. A tree is usually visualized by placing elements inside ovals or rectangles, and by drawing the connections between parents and children with straight lines. (See Figure 7.2.) We typically call the top element the root of the tree, but it is drawn as the highest element, with the other elements being connected below (just the opposite of a botanical tree). Figure 7.2: A tree with 17 nodes representing the organization of a fictitious corporation. The root stores Electronics R'Us. The children of the root store R&D, Sales, Purchasing, and Manufacturing. The internal nodes store Sales, International, Overseas, Electronics R'Us, and Manufacturing

pdf92 trang | Chia sẻ: tlsuongmuoi | Lượt xem: 2827 | Lượt tải: 0download
Bạn đang xem trước 20 trang tài liệu Tree Definitions and Properties, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
methods of the array list ADT on such a representation. Is it better to store the entries in L by increasing indices or not? C-6.17 There is a simple, but inefficient, algorithm, called bubble-sort, for sorting a sequence S of n comparable elements. This algorithm scans the sequence n−1 times, where, in each scan, the algorithm compares the current element with the next one and swaps them if they are out of order. Give a pseudo-code description of bubble-sort that is as efficient as possible assuming S is implemented with a doubly linked list. What is the running time of this algorithm? C-6.18 Answer Exercise C-6.17 assuming S is implemented with an array list. C-6.19 A useful operation in databases is the natural join. If we view a database as a list of ordered pairs of objects, then the natural join of databases A and B is the list of all ordered triples (x,y,z) such that the pair (x,y) is in A and the pair (y,z) is in B. Describe and analyze an efficient algorithm for computing the natural join of a list A of n pairs and a list B of m pairs. C-6.20 When Bob wants to send Alice a message M on the Internet, he breaks M into n data packets, numbers the packets consecutively, and injects them into the network. When the packets arrive at Alice's computer, they may be out of order, so Alice must assemble the sequence of n packets in order before she can be sure she has the entire message. Describe an efficient scheme for Alice to do this. What is the running time of this algorithm? C-6.21 Given a list L of n positive integers, each represented with k = logn + 1 bits, describe an O(n)-time method for finding a k-bit integer not in L. C-6.22 Argue why any solution to the previous problem must run in Ω(n) time. C-6.23 Given a list L of n arbitrary integers, design an O(n)-time method for finding an integer that cannot be formed as the sum of two integers in L. 370 C-6.24 Isabel has an interesting way of summing up the values in an array A of n integers, where n is a power of two. She creates an array B of half the size of A and sets B[i] = A[2i] +A[2i+ 1], for i = 0,1,..., (n/2) − 1. If B has size 1, then she outputs B[0]. Otherwise, she replaces A with B, and repeats the process. What is the running time of her algorithm? Projects P-6.1 Implement the array list ADT by means of an extendable array used in a circular fashion, so that insertions and deletions at the beginning and end of the array list run in constant time. P-6.2 Implement the array list ADT using a doubly linked list. Show experimentally that this implementation is worse than the array-based approach. P-6.3 Write a simple text editor, which stores and displays a string of characters using the list ADT, together with a cursor object that highlights a position in this string. Your editor should support the following operations: • left: Move cursor left one character (do nothing if at text end). • right: Move cursor right one character (do nothing if at text end). • cut: Delete the character right of the cursor (do nothing at text end). • paste c: Insert the character c just after the cursor. P-6.4 Implement a phased favorites list. A phase consists of N accesses in the list, for a given parameter N. During a phase, the list should maintain itself so that elements are ordered by decreasing access counts during that phase. At the end 371 of a phase, it should clear all the access counts and start the next phase. Experimentally, determine what are the best values of N for various list sizes. P-6.5 Write a complete adapter class that implements the sequence ADT using a java.util.ArrayList object. P-6.6 Implement the favorites list application using an array list instead of a list. Compare it experimentally to the list-based implementation. Chapter Notes The concept of viewing data structures as collections (and other principles of object- oriented design) can be found in object-oriented design books by Booch [14], Budd [17], Golberg and Robson [40], and Liskov and Guttag [69]. Lists and iterators are pervasive concepts in the Java Collections Framework. Our node list ADT is derived from the "position" abstraction introduced by Aho, Hopcroft, and Ullman [5], and the list ADT of Wood [100]. Implementations of lists via arrays and linked lists are discussed by Knuth [62]. 372 Chapter 7 Trees Contents 7.1 General Trees ...................... 266 7.1.1 Tree Definitions and Properties............ 267 7.1.2 The Tree Abstract Data Type ............ 373 270 7.1.3 Implementing a Tree ................. 271 7.2 Tree Traversal Algorithms................ 273 7.2.1 Depth and Height................... 273 7.2.2 Preorder Traversal................... 276 7.2.3 Postorder Traversal.................. 279 7.3 Binary Trees....................... 282 7.3.1 The Binary Tree ADT................. 284 7.3.2 A Binary Tree Interface in Java............ 284 374 7.3.3 Properties of Binary Trees .............. 285 7.3.4 A Linked Structure for Binary Trees......... 287 7.3.5 An Array-List Representation of a Binary Tree.... 296 7.3.6 Traversals of Binary Trees............... 298 7.3.7 The Template Method Pattern............ 305 7.4 Exercises......................... 309 java.datastructures.net 7.1 General Trees Productivity experts say that breakthroughs come by thinking "nonlinearly." In this chapter, we discuss one of the most important nonlinear data structures in computing—trees. Tree structures are indeed a breakthrough in data organization, for they allow us to implement a host of algorithms much faster than when using linear data structures, such as list. Trees also provide a natural organization for data, and consequently have become ubiquitous structures in file systems, graphical user interfaces, databases, Web sites, and other computer systems. 375 It is not always clear what productivity experts mean by "nonlinear" thinking, but when we say that trees are "nonlinear," we are referring to an organizational relationship that is richer than the simple "before" and "after" relationships between objects in sequences. The relationships in a tree are hierarchical, with some objects being "above" and some "below" others. Actually, the main terminology for tree data structures comes from family trees, with the terms "parent," "child," "ancestor," and "descendent" being the most common words used to describe relationships. We show an example of a family tree in Figure 7.1. Figure 7.1: A family tree showing some descendents of Abraham, as recorded in Genesis, chapters 25–36. 7.1.1 Tree Definitions and Properties A tree is an abstract data type that stores elements hierarchically. With the exception of the top element, each element in a tree has a parent element and zero or more children elements. A tree is usually visualized by placing elements inside ovals or rectangles, and by drawing the connections between parents and children with straight lines. (See Figure 7.2.) We typically call the top element the root of the tree, but it is drawn as the highest element, with the other elements being connected below (just the opposite of a botanical tree). Figure 7.2: A tree with 17 nodes representing the organization of a fictitious corporation. The root stores 376 Electronics R'Us. The children of the root store R&D, Sales, Purchasing, and Manufacturing. The internal nodes store Sales, International, Overseas, Electronics R'Us, and Manufacturing. Formal Tree Definition Formally, we define a tree T as a set of nodes storing elements such that the nodes have a parent-child relationship, that satisfies the following properties: • If T is nonempty, it has a special node, called the root of T, that has no parent. • Each node v of T different from the root has a unique parent node w; every node with parent w is a child of w. Note that according to our definition, a tree can be empty, meaning that it doesn't have any nodes. This convention also allows us to define a tree recursively, such that a tree T is either empty or consists of a node r, called the root of T, and a (possibly empty) set of trees whose roots are the children of r. Other Node Relationships 377 Two nodes that are children of the same parent are siblings. A node v is external if v has no children. A node v is internal if it has one or more children. External nodes are also known as leaves. Example 7.1: In most operating systems, files are organized hierarchically into nested directories (also called folders), which are presented to the user in the form of a tree. (See Figure 7.3.) More specifically, the internal nodes of the tree are associated with directories and the external nodes are associated with regular files. In the UNIX and Linux operating systems, the root of the tree is appropriately called the "root directory," and is represented by the symbol "/." Figure 7.3: Tree representing a portion of a file system. A node u is an ancestor of a node v if u = v or u is an ancestor of the parent of v. Conversely, we say that a node v is a descendent of a node u if u is an ancestor of v. For example, in Figure 7.3, cs252/ is an ancestor of papers/, and pr3 is a descendent of cs016/. The subtree of T rooted at a node v is the tree consisting of all the descendents of v in T (including v itself). In Figure 7.3, the subtree rooted at cs016/ consists of the nodes cs016/, grades, homeworks/, programs/, hw1, hw2, hw3, pr1, pr2, and pr3. 378 Edges and Paths in Trees An edge of tree T is a pair of nodes (u, v) such that u is the parent of v, or vice versa. A path of T is a sequence of nodes such that any two consecutive nodes in the sequence form an edge. For example, the tree in Figure 7.3 contains the path (cs252/, projects/, demos/, market). Example 7.2: The inheritance relation between classes in a Java program forms a tree. The root, java. lang. Object, is an ancestor of all other classes. Each class, C, is a descendent of this root and is the root of a subtree of the classes that extend C. Thus, there is a path from C to the root, java.lang.Object, in this inheritance tree. Ordered Trees A tree is ordered if there is a linear ordering defined for the children of each node; that is, we can identify the children of a node as being the first, second, third, and so on. Such an ordering is usually visualized by arranging siblings left to right, according to their ordering. Ordered trees typically indicate the linear order among siblings by listing them in the correct order. Example 7.3: The components of a structured document, such as a book, are hierarchically organized as a tree whose internal nodes are parts, chapters, and sections, and whose external nodes are paragraphs, tables, figures, and so on. (See Figure 7.4.) The root of the tree corresponds to the book itself. We could, in fact, consider expanding the tree further to show paragraphs consisting of sentences, sentences consisting of words, and words consisting of characters. Such a tree is an example of an ordered tree, because there is a well-defined ordering among the children of each node. Figure 7.4: An ordered tree associated with a book. 379 7.1.2 The Tree Abstract Data Type The tree ADT stores elements at positions, which, as with positions in a list, are defined relative to neighboring positions. The positions in a tree are its nodes, and neighboring positions satisfy the parent-child relationships that define a valid tree. Therefore, we use the terms "position" and "node" interchangeably for trees. As with a list position, a position object for a tree supports the method: element(): return the object stored at this position. The real power of node positions in a tree, however, comes from the accessor methods of the tree ADT that return and accept positions, such as the following: root(): return the tree's root; an error occurs if the tree is empty. parent (v): return the parent of v; an error occurs if v is the root. children(v): return an iterable collection containing the children of node v. If a tree T is ordered, then the iterable collection, children(v), stores the children of v in order. If v is an external node, then children(v) is empty. In addition to the above fundamental accessor methods, we also include the following query methods: isInternal(v): Test whether node v is internal. isExternal(v): Test whether node v is external. isRoot(v): Test whether node v is the root. These methods make programming with trees easier and more readable, since we can use them in the conditionals of if statements and while loops, rather than using a nonintuitive conditional. 380 There are also a number of generic methods a tree should probably support that are not necessarily related to its tree structure, including the following: size(): return the number of nodes in the tree. isEmpty(): Test whether the tree has any nodes or not. iterator(): return an iterator of all the elements stored at nodes of the tree. positions(): return an iterable collection of all the nodes of the tree. replace(v,e): Replace with e and return the element stored at node v. Any method that takes a position as an argument should generate an error condition if that position is invalid. We do not define any specialized update methods for trees here. Instead, we prefer to describe different tree update methods in conjunction with specific applications of trees in subsequent chapters. In fact, we can imagine several kinds of tree update operations beyond those given in this book. 7.1.3 Implementing a Tree The Java interface shown in Code Fragment 7.1 represents the tree ADT. Error conditions are handled as follows: Each method that can take a position as an argument, may throw an InvalidPositionException, to indicate that the position is invalid. Method parent throws a BoundaryViolationException if it is called on the root. Method root throws an EmptyTreeException if it is called on an empty tree. Code Fragment 7.1: Java interface Tree representing the tree ADT. Additional update methods may be added, depending on the application. We do not include such methods in the interface, however. 381 A Linked Structure for General Trees A natural way to realize a tree T is to use a linked structure, where we represent each node v of T by a position object (see Figure 7.5a) with the following fields: A reference to the element stored at v, a link to the parent of v, and a some kind of collection (for example, a list or array) to store links to the children of v. If v is the root of T, then the parent field of v is null. Also, we store a reference to the root of T and the number of nodes of T in internal variables. This structure is schematically illustrated in Figure 7.5b. 382 Figure 7.5: The linked structure for a general tree: (a) the position object associated with a node; (b) the portion of the data structure associated with a node and its children. Table 7.1 summarizes the performance of the implementation of a general tree using a linked structure. The analysis is left as an exercise (C-7.25), but we note that, by using a collection to store the children of each node v, we can implement children(v) simply by returning a reference to this collection. Table 7.1: Running times of the methods of an n- node general tree implemented with a linked structure. We let cv denote the number of children of a node v. The space usage is O(n). Operation Time size, isEmpty O(1) iterator, positions O(n) 383 replace O(1) root, parent O(1) children(v) O(cv) isInternal, isExternal, isRoot O(1) 7.2 Tree Traversal Algorithms In this section, we present algorithms for performing traversal computations on a tree by accessing it through the tree ADT methods. 7.2.1 Depth and Height Let v be a node of a tree T. The depth of v is the number of ancestors of v, excluding v itself. For example, in the tree of Figure 7.2, the node storing International has depth 2. Note that this definition implies that the depth of the root of T is 0. The depth of a node v can also be recursively defined as follows: • If v is the root, then the depth of v is 0 • Otherwise, the depth of v is one plus the depth of the parent of v. Based on this definition, we present a simple, recursive algorithm, depth, in Code Fragment 7.2, for computing the depth of a node v in T. This method calls itself recursively on the parent of v, and adds 1 to the value returned. A simple Java implementation of this algorithm is shown in Code Fragment 7.3. Code Fragment 7.2: Algorithm for computing the depth of a node v in a tree T. 384 Code Fragment 7.3: Method depth written in Java. The running time of algorithm depth(T, v) is O(dv), where dv denotes the depth of the node v in the tree T, because the algorithm performs a constant-time recursive step for each ancestor of v. Thus, algorithm depth (T, v) runs in O(n) worst-case time, where n is the total number of nodes of T, since a node of T may have depth n − 1 in the worst case. Although such a running time is a function of the input size, it is more accurate to characterize the running time in terms of the parameter dv, since this parameter can be much smaller than n. Height The height of a node v in a tree T is also defined recursively: • If v is an external node, then the height of v is 0 • Otherwise, the height of v is one plus the maximum height of a child of v. The height of a nonempty tree T is the height of the root of T. For example, the tree of Figure 7.2 has height 4. In addition, height can also be viewed as follows. Proposition 7.4: The height of a nonempty tree T is equal to the maximum depth of an external node of T. We leave the justification of this fact to an exercise (R-7.6). We present here an algorithm, height1, shown in Code Fragment 7.4 and implemented in Java in Code Fragment 7.5, for computing the height of a nonempty tree T based on the proposition above and the algorithm depth from Code Fragment 7.2. 385 Code Fragment 7.4: Algorithm height1 for computing the height of a nonempty tree T. Note that this algorithm calls algorithm depth (Code Fragment 7.2). Code Fragment 7.5: Method height1 written in Java. Note the use of the max method of class java.lang. Math. Unfortunately, algorithm height1 is not very efficient. Since height1 calls algorithm depth (v) on each external node v of T, the running time of height1 is given by O(n + Σv(1 + dv)), where n is the number of nodes of T, dv is the depth of node v, and E is the set of external nodes of T. In the worst case, the sumΣv(1 d + v) is proportional to n2. (See Exercise C-7.6.) Thus, algorithm height1 runs in O(n2) time. Algorithm height2, shown in Code Fragment 7.6 and implemented in Java in Code Fragment 7.7, computes the height of tree T in a more efficient manner by using the recursive definition of height. 386 Code Fragment 7.6: Algorithm height2 for computing the height of the subtree of tree T rooted at a node v. Code Fragment 7.7: Method height2 written in Java. Algorithm height2 is more efficient than height1 (from Code Fragment 7.4). The algorithm is recursive, and, if it is initially called on the root of T, it will eventually be called on each node of T. Thus, we can determine the running time of this method by summing, over all the nodes, the amount of time spent at each node (on the nonrecursive part). Processing each node in children(v) takes O(cv) time, where cv denotes the number of children of node v. Also, the while loop has cv iterations and each iteration of the loop takes O(1) time plus the time for the recursive call on a child of v. Thus, algorithm height2 spends O(1 + cv) time at each node v, and its running time is O(Σv(1 + cv)). In order to complete the analysis, we make use of the following property. Proposition 7.5: Let T be a tree with n nodes, and let cv denote the number of children of a node v of T. Then, summing over the vertices in T, Σvcv= n − 1. 387 Justification: Each node of T, with the exception of the root, is a child of another node, and thus contributes one unit to the above sum. By Proposition 7.5, the running time of algorithm height2, when called on the root of T, is O(n), where n is the number of nodes of T. 7.2.2 Preorder Traversal A traversal of a tree T is a systematic way of accessing, or "visiting," all the nodes of T. In this section, we present a basic traversal scheme for trees, called preorder traversal. In the next section, we will study another basic traversal scheme, called postorder traversal. In a preorder traversal of a tree T, the root of T is visited first and then the subtrees rooted at its children are traversed recursively. If the tree is ordered, then the subtrees are traversed according to the order of the children. The specific action associated with the "visit" of a node v depends on the application of this traversal, and could involve anything from incrementing a counter to performing some complex computation for v. The pseudo-code for the preorder traversal of the subtree rooted at a node v is shown in Code Fragment 7.8. We initially call this algorithm with preorder(T,T.root()). Code Fragment 7.8: Algorithm preorder for performing the preorder traversal of the subtree of a tree T rooted at a node v. The preorder traversal algorithm is useful for producing a linear ordering of the nodes of a tree where parents must always come before their children in the ordering. Such orderings have several different applications. We explore a simple instance of such an application in the next example. Figure 7.6: Preorder traversal of an ordered tree, where the children of each node are ordered from left to right. 388 Example 7.6: The preorder traversal of the tree associated with a document, as in Example 7.3, examines an entire document sequentially, from beginning to end. If the external nodes are removed before the traversal, then the traversal examines the table of contents of the document. (See Figure 7.6.) The preorder traversal is also an efficient way to access all the nodes of a tree. To justify this, let us consider the running time of the preorder traversal of a tree T with n nodes under the assumption that visiting a node takes O(1) time. The analysis of the preorder traversal algorithm is actually similar to that of algorithm height2 (Code Fragment 7.7), given in Section 7.2.1. At each node v, the nonrecursive part of the preorder traversal algorithm requires time O(1 + cv), where cv is the number of children of v. Thus, by Proposition 7.5, the overall running time of the preorder traversal of T is O(n). Algorithm toStringPreorder(T, v), implemented in Java in Code Fragment 7.9, performs a preorder printing of the subtree of a node v of T, that is, it performs the preorder traversal of the subtree rooted at v and prints the element stored at a node when the node is visited. Recall that, for an ordered tree T, method T.children(v) returns an iterable collection that accesses the children of v in order. Code Fragment 7.9: Method toStringPreorder(T, v) that performs a preorder printing of the elements in the subtree of node v of T. 389 There is an interesting application of the preorder traversal algorithm that produces a string representation of an entire tree. Let us assume again that for each element e stored in tree T, calling e.toString() returns a string associated with e. The parenthetic string representation P(T) of tree T is recursively defined as follows. If T consists of a single node v, then P(T) = v.element().toString(). Otherwise, P(T) = v.element().toString() + " (" + P(T1) + "," + ··· + ", "+ P(Tk) +")", where v is the root of T and T1, T2,..., Tk are the subtrees rooted at the children of v, which are given in order if T is an ordered tree. Note that the above definition of P(T) is recursive. Also, we are using "+" here to denote string concatenation. The parenthetic representation of the tree of Figure 7.2 is shown in Figure 7.7. Figure 7.7: Parenthetic representation of the tree of Figure 7.2. Indentation, line breaks and spaces have been added for clarity. Note that, technically speaking, there are some computations that occur between and after the recursive calls at a node's children in the above algorithm. We still consider this algorithm to be a preorder traversal, however, since the primary action of printing a node's contents occurs prior to the recursive calls. The Java method parentheticRepresentation, shown in Code Fragment 7.10, is a variation of method toStringPreorder (Code Fragment 7.9). It implements the definition given above to output a parenthetic string representation of a tree T. As with the method toStringPreorder, the method parentheticRepresentation makes use of the toString method that is defined for every Java object. In fact, we can view this method as a kind of toString() method for tree objects. 390 Code Fragment 7.10: Algorithm parentheticRepresentation. Note the use of the + operator to concatenate two strings. We explore a modification to Code Fragment 7.10 in Exercise R-7.9, to display a tree in a fashion more closely matching that given in Figure 7.7. 7.2.3 Postorder Traversal Another important tree traversal algorithm is the postorder traversal. This algorithm can be viewed as the opposite of the preorder traversal, because it recursively traverses the subtrees rooted at the children of the root first, and then visits the root. It is similar to the preorder traversal, however, in that we use it to solve a particular problem by specializing an action associated with the "visit" of a node v. Still, as with the preorder traversal, if the tree is ordered, we make recursive calls for the children of a node v according to their specified order. Pseudo-code for the postorder traversal is given in Code Fragment 7.11. Code Fragment 7.11: Algorithm postorder for performing the postorder traversal of the subtree of a tree T rooted at a node v. 391 The name of the postorder traversal comes from the fact that this traversal method will visit a node v after it has visited all the other nodes in the subtree rooted at v. (See Figure 7.8.) Figure 7.8: Postorder traversal of the ordered tree of Figure 7.6. The analysis of the running time of a postorder traversal is analogous to that of a preorder traversal. (See Section 7.2.2.) The total time spent in the nonrecursive portions of the algorithm is proportional to the time spent visiting the children of each node in the tree. Thus, a postorder traversal of a tree T with n nodes takes O(n) time, assuming that visiting each node takes O(1) time. That is, the postorder traversal runs in linear time. As an example of postorder traversal, we show a Java method toStringPostorder in Code Fragment 7.12, which performs a postorder traversal of a tree T. This method prints the element stored at a node when it is visited. Code Fragment 7.12: Method toStringPostorder(T, v) that performs a postorder printing of the elements in the subtree of node v of T. The method implicitly calls toString on elements, when they are involved in a string concatenation operation. 392 The postorder traversal method is useful for solving problems where we wish to compute some property for each node v in a tree, but computing that property for v requires that we have already computed that same property for v's children. Such an application is illustrated in the following example. Example 7.7: Consider a file system tree T, where external nodes represent files and internal nodes represent directories (Example 7.1). Suppose we want to compute the disk space used by a directory, which is recursively given by the sum of: • The size of the directory itself • The sizes of the files in the directory • The space used by the children directories. (See Figure 7.9.) This computation can be done with apostorder traversal of tree T. After the subtrees of an internal node v have been traversed, we compute the space used by v by adding the sizes of the directory v itself and of the files contained in v to the space used by each internal child of v, which was computed by the recursive postorder traversals of the children of v. A Recursive Java Method for Computing Disk Space Motivated by Example 7.7, Java method diskSpace, shown in Code Fragment 7.13, performs a postorder traversal of a file-system tree T, printing the name and disk space used by the directory associated with each internal node of T. When called on the root of tree T, diskSpace runs in time O(n), where n is the number of nodes of T, provided the auxiliary methods name and size take O(1) time. Figure 7.9: The tree of Figure 7.3 representing a file system, showing the name and size of the associated file/directory inside each node, and the disk space used by the associated directory above each internal node. 393 Code Fragment 7.13: Method diskSpace prints the name and disk space used by the directory associated with each internal node of a file-system tree. This method calls the auxiliary methods name and size, which should be defined to return the name and size of the file/directory associated with a node. Other Kinds of Traversals 394 Although the preorder and postorder traversals are common ways of visiting the nodes of a tree, we can also imagine other traversals. For example, we could traverse a tree so that we visit all the nodes at depth d before we visit the nodes at depth d + 1. Consecutively numbering the nodes of a tree T as we visit them in this traversal is called the level numbering of the nodes of T (see Section 7.3.5). 7.3 Binary Trees A binary tree is an ordered tree with the following properties: 1. Every node has at most two children. 2. Each child node is labeled as being either a left child or a right child. 3. A left child precedes a right child in the ordering of children of a node. The subtree rooted at a left or right child of an internal node v is called a left subtree or right subtree, respectively, of v. A binary tree is proper if each node has either zero or two children. Some people also refer to such trees as being full binary trees. Thus, in a proper binary tree, every internal node has exactly two children. A binary tree that is not proper is improper. Example 7.8: An important class of binary trees arises in contexts where we wish to represent a number of different outcomes that can result from answering a series of yes-or-no questions. Each internal node is associated with a question. Starting at the root, we go to the left or right child of the current node, depending on whether the answer to the question is "Yes" or "No." With each decision, we follow an edge from a parent to a child, eventually tracing a path in the tree from the root to an external node. Such binary trees are known as decision trees, because each external node v in such a tree represents a decision of what to do if the questions associated with v's ancestors are answered in a way that leads to v. A decision tree is a proper binary tree. Figure 7.10 illustrates a decision tree that provides recommendations to a prospective investor. Figure 7.10: A decision tree providing investment advice. 395 Example 7.9: An arithmetic expression can be represented by a binary tree whose external nodes are associated with variables or constants, and whose internal nodes are associated with one of the operators +, −, ×, and /. (See Figure 7.11.) Each node in such a tree has a value associated with it. • If a node is external, then its value is that of its variable or constant. • If a node is internal, then its value is defined by applying its operation to the values of its children. An arithmetic expression tree is a proper binary tree, since each operator +, −, ×, and / takes exactly two operands. Of course, if we were to allow for unary operators, like negation (−), as in "−x," then we could have an improper binary tree. Figure 7.11: A binary tree representing an arithmetic expression. This tree represents the expression ((((3 + 1) × 3)/((9 −5) +2)) − ((3 × (7 −4)) + 6)). The value associated with the internal node labeled "/" is 2. 396 A Recursive Binary Tree Definition Incidentally, we can also define a binary tree in a recursive way such that a binary tree is either empty or consists of: • A node r, called the root of T and storing an element • A binary tree, called the left subtree of T • A binary tree, called the right subtree of T. We discuss some of the specialized topics for binary trees below. 7.3.1 The Binary Tree ADT As an abstract data type, a binary tree is a specialization of a tree that supports three additional accessor methods: left(v): Return the left child of v; an error condition occurs if v has no left child. right(v): Return the right child of v; an error condition occurs if v has no right child. hasLeft(v): Test whether v has a left child. 397 hasRight(v): Test whether v has a right child. Just as in Section 7.1.2 for the tree ADT, we do not define specialized update methods for binary trees here. Instead, we will consider some possible update methods when we describe specific implementations and applications of binary trees. 7.3.2 A Binary Tree Interface in Java We model a binary tree as an abstract data type that extends the tree ADT and adds the three specialized methods for a binary tree. In Code Fragment 7.14, we show the simple Java interface we can define using this approach. By the way, since binary trees are ordered trees, the iterable collection returned by method children(v) (inherited from the Tree interface) stores the left child of v before the right child of v. Code Fragment 7.14: Java interface Binary Tree for the binary tree ADT. Interface Binary Tree extends interface Tree (Code Fragment 7.1). 7.3.3 Properties of Binary Trees Binary trees have several interesting properties dealing with relationships between their heights and number of nodes. We denote the set of all nodes of a tree T at the 398 same depth d as the level dof T. In a binary tree, level 0 has at most one node (the root), level 1 has at most two nodes (the children of the root), level 2 has at most four nodes, and so on. (See Figure 7.12.) In general, level d has at most 2d nodes. Figure 7.12: Maximum number of nodes in the levels of a binary tree. We can see that the maximum number of nodes on the levels of a binary tree grows exponentially as we go down the tree. From this simple observation, we can derive the following properties relating the height of a binary T with its number of nodes. A detailed justification of these properties is left as an exercise (R-7.15). Proposition 7.10: Let T be a nonempty binary tree, and let n, nE, nI and h denote the number of nodes, number of external nodes, number of internal nodes, and height of T, respectively. Then T has the following properties: 1. h+1 ≤ n ≤ 2 h+1 −1 2. 1≤nE≤2h 3. h≤nI≤2h−1 4. log(n+1)−1 ≤h≤n−1. Also, if T is proper, then T has the following properties: 399 1. 2h+1 ≤ n≤2h+1−1 2. h+1≤nE≤2h 3. h≤nI≤2h−1 4. log(n + 1) − 1 ≤ h ≤ (n − 1)/2. Relating Internal Nodes to External Nodes in a Proper Binary Tree In addition to the binary tree properties above, we also have the following relationship between the number of internal nodes and external nodes in a proper binary tree. Proposition 7.11: In a nonempty proper binary tree T, with nE external nodes and nI internal nodes, we have ne = nI + 1. Justification: We justify this proposition by removing nodes from T and dividing them up into two "piles", an internal-node pile and an external-node pile, until T becomes empty. The piles are initially empty. At the end, the external- node pile will have one more node than the internal-node pile. We consider two cases: Case 1: If T has only one node v, we remove v and place it on the external-node pile. Thus, the external-node pile has one node and the internal-node pile is empty. Case 2: Otherwise (T has more than one node), we remove from T an (arbitrary) external node w and its parent v, which is an internal node. We place w on the external-node pile and v on the internal-node pile. If v has a parent u, then we reconnect u with the former sibling z of w, as shown in Figure 7.13. This operation, removes one internal node and one external node, and leaves the tree being a proper binary tree. Repeating this operation, we eventually are left with a final tree consisting of a single node. Note that the same number of external and internal nodes have been removed and placed on their respective piles by the sequence of operations leading to this final tree. Now, we remove the node of the final tree and we place it on the external-node pile. Thus, the the external-node pile has one more node than the internal-node pile. 400 Figure 7.13: Operation that removes an external node and its parent node, used in the justification of Proposition 7.11. Note that the above relationship does not hold, in general, for improper binary trees and nonbinary trees, although there are other interesting relationships that can hold, as we explore in an exercise (C-7.7). 7.3.4 A Linked Structure for Binary Trees As with a general tree, a natural way to realize a binary tree T is to use a linked structure, where we represent each node v of T by a position object (see Figure 7.14a) with fields providing references to the element stored at v and to the position objects associated with the children and parent of v. If v is the root of T, then the parent field of v is null. If v has no left child, then the left field of v is null. If v has no right child, then the right field of v is null. Also, we store the number of nodes of T in a variable, called size. We show the linked structure representation of a binary tree in Figure 7.14b. Figure 7.14: A node (a) and a linked structure (b) for representing a binary tree. 401 Java Implementation of a Binary Tree Node 402 We use a Java interface BTPosition (not shown) to represent a node of a binary tree. This interfaces extends Position, thus inheriting method element, and has additional methods for setting the element stored at the node (setElement) and for setting and returning the left child (setLeft and getLeft), right child (setRight and getRight), and parent (setParent and getParent) of the node. Class BTNode (Code Fragment 7.15) implements interface BTPosition by an object with fields element, left, right, and parent, which, for a node v, reference the element at v, the left child of v, the right child of v, and the parent of v, respectively. Code Fragment 7.15: Auxiliary class BTNode for implementing binary tree nodes. 403 Java Implementation of the Linked Binary Tree Structure In Code Fragments 7.16–7.18, we show portions of class Linked Binary Tree that implements the Binary Tree interface (Code Fragment 7.14) using a linked data structure. This class stores the size of the tree and a reference to the BTNode object associated with the root of the tree in internal variables. In 404 addition to the Binary Tree interface methods, LinkedBinaryTree has various other methods, including accessor method sibling(v), which returns the sibling of a node v, and the following update methods: addRoot(e): Create and return a new node r storing element e and make r the root of the tree; an error occurs if the tree is not empty. insertLeft(v,e): Create and return a new node w storing element e, add w as the the left child of v and return w; an error occurs if v already has a left child. insertRight(v,e): Create and return a new node z storing element e, add z as the the right child of v and return z; an error occurs if v already has a right child. remove(v): Remove node v, replace it with its child, if any, and return the element stored at v; an error occurs if v has two children. attach(v,T1,T2): Attach T1 and T2, respectively, as the left and right subtrees of the external node v; an error condition occurs if v is not external. Class LinkedBinaryTree has a constructor with no arguments that returns an empty binary tree. Starting from this empty tree, we can build any binary tree by creating the first node with method addRoot and repeatedly applying the insertLeft and insertRight methods and/or the attach method. Likewise, we can dismantle any binary tree T using the remove operation, ultimately reducing such a tree T to an empty binary tree. When a position v is passed as an argument to one of the methods of class LinkedBinaryTree, its validity is checked by calling an auxiliary helper method, checkPosition(v). A list of the nodes visited in a preorder traversal of the tree is constructed by recursive method preorderPositions. Error conditions are indicated by throwing exceptions Invalid Position Exception, BoundaryViolation Exception, EmptyTreeException, and NonEmptyTreeException. Code Fragment 7.16: Portions of the Linked Binary Tree class, which implements the Binary Tree interface. (Continues in Code Fragment 7.17.) 405 406 Code Fragment 7.17: Portions of the Linked Binary Tree class, which implements the Binary Tree interface. (Continues in Code Fragment 7.18.) 407 408 Code Fragment 7.18: Portions of the Linked Binary Tree class, which implements the Binary Tree interface. (Continues in Code Fragment 7.19.) 409 Code Fragment 7.19: Portions of the LinkedBinaryTree class, which implements the 410 Binary Tree interface. (Continues in Code Fragment 7.20.) Code Fragment 7.20: Portions of the Linked Binary Tree class, which implements the Binary Tree interface. (Continued from Code Fragment 7.19.) 411 Performance of the Linked Binary Tree Implementation Let us now analyze the running times of the methods of class Linked Binary Tree, which uses a linked structure representation: 412 • Methods size() and isEmpty() use an instance variable storing the number of nodes of T, and each take O(1) time. • The accessor methods root, left, right, sibling and parent take O(1) time. • Method replace(v,e) takes O(1) time. • Methods iterator() and positions() are implemented by performing a pre- order traversal of the tree (using the auxiliary method preorderPositions). The nodes visited by the traversal are stored in a position list implemented by class NodePositionList (Section 6.2.4) and the output iterator is generated with method iterator() of class NodePositionList. Methods iterator() and positions() take O(n) time and methods hasNext() and next() of the returned iterators run in O(1) time. • Method children uses a similar approach to construct the returned iterable collection, but it runs in O(1) time, since there are at most two children for any node in a binary tree. • The update methods insertLeft, insertRight, attach, and remove all run in O(1) time, as they involve constant-time manipulation of a constant number of nodes. Considering the space required by this data structure for a tree with n nodes, note that there is an object of class BTNode (Code Fragment 7.15) for every node of tree T. Thus, the overall space requirement is O(n). Table 7.2 summarizes the performance of the linked structure implementation of a binary tree. Table 7.2: Running times for the methods of an n- node binary tree implemented with a linked structure. Methods hasNext() and next() of the iterators returned by iterator(), positions().iterator(), and children(v).iterator() run in O(1) time. The space usage is O(n). Operation Time size, isEmpty O(1) iterator, positions 413 O(n) replace O(1) root, parent, children, left, right, sibling O(1) hasLeft, hasRight, isInternal, isExternal, isRoot O(1) insertLeft, insertRight, attach, remove O(1) 7.3.5 An Array-List Representation of a Binary Tree An alternative representation of a binary tree T is based on a way of numbering the nodes of T. For every node v of T, let p(v) be the integer defined as follows. • If v is the root of T, then p(v) = 1. • If v is the left child of node u, then p(v) = 2p(u). • If v is the right child of node u, then p(v) = 2p(u) + 1. The numbering function p is known as a level numbering of the nodes in a binary tree T, for it numbers the nodes on each level of T in increasing order from left to right, although it may skip some numbers. (See Figure 7.15.) Figure 7.15: Binary tree level numbering: (a) general scheme; (b) an example. 414 The level numbering function p suggests a representation of a binary tree T by means of an array list S such that node v of T is the element of S at index p(v). As mentioned in the previous chapter, we realize the array list S by means of an extendable array. (See Section 6.1.4.) Such an implementation is simple and efficient, for we can use it to easily perform the methods root, parent, left, right, hasLeft, hasRight, isInternal, isExternal, and isRoot by using simple arithmetic operations on the numbers p(v) associated with each node v involved in the operation. We leave the details of this implementation as an exercise (R-7.26). We show an example array-list representation of a binary tree in Figure 7.16. Figure 7.16: Representation of a binary tree T by means of an array list S. 415 Let n be the number of nodes of T, and let pM be the maximum value of p(v) over all the nodes of T. The array list S has size N = pM + 1 since the element of S at index 0 is not associated with any node of T. Also, S will have, in general, a number of empty elements that do not refer to existing nodes of T. In fact, in the worst case, N = 2n, the justification of which is left as an exercise (R-7.23). In Section 8.3, we will see a class of binary trees, called "heaps" for which N = n + 1. Thus, in spite of the worst-case space usage, there are applications for which the array-list representation of a binary tree is space efficient. Still, for general binary trees, the exponential worst-case space requirement of this representation is prohibitive. Table 7.3 summarizes running times of the methods of a binary tree implemented with an array list. We do not include any tree update methods here. Table 7.3: Running times for a binary tree T implemented with an array list S. We denote the number of nodes of T with n, and N denotes the size of S. The space usage is O(N), which is O(2n) in the worst case. Operation 416 Time size, isEmpty O(1) iterator, positions O(n) replace O(1) root, parent, children, left, right O(1) hasLeft, hasRight, isInternal, isExternal, isRoot O(1) 7.3.6 Traversals of Binary Trees As with general trees, binary tree computations often involve traversals. Building an Expression Tree Consider the problem of constructing an expression tree from a fully parenthesized arithmetic expression of size n. (Recall Example 7.9 and Code Fragment 7.24.) In Code Fragment 7.21, we give algorithm buildExpression for building such an expression tree, assuming all arithmetic operations are binary and variables are not parenthesized. Thus, every parenthesized subexpression contains an operator in the middle. The algorithm uses a stack S while scanning the input expression E looking for variables, operators, and right parentheses. • When we see a variable or operator x, we create a single-node binary tree T, whose root stores x and we push T on the stack. • When we see a right parenthesis, ")", we pop the top three trees from the stack S, which represent a subexpression (E1 o E2). We then attach the trees for E1 and E2 to the one for o, and push the resulting tree back on S. We repeat this until the expression E has been processed, at which time the top element on the stack is the expression tree for E. The total running time is O(n). 417 Code Fragment 7.21: Algorithm buildExpression. Preorder Traversal of a Binary Tree Since any binary tree can also be viewed as a general tree, the preorder traversal for general trees (Code Fragment 7.8) can be applied to any binary tree. We can simplify the algorithm in the case of a binary tree traversal, however, as we show in Code Fragment 7.22. Code Fragment 7.22: Algorithm binaryPreorder for performing the preorder traversal of the subtree of a binary tree T rooted at a node v. 418 As is the case for general trees, there are many applications of the preorder traversal for binary trees. Postorder Traversal of a Binary Tree Analogously, the postorder traversal for general trees (Code Fragment 7.11) can be specialized for binary trees, as shown in Code Fragment 7.23. Code Fragment 7.

Các file đính kèm theo tài liệu này:

  • pdfTree Definitions and Properties.pdf
Tài liệu liên quan