Data Structure

Background

Data Structure and Algorithm

Data Structures: efficiently organise data in computer memory.

Algorithms: manipulate data structures to achieve a given goal.

In order for a program to terminate fast (or in time), it has to use appropriate data structures and efficient algorithms.

Outline

  • various data structures
  • basic algorithms
  • understanding the strengths and weaknesses of those, in terms of their time and space complexities

Outcome: to be able to decide which algorithms and data structure to chose for a given task.

Preliminaries

Abstract data types (ADT)

An abstract data type is

  • a type
  • with associated operations
  • whose representation is hidden to the user

Example: Integers are an abstract data type with operations + , - , * , mod, div,…

  • A type is a collection of values, e.g. integers, Boolean values (true and false).
  • The operations on ADT might come with mathematically specified constraints, for example on the time complexity of the operations.
  • Advantages of ADT’s as explained by Aho, Hopcroft and Ullman (1983):
    • “At first, it may seem tedious writing procedures to govern all accesses to the underlying structures. However, if we discipline ourselves to writing programs in terms of the operations for manipulating abstract data types rather than making use of particular implementations details, then we can modify programs more readily by reimplementing the operations rather than searching all programs for places where we have made accesses to the underlying data structures. This flexibility can be particularly important in large software efforts, and the reader should not judge the concept by the necessarily tiny examples found in this book.”

Mathematical Preliminaries

Sets and Relations

The concept of a set in the mathematical sense has wide application in computer science. The notations and techniques of set theory are commonly used when de- scribing and implementing algorithms because the abstractions associated with sets often help to clarify and simplify algorithm design.

A set is a collection of distinguishable members or elements. The members are typically drawn from some larger population known as the base type. Each member of a set is either a primitive element of the base type or is a set itself. There is no concept of duplication in a set. Each value from the base type is either in the set or not in the set. For example, a set named $P$ might consist of the three integers 7, 11, and 42. In this case, $P$’s members are 7, 11, and 42, and the base type is integer.

Symbols commonly used to express sets and their relationships:

  • $\{1, 4\}$
    • A set composed of the members 1 and 4
  • $\{x \mid x \: is \: a \: positive \: integer\}$
    • A set definition using a set former.
    • Example: the set of all positive integers
  • $x \in P$
    • x is a member of set P
  • $x \notin P$
    • x is not a member of set P
  • $\emptyset$
    • The null or empty set
  • $\mid P \mid$
    • Cardinality: size of set P or number of members for set P
  • $P \subseteq Q, \: Q \supseteq P $
    • Set P is included in set Q, set P is a subset of set Q, set Q is a superset of set P
  • $P \cup Q$
    • Set Union: all elements appearing in P OR Q
  • $P \cap Q$
    • Set Intersection: all elements appearing in P AND Q
  • $P \: — Q$
    • Set difference: all elements of set P NOT in set Q

A sequence is a collection of elements with an order, and which may contain duplicate-valued elements. A sequence is also sometimes called a tuple or a vector. In a sequence, there is a 0th element, a 1st element, 2nd element, and so on. I indicate a sequence by using angle brackets <> to enclose its elements. For example, <3, 4 4, 5,> is a sequence. Note that sequence <3, 4 5, 4,> is distinct from sequence <3, 4 4, 5,>, and both are distinct from sequence <3, 5 4,>.

A relation R over set S is a set of ordered pairs from S. As an example of a relation, if S is {a, b, c}, then

is a relation, and

is a different relation.

If tuple ⟨x, y⟩ is in relation R, we may use the infix notation xRy. We often use relations such as the less than operator (<) on the natural numbers, which includes ordered pairs such as ⟨1, 3⟩ and ⟨2, 23⟩, but not ⟨3, 2⟩ or ⟨2, 2⟩. Rather than writing the relationship in terms of ordered pairs, we typically use an infix notation for such relations, writing 1 < 3.

Define the properties of relations as follows, with R a binary relation over set S.

  • R is reflexive if aRa for all $a \in S$.
  • R is symmetric if whenever aRb, then bRa, for all $a, b \in S$.
  • R is antisymmetric if whenever aRb and bRa, then a = b, for all $a, b \in S$.
  • R is transitive if whenever aRb and bRc, then aRc, for all $a, b, c \in S$.

A binary relation is called a partial order if it is antisymmetric and transitive. The set on which the partial order is defined is called a partially ordered set or a poset. Elements x and y of a set are comparable under a given relation if either xRy or yRx. If every pair of distinct elements in a partial order are comparable, then the order is called a total order or linear order.

Logarithms

A logarithm of base $b$ for value $y$ is the power to which $b$ is raised to get $y$. Normally,this is written as $\log_b y = x$. Thus, if $\log_b y = x$. then $b^x = y$,and $b^{\log_b y} = y$.

Logarithms have the following properties, for any positive values of m, n, and r, and any positive integers a and b.

The first two properties state that the logarithm of two numbers multiplied (or divided) can be found by adding (or subtracting) the logarithms of the two numbers.

Summations and Recurrences

Most programs contain loop constructs. When analyzing running time costs for programs with loops, we need to add up the costs for each time the loop is executed. This is an example of a summation. Summations are simply the sum of costs for some function applied to a range of parameter values. Summations are typically written with the following “Sigma” notation:

This notation indicates that we are summing the value of f(i) over some range of (integer) values. The parameter to the expression and its initial value are indicated below the $\sum$ symbol. Here, the notation $i = 1$ indicates that the parameter is $i$ and that it begins with the value 1. At the top of the $\sum$ symbol is the expression $n$. This indicates the maximum value for the parameter $i$. Thus, this notation means to sum the values of $f (i)$ as $i$ ranges across the integers from 1 through $n$. This can also be written $f(a)+f(2)+ … + f(n)$. Within a sentence, Sigma notation is typeset as $\sum_{i = 1}^{n}f(i)$.

Given a summation, you often wish to replace it with an algebraic equation with the same value as the summation. This is known as a closed-form solution, and the process of replacing the summation with its closed-form solution is known as solving the summation. For example, the summation $\sum_{i = 1}^{n} 1$ is simply the expression “$1$” summed $n$ times (remember that $i$ ranges from $1$ to $n$). Because the sum of $n$ 1s is $n$, the closed-form solution is $n$. The following is a list of useful summations, along with their closed-form solutions.

The sum of reciprocals from 1 to n, called the Harmonic Series.

Arrays and Linked list

List

We all have an intuitive understanding of what we mean by a “list.” Our first step is to define precisely what is meant so that this intuitive understanding can eventually be converted into a concrete data structure and its operations. The most important concept related to lists is that of position. In other words, we perceive that there is a first element in the list, a second element, and so on. We should view a list as embodying the mathematical concepts of a sequence, as defined above.

We define a list to be a finite, ordered sequence of data items known as elements. “Ordered” in this definition means that each element has a position in the list. (We will not use “ordered” in this context to mean that the list elements are sorted by value.) Each list element has a data type. In the simple list implemen- tations discussed in this chapter, all elements of the list have the same data type, although there is no conceptual objection to lists whose elements have differing data types if the application requires it. The operations defined as part of the list ADT do not depend on the elemental data type. For example, the list ADT can be used for lists of integers, lists of characters, lists of payroll records, even lists of lists.

A list is said to be empty when it contains no elements. The number of ele- ments currently stored is called the length of the list. The beginning of the list is called the head, the end of the list is called the tail. There might or might not be some relationship between the value of an element and its position in the list. For example, sorted lists have their elements positioned in ascending order of value, while unsorted lists have no particular relationship between element values and positions. This section will consider only unsorted lists.

When presenting the contents of a list, we use the same notation as was in- troduced for sequences above. To be consistent with Java array indexing, the first position on the list is denoted as $0$. Thus, if there are $n$ elements in the list, they are given positions $0$ through $n − 1$ as $⟨a_0, a_1, …, a_{n−1}⟩$. The subscript indicates an element’s position within the list. Using this notation, the empty list would appear as ⟨⟩.

Before selecting a list implementation, a program designer should first consider what basic operations the implementation must support. Our common intuition about lists tells us that a list should be able to grow and shrink in size as we insert and remove elements. We should be able to insert and remove elements from any- where in the list. We should be able to gain access to any element’s value, either to read it or to change it. We must be able to create and clear (or reinitialize) lists. It is also convenient to access the next or previous element from the “current” one.

An example of a list of numbers

Ordered collection of elements

List is an ADT; list operations are:

  • insert an entry (on a certain position)
  • delete an entry
  • access data by position
  • search
  • move around the list - concatenate two lists
  • sort

Different representations of lists

Depending on what operations are needed for our application, we choose from different data structures (some implement certain operations faster than the others):

  • Arrays.
  • Linked lists.
  • Dynamic arrays.
  • Unrolled linked lists.

A list can be iterated through as shown in the following code fragment.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
/**
* List ADT
*/
public interface List<E> {
/**
* Remove all contents from the list, so it is once again
* empty. Client is responsible for reclaiming storage
* used by the list elements.
*/
public void clear();

/**
* Insert an element at the current location. The client
* must ensure that the list's capacity is not exceeded.
*
* @param item The element to be inserted.
*/
public void insert(E item);

/**
* Append an element at the end of the list. The client
* must ensure that the list's capacity is not exceeded.
*
* @param item The element to be appended.
*/
public void append(E item);

/**
* Remove and return the current element.
*
* @return The element that was removed.
*/
public E remove();

/**
* Set the current position to the start of the list
*/
public void moveToStart();

/**
* Set the current position to the end of the list
*/
public void moveToEnd();

/**
* Move the current position one step left. No change
* if already at beginning.
*/
public void prev();

/**
* Move the current position one step right. No change
* if already at end.
*/
public void next();

/**
* @return The number of elements in the list.
*/
public int length();

/**
* @return The position of the current element.
*/
public int currPos();

/**
* Set current position.
*
* @param pos The position to make current.
*/
public void moveToPos(int pos);

/**
* @return The current element.
*/
public E getValue();
}

Array-Based List

Following code fragment shows the array-based list implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
/**
* Array-based list implementation
*/
class AList<E> implements List<E> {
private static final int defaultSize = 10; // Default size
private int maxSize; // Maximum size of list
private int listSize; // Current # of list items
private int curr; // Position of current element
private E[] listArray; // Array holding list elements

/** Constructors */
/**
* Create a list with the default capacity.
*/
AList() {
this(defaultSize);
}

/**
* Create a new list object.
*
* @param size Max # of elements list can contain.
*/
@SuppressWarnings("unchecked")
// Generic array allocation
AList(int size) {
maxSize = size;
listSize = curr = 0;
listArray = (E[]) new Object[size]; // Create listArray
}

// Reinitialize the list
public void clear() {
listSize = curr = 0;
} // Simply reinitialize values

/**
* Insert "it" at current position
*/
public void insert(E it) {
assert listSize < maxSize : "List capacity exceeded";
for (int i = listSize; i > curr; i--) // Shift elements up
listArray[i] = listArray[i - 1]; // to make room
listArray[curr] = it;
listSize++; // Increment list size
}

/**
* Append "it" to list
*/
public void append(E it) {
assert listSize < maxSize : "List capacity exceeded";
listArray[listSize++] = it;
}

/**
* Remove and return the current element
*/
public E remove() {
if ((curr < 0) || (curr >= listSize)) // No current element
return null;
E it = listArray[curr]; // Copy the element
for (int i = curr; i < listSize - 1; i++) // Shift them down
listArray[i] = listArray[i + 1];
listSize--; // Decrement size
return it;
}

public void moveToStart() {
curr = 0;
} // Set to front

public void moveToEnd() {
curr = listSize;
} // Set at end

public void prev() {
if (curr != 0) curr--;
} // Back up

public void next() {
if (curr < listSize) curr++;
}

/**
* @return List size
*/
public int length() {
return listSize;
}

/**
* @return Current position
*/
public int currPos() {
return curr;
}

/**
* Set current list position to "pos"
*/
public void moveToPos(int pos) {
assert (pos >= 0) && (pos <= listSize) : "Pos out of range";
curr = pos;
}

/**
* @return Current element
*/
public E getValue() {
assert (curr >= 0) && (curr < listSize) :
"No current element";
return listArray[curr];
}
// Extra stuff not printed in the book.

/**
* Generate a human-readable representation of this list's contents
* that looks something like this: < 1 2 3 | 4 5 6 >. The vertical
* bar represents the current location of the fence. This method
* uses toString() on the individual elements.
*
* @return The string representation of this list
*/
public String toString() {
// Save the current position of the list
int oldPos = currPos();
int length = length();
StringBuffer out = new StringBuffer((length() + 1) * 4);

moveToStart();
out.append("< ");
for (int i = 0; i < oldPos; i++) {
out.append(getValue());
out.append(" ");
next();
}
out.append("| ");
for (int i = oldPos; i < length; i++) {
out.append(getValue());
out.append(" ");
next();
}
out.append(">");
moveToPos(oldPos); // Reset the fence to its original position
return out.toString();
}
}

listArray must be allocated at some fixed size, the size of the array must be known when the list object is created. Note that an optional parameter is declared for the AList constructor. With this parameter, the user can indicate the maximum numberofelementspermittedinthelist. Ifnoparameterisgiven,thenittakesthe value defaultSize, which is assumed to be a suitably defined constant value.

Because each list can have a differently sized array, each list must remember its maximum permitted size. Data member maxSize serves this purpose. At any given time the list actually holds some number of elements that can be less than the maximum allowed by the array. This value is stored in listSize. Data member curr stores the current position. Because listArray, maxSize, listSize, and curr are all declared to be private, they may only be accessed by methods of Class AList.

Class AList stores the list elements in the first listSize contiguous array positions. Array positions correspond to list positions. In other words, the element at position i in the list is stored at array cell i. The head of the list is always at position 0. This makes random access to any element in the list quite easy. Given some position in the list, the value of the element in that position can be accessed directly. Thus, access to any element using the moveToPos method followed by the getValue method takes $\Theta(1)$ time.

Because the array-based list implementation is defined to store list elements in contiguous cells of the array, the insert, append, and remove methods must maintain this property. Inserting or removing elements at the tail of the list is easy, so the append operation takes $\Theta(1)$ time. But if we wish to insert an element at the head of the list, all elements currently in the list must shift one position toward the tail to make room, as illustrated below. This process takes $\Theta(n)$ time if there are $n$ elements already in the list. If we wish to insert at position $i$ within a list of $n$ elements, then $n − i$ elements must shift toward the tail. Removing an element from the head of the list is similar in that all remaining elements must shift toward the head by one position to fill in the gap. To remove the element at position $i$, $n − i − 1$ elements must shift toward the head. In the average case, insertion or removal requires moving half of the elements, which is $\Theta(n)$.

Most of the other member functions for Class AList simply access the current list element or move the current position. Such operations all require $\Theta(n)$ time. Aside from insert and remove, the only other operations that might require more than constant time are the constructor, the destructor, and clear. These three member functions each make use of the system free-store operation new.

Linked list

The second traditional approach to implementing lists makes use of pointers and is usually called a linked list. The linked list uses dynamic memory allocation, that is, it allocates memory for new list elements as needed.

A linked list is made up of a series of objects, called the nodes of the list. Because a list node is a distinct object (as opposed to simply a cell in an array), it is good practice to make a separate list node class. An additional benefit to creating a list node class is that it can be reused by the linked implementations for the stack and queue data structures presented later in this chapter. The following code shows the implementation for list nodes, called the Link class. Objects in the Link class contain an element field to store the element value, and a next field to store a pointer to the next node on the list. The list built from such nodes is called a singly linked list, or a one-way list, because each list node has a single pointer to the next node on the list.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/**
* Singly linked list node
*/
class Link<E> {
private E element; // Value for this node
private Link<E> next; // Pointer to next node in list

// Constructors
Link(E it, Link<E> nextval) {
element = it;
next = nextval;
}

Link(Link<E> nextval) {
next = nextval;
}

// Return next field
Link<E> next() {
return next;
}

// Set next field
Link<E> setNext(Link<E> nextval) {
return next = nextval;
} // Return element field

// Set element field
E element() {
return element;
}

E setElement(E it) {
return element = it;
}
}

The Link class is quite simple. There are two forms for its constructor, one with an initial element value and one without. Member functions allow the link user to get or set the element and link fields.

The following figure shows a graphical depiction for a linked list storing four integers. The value stored in a pointer variable is indicated by an arrow “pointing” to some- thing. Java uses the special symbol null for a pointer value that points nowhere, such as for the last list node’s next field. A null pointer is indicated graphically by a diagonal slash through a pointer variable’s box. The vertical line between the nodes labeled 23 and 12 in figure(a) indicates the current position (immediately to the right of this line).

The list’s first node is accessed from a pointer named head. To speed access to the end of the list, and to allow the append method to be performed in constant time, a pointer named tail is also kept to the last link of the list. The position of the current element is indicated by another pointer, named curr. Finally, because there is no simple way to compute the length of the list simply from these three pointers, the list length must be stored explicitly, and updated by every operation that modifies the list size. The value cnt stores the length of the list.

The following code shows the definition for the linked list class, named LList:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
/**
* Linked list implementation
*/
class LList<E> implements List<E> {
private Link<E> head; // Pointer to list header
private Link<E> tail; // Pointer to last element
protected Link<E> curr; // Access to current element
int cnt; // Size of list

/**
* Constructors
*/
LList(int size) {
this();
} // Constructor -- Ignore size

LList() {
curr = tail = head = new Link<E>(null); // Create header
cnt = 0;
}

/**
* Remove all elements
*/
public void clear() {
head.setNext(null); // Drop access to links
curr = tail = head = new Link<E>(null); // Create header
cnt = 0;
}

/**
* Insert "it" at current position
*/
public void insert(E it) {
curr.setNext(new Link<E>(it, curr.next()));
if (tail == curr) {
tail = curr.next(); // New tail
}
cnt++;
}

/**
* Append "it" to list
*/
public void append(E it) {
tail = tail.setNext(new Link<E>(it, null));
cnt++;
}

/**
* Remove and return current element
*/
public E remove() {
if (curr.next() == null) {
return null; // Nothing to remove
}
E it = curr.next().element(); // Remember value
if (tail == curr.next()) {
tail = curr; // Removed last
}
curr.setNext(curr.next().next()); // Remove from list
cnt--; // Decrement count
return it; // Return value
}

/**
* Set curr at list start
*/
public void moveToStart() {
curr = head;
}

/**
* Set curr at list end
*/
public void moveToEnd() {
curr = tail;
}

/**
* Move curr one step left; no change if now at front
*/
public void prev() {
if (curr == head) return; // No previous element
Link<E> temp = head;
// March down list until we find the previous element
while (temp.next() != curr) temp = temp.next();
curr = temp;
}

/**
* Move curr one step right; no change if now at end
*/
public void next() {
if (curr != tail) {
curr = curr.next();
}
}

/**
* @return List length
*/
public int length() {
return cnt;
}

/**
* @return The position of the current element
*/
public int currPos() {
Link<E> temp = head;
int i;
for (i = 0; curr != temp; i++)
temp = temp.next();
return i;
}

/**
* Move down list to "pos" position
*/
public void moveToPos(int pos) {
assert (pos >= 0) && (pos <= cnt) : "Position out of range";
curr = head;
for (int i = 0; i < pos; i++) curr = curr.next();
}

/**
* @return Current element value
*/
public E getValue() {
if (curr.next() == null) return null;
return curr.next().element();
}
// Extra stuff not printed in the book.

/**
* Generate a human-readable representation of this list's contents
* that looks something like this: < 1 2 3 | 4 5 6 >. The vertical
* bar represents the current location of the fence. This method
* uses toString() on the individual elements.
*
* @return The string representation of this list
*/
public String toString() {
// Save the current position of the list
int oldPos = currPos();
int length = length();
StringBuffer out = new StringBuffer((length() + 1) * 4);

moveToStart();
out.append("< ");
for (int i = 0; i < oldPos; i++) {
out.append(getValue());
out.append(" ");
next();
}
out.append("| ");
for (int i = oldPos; i < length; i++) {
out.append(getValue());
out.append(" ");
next();
}
out.append(">");
moveToPos(oldPos); // Reset the fence to its original position
return out.toString();
}
}

Comparison of List Implementations

Operation Array-based List Linked List
access data by position constant linear
search linear linear
insert an entry at the beginning linear constant
insert an entry at the end linear linear*
insert an entry (on a certain position) linear linear
delete first entry linear constant
delete entry i linear linear
concatenate two lists linear linear*