Data Structures and C Programming: January 2010

Friday 29 January 2010

LIST ADT

LIST:
A list is a sequential data structure, ie. a collection of items accessible one after another beginning at the head and ending at the tail.

It is a widely used data structure for applications which do not need random access
Addition and removals can be made at any position in the list
lists are normally in the form of a₁,a₂,a₃.....a_n. The size of this list is n.The first element of the list is a₁,and the last element is a_n.The position of element a_i in a list is i.
List of size 0 is called as null list.

Basic Operations on a List

Creating a list
Traversing the list
Inserting an item in the list
Deleting an item from the list
Concatenating two lists into one

Implementation of List:

A list can be implemented in two ways

Array list
Linked list

1.Storing a list in a static data structure(Array List)

This implementation stores the list in an array.
The position of each element is given by an index from 0 to n-1, where n is the number of elements.
The element with the index can be accessed in constant time (ie) the time to access does not depend on the size of the list.
The time taken to add an element at the end of the list does not depend on the size of the list. But the time taken to add an element at any other point in the list depends on the size of the list because the subsequent elements must be shifted to next index value.So the additions near the start of the list take longer time than the additions near the middle or end.
Similarly when an element is removed,subsequent elements must be shifted to the previous index value. So removals near the start of the list take longer time than removals near the middle or end of the list.

Problems with Array implementation of lists:

Insertion and deletion are expensive. For example, inserting at position 0 (a new first element) requires first pushing the entire array down one spot to make room, whereas deleting the first element requires shifting all the elements in the list up one, so the worst case of these operations is O(n).
Even if the array is dynamically allocated, an estimate of the maximum size of the list is required. Usually this requires a high over-estimate, which wastes considerable space. This could be a serious limitation, if there are many lists of unknown size.
Simple arrays are generally not used to implement lists. Because the running time for insertion and deletion is so slow and the list size must be known in advance

2. Storing a list in a dynamic data structure(Linked List)

The Linked List is stored as a sequence of linked nodes which are not necessarily adjacent in memory.
Each node in a linked list contains data and a reference to the next node
The list can grow and shrink in size during execution of a program.
The list can be made just as long as required.It does not waste memory space because successive elements are connected by pointers.
The position of each element is given by an index from 0 to n-1, where n is the number of elements.
The time taken to access an element with an index depends on the index because each element of the list must be traversed until the required index is found.
The time taken to add an element at any point in the list does not depend on the size of the list,as no shifts are required
Additions and deletion near the end of the list take longer than additions near the middle or start of the list. because the list must be traversed until the required index is found

Array versus Linked Lists

1.Arrays are suitable for
- Randomly accessing any element.
- Searching the list for a particular value
- Inserting or deleting an element at the end.
2.Linked lists are suitable for
-Inserting/Deleting an element.
-Applications where sequential access is required.
-In situations where the number of elements cannot be predicted beforehand.

Thursday 28 January 2010

Abstract Data Type(ADT)

Abstract Data Type(ADT):
An Abstract Data Type is a set of operations. Abstract data types are mathematical abstractions. Objects such as lists, sets, and graphs, along with their operations, can be viewed as abstract data types, just as integers, reals, and booleans are data types. Integers, reals, and booleans have operations associated with them, and so do abstract data types. For the set ADT, we might have such operations as union, intersection, size, and complement.
The basic idea is that the implementation of these operations is written once in the program, and any other part of the program that needs to perform an operation on the ADT can do so by calling the appropriate function. If for some reason implementation details need to change, it should be easy to do so by merely changing the routines that perform the ADT operations. This change would be completely transparent to the rest of the program.
There is no rule telling us which operations must be supported for each ADT; this is a design decision. Error handling and tie breaking (where appropriate) are also generally up to the program designer.

Monday 25 January 2010

Algorithm Analysis

Algorithm:
Algorithm is defined as a step by step procedure for solving a problem in a finite amount of time.
An algorithm is said to be more efficient if it has small running time and it occupies less memory. The running time of an algorithm typically grows with the input size.The most common metric for calculating time complexity is Big O notation.

Big O Notation:

Big O notation is described as O (type) where type is the measure.
Big O notation usually describes the worst case running time of an algorithm. The Big O notation removes all constant factors. so that the running time can be estimated in relation to N.

1.O(1)-Constant time
O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set .This means that the algorithm requires the same fixed number of steps regardless of the size of the task.
In algorithm,
statement;
Is constant. The running time of the statement will not change in relation to N.
eg:
bool IsFirstElementNull(String[] strings)
{
if(strings[0] == null)
{
return true;
}
return false;
}

Examples
A. Push and Pop operations for a stack (containing n elements);
B. Insert and Remove operations for a queue.

2.O(N) - linear time

O(N) describes an algorithm whose performance will grow linearly and in direct proportion to the size of the input data set.
for ( i = 0; i < N; i++ )
statement;

The running time of the loop is directly proportional to N.Big O notation will always assume the upper limit where the algorithm will perform the maximum number of iterations.

Examples
A. Traversal of a list (a linked list or an array) with n elements;
B. Finding the maximum or minimum element in a list, or sequential search in an unsorted list of n elements;
C. Traversal of a tree with n nodes;
D. Calculating iteratively n-factorial; finding iteratively the nth Fibonacci number.

3.O(n²) - quadratic time
O(n²) represents an algorithm whose performance is directly proportional to the square of the size of the input data set.
for ( i = 0; i < N; i++ ) {
for ( j = 0; j < N; j++ )
statement;
}
The running time of the two loops is proportional to the square of N. When N doubles, the running time increases by N * N.
Examples:
A. Some more simplistic sorting algorithms, for instance a selection sort of n elements;
B. Comparing two two-dimensional arrays of size n by n;
C. Finding duplicates in an unsorted list of n elements (implemented with two nested loops).

4.O(log n) - logarithmic time

Binary search is a technique used to search sorted data sets. It works by selecting the middle element of the data set, essentially the median, and compares it against a target value. If the values match it will return success. If the target value is higher than the value of the probe element it will take the upper half of the data set and perform the same operation against it. Likewise, if the target value is lower than the value of the probe element it will perform the operation against the lower half. It will continue to halve the data set with each iteration until the value has been found or until it can no longer split the data set.

while ( low <= high ) {
mid = ( low + high ) / 2;

if ( target < list[mid] )
high = mid - 1;
else if ( target > list[mid] )
low = mid + 1;
else break;
}

The running time of the algorithm is proportional to the number of times N can be divided by 2. This is because the algorithm divides the working area in half with each iteration.
Examples:
A. Binary search in a sorted list of n elements;
B. Insert and Find operations for a binary search tree with n nodes;
C. Insert and Remove operations for a heap with n nodes.

5.O(n log n) - "n log n " time

void quicksort ( int list[], int left, int right )
{
int pivot = partition ( list, left, right );
quicksort ( list, left, pivot - 1 );
quicksort ( list, pivot + 1, right );
}

The running time consists of N loops (iterative or recursive) that are logarithmic, thus the algorithm is a combination of linear and logarithmic.

Examples:
A. More advanced sorting algorithms - quicksort, mergesort

6.O(an) (a > 1) - exponential time
if a=2,O(2n) denotes an algorithm whose growth will double with each additional element in the input data set. The execution time of an O(2n) function will quickly become very large.

Examples:
A. Recursive Fibonacci implementation
B. Towers of Hanoi
C. Generating all permutations of n symbols

Order of asymptotic behavior of the functions

O(l) < O(log n) < O(n) < O(n log n) < O(n2) < O(n3) < O(an)

Big-O when a function is the sum of several terms:

If a function (which describes the order of growth of an algorithm) is a sum of several terms, its order of growth is determined by the fastest growing term. In particular, if we have a polynomial
p(n) = aknk + ak-1nk-1 + … + a1n + a0
its growth is of the order nk:
p(n) = O(nk)

In general, doing something with every item in one dimension is linear, doing something with every item in two dimensions is quadratic, and dividing the working area in half is logarithmic. There are other Big O measures such as cubic, exponential, and square root, but they're not nearly as common.The best time in the above list is obviously constant time, and the worst is exponential time which, as we have seen, quickly
overwhelms even the fastest computers even for relatively small n. Polynomial growth (linear, quadratic, cubic, etc.) is considered manageable as compared to exponential growth.

Pages