Thursday, May 22, 2014

Algorithm Analysis: Finding Time Complexity [Note]

You add up how many machine instructions it will execute as a function of the size of its input, and then simplify the expression to the largest (when N is very large) term and can include any simplifying constant factor.
For example lets see how we simplify "2N+2" machine instructions to describe this as just "O(N)".:.
Why do we remove the two "2"s ?
We are interested in the performance of the algorithm as N becomes large.
Consider the two terms 2N and 2
What is the relative influence of these two terms as N becomes large?
Say N is a million.
Then the first term is 2 million and the second term is only 2.
For this reason we drop all but the largest terms for large N
So now we have gone from "2N + 2" to "2N".
Traditionally we are only interested in performance "up to constant factors".
This means that we don't really care if there is some constant multiple of difference in performance when N is large. The unit of 2N is not well-defined in the first place anyway. So we can multiply or divide by a constant factor to get to the simplest expression.
So "2N" becomes just "N".

The most common metric for calculating time complexity is Big O notation. This removes all constant factors so that the running time can be estimated in relation to N as N approaches infinity. In general you can think of it like this:
```
`statement;`

```
Is constant. The running time of the statement will not change in relation to N.
```
`for ( i = 0; i < N; i++ )`
`  statement;`

```
Is linear. The running time of the loop is directly proportional to N. When N doubles, so does the running time.
```
`for ( i = 0; i < N; i++ ) {`
`  for ( j = 0; j < N; j++ )`
`    statement;`
`}`

```
Is quadratic. The running time of the two loops is proportional to the square of N. When N doubles, the running time increases by N * N.
```
`while ( low <= high ) {`
`  mid = ( low + high ) / 2;`

`  if ( target < list[mid] )`
`    high = mid - 1;`
`  else if ( target > list[mid] )`
`    low = mid + 1;`
`  else break;`
`}`

```
Is logarithmic. The running time of the algorithm is proportional to the number of times N can be divided by 2. This is because the algorithm divides the working area in half with each iteration.
```
`void quicksort ( int list[], int left, int right )`
`{`
`  int pivot = partition ( list, left, right );`
`  quicksort ( list, left, pivot - 1 );`
`  quicksort ( list, pivot + 1, right );`
`}`

```
Is N * log ( N ). The running time consists of N loops (iterative or recursive) that are logarithmic, thus the algorithm is a combination of linear and logarithmic.
In general, doing something with every item in one dimension is linear, doing something with every item in two dimensions is quadratic, and dividing the working area in half is logarithmic. There are other Big O measures such as cubic, exponential, and square root, but they're not nearly as common. Big O notation is described as O ( <type> ) where <type> is the measure. The quicksort algorithm would be described as O ( N * log ( N ) ).
Note that none of this has taken into account best, average, and worst case measures. Each would have its own Big O notation. Also note that this is a VERY simplistic explanation. Big O is the most common, but it's also more complex that I've shown. There are also other notations such as big omega, little o, and big theta. You probably won't encounter them outside of an algorithm analysis course. ;)