## Algorithm Design and Analysis

## Useful Resources

### *Click on a topic below for a list of resources*

Algorithms and Data Structures

Algorithmic Complexity Tutorial

Examples of Algorithms with Various Complexities

Algorithms and Data Structures E-book

Notes on Search

Notes on Sorting

Recurrence Relations Tutorial

Dynamic Programming

Graph Traversal: BFS and DFS

Search Algorithm Taxonomy

Binary Trees

B Trees

AVL Trees

Red/Black Trees

Hash Tables

Heaps

Understanding O(log(n))

Some Interesting Algorithms

BinarySearch

BubbleSort

HeapSort

MergeSort

K-Nearest Neighbors Algorithm

K-Nearest Neighbors in Python

Long Short Term Memory

TreeMap

TreeMap Examples

PageRank

Circle Generation

RSA Encryption

## Practice Questions

### *Click on the questions below to reveal the answer*

Define *algorithm* in your own words.

An algorithm is a recipe for achieving a goal.

*More technically:*

In mathematics and computer science, an algorithm is an effective method expressed as a finite list of
well-defined instructions for calculating a function.

What is the difference between *code* and *pseudocode?*

When using **pseudocode** the emphasis is on conveying meaning. Pseudocode is not concerned with syntax
specific to any programming language.

Code refers to syntactically correct code written for a specific programming language.

*For example:*

//This is C code:

printf("The number at position %d is %f", pos, num);

//This is equivalent pseudocode:

print pos and num

### [ Asymptotic Notation: Concepts ]

Write an **informal** definition of Big O

A theoretical measure of the execution of an algorithm, usually the time or memory needed, given the problem size n, which is usually the number of items. Informally, saying some equation f(n) = O(g(n)) means it is less than some constant multiple of g(n). The notation is read, "f of n is big oh of g of n".

Write a **formal** definition of Big O

f(n) = O(g(n)) means there are positive constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k. The values of c and k must be fixed for the function f and must not depend on n.

Write an informal definition of Big $\Omega$

A theoretical measure of the execution of an algorithm, usually the time or memory needed, given the problem size n, which is usually the number of items. Informally, saying some equation $f(n) = \Omega(g(n))$ means it is more than some constant multiple of g(n).

Write a formal definition of Big $\Omega$

$f(n) = \Omega(g(n))$ means there are positive constants c and k, such that $0 \le cg(n) \le f(n)$ for all $n \ge k$. The values of c and k must be fixed for the function f and must not depend on n.

Write an informal definition of Big $\Theta$

A theoretical measure of the execution of an algorithm, usually the time or memory needed, given the problem size n, which is usually the number of items. Informally, saying some equation $f(n) = \Theta(g(n))$ means it is within a constant multiple of g(n). The equation is read, "f of n is theta g of n".

Write a formal definition of Big $\Theta$

$f(n) = \Theta (g(n))$ means there are positive constants $c_1, c_2, k$, such that $0 \le c_1g(n) \le f(n) \le c_2g(n)$ for all $n \le k$$. The values of $c1, c2, k$ must be fixed for the function f and must not depend on n.

Which notation gives the best case runtime?

Which notation gives the worst case runtime?

Big $\Omega$ gives the best case runtime. An algorith in $\Omega(g)$ is bounded below by g.

Big O gives the worst case runtime. An algorith in O(g) is bounded above by g.

### [ Algorithm Analysis ]

What is the worst case runtime for the following code:

```
```

for (i = 1; i ≤ n*n; i++)

for (j = 0; j ≤ i; j++)

sum++;

` ````
for (i = 1; i ≤ n*n; i++) // Executed n*n times
for (j = 0; j ≤ i; j++) // Executed n*n times
sum++; // Constant
Running time: O(n*n*n*n*1) = O(n
```^{4})

What is the worst case runtime for the following code:

```
```

def power_of_2(a):

x = 0

while a > 1:

a = a/2

x = x+1

return x

` ````
def power_of_2(a):
x = 0 //constant
while a %gt; 1: //runs log(n) times
a = a/2 //constant
x = x+1 //constant
return x //constant
```

Given that the number passed is divided in half at each iteration of the loop, the loop will only run log(a) times.

Hence, the runtime is O(log(n)).

### [ Asymptotic Notation: Problems ]

Prove that $4n^2+6n+8 \in O(n^2)$

From the definition of $O(n^2)$, for some $c \gt 0$:
$$0 \le 4n^2+6n+8 \le cn^2$$
$$0 \le 4n^2 \le cn^2$$
$$0 \le n^2 \le cn^2$$
We are only concerned with asymptotic behavior, so we drop the lower order terms.

Then we divide by 4, and c/4 is still just some constant c.

Prove that $n^2+5n+23 \notin O(n)$

Assume that $n^2+5n+23 \in O(n)$. Then:
$$0 \le n^2+5n+23 \le cn$$
$$0 \le n^2 \le cn$$
$$0 \le n \le c$$
We are only concerned with asymptotic behavior, so we drop the lower order terms.

Then we can clearly see that there is no $c$ that satisfies $n \le c$.

Hence, $n^2+5n+23 \in O(n)$. QED

Prove that $77n^3-2n^2 \in \Theta(n^3)$

We need to show that $\exists c_1, c_2, n_0 \gt 0$ such that:
$$0 \le c_1n^3 \le 77n^3-2n^2 \le c_2n^3$$
$$n^{-3}(0 \le c_1n^3 \le 77n^3-2n^2 \le c_2n^3)$$
$$0 \le c_1 \le 77-2n^{-1} \le c_2$$

Try $n_0 = 1$. This yields $0 \le c_1 \le 75 \le c_2$ and any $c_1, c_2$ satisfying that inequality
will make the statement $0 \le c_1n^3 \le 77n^3-2n^2 \le c_2n^3$ valid.
Therefore, $77n^3-2n^2 \in \Theta(n^3)$. QED

### [ Asymptotic Notation: Proofs ]

Disprove $f \in O(g) \rightarrow f \in \Theta(g)$

Let $f(n) = n$ and $g(n) = n^2$. Then $f = O(g)$ but $f \ne \Omega(g)$. QED

For the general case (i.e., arbitrary $f$ and $g$), prove or disprove $f \notin \omega(g) \rightarrow f \in O(g)$.

Since $c_1g \nless f$ for $n \ge n_0$, then $f \le c_1g$ for $n \ge n_0$.

Then by the definition of Big O, $f \in O(g)$.

### [ Recurrence Relations ]

Write a recursive function to calculate the
Fibonacci sequence that runs in O(2^{N}) time.

` ````
int Fibonacci(int number)
{
if (number ≤ 1)
return number;
return Fibonacci(number - 2) + Fibonacci(number - 1);
}
```

What are recurrence relations?

What does T(0) represent?

What does T(n) represent?

Recurrence relations are used to determine the running time of recursive programs -
recurrence relations themselves are recursive.

T(0) = time to solve problem of size 0. It's the base case.

T(n) = time to solve problem of size n. It's the recursive case.

Solve $T(n) = 2T(n/2)+n$ using the substitution method.

We guess that $T(n) = O(nlog(n))$ since it is in the form of a divide-and-conquer recurrence.
We must show that $T(n) \le cnlog(n)$ for some $c\gt0$. We substitute and solve:
$$T(n) \le 2T(n/2)+n$$
$$T(n) \le 2T((cn/2)(log(n/2))+n$$
$$T(n) \le cnlog(n/2)+n$$
$$T(n) \le cnlog(n)-cnlog(2)+n$$
$$T(n) \le cnlog(n)-cn+n$$

$cnlog(n)$ is the dominating term asymptotically.

$\therefore T(n) \le cnlog(n)$, as long as $c\ge1$.

Show by substitution that $T(n) = 8T(n/5)+n \rightarrow T(n) \in O(n^2)$.

$$T(n) \le 8(cn^2/5)+n$$ $$T(n) \le cn^2+n$$ $$\therefore T(n) \in O(n^2)$$

### [ Dynamic Programming ]

What is the main idea behind dynamic programming.

If you have solved a problem with the given input, then save the result for future reference, and avoid solving the same problem again.

How can you tell an algorithm is a good candidate for dynamic programming?

What is the difference between dynamic programming and divide-and-conquer?

If the given problem can be broken up in to smaller sub-problems and these smaller subproblems are in
turn divided in to still-smaller ones, and in this process, if you observe some **OVERLAPPING**
subproblems, then its a good candidate for dynamic programming.

In divide-and-conquer algorithms we divide the problem in to **NON-OVERLAPPING** subproblems and
solve them independently, like in mergesort and quick sort.

What is the difference between top-down(Memoization) and bottom-up(Dynamic programming) approaches to solving a problem?

**Top-Down:** Start solving the given problem by breaking it down.
If you see that the problem has been solved already, then just return the saved answer.
If it has not been solved, solve it and save the answer.
This is usually easy to think of and very intuitive. This is referred to as **Memoization**.

Bottom-Up : Analyze the problem and see the order in which the sub-problems are solved and
start solving from the trivial subproblem, up towards the given problem.
In this process, it is guaranteed that the subproblems are solved before solving the problem.
This is referred to as Dynamic Programming.