Victor's CS Site

Algorithm Design and Analysis

Useful Resources

Click on a topic below for a list of resources

Algorithms and Data Structures

Dictionary of Algorithms and Data Structures
Algorithmic Complexity Tutorial
Examples of Algorithms with Various Complexities
Algorithms and Data Structures E-book
Notes on Search
Notes on Sorting
Recurrence Relations Tutorial
Dynamic Programming
Graph Traversal: BFS and DFS
Search Algorithm Taxonomy
Binary Trees
B Trees
AVL Trees
Red/Black Trees
Hash Tables
Heaps
Understanding O(log(n))

Some Interesting Algorithms

BFS and DFS
BinarySearch
BubbleSort
HeapSort
MergeSort
K-Nearest Neighbors Algorithm
K-Nearest Neighbors in Python
Long Short Term Memory
TreeMap
TreeMap Examples
PageRank
Circle Generation
RSA Encryption

Practice Questions

Click on the questions below to reveal the answer

Define algorithm in your own words.

An algorithm is a recipe for achieving a goal.

More technically:

In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function.

What is the difference between code and pseudocode?

When using pseudocode the emphasis is on conveying meaning. Pseudocode is not concerned with syntax specific to any programming language.

Code refers to syntactically correct code written for a specific programming language.

For example:

//This is C code:
printf("The number at position %d is %f", pos, num);

//This is equivalent pseudocode:
print pos and num

[ Asymptotic Notation: Concepts ]

Write an informal definition of Big O

A theoretical measure of the execution of an algorithm, usually the time or memory needed, given the problem size n, which is usually the number of items. Informally, saying some equation f(n) = O(g(n)) means it is less than some constant multiple of g(n). The notation is read, "f of n is big oh of g of n".

Write a formal definition of Big O

f(n) = O(g(n)) means there are positive constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k. The values of c and k must be fixed for the function f and must not depend on n.

Write an informal definition of Big $\Omega$

A theoretical measure of the execution of an algorithm, usually the time or memory needed, given the problem size n, which is usually the number of items. Informally, saying some equation $f(n) = \Omega(g(n))$ means it is more than some constant multiple of g(n).

Write a formal definition of Big $\Omega$

$f(n) = \Omega(g(n))$ means there are positive constants c and k, such that $0 \le cg(n) \le f(n)$ for all $n \ge k$. The values of c and k must be fixed for the function f and must not depend on n.

Write an informal definition of Big $\Theta$

A theoretical measure of the execution of an algorithm, usually the time or memory needed, given the problem size n, which is usually the number of items. Informally, saying some equation $f(n) = \Theta(g(n))$ means it is within a constant multiple of g(n). The equation is read, "f of n is theta g of n".

Write a formal definition of Big $\Theta$

$f(n) = \Theta (g(n))$ means there are positive constants $c_1, c_2, k$, such that $0 \le c_1g(n) \le f(n) \le c_2g(n)$ for all $n \le k$$. The values of $c1, c2, k$ must be fixed for the function f and must not depend on n.

Which notation gives the best case runtime?

Which notation gives the worst case runtime?

Big $\Omega$ gives the best case runtime. An algorith in $\Omega(g)$ is bounded below by g.

Big O gives the worst case runtime. An algorith in O(g) is bounded above by g.

[ Algorithm Analysis ]

What is the worst case runtime for the following code:
for (i = 1; i ≤ n*n; i++) for (j = 0; j ≤ i; j++) sum++;

		
      for (i = 1; i ≤ n*n; i++)   // Executed n*n times
         for (j = 0; j ≤ i; j++)  // Executed n*n times
             sum++;                   // Constant

      Running time: O(n*n*n*n*1) = O(n⁴)

What is the worst case runtime for the following code:
def power_of_2(a): x = 0 while a > 1: a = a/2 x = x+1 return x

		
      def power_of_2(a):
          x = 0            //constant
      while a %gt; 1:     //runs log(n) times
          a = a/2        //constant
          x = x+1       //constant
      return x         //constant

Given that the number passed is divided in half at each iteration of the loop, the loop will only run log(a) times.

Hence, the runtime is O(log(n)).

[ Asymptotic Notation: Problems ]

Prove that $4n^2+6n+8 \in O(n^2)$

From the definition of $O(n^2)$, for some $c \gt 0$: $$0 \le 4n^2+6n+8 \le cn^2$$ $$0 \le 4n^2 \le cn^2$$ $$0 \le n^2 \le cn^2$$ We are only concerned with asymptotic behavior, so we drop the lower order terms.

Then we divide by 4, and c/4 is still just some constant c.

Prove that $n^2+5n+23 \notin O(n)$

Assume that $n^2+5n+23 \in O(n)$. Then: $$0 \le n^2+5n+23 \le cn$$ $$0 \le n^2 \le cn$$ $$0 \le n \le c$$ We are only concerned with asymptotic behavior, so we drop the lower order terms.

Then we can clearly see that there is no $c$ that satisfies $n \le c$.
Hence, $n^2+5n+23 \in O(n)$. QED

Prove that $77n^3-2n^2 \in \Theta(n^3)$

We need to show that $\exists c_1, c_2, n_0 \gt 0$ such that: $$0 \le c_1n^3 \le 77n^3-2n^2 \le c_2n^3$$ $$n^{-3}(0 \le c_1n^3 \le 77n^3-2n^2 \le c_2n^3)$$ $$0 \le c_1 \le 77-2n^{-1} \le c_2$$
Try $n_0 = 1$. This yields $0 \le c_1 \le 75 \le c_2$ and any $c_1, c_2$ satisfying that inequality will make the statement $0 \le c_1n^3 \le 77n^3-2n^2 \le c_2n^3$ valid. Therefore, $77n^3-2n^2 \in \Theta(n^3)$. QED

[ Asymptotic Notation: Proofs ]

Disprove $f \in O(g) \rightarrow f \in \Theta(g)$

Let $f(n) = n$ and $g(n) = n^2$. Then $f = O(g)$ but $f \ne \Omega(g)$. QED

For the general case (i.e., arbitrary $f$ and $g$), prove or disprove $f \notin \omega(g) \rightarrow f \in O(g)$.

Since $c_1g \nless f$ for $n \ge n_0$, then $f \le c_1g$ for $n \ge n_0$.

Then by the definition of Big O, $f \in O(g)$.

[ Recurrence Relations ]

Write a recursive function to calculate the Fibonacci sequence that runs in O(2^N) time.

	
   int Fibonacci(int number)
   {
      if (number ≤ 1)
         return number;

      return Fibonacci(number - 2) + Fibonacci(number - 1);
   }

What are recurrence relations?
What does T(0) represent?
What does T(n) represent?

Recurrence relations are used to determine the running time of recursive programs - recurrence relations themselves are recursive.

T(0) = time to solve problem of size 0. It's the base case.
T(n) = time to solve problem of size n. It's the recursive case.

Solve $T(n) = 2T(n/2)+n$ using the substitution method.

We guess that $T(n) = O(nlog(n))$ since it is in the form of a divide-and-conquer recurrence. We must show that $T(n) \le cnlog(n)$ for some $c\gt0$. We substitute and solve: $$T(n) \le 2T(n/2)+n$$ $$T(n) \le 2T((cn/2)(log(n/2))+n$$ $$T(n) \le cnlog(n/2)+n$$ $$T(n) \le cnlog(n)-cnlog(2)+n$$ $$T(n) \le cnlog(n)-cn+n$$

$cnlog(n)$ is the dominating term asymptotically.
$\therefore T(n) \le cnlog(n)$, as long as $c\ge1$.

Show by substitution that $T(n) = 8T(n/5)+n \rightarrow T(n) \in O(n^2)$.

$$T(n) \le 8(cn^2/5)+n$$ $$T(n) \le cn^2+n$$ $$\therefore T(n) \in O(n^2)$$

[ Dynamic Programming ]

What is the main idea behind dynamic programming.

If you have solved a problem with the given input, then save the result for future reference, and avoid solving the same problem again.

How can you tell an algorithm is a good candidate for dynamic programming?

What is the difference between dynamic programming and divide-and-conquer?

If the given problem can be broken up in to smaller sub-problems and these smaller subproblems are in turn divided in to still-smaller ones, and in this process, if you observe some OVERLAPPING subproblems, then its a good candidate for dynamic programming.

In divide-and-conquer algorithms we divide the problem in to NON-OVERLAPPING subproblems and solve them independently, like in mergesort and quick sort.

What is the difference between top-down(Memoization) and bottom-up(Dynamic programming) approaches to solving a problem?

Top-Down: Start solving the given problem by breaking it down. If you see that the problem has been solved already, then just return the saved answer. If it has not been solved, solve it and save the answer. This is usually easy to think of and very intuitive. This is referred to as Memoization.

Bottom-Up : Analyze the problem and see the order in which the sub-problems are solved and start solving from the trivial subproblem, up towards the given problem. In this process, it is guaranteed that the subproblems are solved before solving the problem. This is referred to as Dynamic Programming.