CE100 Algorithms and Programming II

CE100 Algorithms and Programming II

Week-3 (Matrix Multiplication/ Quick Sort)

Spring Semester, 2021-2022

Download DOC-PDF, DOC-DOCX, SLIDE, PPTX

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication / Quick Sort

Outline (1)

  • Matrix Multiplication

    • Traditional

    • Recursive

    • Strassen

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Outline (2)

  • Quicksort

    • Hoare Partitioning

    • Lomuto Partitioning

    • Recursive Sorting

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Outline (3)

  • Quicksort Analysis

    • Randomized Quicksort

    • Randomized Selection

      • Recursive

      • Medians

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication (1)

  • Input: A=[aij],B=[bij]A=[a_{ij}],B=[b_{ij}]
  • Output: C=[cij]=ABC=[c_{ij}]=A \cdot B i,j=1,2,3,,n\Longrightarrow i,j=1,2,3, \dots, n

[c11c12c1nc21c22c2ncn1cn2cnn]=[a11a12a1na21a22a2nan1an2ann][b11b12b1nb21b22b2nbn1an2bnn]\begin{bmatrix} c_{11} & c_{12} & \dots & c_{1n} \\ c_{21} & c_{22} & \dots & c_{2n} \\ \vdots & \vdots & \vdots & \ddots \\ c_{n1} & c_{n2} & \dots & c_{nn} \\ \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \vdots & \ddots \\ a_{n1} & a_{n2} & \dots & a_{nn} \\ \end{bmatrix} \cdot \begin{bmatrix} b_{11} & b_{12} & \dots & b_{1n} \\ b_{21} & b_{22} & \dots & b_{2n} \\ \vdots & \vdots & \vdots & \ddots \\ b_{n1} & a_{n2} & \dots & b_{nn} \\ \end{bmatrix}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication (2)

alt:"alt" center

  • cij=1knaik.bkjc_{ij}=\sum \limits_{1\leq k \leq n}^{}a_{ik}.b_{kj}
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication: Standard Algorithm

Running Time: Θ(n3)\Theta(n^3)

for i=1 to n do
    for j=1 to n do
        C[i,j] = 0
        for k=1 to n do
            C[i,j] = C[i,j] + A[i,k] + B[k,j]
        endfor
    endfor
endfor
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication: Divide & Conquer (1)

IDEA: Divide the nxnnxn matrix into 2x22x2 matrix of (n/2)x(n/2)(n/2)x(n/2) submatrices.

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication: Divide & Conquer (2)

[c11c12c21c22]=[a11a12a21a22][b11b12b21b22]\begin{bmatrix} c_{11} & c_{12} \\ c_{21} & c_{22} \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \cdot \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{bmatrix}

8 mults and 4 adds of (n/2)*(n/2) submatrices={c11=a11b11+a12b21c21=a21b11+a22b21c12=a11b12+a12b22c22=a21b12+a22b22\text{8 mults and 4 adds of (n/2)*(n/2) submatrices}= \begin{cases} c_{11}=a_{11}b_{11}+a_{12}b_{21} \\ c_{21}=a_{21}b_{11}+a_{22}b_{21} \\ c_{12}=a_{11}b_{12}+a_{12}b_{22} \\ c_{22}=a_{21}b_{12}+a_{22}b_{22} \end{cases}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication: Divide & Conquer (3)

MATRIX-MULTIPLY(A, B)
    // Assuming that both A and B are nxn matrices
    if n == 1 then 
        return A * B
    else  
        //partition A, B, and C as shown before
        C[1,1] = MATRIX-MULTIPLY (A[1,1], B[1,1]) + 
                 MATRIX-MULTIPLY (A[1,2], B[2,1]); 

        C[1,2] = MATRIX-MULTIPLY (A[1,1], B[1,2]) + 
                MATRIX-MULTIPLY (A[1,2], B[2,2]); 

        C[2,1] = MATRIX-MULTIPLY (A[2,1], B[1,1]) + 
        MATRIX-MULTIPLY (A[2,2], B[2,1]);

        C[2,2] = MATRIX-MULTIPLY (A[2,1], B[1,2]) + 
        MATRIX-MULTIPLY (A[2,2], B[2,2]);
    endif      

    return C
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication: Divide & Conquer Analysis

T(n)=8T(n/2)+Θ(n2)T(n) = 8T(n/2) + \Theta(n^2)

  • 88 recursive calls 8T()\Longrightarrow 8T(\cdots)
  • each problem has size n/2n/2 T(n/2)\Longrightarrow \cdots T(n/2)
  • Submatrix addition Θ(n2)\Longrightarrow \Theta(n^2)
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication: Solving the Recurrence

  • T(n)=8T(n/2)+Θ(n2)T(n) = 8T(n/2) + \Theta(n^2)

    • a=8a=8, b=2b=2
    • f(n)=Θ(n2)f(n)=\Theta(n^2)
    • nlogba=n3n^{log_b^a}=n^3
  • Case 1: nlogbaf(n)=Ω(nε)T(n)=Θ(nlogba)\frac{n^{log_b^a}}{f(n)}=\Omega(n^{\varepsilon}) \Longrightarrow T(n)=\Theta(n^{log_b^a})

Similar with ordinary (iterative) algorithm.

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication: Strassen’s Idea (1)

Compute c11,c12,c21,c22c_{11},c_{12},c_{21},c_{22} using 77 recursive multiplications.

In normal case we need 88 as below.

[c11c12c21c22]=[a11a12a21a22][b11b12b21b22]\begin{bmatrix} c_{11} & c_{12} \\ c_{21} & c_{22} \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \cdot \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{bmatrix}

8 mults and 4 adds of (n/2)*(n/2) submatrices={c11=a11b11+a12b21c21=a21b11+a22b21c12=a11b12+a12b22c22=a21b12+a22b22\text{8 mults and 4 adds of (n/2)*(n/2) submatrices}= \begin{cases} c_{11}=a_{11}b_{11}+a_{12}b_{21} \\ c_{21}=a_{21}b_{11}+a_{22}b_{21} \\ c_{12}=a_{11}b_{12}+a_{12}b_{22} \\ c_{22}=a_{21}b_{12}+a_{22}b_{22} \end{cases}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication: Strassen’s Idea (2)

  • Reminder:
    • Each submatrix is of size (n/2)(n/2)(n/2)*(n/2)
    • Each add/sub operation takes Θ(n2)\Theta(n^2) time
  • Compute P1P7P1 \dots P7 using 77 recursive calls to matrix-multiply

P1=a11(b12b22)P2=(a11+a12)b22P3=(a21+a22)b11P4=a22(b21b11)P5=(a11+a22)(b11+b22)P6=(a12a22)(b21+b22)P7=(a11a21)(b11+b12)\begin{align*} P_1 & = a_{11} * (b_{12} - b_{22} ) \\ P_2 & = (a_{11} + a_{12} ) * b_{22} \\ P_3 & = (a_{21} + a_{22} ) * b_{11} \\ P_4 & = a_{22} * (b_{21} - b_{11} ) \\ P_5 & = (a_{11} + a_{22} ) * (b_{11} + b_{22} ) \\ P_6 & = (a_{12} - a_{22} ) * (b_{21} + b_{22} ) \\ P_7 & = ( a_{11} - a_{21} ) * (b_{11} + b_{12} ) \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication: Strassen’s Idea (3)

P1=a11(b12b22)P2=(a11+a12)b22P3=(a21+a22)b11P4=a22(b21b11)P5=(a11+a22)(b11+b22)P6=(a12a22)(b21+b22)P7=(a11a21)(b11+b12)\begin{align*} P_1 &= a_{11} * (b_{12} - b_{22} ) \\ P_2 &= (a_{11} + a_{12} ) * b_{22} \\ P_3 &= (a_{21} + a_{22} ) * b_{11} \\ P_4 &= a_{22} * (b_{21} - b_{11} ) \\ P_5 &= (a_{11} + a_{22} ) * (b_{11} + b_{22} ) \\ P_6 &= (a_{12} - a_{22} ) * (b_{21} + b_{22} ) \\ P_7 &= ( a_{11} - a_{21} ) * (b_{11} + b_{12} ) \end{align*}

  • How to compute cijc_{ij} using P1P7P1 \dots P7 ?

c11=P5+P4P2+P6c12=P1+P2c21=P3+P4c22=P5+P1P3P7\begin{align*} c_{11} & = P_5 + P_4 – P_2 + P_6 \\ c_{12} & = P_1 + P_2 \\ c_{21} & = P_3 + P_4 \\ c_{22} & = P_5 + P_1 – P_3 – P_7 \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication: Strassen’s Idea (4)

  • 77 recursive multiply calls
  • 1818 add/sub operations
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication: Strassen’s Idea (5)

e.g. Show that c12=P1+P2c_{12} = P_1+P_2 :

c12=P1+P2=a11(b12b22)+(a11+a12)b22=a11b12a11b22+a11b22+a12b22=a11b12+a12b22\begin{align*} c_{12} & = P_1 + P_2 \\ &= a_{11}(b_{12}–b_{22})+(a_{11}+a_{12})b_{22} \\ &= a_{11}b_{12}-a_{11}b_{22}+a_{11}b_{22}+a_{12}b_{22} \\ &= a_{11}b_{12}+a_{12}b_{22} \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Strassen’s Algorithm

  • Divide: Partition AA and BB into (n/2)(n/2)(n/2)*(n/2) submatrices. Form terms to be multiplied using ++ and -.

  • Conquer: Perform 77 multiplications of (n/2)(n/2)(n/2)*(n/2) submatrices recursively.

  • Combine: Form CC using ++ and on (n/2)(n/2)(n/2)*(n/2)submatrices.

Recurrence: T(n)=7T(n/2)+Θ(n2)T(n) = 7T(n/2) + \Theta(n^2)

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Strassen’s Algorithm: Solving the Recurrence (1)

  • T(n)=7T(n/2)+Θ(n2)T(n) = 7T(n/2) + \Theta(n^2)

    • a=7a=7, b=2b=2
    • f(n)=Θ(n2)f(n)=\Theta(n^2)
    • nlogba=nlg7n^{log_b^a}=n^{lg7}
  • Case 1: nlogbaf(n)=Ω(nε)T(n)=Θ(nlogba)\frac{n^{log_b^a}}{f(n)}=\Omega(n^{\varepsilon}) \Longrightarrow T(n)=\Theta(n^{log_b^a})

T(n)=Θ(nlog27)T(n)=\Theta(n^{log_2^7})

23=8,22=42^3 = 8, 2^2=4 so log272.81\Longrightarrow log_2^7 \approx 2.81

or use https://www.omnicalculator.com/math/log

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Strassen’s Algorithm: Solving the Recurrence (2)

  • The number 2.812.81 may not seem much smaller than 33

  • But, it is significant because the difference is in the exponent.

  • Strassen’s algorithm beats the ordinary algorithm on today’s machines for n30n \geq 30 or so.

  • Best to date: Θ(n2.376)\Theta(n^{2.376 \dots}) (of theoretical interest only)

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication Solution Faster Than Strassen's Algorithm

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication Solution Faster Than Strassen's Algorithm

For example, if the traditional algorithm taught in school multiplies a 4x5 by 5x5 matrix using 100 multiplications, and this number was reduced to 80 with human ingenuity, AlphaTensor has found algorithms that do the same operation using just 76 multiplications.

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Matrix Multiplication Solution Faster Than Strassen's Algorithm

![center h:450px]

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Standard Multiplication

center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Standard and Strassen Comparison

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Improved Solution

RTEU CE100 Week-3
CE100 Algorithms and Programming II

How it's done

"Single-player game played by AlphaTensor, where the goal is to find a correct matrix multiplication algorithm. The state of the game is a cubic array of numbers (shown as grey for 0, blue for 1, and green for -1), representing the remaining work to be done."

center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Maximum Subarray Problem

Input: An array of values

Output: The contiguous subarray that has the largest sum of elements

  • Input array:
    [13][3][25][20][3][16][23][18][20][7][12]max. contiguous subarray[22][4][7][13][-3][-25][20][-3][-16][-23]\overbrace{[18][20][-7][12]}^{\textrm{max. contiguous subarray}}[-22][-4][7]
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Maximum Subarray Problem: Divide & Conquer (1)

  • Basic idea:
    • Divide the input array into 2 from the middle
    • Pick the best solution among the following:
      • The max subarray of the left half
      • The max subarray of the right half
      • The max subarray crossing the mid-point
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Maximum Subarray Problem: Divide & Conquer (2)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Maximum Subarray Problem: Divide & Conquer (3)

  • Divide: Trivial (divide the array from the middle)

  • Conquer: Recursively compute the max subarrays of the left and right halves

  • Combine: Compute the max-subarray crossing the midpointmid-point

    • (can be done in Θ(n)\Theta(n) time).
    • Return the max among the following:
      • the max subarray of the left-subarray\text{left-subarray}
      • the max subarray of the rightsubarray\text{rightsubarray}
      • the max subarray crossing the mid-point\text{mid-point}

TODO : detailed solution in textbook...

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Conclusion : Divide & Conquer

  • Divide and conquer is just one of several powerful techniques for algorithm design.
  • Divide-and-conquer algorithms can be analyzed using recurrences and the master method (so practice this math).
  • Can lead to more efficient algorithms
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Quicksort (1)

  • One of the most-used algorithms in practice
  • Proposed by C.A.R. Hoare in 1962.
  • Divide-and-conquer algorithm
  • In-place algorithm
    • The additional space needed is O(1)
    • The sorted array is returned in the input array
    • Reminder: Insertion-sort is also an in-place algorithm, but Merge-Sort is not in-place.
  • Very practical
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Quicksort (2)

  • Divide: Partition the array into 2 subarrays such that elements in the lower part \leq elements in the higher part
  • Conquer: Recursively sort 2 subarrays
  • Combine: Trivial (because in-place)

Key: Linear-time (Θ(n))(\Theta(n)) partitioning algorithm

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Divide: Partition the array around a pivot element

  • Choose a pivot element xx

  • Rearrange the array such that:

    • Left subarray: All elements x\leq x
    • Right subarray: All elements x\geq x

    alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Conquer: Recursively Sort the Subarrays

Note: Everything in the left subarray ≤ everything in the right subarray

alt:"alt" center

Note: Combine is trivial after conquer. Array already sorted.

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Two partitioning algorithms

  • Hoare’s algorithm:
    Partitions around the first element of subarray
    • (pivot=x=A[p])(pivot = x = A[p])

alt:"alt" center

  • Lomuto’s algorithm:
    Partitions around the last element of subarray
    • (pivot=x=A[r])(pivot = x = A[r])

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm (1)

  • Choose a pivot element: pivot=x=A[p]pivot = x = A[p]

alt:"alt" center

  • Grow two regions:
    • from left to right: A[pi]A[p \dots i]
    • from right to left: A[jr]A[j \dots r]
      • such that:
    • every element in A[pi]A[p \dots i] \leq pivot
    • every element in A[pi]A[p \dots i] \geq pivot
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm (2)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm (3)

  • Elements are exchanged when
    • A[i]A[i] is too large to belong to the left region
    • A[j]A[j] is too small to belong to the right region
      • assuming that the inequality is strict
  • The two regions A[pi]A[p \dots i] and A[jr]A[j \dots r] grow until A[i]pivotA[j]A[i] \geq pivot \geq A[j]
  H-PARTITION(A, p, r)
        pivot = A[p]
          i = p - 1
          j = r - 1
          while true do
            repeat j = j - 1 until A[j] <= pivot
            repeat i = i - 1 until A[i] <= pivot
          if i < j then 
            exchange A[i] with A[j]
          else 
            return j
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm Example (Step-1)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm Example (Step-2)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm Example (Step-3)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm Example (Step-4)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm Example (Step-5)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm Example (Step-6)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm Example (Step-7)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm Example (Step-8)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm Example (Step-9)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm Example (Step-10)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm Example (Step-11)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm Example (Step-12)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm - Notes

  • Elements are exchanged when
    • A[i]A[i] is too large to belong to the left region
    • A[j]A[j] is too small to belong to the right region
      • assuming that the inequality is strict
  • The two regions A[pi]A[p \dots i] and A[jr]A[j \dots r] grow until A[i]pivotA[j]A[i] \geq pivot \geq A[j]
  • The asymptotic runtime of Hoare’s partitioning algorithm Θ(n)\Theta(n)
  H-PARTITION(A, p, r)
        pivot = A[p]
          i = p - 1
          j = r - 1
          while true do
            repeat j = j - 1 until A[j] <= pivot
            repeat i = i - 1 until A[i] <= pivot
          if i < j then exchange A[i] with A[j]
          else return j
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Quicksort with Hoare’s Partitioning Algorithm

QUICKSORT (A, p, r)
  if p < r then
    q = H-PARTITION(A, p, r)
    QUICKSORT(A, p, q)
    QUICKSORT(A, q + 1, r)
  endif

Initial invocation: QUICKSORT(A,1,n)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Hoare’s Partitioning Algorithm: Pivot Selection

  • if we select pivot to be A[r]A[r] instead of A[p]A[p] in H-PARTITION

alt:"alt" center

alt:"alt" center

  • Consider the example where A[r]A[r] is the largest element in the array:
    • End of H-PARTITION: i=j=ri = j = r
    • In QUICKSORT: q=rq = r
      • So, recursive call to:
        • QUICKSORT(A, p, q=r)
          • infinite loop
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Correctness of Hoare’s Algorithm (1)

We need to prove 33 claims to show correctness:

  • Indices ii and jj never reference AA outside the interval A[pr]A[p \dots r]
  • Split is always non-trivial; i.e., jrj \neq r at termination
  • Every element in A[pj]A[p \dots j] \leq every element in A[j+1r]A[j+1 \dots r] at termination

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Correctness of Hoare’s Algorithm (2)

  • Notations:

    • kk: #\# of times the while-loop iterates until termination
    • imi_m: the value of index i at the end of iteration mm
    • jmj_m: the value of index j at the end of iteration mm
    • xx: the value of the pivot element
  • Note: We always have i1=pi_1= p and pj1rp \leq j_1 \leq r
    because x=A[p]x = A[p]

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Correctness of Hoare’s Algorithm (3)

Lemma 1: Either ik=jki_k = j_k or ik=jk+1i_k = j_k+1 at termination

Proof of Lemma 1:

  • The algorithm terminates when iji \geq j (the else condition).

  • So, it is sufficient to prove that ikjk1i_k – j_k \leq 1

  • There are 22 cases to consider:

    • Case 1: k=1k = 1, i.e. the algorithm terminates in a single iteration
    • Case 2: k>1k > 1, i.e. the alg. does not terminate in a single iter.

    By contradiction, assume there is a run with ikjk>1i_k – j_k > 1

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Correctness of Hoare’s Algorithm (4)

Original correctness claims:

  • Indices ii and jj never reference A outside the interval A[pr]A[p \dots r]
  • Split is always non-trivial; i.e., jrj \neq r at termination

Proof:

  • For k=1k = 1:
    • Trivial because i1=j1=pi_1 = j_1 = p (see Case 1 in proof of Lemma 2)
  • For k>1k > 1:
    • ik>pi_k > p and jk<rj_k < r (due to the repeat-until loops moving indices)
    • ikri_k \leq r and jkpj_k \geq p (due to Lemma 1 and the statement above)

The proof of claims (a) and (b) complete

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Correctness of Hoare’s Algorithm (5)

Lemma 2: At the end of iteration mm, where m<km<k (i.e. m is not the last iteration), we must have:
A[pim]xA[p \dots i_m] \leq x and A[jmr]xA[j_m \dots r] \geq x

Proof of Lemma 2:

  • Base case: m=1m=1 and k>1k>1 (i.e. the alg. does not terminate in the first iter.)

Ind. Hyp.: At the end of iteration m1m-1, where m<km<k (i.e. m is not the last iteration), we must have:
A[pim1]xA[p \dots i_m-1] \leq x and A[jm1r]xA[j_m-1 \dots r] \geq x

General case: The lemma holds for mm, where m<km < k

Proof of base case complete!

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Correctness of Hoare’s Algorithm (6)

Original correctness claim:

  • (c) Every element in A[j]A[ \dots j] \leq every element in A[j+r]A[j+ \dots r] at termination

Proof of claim (c)

  • There are 33 cases to consider:
    • Case 1: k=1k=1, i.e. the algorithm terminates in a single iteration
    • Case 2: k>1k>1 and ik=jki_k = j_k
    • Case 3: k>1k>1 and ik=jk+1i_k = j_k + 1
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm (1)

  • Choose a pivot element: pivot=x=A[r]pivot = x = A[r]

alt:"alt" center

  • Grow two regions:
    • from left to right: A[pi]A[p \dots i]
    • from left to right: A[i+1j]A[i+1 \dots j]
      • such that:
        • every element in A[pi]pivotA[p \dots i] \leq pivot
        • every element in A[i+1j]>pivotA[i+1 \dots j] > pivot
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm (2)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-1)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-2)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-3)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-4)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-5)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-6)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-7)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-8)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-9)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-10)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-11)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-12)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-13)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-14)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Lomuto’s Partitioning Algorithm Ex. (Step-15)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Quicksort with Lomuto’s Partitioning Algorithm

QUICKSORT (A, p, r)
  if p < r then
    q = L-PARTITION(A, p, r)
    QUICKSORT(A, p, q - 1)
    QUICKSORT(A, q + 1, r)
  endif

Initial invocation: QUICKSORT(A,1,n)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Comparison of Hoare’s & Lomuto’s Algorithms (1)

  • Notation: n=rp+1n=r-p+1
    • pivot=A[p]pivot=A[p] (Hoare)
    • pivot=A[r]pivot=A[r] (Lomuto)
  • #\# of element exchanges: e(n)e(n)
    • Hoare: 0e(n)n20 \geq e(n) \geq \lfloor \frac{n}{2} \rfloor
      • Best: k=1k=1 with i1=j1=pi_1=j_1=p (i.e., A[p+1r]>pivotA[p+1 \dots r]>pivot)
      • Worst: A[p+1p+n21]pivotA[p+n2r]A[p+1 \dots p+ \lfloor \frac{n}{2} \rfloor - 1] \geq pivot \geq A[p+ \lceil \frac{n}{2} \rceil \dots r]
    • Lomuto : 1e(n)n1 \leq e(n) \leq n
      • Best: A[pr1]>pivotA[p \dots r -1]>pivot
      • Worst: A[pr1]pivotA[p \dots r-1] \leq pivot
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Comparison of Hoare’s & Lomuto’s Algorithms (2)

  • #\# of element comparisons: ce(n)c_e(n)

    • Hoare: n+1ce(n)n+2n+1 \leq c_e(n) \leq n+2
      • Best: ik=jki_k=j_k
      • Worst: ik=jk+1i_k=j_k+1
    • Lomuto: ce(n)=n1c_e(n)=n-1
  • #\# of index comparisons: ci(n)c_i(n)

    • Hoare: 1ci(n)n2+1(ci(n)=e(n)+1)1 \leq c_i(n) \leq \lfloor \frac{n}{2} \rfloor + 1 | (c_i(n)=e(n)+1)
    • Lomuto: ci(n)=n1c_i(n)=n-1
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Comparison of Hoare’s & Lomuto’s Algorithms (3)

  • #\# of index increment/decrement operations: a(n)a(n)

    • Hoare: n+1a(n)n+2(a(n)=ce(n))n+1 \leq a(n) \leq n+2 | (a(n)=c_e(n))
    • Lomuto: na(n)2n1(a(n)=e(n)+(n1))n \leq a(n) \leq 2n-1 | (a(n)=e(n)+(n-1))
  • Hoare’s algorithm is in general faster

  • Hoare behaves better when pivot is repeated in A[pr]A[p \dots r]

    • Hoare: Evenly distributes them between left & right regions
    • Lomuto: Puts all of them to the left region
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Analysis of Quicksort (1)

QUICKSORT (A, p, r)
  if p < r then
    q = H-PARTITION(A, p, r)
    QUICKSORT(A, p, q)
    QUICKSORT(A, q + 1, r)
  endif

Initial invocation: QUICKSORT(A,1,n)

alt:"alt" center

Assume all elements are distinct in the following analysis

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Analysis of Quicksort (2)

  • H-PARTITION always chooses A[p]A[p] (the first element) as the pivot.
  • The runtime of QUICKSORT on an already-sorted array is Θ(n2)\Theta(n^2)
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Example: An Already Sorted Array

Partitioning always leads to 22 parts of size 11 and n1n-1

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Worst Case Analysis of Quicksort

  • Worst case is when the PARTITION algorithm always returns imbalanced partitions (of size 11 and n1n-1) in every recursive call.
    • This happens when the pivot is selected to be either the min or max element.
    • This happens for H-PARTITION when the input array is already sorted or reverse sorted

T(n)=T(1)+T(n1)+Θ(n)=T(n1)+Θ(n)=Θ(n2)\begin{align*} T(n) &= T(1) + T(n-1) + Θ(n) \\ &= T(n-1) + Θ(n) \\ &= Θ(n2) \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Worst Case Recursion Tree

T(n)=T(1)+T(n1)+cnT(n) = T(1) + T(n-1) + cn

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Best Case Analysis (for intuition only)

  • If we’re extremely lucky, H-PARTITION splits the array evenly at every recursive call

T(n)=2T(n/2)+Θ(n)=Θ(nlgn)\begin{align*} T(n) &= 2T(n/2) + \Theta(n) \\ &= \Theta(nlgn) \end{align*}

(same as merge sort)

  • Instead of splitting 0.5:0.50.5:0.5, if we split 0.1:0.90.1:0.9 then we need solve following equation.

T(n)=T(n/10)+T(9n/10)+Θ(n)=Θ(nlgn) \begin{align*} T(n) &= T(n/10) + T(9n/10) + \Theta(n) \\ &= \Theta(nlgn) \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

“Almost-Best” Case Analysis

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Balanced Partitioning (1)

  • We have seen that if H-PARTITION always splits the array with 0.1to0.90.1-to-0.9 ratio, the runtime will be Θ(nlgn)\Theta(nlgn).

  • Same is true with a split ratio of 0.01to0.990.01-to-0.99, etc.

  • Possible to show that if the split has always constant (Θ(1))(\Theta(1)) proportionality, then the runtime will be Θ(nlgn)\Theta(nlgn).

  • In other words, for a constant α(0<α0.5)\alpha | (0 < \alpha ≤ 0.5):

    • αto(1α)\alpha–to–(1-\alpha) proportional split yields Θ(nlgn)\Theta(nlgn) total runtime
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Balanced Partitioning (2)

  • In the rest of the analysis, assume that all input permutations are equally likely.
    • This is only to gain some intuition
    • We cannot make this assumption for average case analysis
    • We will revisit this assumption later
  • Also, assume that all input elements are distinct.
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Balanced Partitioning (3)

  • Question: What is the probability that H-PARTITION returns a split that is more balanced than 0.1to0.90.1-to-0.9?
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Balanced Partitioning (4)

Reminder: H-PARTITION will place the pivot in the right partition unless the pivot is the smallest element in the arrays.

Question: If the pivot selected is the mth smallest value (1<mn)(1 < m ≤ n) in the input array, what is the size of the left region after partitioning?

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Balanced Partitioning (5)

  • Question: What is the probability that the pivot selected is the mthm^{th} smallest value in the array of size nn?

    • 1/n1/n (since all input permutations are equally likely)
  • Question: What is the probability that the left partition returned by H-PARTITION has size mm, where 1<m<n1<m<n?

    • 1/n1/n (due to the answers to the previous 2 questions)
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Balanced Partitioning (6)

  • Question: What is the probability that H-PARTITION returns a split that is more balanced than 0.1to0.90.1-to-0.9?

Probability=q=0.1n+10.9n11n=1n(0.9n10.1n1+1)=0.81n0.8 for large n\begin{align*} Probability &=\sum \limits_{q=0.1n+1}^{0.9n-1}\frac{1}{n} \\ &=\frac{1}{n}(0.9n-1-0.1n-1+1) \\ &= 0.8-\frac{1}{n} \\ & \approx 0.8 \text{ for large n} \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Balanced Partitioning (7)

  • The probability that H-PARTITION yields a split that is more balanced than 0.1to0.90.1-to-0.9 is 80%80\% on a random array.

  • Let Pα>P_{\alpha>} be the probability that H-PARTITION yields a split more balanced than αto(1α)\alpha-to-(1-\alpha), where 0<α0.50 < \alpha \leq 0.5

  • Repeat the analysis to generalize the previous result

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Balanced Partitioning (8)

  • Question: What is the probability that H-PARTITION returns a split that is more balanced than αto(1α)\alpha-to-(1-\alpha)?

Probability=q=αn+1(1α)n11n=1n((1α)n1αn1+1)=(12α)1n(12α) for large n\begin{align*} Probability & =\sum \limits_{q=\alpha n+1}^{(1-\alpha)n-1}\frac{1}{n} \\ & =\frac{1}{n}((1-\alpha)n-1- \alpha n-1+1) \\ & = (1-2\alpha)-\frac{1}{n} \\ & \approx (1-2\alpha) \text{ for large n} \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Balanced Partitioning (9)

  • We found Pα>=12αP_{\alpha >}=1-2\alpha
    • Ex: P0.1>=0.8P_{0.1>}=0.8 and P0.01>=0.98P_{0.01>}=0.98
  • Hence, H-PARTITION produces a split
    • more balanced than a
      • 0.1to0.90.1-to-0.9 split 80%80\% of the time
      • 0.01to0.990.01-to-0.99 split 98%98\% of the time
    • less balanced than a
      • 0.1to0.90.1-to-0.9 split 20%20\% of the time
      • 0.01to0.990.01-to-0.99 split 2%2\% of the time
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Intuition for the Average Case (1)

  • Assumption: All permutations are equally likely

    • Only for intuition; we’ll revisit this assumption later
  • Unlikely: Splits always the same way at every level

  • Expectation:

    • Some splits will be reasonably balanced
    • Some splits will be fairly unbalanced
  • Average case: A mix of good and bad splits

    • Good and bad splits distributed randomly thru the tree
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Intuition for the Average Case (2)

  • Assume for intuition: Good and bad splits occur in the alternate levels of the tree
    • Good split: Best case split
    • Bad split: Worst case split
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Intuition for the Average Case (3)

Compare 2-successive levels of avg case vs. 1 level of best case

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Intuition for the Average Case (4)

  • In terms of the remaining subproblems, two levels of avg case is slightly better than the single level of the best case

  • The avg case has extra divide cost of Θ(n)\Theta(n) at alternate levels

  • The extra divide cost Θ(n)\Theta(n) of bad splits absorbed into the Θ(n)\Theta(n) of good splits.

  • Running time is still Θ(nlgn)\Theta(nlgn)

    • But, slightly larger hidden constants, because the height of the recursion tree is about twice of that of best case.
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Intuition for the Average Case (5)

  • Another way of looking at it:
    • Suppose we alternate lucky, unlucky, lucky, unlucky, \dots
    • We can write the recurrence as:
      • L(n)=2U(n/2)+Θ(n)L(n) = 2U(n/2) + \Theta(n) lucky split (best)
      • U(n)=L(n1)+Θ(n)U(n) = L(n-1) + \Theta(n) unlucky split (worst)
    • Solving:

L(n)=2(L(n/21)+Θ(n/2))+Θ(n)=2L(n/21)+Θ(n)=Θ(nlgn)\begin{align*} L(n) & = 2(L(n/2-1) + \Theta(n/2)) + \Theta(n) \\ & = 2L(n/2-1) + \Theta(n) \\ & = Θ(nlgn) \end{align*}

  • How can we make sure we are usually lucky for all inputs?
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Summary: Quicksort Runtime Analysis (1)

  • Worst case: Unbalanced split at every recursive call

T(n)=T(1)+T(n1)+Θ(n)T(n)=Θ(n2)\begin{align*} T(n) & = T(1) + T(n-1) + \Theta(n) \\ T(n) & = \Theta(n2) \end{align*}

  • Best case: Balanced split at every recursive call (extremely lucky)

T(n)=2T(n/2)+Θ(n)T(n)=Θ(nlgn)\begin{align*} T(n) & = 2T(n/2) + \Theta(n) \\ T(n) & = \Theta(nlgn) \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Summary: Quicksort Runtime Analysis (2)

  • Almost-best case: Almost-balanced split at every recursive call

T(n)=T(n/10)+T(9n/10)+Θ(n)or T(n)=T(n/100)+T(99n/100)+Θ(n)or T(n)=T(αn)+T((1αn)+Θ(n)\begin{align*} T(n) &=T(n/10)+T(9n/10)+ \Theta(n) \\ \text{or } T(n) &= T(n/100) + T(99n/100) + Θ(n) \\ \text{or } T(n) &= T(\alpha n) + T((1-\alpha n)+ \Theta(n) \end{align*}

for any constant α,0<α0.5\alpha, 0 < \alpha \leq 0.5

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Summary: Quicksort Runtime Analysis (3)

  • For a random input array, the probability of having a split

    • more balanced than 0.1to0.9:80%0.1 – to – 0.9 : 80\%
    • more balanced than 0.01to0.99:98%0.01 – to – 0.99 : 98\%
    • more balanced than αto(1α):12α\alpha – to – (1-\alpha) : 1 – 2 \alpha
  • for any constant α,0<α0.5\alpha, 0 < \alpha \leq 0.5

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Summary: Quicksort Runtime Analysis (4)

  • Avg case intuition: Different splits expected at different levels

    • some balanced (good), some unbalanced (bad)
  • Avg case intuition: Assume the good and bad splits alternate

    • i.e. good split -> bad split -> good split -> …
    • T(n)=Θ(nlgn)T(n) = \Theta(nlgn)
      • (informal analysis for intuition)
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Randomized Quicksort

  • In the avg-case analysis, we assumed that all permutations of the input array are equally likely.
    • But, this assumption does not always hold
    • e.g. What if all the input arrays are reverse sorted?
      • Always worst-case behavior
  • Ideally, the avg-case runtime should be independent of the input permutation.
  • Randomness should be within the algorithm, not based on the distribution of the inputs.
    • i.e. The avg case should hold for all possible inputs
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Randomized Algorithms (1)

  • Alternative to assuming a uniform distribution:
    • Impose a uniform distribution
    • e.g. Choose a random pivot rather than the first element
  • Typically useful when:
    • there are many ways that an algorithm can proceed
    • but, it’s difficult to determine a way that is always guaranteed to be good.
    • If there are many good alternatives; simply choose one randomly.
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Randomized Algorithms (1)

  • Ideally:
    • Runtime should be independent of the specific inputs
    • No specific input should cause worst-case behavior
    • Worst-case should be determined only by output of a random number generator.
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Randomized Quicksort (1)

  • Using Hoare’s partitioning algorithm:
R-QUICKSORT(A, p, r)
  if p < r then
    q = R-PARTITION(A, p, r)
    R-QUICKSORT(A, p, q)
    R-QUICKSORT(A, q+1, r)
R-PARTITION(A, p, r)
  s = RANDOM(p, r)
  exchange A[p] with A[s]
  return H-PARTITION(A, p, r)
  • Alternatively, permuting the whole array would also work
    • but, would be more difficult to analyze
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Randomized Quicksort (2)

  • Using Lomuto’s partitioning algorithm:
R-QUICKSORT(A, p, r)
  if p < r then
    q = R-PARTITION(A, p, r)
    R-QUICKSORT(A, p, q-1)
    R-QUICKSORT(A, q+1, r)
R-PARTITION(A, p, r)
  s = RANDOM(p, r)
  exchange A[r] with A[s]
  return L-PARTITION(A, p, r)
  • Alternatively, permuting the whole array would also work
    • but, would be more difficult to analyze
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Notations for Formal Analysis

  • Assume all elements in A[pr]A[p \dots r] are distinct

    • Let n=rp+1n = r – p + 1
  • Let rank(x)=A[i]:pir and A[i]xrank(x) = |{A[i]: p \leq i \leq r \text{ and } A[i] \leq x}|

  • i.e. rank(x)rank(x) is the number of array elements with value less than or equal to xx

    • A={5,9,7,6,8,1,4}A=\{5,9,7,6,8,1,4\}
    • p=5,r=4p=5,r=4
    • rank(5)=3rank(5)=3
      • i.e. it is the 3rd3^{rd} smallest element in the array
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Formal Analysis for Average Case

  • The following analysis will be for Quicksort using Hoare’s partitioning algorithm.

  • Reminder: The pivot is selected randomly and exchanged with A[p]A[p] before calling H-PARTITION

  • Let xx be the random pivot chosen.

  • What is the probability that rank(x)=irank(x) = i for i=1,2,ni = 1, 2, \dots n ?

    • P(rank(x)=i)=1/nP(rank(x) = i) = 1/n
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Various Outcomes of H-PARTITION (1)

  • Assume that rank(x)=1rank(x)=1
    • i.e. the random pivot chosen is the smallest element
    • What will be the size of the left partition (L)(|L|)?
    • Reminder: Only the elements less than or equal to xx will be in the left partition.

A={2p=x=pivot,L=19,7,6,8,5,4r}A=\{\overbrace{2}^{p=x=pivot}\underbrace{,}_{\Longrightarrow|L|=1 } 9,7,6,8,5,\overbrace{4}^r\}

p=2,r=4p=2,r=4
pivot=x=2pivot=x=2

TODO: convert to image...S6_P9

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Various Outcomes of H-PARTITION (2)

  • Assume that rank(x)>1rank(x)>1
    • i.e. the random pivot chosen is not the smallest element
    • What will be the size of the left partition (L)(|L|)?
    • Reminder: Only the elements less than or equal to xx will be in the left partition.
    • Reminder: The pivot will stay in the right region after H-PARTITION if rank(x)>1rank(x)>1

A={2p,4,L=rank(x)17,6,8,5,pivot9r}A=\{\overbrace{2}^{p}, 4 \underbrace{,}_{\Longrightarrow|L|=rank(x)-1}7,6,8,\overbrace{5,}^{pivot}\overbrace{9}^r\}

p=2,r=4p=2,r=4
pivot=x=5pivot=x=5

TODO: convert to image...S6_P10

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Various Outcomes of H-PARTITION - Summary (1)

  • x:pivotx: pivot

  • L:size of left region|L|: \text{size of left region}

  • P(rank(x)=i)=1/n for 1inP(rank(x) = i) = 1/n \text{ for } 1 \leq i \leq n

    • if rank(x)=1 then L=1\text{if } rank(x) = 1 \text{ then } |L| = 1
    • if rank(x)>1 then L=rank(x)1\text{if } rank(x) > 1 \text{ then } |L| = rank(x) - 1
  • P(L=1)=P(rank(x)=1)+P(rank(x)=2)P(|L| = 1) = P(rank(x) = 1) + P(rank(x) = 2)

    • P(L=1)=2/nP(|L| = 1) = 2/n
  • P(L=i)=P(rank(x)=i+1) for 1<i<nP(|L| = i) = P(rank(x) = i+1) \text{ for } 1< i < n

    • P(L=i)=1/n for 1<i<nP(|L| = i) = 1/n \text{ for } 1< i < n
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Various Outcomes of H-PARTITION - Summary (2)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Average - Case Analysis: Recurrence (1)

x=pivotx=pivot

T(n)=1n(T(1)+t(n1))rank:1+1n(T(1)+t(n1))rank:2+1n(T(2)+t(n2))rank:3+1n(T(i)+t(ni))rank:i+1+1n(T(n1)+t(1))rank:n+Θ(n)\begin{align*} T(n) & = \frac{1}{n}(T(1)+t(n-1) ) & rank:1 \\ & + \frac{1}{n}(T(1)+t(n-1) ) & rank:2 \\ & + \frac{1}{n}(T(2)+t(n-2) ) & rank:3 \\ & \vdots & \vdots \\ & + \frac{1}{n}(T(i)+t(n-i) ) & rank:i+1 \\ & \vdots & \vdots \\ & + \frac{1}{n}(T(n-1)+t(1) ) & rank:n \\ & + \Theta(n) \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Average - Case Analysis: Recurrence (2)

T(n)=1nq=1n1(T(q)+T(nq))+1n(T(1)+T(n1))+Θ(n)Note: 1n(T(1)+T(n1))=1n(Θ(1)+O(n2))=O(n)T(n)=1nq=1n1(T(q)+T(nq))+Θ(n)\begin{align*} T(n) &= \frac{1}{n}\sum \limits_{q=1}^{n-1}(T(q)+T(n-q))+\frac{1}{n}(T(1)+T(n-1))+\Theta(n)\\ & \text{Note: } \frac{1}{n}(T(1)+T(n-1))=\frac{1}{n}(\Theta(1)+O(n^2))=O(n) \\ T(n) &= \frac{1}{n}\sum \limits_{q=1}^{n-1}(T(q)+T(n-q))+\Theta(n) \end{align*}

for k=1,2,,n1k=1,2,\dots,n-1 each term T(k)T(k) appears twice once for q=kq = k and once for q=nkq = n−k

T(n)=2nk=1n1T(k)+Θ(n)T(n) = \frac{2}{n}\sum \limits_{k=1}^{n-1} T(k)+\Theta(n)

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Average - Case Analysis -Solving Recurrence: Substitution

  • Guess: T(n)=O(nlgn)T(n)=O(nlgn)
  • T(k)aklgkT(k) ≤ aklgk for k<nk<n, for some constant a>0a > 0

T(n)=2nk=1n1T(k)+Θ(n)2nk=1n1aklgk+Θ(n)2ank=1n1klgk+Θ(n)\begin{align*} T(n) &= \frac{2}{n} \sum \limits_{k=1}^{n-1} T(k)+\Theta(n) \\ & \leq \frac{2}{n} \sum \limits_{k=1}^{n-1} aklgk+\Theta(n) \\ & \leq \frac{2a}{n} \sum \limits_{k=1}^{n-1} klgk+\Theta(n) \end{align*}

  • Need a tight bound for klgk\sum klgk
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Tight bound for klgk\sum klgk (1)

  • Bounding the terms
    •  k=1n1klgkk=1n1nlgn=n(n1)lgnn2lgn\ \sum \limits_{k=1}^{n-1}klgk \leq \sum \limits_{k=1}^{n-1}nlgn = n(n-1)lgn \leq n^2lgn
    • This bound is not strong enough because
    • T(n)2ann2lgn+Θ(n)T(n) \leq \frac{2a}{n}n^2lgn+\Theta(n)
    • =2anlgn+Θ(n)=2anlgn+\Theta(n) \Longrightarrow couldn’t prove T(n)anlgnT(n) \leq anlgn
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Tight bound for klgk\sum klgk (2)

  • Splitting summations: ignore ceilings for simplicity

    k=1n1klgkk=1n/21klgk+k=n/2n1klgk\sum \limits_{k=1}^{n-1}klgk \leq \sum \limits_{k=1}^{n/2-1}klgk + \sum \limits_{k=n/2}^{n-1}klgk

    • First summation: lgk<lg(n/2)=lgn1lgk < lg(n/2)=lgn-1
    • Second summation: lgk<lgnlgk < lgn
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Splitting: k=1n1klgkk=1n/21klgk+k=n/2n1klgk\sum \limits_{k=1}^{n-1}klgk \leq \sum \limits_{k=1}^{n/2-1}klgk + \sum \limits_{k=n/2}^{n-1}klgk (3)

k=1n1klgk(lg(n1))k=1n/21k+lgnk=n/2n1k=lgnk=1n1kk=1n/21k=12n(n1)lgn12n2(n21)=12n2lgn18n212n(lgn1/2)\begin{align*} & \sum \limits_{k=1}^{n-1}klgk \leq (lg(n-1))\sum \limits_{k=1}^{n/2-1}k + lgn \sum \limits_{k=n/2}^{n-1}k \\ &= lgn \sum \limits_{k=1}^{n-1}k- \sum \limits_{k=1}^{n/2-1}k \\ &= \frac{1}{2}n(n-1)lgn - \frac{1}{2} \frac{n}{2}(\frac{n}{2}-1) \\ &= \frac{1}{2}n^2lgn - \frac{1}{8}n^2 - \frac{1}{2}n(lgn-1/2) \\ \end{align*}

k=1n1klgk12n2lgn18n2 for lgn1/2n2\begin{align*} & \sum \limits_{k=1}^{n-1} klgk \leq \frac{1}{2}n^2lgn-\frac{1}{8}n^2 \ for \ lgn \geq 1/2 \Longrightarrow n \geq \sqrt{2} \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Substituting: - k=1n1klgk12n2lgn18n2\sum \limits_{k=1}^{n-1}klgk \leq \frac{1}{2}n^2lgn-\frac{1}{8}n^2 (4)

T(n)2ank=1n1klgk+Θ(n)2an(12n2lgn18n2)+Θ(n)=anlgn(a4nΘ(n))\begin{align*} T(n) & \leq \frac{2a}{n}\sum \limits_{k=1}^{n-1}klgk+\Theta(n)\\ & \leq \frac{2a}{n}(\frac{1}{2}n^2lgn-\frac{1}{8}n^2)+\Theta(n) \\ & = anlgn - (\frac{a}{4}n-\Theta(n)) \end{align*}

  • We can choose a large enough so that a4nΘ(n)\frac{a}{4}n \geq \Theta(n)

T(n)anlgnT(n)=O(nlgn)\begin{align*} T(n) & \leq anlgn \\ T(n) & = O(nlgn) \end{align*}

Q.E.D.

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Medians and Order Statistics

  • ith order statistic: ithi^{th} smallest element of a set of nn elements

  • minimum: first order statistic

  • maximum: nthn^{th} order statistic

  • median: “halfway point” of the set

i=(n+1)2 or i=(n+1)2\begin{align*} i & = \lfloor \frac{(n+1)}{2} \rfloor \\ \text{ or } \\ i & = \lceil \frac{(n+1)}{2} \rceil \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection Problem

  • Selection problem: Select the ithi^{th} smallest of nn elements

  • Naïve algorithm: Sort the input array AA; then return A[i]A[i]

    • T(n)=θ(nlgn)T(n) = \theta(nlgn)
      • using e.g. merge sort (but not quicksort)
  • Can we do any better?

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Expected Linear Time

  • Randomized algorithm using divide and conquer

  • Similar to randomized quicksort

    • Like quicksort: Partitions input array recursively
    • Unlike quicksort: Makes a single recursive call
      • Reminder: Quicksort makes two recursive calls
  • Expected runtime: Θ(n)\Theta(n)

    • Reminder: Expected runtime of quicksort: Θ(nlgn)\Theta(nlgn)
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Expected Linear Time: Example 1

  • Select the 2nd2^{nd} smallest element:

A={6,10,13,5,8,3,2,11}i=2\begin{align*} A & = \{6,10,13,5,8,3,2,11\} \\ i & = 2 \\ \end{align*}

  • Partition the input array:

A={2,3,5,left subarray13,8,10,6,11right subarray}\begin{align*} A & = \{\underbrace{2,3,5,}_{\text{left subarray} }\underbrace{13,8,10,6,11}_{\text{right subarray}}\} \end{align*}

  • make a recursive call to select the 2nd2^{nd} smallest element in left subarray
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Expected Linear Time: Example 2

  • Select the 7th7^{th} smallest element:

A={6,10,13,5,8,3,2,11}i=7\begin{align*} A & = \{6,10,13,5,8,3,2,11\} \\ i & = 7 \\ \end{align*}

  • Partition the input array:

A={2,3,5,left subarray13,8,10,6,11right subarray}\begin{align*} A & = \{\underbrace{2,3,5,}_{\text{left subarray} }\underbrace{13,8,10,6,11}_{\text{right subarray}}\} \end{align*}

  • make a recursive call to select the 4th4^{th} smallest element in right subarray
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Expected Linear Time (1)

R-SELECT(A,p,r,i)
  if p == r then 
    return A[p];
  q = R-PARTITION(A, p, r)
  k = q–p+1;
  if i <= k then 
    return R-SELECT(A, p, q, i);
  else
    return R-SELECT(A, q+1, r, i-k);

A={px(k smallest elements)qxr}x=pivot\begin{align*} A & = \{ \underbrace{ | }_{p} \dots \leq x \text{(k smallest elements)} \dots \underbrace{ | }_{q} \dots \geq x \dots \underbrace{ | }_{r} \} \\ x & = pivot \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Expected Linear Time (2)

A={pxLqxRr}x=pivot\begin{align*} A & = \{ \overbrace{ | }^{p} \underbrace{ \dots \leq x \dots }_{L} \overbrace{ | }^{q} \underbrace{ \dots \geq x \dots }_{R} \overbrace{ | }^{r} \} \\ x & = pivot \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Expected Linear Time (2)

  • All elements in LL \leq all elements in RR
  • LL contains:
    • L=qp+1|L| = q–p+1 == k smallest elements of A[p...r]A[p...r]
    • if iL=ki \leq |L| = k then
      • search LL recursively for its ithi^{th} smallest element
    • else
      • search RR recursively for its (ik)th(i-k)^{th} smallest element
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Runtime Analysis (1)

  • Worst case:
    • Imbalanced partitioning at every level and the recursive call always to the larger partition

={1,2,3,4,5,6,7,8recursive call}i=8={2,3,4,5,6,7,8recursive call}i=7\begin{align*} & = \{1,\underbrace{2,3,4,5,6,7,8}_{\text{recursive call}} \} & i & = 8 \\ & = \{2,\underbrace{3,4,5,6,7,8}_{\text{recursive call}} \} & i & = 7 \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Runtime Analysis (2)

  • Worst case: Worse than the naïve method (based on sorting)

T(n)=T(n1)+Θ(n)T(n)=Θ(n2)\begin{align*} T(n) &= T(n-1) + \Theta(n) \\ T(n) &= \Theta(n^2) \end{align*}

  • Best case: Balanced partitioning at every recursive level

T(n)=T(n/2)+Θ(n)T(n)=Θ(n)\begin{align*} T(n) &= T(n/2) + \Theta(n) \\ T(n) &= \Theta(n) \end{align*}

  • Avg case: Expected runtime – need analysis T.B.D.
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Reminder: Various Outcomes of H-PARTITION

  • x:pivotx: pivot

  • L:size of left region|L|: \text{size of left region}

  • P(rank(x)=i)=1/n for 1inP(rank(x) = i) = 1/n \text{ for } 1 \leq i \leq n

    • if rank(x)=1 then L=1\text{if } rank(x) = 1 \text{ then } |L| = 1
    • if rank(x)>1 then L=rank(x)1\text{if } rank(x) > 1 \text{ then } |L| = rank(x) - 1
  • P(L=1)=P(rank(x)=1)+P(rank(x)=2)P(|L| = 1) = P(rank(x) = 1) + P(rank(x) = 2)

    • P(L=1)=2/nP(|L| = 1) = 2/n
  • P(L=i)=P(rank(x)=i+1) for 1<i<nP(|L| = i) = P(rank(x) = i+1) \text{ for } 1< i < n

    • P(L=i)=1/n for 1<i<nP(|L| = i) = 1/n \text{ for } 1< i < n
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Average Case Analysis of Randomized Select

  • To compute the upper bound for the avg case, assume that the ithi^{th} element always falls into the larger partition.

A={pxLeftPartitionqxRightPartitionr}x=pivot\begin{align*} A & = \{ \overbrace{ | }^{p} \underbrace{ \dots \leq x \dots }_{Left Partition} \overbrace{ | }^{q} \underbrace{ \dots \geq x \dots }_{Right Partition} \overbrace{ | }^{r} \} \\ x & = pivot \end{align*}

  • We will analyze the case where the recursive call is always made to the larger partition
    • This will give us an upper bound for the avg case
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Various Outcomes of H-PARTITION

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Average-Case Analysis of Randomized Select (1)

Recall:P(L=i)={2/nfor i=11/nfor i=2,3,,n1\text{Recall:} P(|L|=i) = \begin{cases} 2/n & \text{for } i=1 \\ 1/n & \text{for } i=2,3,\dots,n-1 \end{cases}

Upper bound: Assume ithi^{th} element always falls into the larger part.

T(n)1nT(max(1,n1))+1nq=1n1T(max(q,nq))+O(n)Note:1nT(max(1,n1))=1nT(n1)=1nO(n2)=O(n)(3 dot mean therefore) T(n)1nq=1n1T(max(q,nq))+O(n)\begin{align*} T(n) &\leq \frac{1}{n}T(max(1,n-1))+\frac{1}{n}\sum \limits_{q=1}^{n-1}T(max(q,n-q))+O(n) \\ Note: & \frac{1}{n}T(max(1,n-1)) = \frac{1}{n}T(n-1)=\frac{1}{n}O(n^2) = O(n) \\ \therefore \text{(3 dot mean therefore) } & T(n) \leq \frac{1}{n}\sum \limits_{q=1}^{n-1}T(max(q,n-q))+O(n) \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Average-Case Analysis of Randomized Select (2)

T(n)1nq=1n1T(max(q,nq))+O(n)\begin{align*} \therefore T(n) \leq \frac{1}{n}\sum \limits_{q=1}^{n-1}T(max(q,n-q))+O(n) \end{align*}

max(q,nq)={q if qn/2nq if q<n/2max(q, n–q) = \begin{cases} q & \text{ if } q \geq \lceil n/2 \rceil \\ n-q & \text{ if } q < \lceil n/2 \rceil \\ \end{cases}

  • nn is odd: T(k)T(k) appears twice for k=n/2+1,n/2+2,,n1k=\lceil n/2 \rceil+1,\lceil n/2 \rceil+2,\dots,n–1
  • nn is even:T(n/2)T(\lceil n/2 \rceil) appears once T(k)T(k) appears twice for k=n/2+1,n/2+2,,n1k = \lceil n/2 \rceil +1, \lceil n/2 \rceil+2,\dots,n–1
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Average-Case Analysis of Randomized Select (3)

  • Hence, in both cases:

q=1n1T(max(q,nq))+O(n)2q=n/2n1T(q)+O(n)T(n)2nq=n/2n1T(q)+O(n)\begin{align*} \sum \limits_{q=1}^{n-1} T(max(q,n-q))+O(n) & \leq 2\sum \limits_{q=\lceil n/2 \rceil}^{n-1} T(q)+O(n) \\ \therefore T(n) & \leq \frac{2}{n} \sum \limits_{q=\lceil n/2 \rceil}^{n-1}T(q)+O(n) \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Average-Case Analysis of Randomized Select (4)

T(n)2nq=n/2n1T(q)+O(n)\begin{align*} T(n) & \leq \frac{2}{n} \sum \limits_{q=\lceil n/2 \rceil}^{n-1}T(q)+O(n) \end{align*}

  • By substitution guess T(n)=O(n)T(n) = O(n)
  • Inductive hypothesis: T(k)ck,k<nT(k) \leq ck, \forall k<n

T(n)2nq=n/2n1ck+O(n)=2cn(k=1n1kk=1n/21k)+O(n)=2cn(12n(n1)12n2(n21))+O(n)\begin{align*} T(n) & \leq \frac{2}{n} \sum \limits_{q=\lceil n/2 \rceil}^{n-1}ck+O(n) \\ & = \frac{2c}{n} \Bigg(\sum \limits_{k=1}^{n-1}k-\sum \limits_{k=1}^{\lceil n/2 \rceil-1}k \Bigg)+ O(n) \\ & = \frac{2c}{n} \Bigg(\frac{1}{2}n(n-1)-\frac{1}{2} \lceil \frac{n}{2} \rceil \bigg( \frac{n}{2}-1 \bigg) \Bigg)+ O(n) \end{align*}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Average-Case Analysis of Randomized Select (5)

T(n)2cn(12n(n1)12n2(n21))+O(n)c(n1)c4n+c2+O(n)=cnc4nc2+O(n)=cn((c4n+c2)+O(n))cn\begin{align*} T(n)& \leq \frac{2c}{n} \Bigg(\frac{1}{2}n(n-1)-\frac{1}{2} \lceil \frac{n}{2} \rceil \bigg( \frac{n}{2}-1 \bigg) \Bigg)+ O(n) \\ & \leq c(n-1)-\frac{c}{4}n+\frac{c}{2}+O(n) \\ & = cn - \frac{c}{4}n - \frac{c}{2} + O(n) \\ & = cn - \Bigg( \bigg( \frac{c}{4}n+\frac{c}{2}\bigg) + O(n) \Bigg) \\ & \leq cn \end{align*}

  • since we can choose c large enough so that (cn/4+c/2)(cn/4+c/2 ) dominates O(n)O(n)
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Summary of Randomized Order-Statistic Selection

  • Works fast: linear expected time

  • Excellent algorithm in practise

  • But, the worst case is very bad: Θ(n2)\Theta(n^2)

  • Blum, Floyd, Pratt, Rivest & Tarjan[1973] algorithms are runs in linear time in the worst case.

  • Generate a good pivot recursively

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Worst Case Linear Time

//return i-th element in set S with n elements
SELECT(S, n, i)  

  if n <= 5 then

    SORT S and return the i-th element

  DIVIDE S into ceil(n/5) groups
  //first ceil(n/5) groups are of size 5, last group is of size n mod 5

  FIND median set M={m ,, m_ceil(n/5)}
  // m_j : median of j-th group

  x = SELECT(M,ceil(n/5),floor((ceil(n/5)+1)/2))

  PARTITION set S around the pivot x into L and R

  if i <= |L| then
    return  SELECT(L, |L|, i)
  else
    return  SELECT(R, n–|L|, i–|L|)
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Worst Case Linear Time - Example (1)

  • Input: Array SS and index ii
  • Output: The ithi^{th} smallest value

25916811273942156321436203322314173304121319721103413723405291824123828263543\begin{array}{ccc} 25 & 9 & 16 & 8 & 11 & 27 & 39 & 42 & 15 & 6 32 & 14 & 36 & 20 & 33 & 22 & 31 & 4 & 17 & 3 & 30 & 41 \\ 2 & 13 & 19 & 7 & 21 & 10 & 34 & 1 & 37 & 23 & 40 & 5 & 29 & 18 & 24 & 12 & 38 & 28 & 26 & 35 & 43 \end{array}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Worst Case Linear Time - Example (2)

Step 1: Divide the input array into groups of size 55

25916811273942156321436203322314173304121319721103413723405291824123828263543group size=5\overbrace{ \begin{array}{ccc} 25 & 9 & 16 & 8 & 11 \\ 27 & 39 & 42 & 15 & 6 \\ 32 & 14 & 36 & 20 & 33 \\ 22 & 31 & 4 & 17 & 3 \\ 30 & 41 & 2 & 13 & 19 \\ 7 & 21 & 10 & 34 & 1 \\ 37 & 23 & 40 & 5 & 29 \\ 18 & 24 & 12 & 38 & 28 \\ 26 & 35 & 43 \end{array} }^{\text{group size}=5}

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Worst Case Linear Time - Example (3)

Step 2: Compute the median of each group (Θ(n)\Theta(n))

251611Medians89394227615363332201422311734413019132213410173740292353828241218263543\begin{array}{ccc} 25 & 16 & \overbrace{11}^{Medians} & 8 & 9 \\ 39 & 42 & 27 & 6 & 15 \\ 36 & 33 & 32 & 20 & 14 \\ 22 & 31 & 17 & 3 & 4 \\ 41 & 30 & 19 & 13 & 2 \\ 21 & 34 & 10 & 1 & 7 \\ 37 & 40 & 29 & 23 & 5 \\ 38 & 28 & 24 & 12 & 18 \\ & 26 & 35 & 43 \end{array}

  • Let MM be the set of the medians computed:
    • M={11,27,32,17,19,10,29,24,35}M = \{11, 27, 32, 17, 19, 10, 29, 24, 35\}
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Worst Case Linear Time - Example (4)

Step 3: Compute the median of the median group MM

xSELECT(M,M,(M+1)/2)x \leftarrow SELECT(M,|M|,\lfloor (|M|+1)/2 \rfloor) where M=n/5|M|=\lceil n/5 \rceil

  • Let MM be the set of the medians computed:

    • M={11,27,32,17,19,10,29,24Median,35}M = \{11, 27, 32, 17, 19, 10, 29, \overbrace{24}^{Median}, 35\}
  • Median=24Median = 24

  • The runtime of the recursive call: T(M)=T(n/5)T(|M|)=T(\lceil n/5 \rceil)

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Worst Case Linear Time - Example (5)

Step 4: Partition the input array SS around the median-of-medians xx

25916811273942156321436203322314173304121319721103413723405291824123828263543\begin{array}{ccc} 25 & 9 & 16 & 8 & 11 & 27 & 39 & 42 & 15 & 6 32 & 14 & 36 & 20 & 33 & 22 & 31 & 4 & 17 & 3 & 30 & 41 \\ 2 & 13 & 19 & 7 & 21 & 10 & 34 & 1 & 37 & 23 & 40 & 5 & 29 & 18 & 24 & 12 & 38 & 28 & 26 & 35 & 43 \end{array}

Partition SS around x=24x = 24

Claim: Partitioning around x is guaranteed to be well-balanced.

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Worst Case Linear Time - Example (6)

  • MM : Median, MM^* : Median of Medians

413019M132213410172231173425161189382824M12183633322014374029235394227615263543\begin{array}{ccc} 41 & 30 & \overbrace{19}^{M} & 13 & 2 \\ 21 & 34 & 10 & 1 & 7 \\ 22 & 31 & 17 & 3 & 4 \\ 25 & 16 & 11 & 8 & 9 \\ 38 & 28 & \overbrace{24}^{M^*} & 12 & 18 \\ 36 & 33 & 32 & 20 & 14 \\ 37 & 40 & 29 & 23 & 5 \\ 39 & 42 & 27 & 6 & 15 \\ & 26 & 35 & 43 \end{array}

  • About half of the medians greater than x=24x=24 (about n/10n/10)
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Worst Case Linear Time - Example (7)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Worst Case Linear Time - Example (8)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Worst Case Linear Time - Example (9)

S={25916811273942156321436203322314173304121319721103413723405291824123828263543}S = \begin{array}{ccc} \{ 25 & 9 & 16 & 8 & 11 & 27 & 39 & 42 & 15 & 6 32 & 14 & 36 & 20 & 33 & 22 & 31 & 4 & 17 & 3 & 30 & 41 \\ 2 & 13 & 19 & 7 & 21 & 10 & 34 & 1 & 37 & 23 & 40 & 5 & 29 & 18 & 24 & 12 & 38 & 28 & 26 & 35 & 43 \} \end{array}

  • Partitioning SS around x=24x = 24 will lead to partitions
    of sizes 3n/10\sim 3n/10 and 7n/10\sim 7n/10 in the worst case.

Step 5: Make a recursive call to one of the partitions

if i <= |L| then 
  return SELECT(L,|L|,i)
else 
  return SELECT(R,n-|L|,i-|L|)
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Worst Case Linear Time

//return i-th element in set S with n elements
SELECT(S, n, i)  

  if n <= 5 then

    SORT S and return the i-th element

  DIVIDE S into ceil(n/5) groups
  //first ceil(n/5) groups are of size 5, last group is of size n mod 5

  FIND median set M={m ,, m_ceil(n/5)}
  // m_j : median of j-th group

  x = SELECT(M,ceil(n/5),floor((ceil(n/5)+1)/2))

  PARTITION set S around the pivot x into L and R

  if i <= |L| then
    return  SELECT(L, |L|, i)
  else
    return  SELECT(R, n–|L|, i–|L|)
RTEU CE100 Week-3
CE100 Algorithms and Programming II

Choosing the Pivot (1)

  1. Divide S into groups of size 5

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Choosing the Pivot (2)

  • Divide S into groups of size 5
  • Find the median of each group

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Choosing the Pivot (3)

  • Divide S into groups of size 5
  • Find the median of each group
  • Recursively select the median x of the medians

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Choosing the Pivot (4)

  • At least half of the medians x\geq x
  • Thus m=n/5/2m = \lceil \lceil n/5 \rceil /2 \rceil groups contribute 3 elements to R except possibly the last group and the group that contains xx, R3(m2)3n106|R|\geq 3(m–2)\geq \frac{3n}{10}–6

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Choosing the Pivot (5)

  • Similarly L3n106|L| \geq \frac{3n}{10}– 6
  • Therefore, SELECT is recursively called on at most n(3n106)=7n10+6n-(\frac{3n}{10}-6)=\frac{7n}{10}+6 elements

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Worst Case Linear Time (1)

alt:"alt" center

RTEU CE100 Week-3
CE100 Algorithms and Programming II

Selection in Worst Case Linear Time (2)

  • Thus recurrence becomes
    • T(n)T(n5)+T(7n10+6)+Θ(n)T(n) \leq T \big( \lceil \frac{n}{5} \rceil \big) + T\big( \frac{7n}{10}+6 \big) + \Theta(n)
  • Guess T(n)=O(n)T(n)=O(n) and prove by induction
  • Inductive step:

T(n)cn/5+c(7n/10+6)+Θ(n)cn/5+c+7cn/10+6c+Θ(n)=9cn/10+7c+Θ(n)=cn[c(n/107)Θ(n)]cn( for large c)\begin{align*} T(n) & \leq c \lceil n/5 \rceil + c(7n/10+6)+\Theta(n) \\ & \leq cn/5 + c + 7cn/10 + 6c + \Theta(n) \\ & = 9cn/10 + 7c + \Theta(n) \\ & = cn - [c(n/10-7)-\Theta(n)] \leq cn &\text{( for large c)} \end{align*}

  • Work at each level of recursion is a constant factor (9/10)(9/10) smaller
RTEU CE100 Week-3
CE100 Algorithms and Programming II

References

RTEU CE100 Week-3
CE100 Algorithms and Programming II

EndOfWeek3CourseModule-End-Of-Week-3-Course-Module-

RTEU CE100 Week-3