CE100 Algorithms and Programming II

CE100 Algorithms and Programming II

Week-4 (Heap/Heap Sort)

Spring Semester, 2021-2022

Download DOC, SLIDE, PPTX

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap/Heap Sort

Outline (1)

  • Heaps

    • Max / Min Heap
  • Heap Data Structure

    • Heapify

      • Iterative

      • Recursive

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Outline (2)

  • Extract-Max

  • Build Heap

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Outline (3)

  • Heap Sort

  • Priority Queues

  • Linked Lists

  • Radix Sort

  • Counting Sort

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort

  • Worst-case runtime: O(nlgn)O(nlgn)
  • Sorts in-place
  • Uses a special data structure (heap) to manage information during execution of the algorithm
    • Another design paradigm
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Data Structure (1)

  • Nearly complete binary tree
    • Completely filled on all levels except possibly the lowest level

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Data Structure (2)

  • Height of node i: Length of the longest simple downward path from i to a leaf
  • Height of the tree: height of the root

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Data Structures (3)

  • Depth of node i: Length of the simple downward path from the root to node i

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Property: Min-Heap

  • The smallest element in any subtree is the root element in a min-heap

  • Min heap: For every node i other than root, A[parent(i)]A[i]A[parent(i)] \leq A[i]

    • Parent node is always smaller than the child nodes
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Property: Max-Heap

  • The largest element in any subtree is the root element in a max-heap
    • We will focus on max-heaps
  • Max heap: For every node i other than root, A[parent(i)]A[i]A[parent(i)] ≥ A[i]
    • Parent node is always larger than the child nodes
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Data Structures (4)

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Data Structures (5)

  • Computing left child, right child, and parent indices very fast

    • left(i) = 2i \Longrightarrow binary left shift
    • right(i) = 2i+1 \Longrightarrow binary left shift, then set the lowest bit to 1
    • parent(i) = floor(i/2) \Longrightarrow right shift in binary
  • A[1]A[1] is always the root element

  • Array AA has two attributes:

    • length(A): The number of elements in AA
    • n = heap-size(A): The number elements in heapheap
      • nlength(A)n \leq length(A)
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations : EXTRACT-MAX (1)

EXTRACT-MAX(A, n)
  max = A[1]
  A[1] = A[n]
  n = n - 1
  HEAPIFY(A, 1,n)
  return max
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations : EXTRACT-MAX (2)

  • Return the max element,and reorganize the heap to maintain heap property

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations: HEAPIFY (1)

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations: HEAPIFY (2)

  • Maintaining heap property:
    • Subtrees rooted at left[i]left[i] and right[i]right[i] are already heaps.
    • But, A[i]A[i] may violate the heap property (i.e., may be smaller than its children)
  • Idea: Float down the value at A[i]A[i] in the heap so that subtree rooted at ii becomes a heap.
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations: HEAPIFY (2)

HEAPIFY(A, i, n)
  largest = i 

  if 2i <= n and A[2i] > A[i] then 
    largest = 2i;
  endif

  if 2i+1 <= n and A[2i+1] > A[largest] then 
    largest = 2i+1;
  endif

  if largest != i  then
    exchange A[i] with A[largest];
    HEAPIFY(A, largest, n);
  endif
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations: HEAPIFY (3)

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations: HEAPIFY (4)

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations: HEAPIFY (5)

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations: HEAPIFY (6)

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations: HEAPIFY (7)

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations: HEAPIFY (8)

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Intuitive Analysis of HEAPIFY

  • Consider HEAPIFY(A,i,n)HEAPIFY(A, i, n)
    • let h(i)h(i) be the height of node ii
    • at most h(i)h(i) recursion levels
      • Constant work at each level: Θ(1)\Theta(1)
    • Therefore T(i)=O(h(i))T(i)=O(h(i))
  • Heap is almost-complete binary tree
    • h(n)=O(lgn)h(n)=O(lgn)
  • Thus T(n)=O(lgn)T(n)=O(lgn)
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Formal Analysis of HEAPIFY

  • What is the recurrence?
    • Depends on the size of the subtree on which recursive call is made
      • In the next, we try to compute an upper bound for this subtree.
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Reminder: Binary trees

  • For a complete binary tree:
    • #\# of nodes at depth dd: 2d2^d
    • #\# of nodes with depths less than dd: 2d12^d-1

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Formal Analysis of HEAPIFY (1)

  • Worst case occurs when last row of the subtree SiS_i rooted at node ii is half full

  • T(n)T(SL(i))+Θ(1)T(n) \leq T(|S_{L(i)}|) + \Theta(1)

  • SL(i)S_{L(i)} and SR(i)S_{R(i)} are complete binary trees of heights h(i)1h(i)-1 and h(i)2h(i)-2, respectively

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Formal Analysis of HEAPIFY (2)

  • Let mm be the number of leaf nodes in SL(i)S_{L(i)}
    • SL(i)=mext.+(m1)int.=2m1|S_{L(i)}|=\overbrace{m}^{ext.}+\overbrace{(m–1)}^{int.}=2m–1
    • SR(i)=m2ext.+(m21)int.=m1|S_{R(i)}|=\overbrace{\frac{m}{2}}^{ext.}+\overbrace{(\frac{m}{2}–1)}^{int.}=m–1
    • SL(i)+SR(i)+1=n|S_{L(i)}|+|S_{R(i)}|+1=n
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Formal Analysis of HEAPIFY (2)

(2m1)+(m1)+1=nm=(n+1)/3SL(i)=2m1=2(n+1)/31=(2n/3+2/3)1=2n3132n3T(n)T(2n/3)+Θ(1)T(n)=O(lgn)\begin{align*} (2m–1)+(m–1)+1 &=n \\ m &= (n+1)/3 \\ |S_{L(i)}| &= 2m – 1 \\ &=2(n+1)/3 – 1 \\ &=(2n/3+2/3) –1 \\ &=\frac{2n}{3}-\frac{1}{3} \leq \frac{2n}{3} \\ T(n) & \leq T(2n/3) + \Theta(1) \\ T(n) &= O(lgn) \end{align*}

  • By CASE-2 of Master Theorem \Longrightarrow T(n)=Θ(nlogbalgn)T(n)=\Theta(n^{log_b^a}lgn)
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Formal Analysis of HEAPIFY (2)

  • Recurrence: T(n)=aT(n/b)+f(n)T(n) = aT(n/b) + f(n)

  • Case 2: f(n)nlogba=Θ(1)\frac{f(n)}{n^{log_b^a}}=\Theta(1)

  • i.e., f(n)f(n) and nlogban^{log_b^a} grow at similar rates

  • Solution: T(n)=Θ(nlogbalgn)T(n)=\Theta(n^{log_b^a}lgn)

    • T(n)T(2n/3)+Θ(1)T(n) \leq T(2n/3) + \Theta(1) (drop constants.)
    • T(n)Θ(nlog31lgn)T(n) \leq \Theta(n^{log_3^1}lgn)
    • T(n)Θ(n0lgn)T(n) \leq \Theta(n^0lgn)
    • T(n)=O(lgn)T(n) = O(lgn)
RTEU CE100 Week-4
CE100 Algorithms and Programming II

HEAPIFY: Efficiency Issues

  • Recursion vs Iteration:
    • In the absence of tail recursion, iterative version is in general more efficient because of the pop/push operations to/from stack at each level of recursion.
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations: HEAPIFY (1)

Recursive

  HEAPIFY(A, i, n)
  largest = i 

  if 2i <= n and A[2i] > A[i] then 
    largest = 2i

  if 2i+1 <= n and A[2i+1] > A[largest] then 
    largest = 2i+1

  if largest != i  then
    exchange A[i] with A[largest]
    HEAPIFY(A, largest, n)
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations: HEAPIFY (2)

Iterative

HEAPIFY(A, i, n)
  j = i
  while(true) do
    largest = j 

  if 2j <= n and A[2j] > A[j] then 
    largest = 2j

  if 2j+1 <= n and A[2j+1] > A[largest] then 
    largest = 2j+1

  if largest != j  then
    exchange A[j] with A[largest]
    j = largest
  else return
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations: HEAPIFY (3)

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations: Building Heap

  • Given an arbitrary array, how to build a heap from scratch?

  • Basic idea: Call HEAPIFYHEAPIFY on each node bottom up

    • Start from the leaves (which trivially satisfy the heap property)
    • Process nodes in bottom up order.
    • When HEAPIFYHEAPIFY is called on node ii, the subtrees connected to the leftleft and rightright subtrees already satisfy the heap property.
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Storage of the leaves (Lemma)

  • Lemma: The last n2\lceil \frac{n}{2} \rceil nodes of a heap are all leaves.
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Storage of the leaves (Proof of Lemma) (1)

  • Lemma: last n/2\lceil n/2 \rceil nodes of a heap are all leaves
  • Proof :
    • m=2d1m=2^{d-1}: #\# nodes at level d1d-1
    • ff: #\# nodes at level dd (last level)
  • #\# of nodes with depth d1d-1 : mm
  • #\# of nodes with depth <d1<d-1 : m1m-1
  • #\# of nodes with depth dd : ff
  • Total #\# of nodes :n=f+2m1n=f+2m-1
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Storage of the leaves (Proof of Lemma) (2)

  • Total #\# of nodes : f=n2m+1f=n-2m+1

# of leaves: =f+mf/2=m+f/2=m+(n2m+1)/2=(n+1)/2=n/2\begin{align*} \text{\# of leaves: }&=f+m-\lceil f/2 \rceil \\ &= m+\lfloor f/2 \rfloor \\ &= m+\lfloor (n-2m+1)/2 \rfloor \\ &= \lfloor (n+1)/2 \rfloor \\ &= \lceil n/2 \rceil \end{align*}

Proof is Completed

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Operations: Building Heap

BUILD-HEAP (A, n)
  for i = ceil(n/2) downto 1 do
    HEAPIFY(A, i, n)
  • Reminder: The last n/2\lceil n/2 \rceil nodes of a heap are all leaves, which trivially satisfy the heap property
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Build-Heap Example (Step-1)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Build-Heap Example (Step-2)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Build-Heap Example (Step-3)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Build-Heap Example (Step-4)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Build-Heap Example (Step-5)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Build-Heap Example (Step-6)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Build-Heap Example (Step-7)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Build-Heap Example (Step-8)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Build-Heap Example (Step-9)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Build-Heap: Runtime Analysis

  • Simple analysis:

    • O(n)O(n) calls to HEAPIFYHEAPIFY, each of which takes O(lgn)O(lgn) time
    • O(nlgn)O(nlgn) \Longrightarrow loose bound
  • In general, a good approach:

    • Start by proving an easy bound
    • Then, try to tighten it
  • Is there a tighter bound?

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Build-Heap: Tighter Running Time Analysis

  • If the heap is complete binary tree then h=dh_{\ell} = d – \ell
  • Otherwise, nodes at a given level do not all have the same height, But we have d1hdd – \ell – 1 \leq h_{\ell} \leq d – \ell

alt:"alt" center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Build-Heap: Tighter Running Time Analysis

  • Assume that all nodes at level =d1\ell= d – 1 are processed

T(n)==0d1nO(h)=O(=0d1nh){n=2=# of nodes at level h=height of nodes at level T(n)=O(=0d12(d))Let h=d=dh change of variablesT(n)=O(h=1dh2dh)=O(h=1dh2d2h)=O(2dh=1dh(1/2)h) but 2d=Θ(n)O(nh=1dh(1/2)h)\begin{align*} T(n) &=\sum \limits_{\ell=0}^{d-1}n_{\ell}O(h_{\ell})=O(\sum \limits_{\ell=0}^{d-1}n_{\ell}h_{\ell}) \begin{cases} n_{\ell}=2^{\ell} = \# \text{ of nodes at level }\ell \\ h_{\ell}=\text{height of nodes at level } \ell \end{cases} \\ \therefore T(n) &= O \bigg( \sum \limits_{\ell=0}^{d-1}2^{\ell}(d-\ell) \bigg) \\ \text{Let } & h=d-\ell \Longrightarrow \ell = d-h \text{ change of variables} \\ T(n) &= O\bigg(\sum \limits_{h=1}^{d}h2^{d-h} \bigg)=O\bigg(\sum \limits_{h=1}^{d}h \frac{2^d}{2^h} \bigg) = O\bigg(2^d\sum \limits_{h=1}^{d}h (1/2)^h\bigg) \\ \text{ but } & 2^d = \Theta(n) \Longrightarrow O\bigg(n\sum \limits_{h=1}^{d}h (1/2)^h \bigg) \end{align*}

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Build-Heap: Tighter Running Time Analysis

h=1dh(1/2)hh=0dh(1/2)hh=0h(1/2)h\sum \limits_{h=1}^{d}h(1/2)^h \leq \sum \limits_{h=0}^{d}h(1/2)^h \leq \sum \limits_{h=0}^{\infty}h(1/2)^h

  • recall infinite decreasing geometric series

k=0xk=11x where x<1\sum \limits_{k=0}^{\infty} x^k = \frac{1}{1-x} \text{ where } |x|<1

  • differentiate both sides

k=0kxk1=1(1x)2\sum \limits_{k=0}^{\infty}kx^{k-1} = \frac{1}{(1-x)^2}

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Build-Heap: Tighter Running Time Analysis

k=0kxk1=1(1x)2\sum \limits_{k=0}^{\infty}kx^{k-1} = \frac{1}{(1-x)^2}

  • then, multiply both sides by xx

k=0kxk=x(1x)2\sum \limits_{k=0}^{\infty}kx^k = \frac{x}{(1-x)^2}

  • in our case: x=1/2x = 1/2 and k=hk = h

h=0h(1/2)h=1/2(1(1/2))2=2=O(1)T(n)=O(nh=1dh(1/2)h)=O(n)\therefore \sum \limits_{h=0}^{\infty}h(1/2)^h = \frac{1/2}{(1-(1/2))^2}=2=O(1) \\ \therefore T(n)=O(n\sum \limits_{h=1}^{d}h(1/2)^h)=O(n)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Steps

  • (1) Build a heap on array A[1n]A[1 \dots n] by calling BUILDHEAP(A,n)BUILD-HEAP(A, n)
  • (2) The largest element is stored at the root A[1]A[1]
    • Put it into its correct final position A[n]A[n] by A[1]A[n]A[1] \longleftrightarrow A[n]
  • (3) Discard node nn from the heap
  • (4) Subtrees (S2&S3)(S2 \& S3) rooted at children of root remain as heaps, but the new root element may violate the heap property.
    • Make A[1n1]A[1 \dots n-1] a heap by calling HEAPIFY(A,1,n1)HEAPIFY(A,1,n-1)
  • (5) nn1n \leftarrow n-1
  • (6) Repeat steps (2-4) until n=2n=2
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-1)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-2)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-3)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-4)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-5)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-6)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-7)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-8)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-9)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-10)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-11)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-12)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-13)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-14)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-15)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-16)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-17)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-18)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm Example (Step-19)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort Algorithm: Runtime Analysis

center

T(n)=Θ(n)+i=2nO(lgi)=Θ(n)+O(i=2nO(lgn))=O(nlgn)\begin{align*} T(n) &= \Theta(n)+\sum \limits_{i=2}^{n}O(lgi) \\ &= \Theta(n)+O\bigg( \sum \limits_{i=2}^{n}O(lgn) \bigg) \\ &= O(nlgn) \end{align*}

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heapsort - Notes

  • Heapsort is a very good algorithm but, a good implementation of quicksort always beats heapsort in practice
  • However, heap data structure has many popular applications, and it can be efficiently used for implementing priority queues
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Data structures for Dynamic Sets

  • Consider sets of records having key and satellite data

center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Operations on Dynamic Sets

  • Queries: Simply return info;
    • MAX(S)/MIN(S):MAX(S) / MIN(S): (Query) return xSx \in S with the largest/smallest keykey
    • SEARCH(S,k):SEARCH(S, k): (Query) return xSx \in S with key[x]=kkey[x]= k
    • SUCCESSOR(S,x)/PREDECESSOR(S,x):SUCCESSOR(S, x) / PREDECESSOR(S, x): (Query) return ySy \in S which is the next larger/smaller element after xx
  • Modifying operations: Change the set
    • INSERT(S,x):INSERT(S, x): (Modifying) SS{x}S \leftarrow S \cup \{x\}
    • DELETE(S,x):DELETE(S, x): (Modifying) SS{x}S \leftarrow S - \{x\}
    • EXTRACT-MAX(S)/EXTRACT-MIN(S):\text{EXTRACT-MAX}(S) / \text{EXTRACT-MIN}(S): (Modifying) return and delete xSx \in S with the largest/smallest keykey
  • Different data structures support/optimize different operations
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Priority Queues (PQ)

  • Supports
    • INSERTINSERT
    • MAX/MINMAX / MIN
    • EXTRACT-MAX/EXTRACT-MIN\text{EXTRACT-MAX} / \text{EXTRACT-MIN}
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Priority Queues (PQ)

  • One application: Schedule jobs on a shared resource
    • PQ keeps track of jobs and their relative priorities
    • When a job is finished or interrupted, highest priority job is selected from those pending using EXTRACT-MAX\text{EXTRACT-MAX}
    • A new job can be added at any time using INSERTINSERT
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Priority Queues (PQ)

  • Another application: Event-driven simulation
    • Events to be simulated are the items in the PQ
    • Each event is associated with a time of occurrence which serves as a keykey
    • Simulation of an event can cause other events to be simulated in the future
    • Use EXTRACT-MIN\text{EXTRACT-MIN} at each step to choose the next event to simulate
    • As new events are produced insert them into the PQ using INSERTINSERT
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Implementation of Priority Queue

  • Sorted linked list: Simplest implementation
    • INSERTINSERT
      • O(n)O(n) time
      • Scan the list to find place and splice in the new item
    • EXTRACT-MAX\text{EXTRACT-MAX}
      • O(1)O(1) time
      • Take the first element
    • Fast extraction but slow insertion.
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Implementation of Priority Queue

  • Unsorted linked list: Simplest implementation

    • INSERTINSERT
      • O(1)O(1) time
      • Put the new item at front
    • EXTRACT-MAX\text{EXTRACT-MAX}
      • O(n)O(n) time
      • Scan the whole list
    • Fast insertion but slow extraction.
  • Sorted linked list is better on the average

    • Sorted list: on the average, scans n/2n/2 element per insertion
    • Unsorted list: always scans nn element at each extraction
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Implementation of PQ

  • INSERTINSERT and EXTRACT-MAX\text{EXTRACT-MAX} are both O(lgn)O(lgn)
    • good compromise between fast insertion but slow extraction and vice versa
  • EXTRACT-MAX\text{EXTRACT-MAX}: already discussed HEAP-EXTRACT-MAX\text{HEAP-EXTRACT-MAX}
  • INSERTINSERT: Insertion is like that of Insertion-Sort.
HEAP-INSERT(A, key, n)
  n = n+1
  i=n  
  while i>1 and A[floor(i/2)] < key do
    A[i]=A[floor(i/2)] 
    i= floor(i/2)
  A[i]=key
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Implementation of PQ

  • Traverses O(lgn)O(lgn) nodes, as HEAPIFYHEAPIFY does but makes fewer comparisons and assignments
    • HEAPIFYHEAPIFY: compares parent with both children
    • HEAPINSERTHEAP-INSERT: with only one
RTEU CE100 Week-4
CE100 Algorithms and Programming II

HEAP-INSERT Example (Step-1)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

HEAP-INSERT Example (Step-2)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

HEAP-INSERT Example (Step-3)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

HEAP-INSERT Example (Step-4)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

HEAP-INSERT Example (Step-5)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Increase Key

  • Key value of ithi^{th} element of heap is increased from A[i]A[i] to keykey
HEAP-INCREASE-KEY(A, i, key)

  if key < A[i] then
    return error

  while i > 1 and A[floor(i/2)] < key do
    A[i] = A[floor(i/2)] 
    i = floor(i/2)

  A[i] = key
RTEU CE100 Week-4
CE100 Algorithms and Programming II

HEAP-INCREASE-KEY Example (Step-1)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

HEAP-INCREASE-KEY Example (Step-2)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

HEAP-INCREASE-KEY Example (Step-3)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

HEAP-INCREASE-KEY Example (Step-4)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

HEAP-INCREASE-KEY Example (Step-5)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Heap Implementation of Priority Queue (PQ)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Summary: Max Heap

  • Heapify(A, i)
    • Works when both child subtrees of node i are heaps
    • "Floats down" node i to satisfy the heap property
    • Runtime: O(lgn)O(lgn)
  • Max(A, n)
    • Returns the max element of the heap (no modification)
    • Runtime: O(1)O(1)
  • Extract-Max(A, n)
    • Returns and removes the max element of the heap
    • Fills the gap in A[1]A[1] with A[n]A[n], then calls Heapify(A,1)
    • Runtime: O(lgn)O(lgn)
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Summary: Max Heap

  • Build-Heap(A, n)
    • Given an arbitrary array, builds a heap from scratch
    • Runtime: O(n)O(n)
  • Min(A, n)
    • How to return the min element in a max-heap?
    • Worst case runtime: O(n)O(n)
      • because ~half of the heap elements are leaf nodes
    • Instead, use a min-heap for efficient min operations
  • Search(A, x)
    • For an arbitrary xx value, the worst-case runtime: O(n)O(n)
    • Use a sorted array instead for efficient search operations
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Summary: Max Heap

  • Increase-Key(A, i, x)
    • Increase the key of node ii (from A[i]A[i] to xx)
    • Float upxx until heap property is satisfied
    • Runtime: O(lgn)O(lgn)
  • Decrease-Key(A, i, x)
    • Decrease the key of node ii (from A[i]A[i] to xx)
    • Call Heapify(A, i)
    • Runtime: O(lgn)O(lgn)
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Phone Operator Problem

  • A phone operator answering nn phones

  • Each phone ii has xix_i people waiting in line for their calls to be answered.

  • Phone operator needs to answer the phone with the largest number of people waiting in line.

  • New calls come continuously, and
    some people hang up after waiting.

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Phone Operator Solution

  • Step 1: Define the following array:

  • A[i]A[i]: the ith element in heap

  • A[i].idA[i].id: the index of the corresponding phone

  • A[i].keyA[i].key: #\# of people waiting in line for phone with index A[i].idA[i].id

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Phone Operator Solution

  • Step 2: Build-Max-Heap(A,n)\text{Build-Max-Heap}(A, n)
    • Execution:
      • When the operator wants to answer a phone:
        • id=A[1].idid = A[1].id
          • Decrease-Key(A,1,A[1].key1)\text{Decrease-Key}(A, 1, A[1].key-1)
          • answer phone with index idid
        • When a new call comes in to phone i:
          • Increase-Key(A,i,A[i].key+1)\text{Increase-Key}(A, i, A[i].key+1)
        • When a call drops from phone i:
          • Decrease-Key(A,i,A[i].key1)\text{Decrease-Key}(A, i, A[i].key-1)
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Linked Lists

  • Like arrays, Linked List is a linear data structure.
  • Unlike arrays, linked list elements are not stored at a contiguous location; the elements are linked using pointers.
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Linked Lists - C Definition

  • C

    // A linked list node
    struct Node {
      int data;
      struct Node* next;
    };
    
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Linked Lists - Cpp Definition

  • Cpp

    class Node {
    public:
      int data;
      Node* next;
    };
    
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Linked Lists - Java Definition

  • Java

    class LinkedList {
      Node head; // head of the list
    
      /* Linked list Node*/
      class Node {
          int data;
          Node next;
    
          // Constructor to create a new node
          // Next is by default initialized
          // as null
          Node(int d) { data = d; }
      }
    }
    
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Linked Lists - Csharp Definition

  • Csharp

    class LinkedList {
      // The first node(head) of the linked list
      // Will be an object of type Node (null by default)
      Node head;
    
      class Node {
          int data;
          Node next;
    
          // Constructor to create a new node
          Node(int d) { data = d; }
      }
    }
    
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Priority Queue using Linked List Methods

  • Implement Priority Queue using Linked Lists.
    • push(): This function is used to insert a new data into the queue.
    • pop(): This function removes the element with the highest priority from the queue.
    • peek()/top(): This function is used to get the highest priority element in the queue without removing it from the queue.
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Priority Queue using Linked List Algorithm

PUSH(HEAD, DATA, PRIORITY)
  Create NEW.Data = DATA & NEW.Priority = PRIORITY
  If HEAD.priority < NEW.Priority 
    NEW -> NEXT = HEAD
    HEAD = NEW 
  Else
    Set TEMP to head of the list 
  Endif

  WHILE TEMP -> NEXT != NULL and TEMP -> NEXT ->PRIORITY > PRIORITY THEN
    TEMP = TEMP -> NEXT 
  ENDWHILE

  NEW -> NEXT = TEMP -> NEXT 
  TEMP -> NEXT = NEW 
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Priority Queue using Linked List Algorithm

POP(HEAD)
//Set the head of the list to the next node in the list.
HEAD = HEAD -> NEXT.
Free the node at the head of the list
PEEK(HEAD): 
Return HEAD -> DATA 
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Priority Queue using Linked List Notes

  • LinkedList is already sorted.
  • Time Complexities and Comparison with Binary Heap
peek() push() pop()
Linked List O(1)O(1) O(n)O(n) O(1)O(1)
Binary Heap O(1)O(1) O(lgn)O(lgn) O(lgn)O(lgn)
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Sorting in Linear Time

RTEU CE100 Week-4
CE100 Algorithms and Programming II

How Fast Can We Sort?

  • The algorithms we have seen so far:

    • Based on comparison of elements
    • We only care about the relative ordering between the elements (not the actual values)
    • The smallest worst-case runtime we have seen so far: O(nlgn)O(nlgn)
    • Is O(nlgn)O(nlgn) the best we can do?
  • Comparison sorts: Only use comparisons to determine the relative order of elements.

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Decision Trees for Comparison Sorts

  • Represent a sorting algorithm abstractly in terms of a decision tree

    • A binary tree that represents the comparisons between elements in the sorting algorithm
    • Control, data movement, and other aspects are ignored
  • One decision tree corresponds to one sorting algorithm and one value of nn (input size)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Reminder: Insertion Sort Step-By-Step Description (1)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Reminder: Insertion Sort Step-By-Step Description (2)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Reminder: Insertion Sort Step-By-Step Description (3)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Different Outcomes for Insertion Sort and n=3

  • Input :
    <a1,a2,a3><a_1,a_2,a_3>
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Decision Tree for Insertion Sort and n=3

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Decision Tree Model for Comparison Sorts

  • Internal node (i:j)(i:j): Comparison between elements aia_i and aja_j

  • Leaf node: An output of the sorting algorithm

  • Path from root to a leaf: The execution of the sorting algorithm for a given input

  • All possible executions are captured by the decision tree

  • All possible outcomes (permutations) are in the leaf nodes

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Decision Tree for Insertion Sort and n=3

  • Input:
    <9,4,6><9, 4, 6>
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Decision Tree Model

  • A decision tree can model the execution of any comparison sort:

    • One tree for each input size nn
    • View the algorithm as splitting whenever it compares two elements
    • The tree contains the comparisons along all possible instruction traces
  • The running time of the algorithm == the length of the path taken

  • Worst case running time == height of the tree

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Counting Sort

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Lower Bound for Comparison Sorts

  • Let nn be the number of elements in the input array.

  • What is the minmin number of leaves in the decision tree?

    • n!n! (because there are n! permutations of the input array, and all possible outputs must be captured in the leaves)
  • What is the max number of leaves in a binary tree of height hh? \Longrightarrow 2h2^h

  • So, we must have:

    2hn!2^h \geq n!

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Lower Bound for Decision Tree Sorting

  • Theorem: Any comparison sort algorithm requires
    Ω(nlgn)\Omega(nlgn) comparisons in the worst case.
  • Proof: We’ll prove that any decision tree corresponding to a comparison sort algorithm must have height Ω(nlgn)\Omega(nlgn)

2hn!hlg(n!)lg((n/e)n)(StirlingApproximation)=nlgnnlge=Ω(nlgn)\begin{align*} 2^h & \geq n! \\ h & \geq lg(n!) \\ & \geq lg((n/e)^n) (Stirling Approximation) \\ & = nlgn - nlge \\ & = \Omega(nlgn) \end{align*}

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Lower Bound for Decision Tree Sorting

Corollary: Heapsort and merge sort are asymptotically optimal comparison sorts.

Proof: The O(nlgn)O(nlgn) upper bounds on the runtimes for heapsort and merge sort match the Ω(nlgn)\Omega(nlgn) worst-case lower bound from the previous theorem.

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Sorting in Linear Time

  • Counting sort: No comparisons between elements

    • Input: A[1n]A[1 \dots n], where A[j]{1,2,,k}A[j] \in \{1, 2,\dots, k\}
    • Output: B[1n]B[1 \dots n], sorted
    • Auxiliary storage: C[1k]C[1 \dots k]
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Counting Sort-1

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Counting Sort-2

  • Step 1: Initialize all counts to 0
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Counting Sort-3

  • Step 2: Count the number of occurrences
    of each value in the input array
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Counting Sort-4

  • Step 3: Compute the number of elements
    less than or equal to each value
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Counting Sort-5

  • Step 4: Populate the output array
    • There are C[3]=3C[3] = 3 elements that are 3\leq 3
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Counting Sort-6

  • Step 4: Populate the output array
    • There are C[4]=5C[4]=5 elements that are 4\leq 4
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Counting Sort-7

  • Step 4: Populate the output array
    • There are C[3]=2C[3]=2 elements that are 3\leq 3
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Counting Sort-8

  • Step 4: Populate the output array
    • There are C[1]=1C[1]=1 elements that are 1\leq 1
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Counting Sort-9

  • Step 4: Populate the output array
    • There are C[4]=4C[4]=4 elements that are 4\leq 4
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Counting Sort: Runtime Analysis

  • Total Runtime: Θ(n+k)\Theta(n+k)
    • nn : size of the input array
    • kk : the range of input values
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Counting Sort: Runtime

  • Runtime is Θ(n+k)\Theta(n+k)
    • If k=O(n)k=O(n), then counting sort takes Θ(n)\Theta(n)
  • Question: We proved a lower bound of Θ(nlgn)\Theta(nlgn) before! Where is the fallacy?
  • Answer:
    • Θ(nlgn)\Theta(nlgn) lower bound is for comparison-based sorting
    • Counting sort is not a comparison sort
    • In fact, not a single comparison between elements occurs!
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Stable Sorting

  • Counting sort is a stable sort: It preserves the input order among equal elements.
    • i.e. The numbers with the same value appear in the output array in the same order as they do in the input array.
  • Note: Which other sorting algorithms have this property?

center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Radix Sort

  • Origin: Herman Hollerith’s card-sorting machine for the 1890 US Census.

  • Basic idea: Digit-by-digit sorting

  • Two variations:

    • Sort from MSD to LSD (bad idea)
    • Sort from LSD to MSD (good idea)

(LSD/MSD: Least/most significant digit)

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Herman Hollerith (1860-1929)

  • The 1880 U.S. Census took almost 10 years to process.
  • While a lecturer at MIT, Hollerith prototyped punched-card technology.
  • His machines, including a card sorter, allowed the 1890 census total to be reported in 6 weeks.
  • He founded the Tabulating Machine Company in 1911, which merged with other companies in 1924 to form International Business Machines(IBM).

center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Hollerith Punched Card

  • Punched card: A piece of stiff paper that contains digital information represented by the presence or absence of holes.
    • 12 rows and 24 columns
    • coded for age, state of residency, gender, etc.

center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Modern IBM card

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Hollerith Tabulating Machine and Sorter

  • Mechanically sorts the cards based on the hole locations.
  • Sorting performed for one column at a time
  • Human operator needed to load/retrieve/move cards at each stage

center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Hollerith’s MSD-First Radix Sort

  • Sort starting from the most significant digit (MSD)
  • Then, sort each of the resulting bins recursively
  • At the end, combine the decks in order

center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Hollerith’s MSD-First Radix Sort

  • To sort a subset of cards recursively:

    • All the other cards need to be removed from the machine, because the machine can handle only one sorting problem at a time.
    • The human operator needs to keep track of the intermediate card piles

    center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Hollerith’s MSD-First Radix Sort

  • MSD-first sorting may require:
    • very large number of sorting passes
    • very large number of intermediate card piles to maintain
  • S(d):
    • #\# of passes needed to sort d-digit numbers (worst-case)
  • Recurrence:
    • S(d)=10S(d1)+1S(d)=10S(d-1)+1 with S(1)=1S(1)=1
      • Reminder: Recursive call made to each subset with the same most significant digit(MSD)
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Hollerith’s MSD-First Radix Sort

  • Recurrence: S(d)=10S(d1)+1S(d)=10S(d-1)+1

S(d)=10S(d1)+1=10(10S(d2)+1)+1=10(10(10S(d3)+1)+1)+1=10iS(di)+10i1+10i2++101+100=i=0d110i\begin{align*} S(d) &= 10 S(d-1) + 1 \\ & = 10 \bigg(10 S(d-2) + 1 \bigg) + 1 \\ & = 10 \Big(10 \bigg(10 S(d-3) + 1\bigg) + 1 \Big) + 1 \\ & = 10i S(d-i) + 10i-1 + 10i-2 + \dots + 101 + 100 \\ &=\sum \limits_{i=0}^{d-1}10^i \end{align*}

  • Iteration terminates when i=d1i = d-1 with S(d(d1))=S(1)=1S(d-(d-1)) = S(1) = 1
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Hollerith’s MSD-First Radix Sort

  • Recurrence: S(d)=10S(d1)+1S(d)=10S(d-1)+1

S(d)=i=0d110i=10d1101=19(10d1)S(d)=19(10d1)\begin{align*} S(d) &=\sum \limits_{i=0}^{d-1}10^i \\ & = \frac{10^d-1}{10-1} \\ & = \frac{1}{9}(10^d-1)\\ & \Downarrow \\ S(d)&=\frac{1}{9}(10^d-1) \end{align*}

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Hollerith’s MSD-First Radix Sort

  • P(d)P(d): #\# of intermediate card piles maintained (worst-case)
  • Reminder: Each routing pass generates 9 intermediate piles except the sorting passes on least significant digits (LSDs)
    • There are 10d110^{d-1} sorting calls to LSDs

P(d)=9(S(d)10d1)=9(10d1)910d1=(10d1910d1)=10d11\begin{align*} P(d) &= 9(S(d)–10^{d-1}) \\ &= 9\frac{(10^{d–1})}{9– 10^{d-1}} \\ &= (10^{d–1}–9 * 10^{d-1}) \\ &= 10^{d-1} - 1 \end{align*}

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Hollerith’s MSD-First Radix Sort

P(d)=10d11\begin{align*} P(d) &= 10^{d-1} - 1 \end{align*}

Alternative solution: Solve the recurrence

P(d)=10P(d1)+9P(1)=0\begin{align*} P(d) &= 10P(d-1)+9 \\ P(1) &= 0 \\ \end{align*}

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Hollerith’s MSD-First Radix Sort

  • Example: To sort 33 digit numbers, in the worst case:

    • S(d)=(1/9)(1031)=111S(d) = (1/9) (103-1) = 111 sorting passes needed
    • P(d)=10d11=99P(d) = 10d-1-1 = 99 intermediate card piles generated
  • MSD-first approach has more recursive calls and intermediate storage requirement

    • Expensive for a tabulating machine to sort punched cards
    • Overhead of recursive calls in a modern computer
RTEU CE100 Week-4
CE100 Algorithms and Programming II

LSD-First Radix Sort

  • Least significant digit (LSD)-first radix sort seems to be a folk invention originated by machine operators.

  • It is the counter-intuitive, but the better algorithm.

  • Basic Algorithm:

    Sort numbers on their LSD first   (Stable Sorting Needed)
    Combine the cards into a single deck in order 
    Continue this sorting process for the other digits
      from the LSD to MSD
    
  • Requires only dd sorting passes

  • No intermediate card pile generated

RTEU CE100 Week-4
CE100 Algorithms and Programming II

LSD-first Radix Sort Example

center

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Correctness of Radix Sort (LSD-first)

  • Proof by induction:
    • Base case: d=1d=1 is correct (trivial)
    • Inductive hyp: Assume the first d1d-1 digits are sorted correctly
  • Prove that all dd digits are sorted correctly after sorting digit dd
  • Two numbers that differ in digit dd are correctly sorted (e.g. 355 and 657)
  • Two numbers equal in digit d are put in the same order as the input
    • (correct order)
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Radix Sort Runtime

  • Use counting-sort to sort each digit
  • Reminder: Counting sort complexity: Θ(n+k)\Theta(n+k)
    • nn: size of input array
    • kk: the range of the values
  • Radix sort runtime: Θ(d(n+k))\Theta(d(n+k))
    • dd: #\# of digits

How to choose the dd and kk?

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Radix Sort: Runtime – Example 1

  • We have flexibility in choosing dd and kk

  • Assume we are trying to sort 32-bit words

    • We can define each digit to be 4 bits
    • Then, the range for each digit k=24=16k=2^4=16
      • So, counting sort will take Θ(n+16)\Theta(n+16)
    • The number of digits d=32/4=8d =32/4=8
    • Radix sort runtime: Θ(8(n+16))=Θ(n)\Theta(8(n+16)) = \Theta(n)
  • [4bits4bits4bits4bits4bits4bits4bits4bits]32-bits\overbrace{[4bits|4bits|4bits|4bits|4bits|4bits|4bits|4bits]}^{\text{32-bits}}

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Radix Sort: Runtime – Example 2

  • We have flexibility in choosing dd and kk

  • Assume we are trying to sort 32-bit words

    • Or, we can define each digit to be 8 bits
    • Then, the range for each digit k=28=256k = 2^8 = 256
      • So, counting sort will take Θ(n+256)\Theta(n+256)
    • The number of digits d=32/8=4d = 32/8 = 4
    • Radix sort runtime: Θ(4(n+256))=Θ(n)\Theta(4(n+256)) = \Theta(n)
  • [8bits8bits8bits8bits]32-bits\overbrace{[8bits|8bits|8bits|8bits]}^{\text{32-bits}}

RTEU CE100 Week-4
CE100 Algorithms and Programming II

Radix Sort: Runtime

  • Assume we are trying to sort bb-bit words

    • Define each digit to be rr bits

    • Then, the range for each digit k=2rk = 2^r

      • So, counting sort will take Θ(n+2r)\Theta(n+2^r)
    • The number of digits d=b/rd = b/r

      • Radix sort runtime:

T(n,b)=Θ(br(n+2r))\begin{align*} T(n,b)&=\Theta \bigg( \frac{b}{r}(n+2^r) \bigg) \end{align*}

  • [rbitsrbitsrbitsrbits]b/r bits\overbrace{[rbits|rbits|rbits|rbits]}^{b/r \text{ bits}}
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Radix Sort: Runtime Analysis

T(n,b)=Θ(br(n+2r))\begin{align*} T(n,b)&=\Theta \bigg( \frac{b}{r}(n+2^r) \bigg) \end{align*}

  • Minimize T(n,b)T(n,b) by differentiating and setting to 00
  • Or, intuitively:
    • We want to balance the terms (b/r)(b/r) and (n+2r)(n + 2^r)
    • Choose rlgnr \approx lgn
      • If we choose r<<lgn(n+2r)r << lgn \Longrightarrow (n + 2^r) term doesn’t improve
      • If we choose r>>lgn(n+2r)r >> lgn \Longrightarrow (n + 2^r) increases exponentially
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Radix Sort: Runtime Analysis

T(n,b)=Θ(br(n+2r))\begin{align*} T(n,b)&=\Theta \bigg( \frac{b}{r}(n+2^r) \bigg) \end{align*}

Choose r=lgnT(n,b)=Θ(bn/lgn)\begin{align*} \text{Choose } r=lgn \Longrightarrow T(n,b)=\Theta(bn/lgn) \end{align*}

  • For numbers in the range from 00 to nd1n^d – 1, we have:
    • The number of bits b=lg(nd)=dlgnb = lg(nd ) = d lgn
      • Radix sort runs in Θ(dn)\Theta(dn)
RTEU CE100 Week-4
CE100 Algorithms and Programming II

Radix Sort: Conclusions

Choose r=lgnT(n,b)=Θ(bn/lgn)\begin{align*} \text{Choose } r=lgn \Longrightarrow T(n,b)=\Theta(bn/lgn) \end{align*}

  • Example: Compare radix sort with merge sort/heapsort
    • 11 million (2202^{20}), 3232-bit numbers (n=220,b=32)(n = 2^{20}, b = 32)
      • Radix sort: 32/20=2\lfloor 32/20 \rfloor = 2 passes
      • Merge sort/heap sort: lgn=20lgn = 20 passes
  • Downsides:
    • Radix sort has little locality of reference (more cache misses)
    • The version that uses counting sort is not in-place
  • On modern processors, a well-tuned quicksort implementation typically runs faster.
RTEU CE100 Week-4
CE100 Algorithms and Programming II

References

RTEU CE100 Week-4
CE100 Algorithms and Programming II

EndOfWeek4CourseModule-End-Of-Week-4-Course-Module-

RTEU CE100 Week-4

![alt:"alt" height:450px center](assets/ce100-week-4-heap-heap_heap_leaves.drawio.svg)