Sunday, November 28, 2021

Test Driven Development

I've been learning rust programming language. The addition of the ability to write tests in-built is quite a welcome change to most programming languages. I found that I spent a lot of time debugging corner cases (bounds) in my algorithms, so I decided to do a test driven approach (not a chaos/random approach), but more targetted towards my weakness with boundary conditions. My favourite small/short/complex algorithm is quicksort.

Here is the code I wrote

use std::cmp::PartialOrd;

/// Quicksort
/// Simple sort algorithm that is type aware and is unique in that it's test driven
///
fn partition<T: PartialOrd>(elements: &mut [T], low: usize, high: usize) -> usize {
// Explictly set the pivot to the lowest element
let mut i : usize;
let mut j : usize;
if high <= low {
return low;
}

i = low; j = high -1; // Bounds 1, check
while i < j {
while elements[i] <= elements[low] && i < (high-1) { i = i + 1; } // Bounds 2
while elements[j] >= elements[low] && j > low { j = j - 1; } // Bounds 3
if i >= j {break; } // Bounds 4
elements.swap(i, j); // Check 5
}
elements.swap(low, j); // Check 6
j
}

fn __quicksort<T: PartialOrd + std::fmt::Debug>(elements: &mut [T], low: usize, high: usize)
{
let p = partition(elements, low, high);
if p > low {
__quicksort(elements, low, p);
}
if p < high {
__quicksort(elements, p+1, high);
}
}

fn quicksort<T: PartialOrd + std::fmt::Debug>(elements: &mut [T])
{
let len = elements.len();

__quicksort(elements, 0, len);
}

#[test]
fn partition_bounds_test_single()
{
let mut elements = vec![1];
assert_eq!(0, partition(&mut elements, 0, 1));
}

#[test]
fn partition_bounds_test_mostly_sorted()
{
let mut elements = vec![10, 2, 4, 5, 6];
let expected = vec![6, 2, 4, 5, 10];
let len = elements.len();

assert_eq!(4, partition(&mut elements, 0, len));
assert_eq!(expected, elements);

let mut elements = vec![2, 4, 5, 6, 10];
let expected = vec![2, 4, 5, 6, 10];
let len = elements.len();

assert_eq!(0, partition(&mut elements, 0, len));
assert_eq!(expected, elements);
}

#[test]
fn partition_bounds_test_2_elements_sorted()
{
let mut elements = vec![15, 10];
let expected = vec![10, 15];
let len = elements.len();

assert_eq!(1, partition(&mut elements, 0, len));
assert_eq!(expected, elements);

let mut elements = vec![10, 15];
let expected = vec![10, 15];
let len = elements.len();

assert_eq!(0, partition(&mut elements, 0, len));
assert_eq!(expected, elements);

let mut elements = vec![15, 5, 10];
let expected = vec![10, 5, 15];
let len = elements.len();

assert_eq!(2, partition(&mut elements, 0, len));
assert_eq!(expected, elements);

}

#[test]
fn quicksort_test()
{
let mut vec = vec![1, 5, 10, 2, 15];
let mut vec2 = vec.clone();

quicksort(&mut vec);
vec2.sort();
assert_eq!(vec![1, 2, 5, 10, 15], vec);
assert_eq!(vec2, vec);
let mut vec = vec![1.1, 1.15, 5.5, 1.123, 2.0];
quicksort(&mut vec);
assert_eq!(vec, vec![1.1, 1.123, 1.15, 2.0, 5.5]);

let mut vec = vec![1, 5, 10, 15, 2];
let mut vec2 = vec.clone();

quicksort(&mut vec);
vec2.sort();
assert_eq!(vec2, vec);

let mut vec = vec![15, 5, 10];
let mut vec2 = vec.clone();

quicksort(&mut vec);
vec2.sort();
assert_eq!(vec2, vec);


let mut vec = vec![1, 2, 3, 4, 5];
let mut vec2 = vec.clone();

quicksort(&mut vec);
vec2.sort();
assert_eq!(vec2, vec);

let mut vec = vec![5, 4, 3, 2, 1];
let mut vec2 = vec.clone();

quicksort(&mut vec);
vec2.sort();
assert_eq!(vec2, vec);

}

Notice, the identification of the various conditions as bounds with a number in the partition algorithm. Looking at the algorithm, the following caught my attention
  1. The use of the parameter high in the algorithm is unique, high is always one greater than the last element, so should the contract abstract it and expect a pointer to the last element? Notice the use of high-1 more than once in the code.
  2. Rust itself has some quirks associated with the Borrow Checker, after taking a mutable reference to the vector, one cannot use elements.len(), since that internally grabs a shared reference (the compiler tells you).
  3. The while loops use elements[i] <= elements[low], even though elements[low] is constant, using it before the loop causes a move of the element to the temporary variable used to cache it and that causes the Borrow Checker to complain, it might be possible to do some interesting things by having the template type T have a constraint that requires it to implement the Copy trait.
  4. Typically quicksort partitions the elements at p and do a quicksort with 0, p-1 and p+1 to high, but since the bounds is [low, high), we need to ensure we pass p as oposed to p-1
To deal with the complexity of the algorithm (points 1 through 4 above), I decided to use a test driven
approach and tried to cover as many cases as I could think off and extracted coverage.
(you need the nightly compiler for it).

Here is what the output looks like

    1|      1|use std::cmp::PartialOrd;use std::cmp::PartialOrd;
    2|       |
    3|       |/// Quicksort
    4|       |/// Simple sort algorithm that is type aware and is unique in that it's test driven
    5|       |/// 
    6|     49|fn partition<T: PartialOrd>(elements: &mut [T], low: usize, high: usize) -> usize {
    7|     49|    // Explictly set the pivot to the lowest element
    8|     49|    let mut i : usize;
    9|     49|    let mut j : usize;
   10|     49|    
   11|     49|    if high <= low {
   12|     15|        return low;
   13|     34|    }
   14|     34|
   15|     34|    i = low; j = high -1; // Bounds 1, check
   16|       |    
   17|     37|    while i < j {
   18|     63|        while elements[i] <= elements[low] && i < (high-1) { i = i + 1; } // Bounds 2
                                                            ^45          ^36
   19|     65|        while elements[j] >= elements[low] && j > low { j = j - 1; } // Bounds 3
                                                            ^50     ^38
   20|       |        
   21|     27|        if i >= j {break; } // Bounds 4
                                 ^24
   22|      3|        
   23|      3|        elements.swap(i, j); // Check 5
   24|       |    }
   25|     34|    elements.swap(low, j); // Check 6
   26|     34|    j
   27|     49|}
   28|       |
   29|     43|fn __quicksort<T: PartialOrd + std::fmt::Debug>(elements: &mut [T], low: usize, high: usize)
   30|     43|{
   31|     43|    let p = partition(elements, low, high);
   32|     43|    // p - 1 should not be negative
   33|     43|    if p > low {
   34|      9|        __quicksort(elements, low, p);
   35|     34|    }
   36|     43|    if p < high {
   37|     28|        __quicksort(elements, p+1, high);
   38|     28|    }
                   ^15
   39|     43|}

   41|      6|fn quicksort<T: PartialOrd + std::fmt::Debug>(elements: &mut [T])
   42|      6|{
   43|      6|    let len = elements.len();
   44|      6|
   45|      6|    __quicksort(elements, 0, len);
   46|      6|}

Not bad, but the branch coverage needs improvement. This variant of quicksort is prone to several errors, although it might seem faster at first look. Interestingly,  the algorithms book by Jeff Erickson very correctly states

Hoare proposed a more complicated “two-way” partitioning algorithm that has some practical advantages over Lomuto’s algorithm. On the other hand, Hoare’s partitioning algorithm is one of the places off-by-one errors go to die

I do find the Lomuto's algorithm easier to parse and implement. What do you think? Is test-driven iterative development the right way forward for even things like data structures? What's the best way to compute the total number of branches and cover them? Should we search for easier alternatives that perform well and are easier to test? 


Update: I made some slice based improvements to the code, with the Lomuto algorithm as well

/// Quicksort
/// Simple sort algorithm that is type aware and is unique in that it's test driven
///
fn partition<T: PartialOrd>(elements: &mut [T]) -> usize {
// Explictly set the pivot to the lowest element
let mut i : usize;
let mut j : usize;

if elements.is_empty() { // Bounds 1
return 0;
}

let high = elements.len();

i = 0;
j = high - 1;

while i < j { // Bounds 2
while elements[i] <= elements[0] && i < (high-1) { i += 1; } // Bounds 3
while elements[j] >= elements[0] && j > 0 { j -= 1; } // Bounds 4

if i >= j {break; } // Bounds 5

elements.swap(i, j); // Check 6
}
elements.swap(0, j); // Check 7
j
}

fn quicksort<T: PartialOrd + std::fmt::Debug>(elements: &mut [T])
{
let low = 0;
let high = elements.len();

if elements.is_empty() {
return;
}

let p = partition(&mut elements[low..high]);
// p - 1 should not be negative
quicksort(&mut elements[low..p]);
quicksort(&mut elements[p+1..high]);
}

///
/// Use the algorithm that has fewer boundary conditions to worry about
fn lomuto_partition<T: std::cmp::PartialOrd>(elements: &mut [T]) -> usize
{
let mut i: usize = 0;
let size: usize = elements.len();
let mut j : usize = 0;

if elements.is_empty() { // bounds 1
return 0;
}

while i < size-1 { // bounds 2
i += 1;
if elements[i] <= elements[0] { // bounds 3
j += 1;
elements.swap(i, j); // bounds 4
}
}

elements.swap(0, j); // bounds 5
j
}

fn quicksort<T: PartialOrd + std::fmt::Debug>(elements: &mut [T])
{
let low = 0;
let high = elements.len();

if elements.is_empty() {
return;
}

let p = lomuto_partition(&mut elements[low..high]);
// p - 1 should not be negative
quicksort(&mut elements[low..p]);
quicksort(&mut elements[p+1..high]);
}


Ranking and Unranking permutations

I've been a big fan of Skiena's Algorithm Design Manual , I recently found my first edition of the book (although I own the third ed...