Sortacle

    1 What is This Assignment’s Purpose?

    2 Theme Song

    3 A Simple Example

    4 Late Days

    5 Testing Plan

    6 The Oracle

    7 Input-Output Specification

    8 Testing the Tester

    9 Determinism

    10 Testing

    11 Builtins

    12 Starter

1 What is This Assignment’s Purpose?

Being confident that your software is correct involves more than just writing code you think is right. That’s why we write tests. As computer scientists, we should therefore think about taking this a step further: to automate testing. How can we do that? We’ll see in this assignment.

There’s another motivation for what we’re going to do, which illustrates a weakness in the nature of testing as you’ve done it until now. When a test fails, the error is either in the implementation of the function (most likely) or in the test itself (usually less likely, but certainly possible). However, there is a third, more subtle reason: both the implementation and test are correct, but the implementation returned a different result than the test expected. One likely cause is that the problem statement allows a given input to produce multiple different answers.

Both these motivations point in the same direction, which is to write programs that can check programs. Specifically, we will study a form of it called property-based testing. This technique is seeing increasing use in industry, especially to test systems that are complex or may have non-deterministic behavior (e.g., distributed systems, cryptographic systems, etc.).

2 Theme Song

I Knew You Were Trouble by Taylor Swift

3 A Simple Example

Consider the square-root function. Given an input of 4, you might write that you expect it to produce 2. Mathematically, however, it’s equally reasonable for the implementation (if it produces only one value) to produce -2. Neither your expectation nor the implementation is wrong; you’re just mis-matched. So what is a better way to test the range of possible square-root functions? It would be to square the output and check that it does, in fact, produce the same value as what it was given. Keep this example in mind as you read on.

4 Late Days

This is the last full assignment before the end of shopping period. We are committed to giving those who are choosing between courses all the possible information so they may make the most informed decision. However, this means that we need enough time to grade.

As such, you are not permitted to use any late days on this assignment.

5 Testing Plan

The task in this assignment is likely novel to you: you’re being asked to write a program not to solve a problem but to test solutions to a problem. It may also seem a bit complex. For that reason, it’s wise to step away from low-level coding and think about what you’re doing at a conceptual level so that you don’t get lost in details. If you start directly with code, you can easily overlook some of the things you should have done or forget some of the things you wanted to do.

We are therefore asking you to first write a testing plan. To force you to do things that are good for you (!), we’re asking you to submit your testing plan first. To avoid micro-managing your schedule we aren’t imposing a formal deadline, but we’d very much like to see you write and upload your plan during the first two days the assignment is out.

Let’s now discuss testing plans in more detail.

A testing plan is a conceptual list of situations you will want to test, as opposed to concrete test instances. Because you eventually need to convert your testing plan into tests, it cannot be too abstract: “Check whether it solves the problem” is indeed a testing plan, but it’s not a useful, actionable one. Instead, think of it this way: suppose you were asked to grade a solution while giving partial credit. One useful way to proceed is to decompose the problem into a set of sub-problems, so that the combination of these solves the problem as a whole. Then, when grading, you could give partial credit for each sub-problem that was correctly solved.

Consider, for instance, the game Tic-tac-toe. Suppose you are asked to test a system that simulates games. Here are some “partial credit” conditions you might check for:
  • A cell does not have both X’s and O’s.

  • Once a cell goes from blank to X or O, it does not change.

  • The total number of X’s and O’s is at most off by one.

  • Players alternate moves.

  • The game stops when all cells have been filled.

  • A player wins when they have three marks in a line (and the other player has not already won).

and so on.

Unlike your previous program planning, we aren’t even going to ask you to use any blocks. Just use plain prose. Writing a testing plan before you start to write code has two benefits (just like with program planning):
  • Once you start writing code, you are more likely to commit to whatever you have written, even if it’s a poor strategy. Whereas your testing plan is going to have to rewritten as code anyway, giving you a chance to rethink your decisions.

  • Code is full of low-level details that distract from the overarching task, but you can’t lose track of those details, either. This focuses your cognitive load in the wrong place. The testing plan lets you think about how to test without those low-level details, making you less likely to miss out on important or subtle properties.

6 The Oracle

This assignment is not sponsored by any particular large technology company.

Suppose we have a list of people giving their names and ages, where a Person is as follows:

data Person:

  | person(name :: String, age :: Number)

end

Given a list of these people, suppose we want to sort them by increasing age. This ordering expressly does not say what should happen to people whose ages are the same; therefore, every permutation of their names is valid. As a result, concrete tests (i.e., ones where you write a specific output corresponding to an input) may fail for no good reason.

The problem lies in the fact that we wrote concrete tests at all. Instead, we should have written a function that checks whether the output has the right relationship with the input; then, all the valid outputs for a given input would pass, but no others. This checking function is sometimes called an oracle.

That is what we will do in this assignment: write a testing oracle for a function that sorts lists of Persons. Your oracle will be given purported implementations of such a sorting function, and it must (try to) determine whether the given function is indeed a valid sorter or not. In addition, you will write a generator of inputs. Combining the two, you arrive at a system that automates the testing of solutions to this problem.

Your testing plan is designed to help you plan out your oracle. That is, your plan determines what it means for a Person-sorter to be correct (just like the conditions above approximated what it means for a tic-tac-toe board to be valid). You can then translate your plan into the implementation. You are not writing a plan to test your oracle (i.e., it is not a tester-tester).

7 Input-Output Specification

Your assignment has four tasks:

  1. Write and upload your testing plan, early.

  2. Write a function named generate-input that generates a list of random Persons. It should take a non-negative integer length, and return a list of that many randomly constructed people. You will probably find it helpful to use num-random for ages. Names should also be randomly generated. Choosing from a pool of names is not sufficiently random; instead, you might consider using string-from-code-points.

    generate-input :: Number -> List<Person>

    Keep in mind that you would not be able to directly test generate-input, as it returns random lists. However, you could still write tests to check whether the value it produces has reasonable properties.

  3. Write a function that consumes two lists of Persons and determines whether the second input is a sorted version of the first.

    is-valid :: List<Person>, List<Person> -> Boolean

  4. Using is-valid and generate-input, write a function that consumes a function and tests whether the given function is a valid sorter. Note that you must use generate-input (possibly many times), but should also explicitly include any edge cases that you think are important.

    oracle :: (List<Person> -> List<Person>) -> Boolean

    Note that the argument to oracle is a function that consumes a list of Person and produces a list of Person (that’s what the notation (List<Person> -> List<Person>) means).

To do well on this assignment, you will want to spend time considering different types of incorrect output an algorithm might produce, and make sure your oracle covers all those cases. Of course, we may have thought up cases you didn’t, so your oracle needs to be general, not specific to those cases.However, we will not use especially perverse non-sorters—e.g., one that produces the wrong answer only when the input has 19 elements. In short, be thorough! That’s the name of the game when testing.

8 Testing the Tester

Observe that the test cases for your oracle are programs. That is, to test your oracle, you need good and bad implementations of sorting. If you’d like a correct sorter, here’s one:

fun correct-sorter(people :: List<Person>) -> List<Person>:

  doc: ```Consumes a list of people and produces a list of people

       that are sorted by age in ascending order.```

 

  lists.sort-by(people,

    lam(p1, p2): p1.age < p2.age end,

    lam(p1, p2): p1.age == p2.age end)

where:

  cjordan3 = person("Connor", 18)

  cli135   = person("Danny", 65)

  kreyes7  = person("Kyle", 32)

 

  correct-sorter(empty) is empty

  correct-sorter([list: cli135]) is [list: cli135]

  correct-sorter([list: cli135, cjordan3]) is [list: cjordan3, cli135]

  correct-sorter([list: cjordan3, cli135]) is [list: cjordan3, cli135]

  correct-sorter([list: cjordan3, cli135, kreyes7])

    is [list: cjordan3, kreyes7, cli135]

end

You can only use this function in test cases. Feel free to perturb it or its output to create incorrect implementations, though you should also think of other interesting ways of producing incorrect “sorting” algorithms.

One natural question you might have is how many tests you have to write of your oracle. It’s fine to have only one “positive” test, namely using sort or sort-by. You should have several “negative” tests (namely, functions that do not sort correctly).

You do not need to follow the recipe (write tests, etc.) of these functions—namely, the ones you’re using to test your oracle.

9 Determinism

In previous assignments, we wrote programs that are deterministic. That is, on a given input, they always produce the same output.

The testing oracle you will be implementing is non-deterministic: that is, because of randomness, your oracle may not always return the same output on a fixed input. Thus, the results of your tests could change from run to run. For example, when running your oracle on a wrong implementation, your oracle could catch the bug one time it is run, and miss it the next time.

Almost always, we prefer deterministic testing. It provides replicable results that are easier to debug. However, there are cases where deterministic testing is undesirable or impossible, such as in this assignment.

When dealing with non-deterministic functions, you will always have some amount of unpredictability. This unpredictability cannot be eliminated, but there are ways you can try to minimize it. Think of how you could decrease the chance your oracle reports incorrect implementations as correct.

10 Testing

There is no Examplar for this assignment. (If you’re curious why, think about what a wheat must be able to do.) You will still be required to turn in a final test suite.

11 Builtins

For this assignment, you should not use any libraries that are not explicitly permitted. You are permitted to use the number and string libraries, and from the list library, only map, filter, the two folds, sort, and sort-by. Note that you should use sort and sort-by only for testing (the parts of) your oracle, not in its implementation! You should be able to implement everything else you need on your own. (If you think we’re mistaken, please ask on EdStem.) However, you are allowed to use other functions in your check blocks.

Please also remember to follow our coding guidelines.

12 Starter

Template

There are no wheats and chaffs for this assignment.