MST
1 What is This Assignment’s Purpose?
This assignment has two purposes.
First, you will implement two solutions to the Minimum Spanning Tree problem. This will give you practice with implementing graph algorithms.
Second, you will learn another advanced program testing technique. In property-based testing, we wrote astract properties. Here, we will use a second implementation to test against. This is especially useful when, for instance, a problem has one solution that is simple (and hence easy to check for correctness) but slow, and another that is sophisticated but fast. Then, the simple solution can be used as the expected behavior against which the sophisticated one is tested.
2 Theme Song
Spanning Tree Song by Radia Perlman, Dawn Perlner, and Ray Perlner
NB: Radia Perlman is one of the “parents of the Internet” and is in the Internet Hall of Fame. You’re using her ideas and work to read this page and to listen to that song.
3 The Testing Problem
In general, given a graph, an MST solution must produce a tree, that spans, and is minimal. There can be many solutions that are all minimal. Therefore, we should recognize this situation: it makes it difficult to write concrete test cases!
We can still check the above properties for a small number of concrete cases. In addition, checking that a purported solution is a tree and spans is not too hard. Therefore, we can in principle write an oracle. However, the property of minimality makes the combinatorics daunting: we must in principle enumerate every single tree, calculate its weight, determine the minimum of these, and confirm that the algorithm produces a solution with this weight. The tricky problem, therefore, is determining the weight of a minimal solution.
4 Another Testing Strategy
In this assignment we will pursue another strategy to validate the output. We have seen that there are multiple ways of computing minimal spanning trees. Therefore, we will implement two different algorithms for the same problem. We will then measure the weights produced by the two solutions and check that they’re the same.
Of course, the fact that they produce solutions of the same weight doesn’t mean they are both correct: they could both be wrong! However, to produce the same wrong weight over many inputs, they must have a correlated error. (This error could, for instance, be in code they share, such as an implementation of union-find.) To minimize for likelihood of a correlated error, you should also test on at least a few inputs for which you know the definitive minimum.
5 Problem Statement
data Edge:
| edge(a :: String, b :: String, weight :: Number)
end
type Graph = List<Edge>
mst-prim :: Graph -> Graph, which implements Prim’s algorithm.
mst-kruskal :: Graph -> Graph, which implements Kruskal’s algorithm.
You must consider edges to be undirected (edge equality has been defined so that order of Edge.a and Edge.b doesn’t matter). However, the input will include edges in only one “direction”. For instance, a sample Graph might be [list: edge("A", "B", 3), edge("C", "B", 4), edge("C", "A", 5)].
It is possible for there to be multiple edges to exist between the same two vertices, but notice that they must be in the same direction. So it is valid to have: [list: edge("A","B",3), edge("A","B",5)] but not [list: edge("A","B",3), edge("B","A",5)].
Despite the fact that edges from "B" to "A", "B" to "C", and "A" to "C" are not explicitly listed, because the edges are undirected, you should treat them like they are there. (This implies that either version of a particular edge can be in the MST, but not both.) How you choose to handle this implementation-wise and testing-wise is left up to you.Hint: You could add all the edges in the opposite direction.
The weight of an edge can be 0 or negative. Furthermore, a path that starts and ends at the same vertex will form a cycle, thus making the graph not a tree.
generate-input :: Number -> Graph, which consumes a number and randomly produces a connected graph with that many vertices. How you actually do this is up to you; there’s no one “right” answer, and we’ll accept many different solutions so long as they incorporate randomness. (Note that a graph containing 0 or 1 vertices contains no edges, so make sure your output reflects this.)
mst-cmp :: Graph, Graph, Graph -> Boolean, which consumes a well-formed graph and two purported MSTs of the graph, and returns true exactly when the two solutions are both trees, spanning, have the same weight, and only contain edges in the original graph. This function should not check that the weights are minimum—
it should only check that they are the same.
We expect you to write a reasonable set of interesting manual tests for mst-prim, mst-kruskal, and mst-cmp. You must especially be manually checking (for mst-prim and mst-kruskal) that these compute minimal weight solutions, as discussed in Another Testing Strategy.
Finally, also implement:
sort-o-cle :: (Graph -> Graph), (Graph -> Graph) -> Boolean, which creates an oracle that consumes two purported MST implementations and generates several random graphs, checking for each that both MST algorithms generate solutions that satisfy mst-cmp. sort-o-cle should not only test on randomly generated graphs: you should also include manually-crafted graphs that cover interesting edge-cases and ensure that both MST algorithms’ solutions satisfy mst-cmp. Since passing mst-cmp is not a guarantee that the graphs have minimum weight, we consider this a sorta’ oracle, hence sort-o-cle (not to be confused with “sortacle”).
Of course, it’s possible that both algorithms are producing the wrong answer, in that the solution from neither is minimal. You should therefore create some interesting graphs manually and check that the weights of the result are minimal. These concrete tests should belong to mst-kruskal and mst-prim, which are the functions responsible for generating a minimum solution. It is this combination of testing strategies that gives us some confidence that both implementations are, in fact, correct.
6 Efficiency
Prim’s algorithm takes \(O([n \rightarrow n^2])\) time when you use a list to keep track of the nodes you have visited. We can bring this down to \(O([n \rightarrow n \log n])\) time if we use a heap, which you have seen both in class and in lab. Therefore, we expect you to use a heap in your implementation of Prim’s.
That apart, we are not considering efficiency of implementation in this assignment, but we will check that you indeed implemented Prim’s and Kruskal’s algorithms correctly.
7 Built-Ins
In addition to the allowed functions in the general guidelines, you can also use immutable string dictionaries.
You can use any of the union-find implementations shown in class or in the textbook. You may use ref fields for that specific purpose. However, you are not allowed to use ref or any form of mutation elsewhere in your program.
8 Starter
There are no wheats and chaffs for this assignment.