JoinLists

contents ← prev up next →

JoinLists

1 What is This Assignment’s Purpose?

2 Theme Song

3 Rules and Support Code

1 What is This Assignment’s Purpose?

Lists, as you’ve seen them until now, permit only sequential access. You can’t get to the third element until you’ve been through the first two. This means that when you recur, you do so on a list that is only one element smaller.

For some computations, this is particularly problematic if you’re working on a multi-processor, such as a machine with many cores. Even with just two processors, the amount of work the two would do would be very imbalanced because one is processing a single element while the other is processing the entire rest of the list. It would be much better if we could split the list roughly in half, and thus (in some cases) reduce the overall processing time by half. That requires a different kind of list data structure, which is what we’re going to explore.

Put differently, the data structures built into programming languages are not ideal for all purposes. Sometimes, it makes sense to design alternate underlying representations for the same operations (as well as adding operations that exploit the new representation). We have seen this before with, e.g., different representations of sets. Choosing between representations requires wrestling with tradeoffs, both strengths and weaknesses.

2 Theme Song

You Belong With Me by Taylor Swift

3 Rules and Support Code

Our new data structure, the join list, is defined as follows:

data JoinList<T>:
  | empty-join-list
  | one(elt :: T)
  | many(mjl :: ManyJoinList<T>)
end

A ManyJoinList is unstructured data. A value of type ManyJoinList has methods that you can call, specifically:

rebalance-and-split-parallel
rebalance-and-split
length

(See below for more.)

A ManyJoinList is stored as a tree. Any time rebalance-and-split(-parallel) is called, we rebalance this tree, although possibly in different ways each time.

rebalance-and-split specifically rebalances it so that the two parts are roughly even, so that we can take advantage of loggy runtime complexities (though there is no guarantees of this at all; it depends on the system load). It then gives you the two branches, and lets you do whatever you want with them.

In principle (though Pyret doesn’t actually work on multiple cores), rebalance-and-split-parallel rebalances it so that the size of the two parts corresponds to the amount of load on the two cores it will use. If one core is more busy than another, it will give that core a smaller portion of the tree. The user of this function gives a procedure to perform on the left chunk and a procedure to perform on the right chunk, and these procedures are carried out in parallel on the two cores. There is then a combining procedure which is performed sequentially on a single core after both cores have finished their procedures.

A join list is thus really a tree. Often, the two sub-lists (which are really sub-trees) will be of the same size; however, the definition does not require that they be. Indeed, because it is not, constructing a join list is easy; all balancing can be done internally by the system, based on various characteristics (such as the load on the system, number of available processors, etc.).

That said, though a join list is represented as a tree, it’s still a list in nature. That means it’s an ordered data structure, and the order must be preserved by operations. Similarly, it can contain duplicates, and these must not be discarded.

You can assume that all join lists of size 0 will be represented by empty-join-list, and all join lists of size 1 will be represented by one(elt).

We provide the following operations:

JoinList.join(other :: JoinList) -> JoinList
(What this notation means: join is a method of JoinList objects, so you would call it like my-join-list.join(other-join-list).)
Given another join list, it joins them to produce a new join list. If one or both of the join lists are empty-join-list, join will simply return the other list. Otherwise it will return a join list containing the second list appended to the first.
==
JoinList supports ==, which checks whether the two JoinLists have the same elements in the same order.
[join-list: …]
This is a constructor for new values.
JoinList.length() -> Number
This method computes the length of a join list in constant time.
is-non-empty-jl :: JoinList -> Boolean
This predicate is mainly useful for type refinements.
ManyJoinList<T>.rebalance-and-split-parallel<U, V, W>(
lproc :: (JoinList<T> -> U),
rproc :: (JoinList<T> -> V),
ucomb :: (U, V -> W)) -> W
This splits the list, processes the two parts with the given function, and then combines them with the given combiner.
ManyJoinList<T>.rebalance-and-split<U>(
comb :: (JoinList<T>, JoinList<T> -> U)) -> U:
This is a simpler version of rebalance-and-split-parallel.
For purposes of testing you may use the functions
join-list-to-list(jlist :: JoinList<A>) -> List<A> and
list-to-join-list(lst :: List<A>) -> JoinList<A>
because otherwise your tests will be inconvenient to write. However, you may not use these—or any functions other than those listed—to implement the required functionality.

To reiterate that last point: due to the nature of the assignment, we want you to limit yourself to programming as if you only had join lists, not conventional lists. Therefore, you should not use list-to-join-list and join-list-to-list except in tests. You also should not use the .length method to implement j-length.

Here is something else you should not do: give in to the temptation to write j-first and j-rest, then implement the other functions with j-first and j-rest as you would for normal Pyret lists. The functions you write should take advantage of the unique structure of join lists. It may be necessary to use j-first and j-rest in some situations, but you should use splitting instead wherever possible. The body of your functions should generally take the form:

cases (JoinList) ...:
  | empty-join-list => ...
  | one(elt) => ...
  | many(mjl) => mjl.rebalance-and-split(-parallel)(...)
end

If you are unsure what you may or may not do or use, please ask.

While we are not grading on the efficiency of your code per-se, if your implementation is extremely inefficient, that is probably a sign that you are not using the structure of join lists properly, and you might want to rethink your implementation.

4 Instructions

Implement the following functions. Some of them require non-empty join lists. This is specified in the type signature as JoinList%(is-non-empty-jl)). Unless otherwise indicated, all references to “lists” are to join lists.

j-first<A>(jl :: JoinList<A>%(is-non-empty-jl)) -> A
Returns the first element of a non-empty list.
j-rest<A>(jl :: JoinList<A>%(is-non-empty-jl)) -> JoinList<A>
Returns a join list containing all elements but the first of a non-empty list.
j-length<A>(jl :: JoinList<A>) -> Number
Returns the length of a join list. Do not implement this by using the length method of the JoinList!
j-nth<A>(jl :: JoinList<A>%(is-non-empty-jl), n :: Number) -> A
Returns the nth element (using a 0-based index) of a list containing at least n + 1 elements. For example, a second argument of 0 returns the first element of the join list.
j-max<A>(jl :: JoinList<A>%(is-non-empty-jl), cmp :: (A, A -> Boolean)) -> A
Returns the “maximum” value in a non-empty list. The second argument is a comparator that defines “maximum” for the type in the list. The comparator should behave like a “greater than”: that is, if the comparator returns true, the first argument to the comparator is “greater” than its second argument.
j-map<A,B>(map-fun :: (A -> B), jl :: JoinList<A>) -> JoinList<B>
Implements map for join lists.
j-filter<A>(filter-fun :: (A -> Boolean), jl :: JoinList<A>) -> JoinList<A>
Implements filter for join lists.
j-reduce<A>(reduce-func :: (A, A -> A), jl :: JoinList<A>%(is-non-empty-jl)) -> A
Distributes an operator across a non-empty list. That is, given the elements e1, e2, ..., en in order, and operator op, computes the equivalent of e1 op e2 op ... op en.
j-sort<A>(cmp-fun :: (A, A -> Boolean), jl :: JoinList<A>) -> JoinList<A>
Sorts a list in the order given by the first argument, the comparator. If the comparator returns true, then the first argument to the comparator should come before the second argument in the list produced. The comparator should behave like a “less than”.

5 Analysis

What should the value of

j-reduce(lam(x,y): x - y end, [join-list : 1, 2, 3])

be? Hint: There is more than one possible answer. List all.

Unfortunately, having more than one answer violates the expectation that j-reduce is a function. The problem is not with j-reduce but with the use of - (subtraction) as an argument. What property should the first argument to j-reduce demonstrate? Can you argue this from the description of j-reduce above?

6 Examplar Note

At least one of the chaffs in Examplar is non-deterministic. That means, on the same input it may return several different outputs.

Copying and pasting that test multiple times will increase your chances of catching it. This may not be the best way aesthetically or stylistically to approach it. You can also use higher order functions to run some test multiple times (as in your testing oracles). Notice that this won’t guarantee that you’ll catch the chaff, but it can increase your chances. You should test enough times to make yourself feel confident that you can catch this chaff but also remember that too many tests will time out the autograder.

7 Starter

Template

contents ← prev up next →