TweeSearch 3
1 Format
You should treat this assignment like an exam. Therefore, you should do this work without consulting course staff except for critical issues (broken links, possible assignment typo, etc.).
2 Theme Song
Rockin’ Robin by Bobby Day
3 Problem
You remember TweeSearch 2. Now we’re going to build on it, too.
In the previous assignment, we had a somewhat awkward way of dealing with the true structure of tweets. This is because we defined tweets as ascending chains, which meant we could end up inadvertently traversing duplicates, which we had to then filter. A better structure would be to have a descending tree.
Therefore, we will change the definition of a Tweet, eliminating the parent field and instead having the field children, which is of type List<Tweet>.
The given list of old tweets is guaranteed to contain only and all the roots.
Below, we redefine what relevance means.
A tweet’s relevance is now conditionally defined depending on whether or not it has a parent.
If the tweet does not have a parent, its relevance is \[(0.75 \times o_t) + (0.25 \times s)\] where \(o_t\) is the overlap of the tweet itself with the new tweet, and \(s\) is a size ratio (defined below).
If the tweet does have a parent, its relevance is \[(0.20 \times r_p) + (0.60 \times o_t) + (0.20 \times s)\] where \(r_p\) is the relevance of its parent, \(o_t\) is the overlap of the tweet itself with the new tweet, and \(s\) is its size ratio.
What is the size ratio? It’s the ratio of the size of the current tweet’s subtree to the total size of all the trees provided. The size of a tweet’s tree is defined as 1 (for the tweet itself) plus the sum of the sizes of each of the child tweets. (The parent and other tweets do not count.)