On this page:
1 Format
2 Theme Song
3 Problem
4 Files
5 Handing In

TweeSearch 1

    1 Format

    2 Theme Song

    3 Problem

    4 Files

    5 Handing In

1 Format

You should treat this assignment like an exam. Therefore, you should do this work without consulting course staff except for critical issues (broken links, possible assignment typo, etc.).

2 Theme Song

Rockin’ Robin by Bobby Day

3 Problem

Imagine you have a collection of past tweets, and are given a new tweet. You want to find all the old tweets that are similar to the current one.

How do we test for “similarity”? Easy: we use DocDiff! That is, we will use overlap with some specified threshold (which will be in the range \([0, 1]\)).

The assignment depends on the following data definition:

data Tweet: tweet(author :: String, content :: String) end

(which is provided for you in the support code), where both fields are expected to be non-empty strings.

In this assignment, we will:
  • use only the alphanumeric characters (retaining both capital and lower-case letters) and spaces of each Tweet, and

  • keep the tweet intact, so when returning searched tweets, the entire text is present.

Comparison is done with the content field; the author field is ignored.

Define the function

search :: Tweet, List<Tweet>, Number -> List<Tweet>

The first argument is the new tweet, the the second argument is the collection of past tweets, and the third is a threshold for overlap.

Your overlap function from DocDiff takes two lists of strings. To create the lists, split the content of a tweet into words. In the tweet, a space separates one word from the next.

The result is all past tweets that have an overlap of at least threshold (inclusive). The result should be sorted from the tweets with highest to those with lowest (up to threshold) overlap.

4 Files

Template

5 Handing In

Submission Form