Data Scripting
1 What is This Assignment’s Purpose?
We live in a world awash with data. It’s important that we develop facility with being able to write small programs that can find useful information for us. This assignment gives you practice with that.
In addition, a lot of the world’s data are structured as tables. This assignment therefore also gets you familiar with tables and writing programs over them.
2 Theme Song
Collection 1 by Piano Studio Ghibli
3 Programming with Tables
Tables are built into Pyret, so we can program with them directly. The Pyret documentation lists Pyret’s features for working with tables. You will find it helpful to read through this to familiarize yourself with the available support. You can also read chapters 7, 8, and 9 of DCIC for a more pedagogic introduction.
4 Program Plans
We again want you to construct program plans for your programs before you implement them. We provide a block-based interface to table operations that you can use for this purpose. Please see our instructions on this.
We want plans for the functions get-art-in-1, get-art-in-2, and get-art-in-3 in Virtual Art Store, as well as for the code you will write in Titanic. Please first construct a plan and save it, and then begin implementation. You will be asked to upload the plan separately.
5 Testing Plan
We want you to again create a Testing Plan for one of the problems in Virtual Art Store. This one, however, is more subtle.
As you write the sub-problems that you want to test—
Therefore, you should also write down your assumptions about table structure. These are often called well-formedness conditions. Essentially, what you are saying is, the testing plan only applies to programs that are run over tables that have this structure. If the given tables don’t, then all bets are off. Put differently, these are just contracts over the input table’s shape.
6 Starter
7 Virtual Art Store
An online store sells rights to digital content created by artists. Artists come from all over the world, as do clients. This requires currency conversions between the two.
table: id :: Number, cost :: Number, currency :: String
table: from-c :: String, to-c :: String, conv-rate :: Number
- Write the function
get-art-in-1 :: ArtworkTable, ConversionTable, Number, String -> Number
that takes an artwork’s id and a desired currency. It gets the price in the listed currency and, if that is not the desired one, uses the conversion table to find the conversion rate and translates it. You can assume that every artwork queried is listed exactly once in the artwork table, and that every pair of currencies you need is listed exactly once in the currency conversion table. Unfortunately, in practice, errors creep in. Either table may be faulty: entries may be deleted by accident, or there may be duplicate entries. If there are missing or duplicate entries for either the input artwork id or the necessary conversion factor, you should raise an exception. You should only compute and return a numeric answer if none of these happens. Call this function get-art-in-2 (with the same signature). Are there any abstractions you can write that can help you clean up your code?
Sometimes, the currency conversion table may not list the conversion from A to B, but it may list that from B to A. If that happens, use the inverse of the conversion table’s ratio. Call this function get-art-in-3 (with the same signature).
Of course, sometimes you can get a conversion ratio by composing several: e.g., the table may have neither A-to-C nor C-to-A, but it may have a chain of conversions (e.g., A to B to D to C, potentially including inverses). Don’t write a program plan or implementation for this; instead, write only a Testing Plan.
- In this problem, we’ve represented currency as strings. Another option is to represent it as a datatype: e.g.,
data Currency: USD | EUR | CHF | INR | … end
What are the strengths and weaknesses of each representation? In a separate document, explain the trade-offs between these two representations. (There is no program plan or code needed for this entry.)
8 Titanic
There are a few different versions of databases of passengers who sailed on the Titanic. (There is some ambiguity in this term, because the Titanic made a few stops even on its maiden voyage.) Here is one such database.
titanic-raw-loader =
GS.load-spreadsheet("1ZqZWMY_p8rvv44_z7MaKJxLUI82oaOSkClwW057lr3Q")
titanic-raw = load-table:
survived :: Number,
pclass :: Number,
raw-name :: String,
sex :: String,
age :: Number,
sib-sp :: Number,
par-chil :: Number,
fare :: Number
source: titanic-raw-loader.sheet-by-name("titanic", true)
end
A male has sex field "male", female has it as "female". (We are reflecting the standards of that time, not claiming any form of normativity.)
A title is the part of the raw-name field up to but not including the first period.
A first name is the part of raw-name between the first and second spaces, skipping over any leading parenthesis.
The six most popular male first names.
The six most popular female first names.
The frequencies of the titles.
Write a program plan (using blocks) for how you would approach these tasks.
Write a program that computes these values. Use sensible variable names and/or comments to make clear which parts of your program compute what.
Answer the following questions on Gradescope:
Write in descending order the 6 most common male first names.
Do the same for the 6 most common female first names.
Describe any observations you have about the above two answers.
Describe any observations you have about the titles.
9 Socially Responsible Computing
Read/View
Read this article.
Write
Why do you think we assigned this reading? Can you think of any falsehoods that the author missed?
In some sense, all user profiles are reductionist in that they condense a human being into an object with several predefined traits. What human characteristics (beyond names) do you think software developers tend to reduce in ways that are harmful to people who carry specific traits?
Optional Readings
This article contains illustrative examples for the previous one, in case you need them. It’s always good to have examples!
This thread introduces you to naming conventions for Vietnamese and other Southeast Asian names.