Assignment 1: Exploratory Data Analysis

In this assignment, you will identify a dataset of interest and perform an exploratory analysis to better understand the shape & structure of the data, investigate initial questions, and develop preliminary insights & hypotheses.

Your final submission will take the form of a report consisting of captioned visualizations that convey key insights gained during your analysis. The team report will consist of four parts.

Part 1: Data Description (1–2 pages)

First of all, the team picks a topic area of interest to you and finds a dataset that can provide insights into that topic.

In this part, the team describes the topic of interest and why this topic is interesting/important.

Following, the team provides the description of the dataset, including:

  • Name and a short description of the dataset (e.g., What is the data about? How has this data been collected?)
  • What are the attributes in the dataset? What are the types of each variable? Is there any missing or outlier data? Do you need to transform the data before the analysis?
  • How to access the dataset? Provide the URL link or reference to access the dataset.

Part 2: Research Questions

After selecting a topic and dataset ó but prior to analysis ó you write down an initial set of at least three different questions you would like to investigate on the dataset.

This part is critical as each member of the team will work on different questions to perform exploratory analysis.

Part 3: Exploratory Analysis (individual work)

This part is an individual assignment. Each team member picks a different research question from Part 2 to work on his/her own visualization.

Each team member designs a static (i.e., single image) visualization that effectively communicates an idea about the research question he/she picked. You are free to use any graphics or charting tool you are familiar with. You can also transform the data, remove outliers, and incorporate additional external data sources to improve the quality of your visualization.

Each member writes up their own work that includes the following elements:

  • Name of the member who works on this visualization
  • The research question that he/she picks. Each team member needs to pick different questions from Part 2.
  • Add a figure of his/her visualization. The visualization should be interpretable without recourse to your write-up. Do not forget to include titles, axis labels, or legends as needed. Ensure the figure is of high quality, i.e., all visual elements and text can be read easily.
  • Write a short description of the figure (1 paragraph): What are aspects of the data that you are attempting to communicate? (i.e., what is the story you want to tell?)
  • Provide a design rationale for your visualization design (1 paragraph): What visual encodings did you use? Why are they appropriate for the data and your specific question? How do these decisions facilitate effective communication?

The team concatenates each individual work into the team report. It should take a maximum of one page for each memberís work. You will be graded individually for this part.

Part 4: Findings (max. 1/2 page)

Finally, the team looks at the visualizations that everybody produces in Part 3. The team should synthesize all visualizations together and write up on findings that you gain from an exploratory analysis. Donít forget to relate your findings to the topic you picked at the beginning.

What to submit

Part 1, 2, and 4 is the writing for the entire team, while Part 3 concatenates all of your memberís work into the same report.

The file must be named "A1_[Your team name].pdf". The deadline to submit this assignment is February 15th, 23:59.

Upload the team report as a single PDF file on Edunaoís course page in the assignment titled "Assignment 1: Exploratory Data Analysis."

Points for this assignment

  • Parts 1, 2, and 4 count for a total of 15 points for the team. These parts are the work of the entire team, so everyone will get the same points.
  • Part 3 is an individual assignment and counts for a total of 15 points. We will determine scores by judging both the soundness of your design and the quality of the write-up.

Acknowledgment: This assignment is based on the assignment from CSE 442: Data Visualization course taught by Jeff Heer at the University of Washington and the Interactive Data Visualization course taught by Arvind Satyanarayan at MIT.

