Assignment 1: Exploratory Data Analysis

In this assignment, your team will engage in an exploratory analysis to identify a dataset of interest and perform an exploratory analysis, understand the shape & structure of the data, investigate initial questions, and develop preliminary insights & hypotheses.

Your final submission will be a report with captioned visualizations that convey key insights gained during your analysis. The team report will consist of three parts.

Part 1: Data Description (2–3 pages)

First of all, the team picks a topic area of interest and finds a dataset that can provide insights into that topic.

Minimum Requirement: The data table you proposed for the project must have at least 10,000 rows and at least 10 columns.

This part should include the following contents:

  • Describe the chosen topic of interest and explain why this topic is interesting/important.
  • Provide the name and a description of the dataset, highlighting its relevance to the chosen topic. Also, briefly describe how the data was collected.
  • What are the attributes in the dataset? What are the types of each variable? Is there any missing or outlier data? Do you need to transform the data before the analysis?
  • Include the URL link or reference to access the dataset.

Part 2: Exploratory Analysis (individual work, 1 page each)

This part is an individual assignment.

In this part, each team member will produce one exploratory visualization that effectively answer the research questions about the dataset (i.e., one question and one figure per team member).

Each visualization must include the following elements:

  • Name of the member who works on this visualization.
  • Specifying a research question you’d like a visualization to answer.
    • Each team member must pick different questions.
  • Design a static visualization (i.e., a single image) that effectively answers that question. Use the question as the title of your graphic.
    • The visualization should be interpretable without recourse to your write-up.
    • Do not forget to include titles, axis labels, or legends as needed.
    • Ensure the figure in the report is of high quality, i.e., all visual elements and text can be read easily.
  • Provide a description of the figure (1 paragraph)
    • What aspects of the data are you attempting to communicate? (i.e., what is the story you want to tell?)
  • Provide a design rationale for your visualization design (1 paragraph)
    • What visual encodings did you use? Why are they appropriate for the data and your specific question? How do these decisions facilitate effective communication?

You are free to use any graphics or charting tool you are familiar with. You may need to transform the data, remove outliers, and incorporate additional external data sources to improve the quality of your visualization.

It should take a maximum of one page for each visualization/member. The team concatenates all visualizations from each team member into the team report.

Part 3: Findings (around 1/2 page)

Finally, the team looks at the visualizations that everybody produces in the previous part. The team should synthesize all visualizations and write up on findings you gain from an exploratory analysis. Don’t forget to relate your findings to the topic you picked at the beginning.

What to submit

Parts 1 and 3 are the writing for the entire team, while Part 2 concatenates all of your member’s work into the same report.

Upload the team report as a single PDF file on Edunao’s course page in the assignment titled "Assignment 1: Exploratory Data Analysis." The file must be named "A1_[Your team name].pdf".

The deadline to submit this assignment is March 3rd, 23:00.

Points for this assignment

  • Parts 1 and 3 count for a total of 15 points for the team. These parts are the work of the entire team. Points will be awarded uniformly to all team members.
  • Part 2 is an individual assignment and counts for a total of 15 points.

Acknowledgment: This assignment is based on the assignment from CSE 442: Data Visualization course taught by Jeff Heer at the University of Washington and the Interactive Data Visualization course taught by Arvind Satyanarayan at MIT.

