Do You Believe Your (Social Media) Data? A Personal Story on Location Data Biases, Errors, and Plausibility as well as their Visualization

Description:

We present a case study on a journey about a personal data collection of carnivorous plant species habitats, and the resulting scientific exploration of location data biases, data errors, location hiding, and data plausibility. While initially driven by personal interest, our work led to the analysis and development of various means for visualizing threats to insight from geo-tagged social media data. In the course of this endeavor we analyzed local and global geographic distributions and their inaccuracies. We also contribute Motion Plausibility Profiles—a new means for visualizing how believable a specific contributor’s location data is or if it was likely manipulated. We then compared our own repurposed social media dataset with data from a dedicated citizen science project. Compared to biases and errors in the literature on traditional citizen science data, with our visualizations we could also identify some new types or show new aspects for known ones. Moreover, we demonstrate several types of errors and biases for repurposed social media data.

Paper download:

(regular version, 43.6 MB)
(version for readers with color impairments, 43.6 MB)

Demo Source Code and Data:

As described at the end of the paper, we purposefully do not share any data and use only coarse or anonymized maps in the published materials to prevent poaching of the endangered plants. We provide, however, the script that can generate the Motion Plausibility Profiles for iNaturalist data, for any chosen species or group of species. The same script is also available through GitHub. To be able to use it, you still need to download your own data from iNaturalist and adjust the script accordingly. The script reproduces all motion plausibility profiles for the iNaturalist data as shown in the paper (along with some other visualizations we used), but of course based on the data exported at the time of download.

Video:

live presentation at IEEE :

pre-recorded presentation from IEEE :

30 second paper fast-forward at IEEE :

30 second poster preview at IEEE :

Get the videos:

Pictures:

All self-created pictures that we use and discuss in the paper are available under a Creative Commons Attribution 4.0 International (CC-BY 4.0) license in a dedicated OSF repository.

Additional related work that we missed in our paper's discussion:

Describing place though user-generated content by Purves et al.: discussion of some forms of bias in social media image data, in particular the power law behavior for the contribution counts (and thus the light-green field for individual poster contribution skew bias for “repurposed” social media data in our Table 1 should, in fact, have a white background)
Understanding movement data quality by Andrienko et al.: discussion of errors in movement data and their visualization

Additional Material:

We have included all additional material (many additional visual representations) in the main PDF files linked from these pages. This way the hyperlinks between the main paper and the visual representations in the additional material are maintained.

Misc:

The presentation received the unofficial »Michael Correll's favorite IEEE VIS 2022 presentation« award; thanks, Michael! 😄

Poster presented at IEEE on the initial tool:

Main Reference:

Tobias Isenberg, Zujany Salazar, Rafael Blanco, and Catherine Plaisant (2022) Do You Believe Your (Social Media) Data? A Personal Story on Location Data Biases, Errors, and Plausibility as well as their Visualization. IEEE Transactions on Visualization and Computer Graphics, 28(9):3277–3291, September 2022.

BibTeX entry:



@ARTICLE{Isenberg:2022:DYB,
  author      = {Tobias Isenberg and Zujany Salazar and Rafael Blanco and Catherine Plaisant},
  title       = {Do You Believe Your (Social Media) Data? {A} Personal Story on Location Data Biases, Errors, and Plausibility as well as their Visualization},
  journal     = {IEEE Transactions on Visualization and Computer Graphics},
  year        = {2022},
  volume      = {28},
  number      = {9},
  month       = sep,
  pages       = {3277--3291},
  doi         = {10.1109/TVCG.2022.3141605},
  doi_url     = {https://doi.org/10.1109/TVCG.2022.3141605},
  oa_hal_url  = {https://hal.science/hal-03516682},
  osf_url     = {https://osf.io/u8ejr/},
  github_url  = {https://github.com/tobiasisenberg/Motion-Plausibility-Profiles},
  url         = {https://tobias.isenberg.cc/p/Isenberg2022DYB},
  pdf         = {https://tobias.isenberg.cc/personal/papers/Isenberg_2022_DYB.pdf},
}

Other References:

Rafael Blanco, Zujany Salazar, and Tobias Isenberg (2019) Exploring Carnivorous Plant Habitats based on Images from Social Media. In Steffen Koch, Tatiana von Landesberger, Christopher Collins, Nathalie Henry Riche, Jian Chen, and Daniel Jönsson, eds., Posters at the IEEE Conference on Visualization (IEEE VIS, October 20–25, Vancouver, Canada). 2019. Extended abstract and poster.

BibTeX entry:



@INPROCEEDINGS{Blanco:2019:ECP,
  author      = {Rafael Blanco and Zujany Salazar and Tobias Isenberg},
  title       = {Exploring Carnivorous Plant Habitats based on Images from Social Media},
  booktitle   = {Posters at the IEEE Conference on Visualization (IEEE VIS, October 20--25, Vancouver, Canada)},
  OPTeditor   = {Steffen Koch and von Landesberger, Tatiana and Christopher Collins and Henry Riche, Nathalie and Jian Chen and Daniel J{\"o}nsson},
  year        = {2019},
  oa_hal_url  = {https://hal.science/hal-02196764},
  url         = {https://tobias.isenberg.cc/p/Isenberg2022DYB},
  pdf         = {https://tobias.isenberg.cc/personal/papers/Blanco_2019_ECP.pdf},
}

Tobias Isenberg (2024) Visualization of Species Distributions based on Social Media Data and a Couple of Surprises and Insights. Keynote talk at VCBM, September 2024.

BibTeX entry:



@MISC{Isenberg:2024:VCB,
  author      = {Tobias Isenberg},
  title       = {Visualization of Species Distributions based on Social Media Data and a Couple of Surprises and Insights},
  howpublished= {Keynote talk at VCBM},
  year        = {2024},
  month       = sep,
  url         = {https://diglib.eg.org/handle/10.2312/3607039},
  url2        = {https://conferences.eg.org/vcbm2024/},
}

This work was done at the AVIZ project group of Inria, France.