Learning goals

  • Students can scrape data from static webpages
  • Students understand the operation of simple APIs and can pull data from those using GET requests
  • Students can examine websites to find potential hidden APIs from websites
  • Students can collect data from online platforms which require OAuth authentication
  • Students can list other forms of digital data (away from websites and social media content) and describe how it would be collected
  • Students can explain limitations digital data have
  • Students can justify the data collection in terms of research ethics and legal perspectives

Assignments

  • Weekly assignments, see code,

Lectures

4.9.2024: Connecting to the Internet

  • JSON
  • HTML

11.9.2024: Working with APIs

  • REST API
  • GraphQL
  • OAuth

Readings

Post-API era

  • Bruns, A. (2019). After the ‘APIcalypse’: social media platforms and their fight against critical scholarly research. Information Communication and Society, 22(11), 1544–1566. https://doi.org/10.1080/1369118X.2019.1637447
  • Freelon, D. (2018). Computational Research in the Post-API Age. Political Communication, 35(4), 665–668. https://doi.org/10.1080/10584609.2018.1477506

Validity and reliability issues with APIs

  • Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. (2014). Two 1%s Don’t Make a Whole: Comparing Simultaneous Samples from Twitter’s Streaming API. In Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (pp. 75–83). https://doi.org/10.1007/978-3-319-05579-4_10
  • Ruths, D., & Pfeffer, J. (2014). Social media for large studies of behavior. Science, 346(6213), 1063–1064. https://doi.org/10.1126/science.346.6213.1063

18.9.2024: Parsing websites

  • Understanding HTML
  • Working with parsers
  • Working with browser extensions

25.9.2024: Parsing websites and gray APIs

  • Selenium
  • Finding gray APIs

Readings

  • Mancosu, M., & Vegetti, F. (2020). What You Can Scrape and What Is Right to Scrape: A Proposal for a Tool to Collect Public Facebook Data. Social Media and Society, 6(3). https://doi.org/10.1177/2056305120940703
  • Arora, S. K., Li, Y., Youtie, J., & Shapira, P. (2016). Using the wayback machine to mine websites in the social sciences: A methodological resource. Journal of the Association for Information Science and Technology, 67(8), 1904–1915. https://doi.org/10.1002/asi.23503

2.10.2024: Applications and digital traces

  • Data donations
  • Trackkers
  • etc

Readings

  • Hase, V., Munich, L. M. U., Schmidbauer, E., Munich, L. M. U., Welbers, K., Universiteit, V., Haim, M., & Munich, L. M. U. (n.d.). Fulfilling data access obligations : How could ( and should ) platforms facilitate data donation studies ? 13. https://doi.org/10.14763/2024.3.1793
  • Araujo, T., Ausloos, J., van Atteveldt, W., Loecherbach, F., Moeller, J., Ohme, J., Trilling, D., van de Velde, B., de Vreese, C., & Welbers, K. (2022). OSD2F: An Open-Source Data Donation Framework. Computational Communication Research, 4(2), 372–387. https://doi.org/10.5117/CCR2022.2.001.ARAU

9.10.2024: Data management

Readings

Additional readings

  • Software carpentry: Databases and SQL

  • University of Helsinki: Data management planning

  • Williams, M. L., Burnap, P., & Sloan, L. (2017). Towards an Ethical Framework for Publishing Twitter Data in Social Research: Taking into Account Users’ Views, Online Context and Algorithmic Estimation. Sociology, 51(6), 1149–1168. https://doi.org/10.1177/0038038517708140

  • Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J. W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). Comment: The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 1–9. https://doi.org/10.1038/sdata.2016.18

  • Ethics

  • Legal

  • SQL

16.10.2024: Concluding notes