Aim and approach
This course provides a strong methodology foundation in applying data science methods for social sciences.
In the end of the course, the student knows how to (a) conceptually apply data science approaches in work and (b) knows how to code machine learning using R or Python language.
This course focuses on
- conceptual understanding of applying data science approaches through discussing papers which have used a particular approach.
- hands-on skills of conducting these analysis with data sets
By the end of the course, students can
- conduct data pre-processing both with quantitative and qualitative datasets
- apply unsupervised machine learning methods to quantitative and qualitative data and draw social science relevant conclusions from the results
- apply supervised machine learning methods to quantitative and qualitative data and draw social science relevant conclusions from the results
- discuss the benefits and challenges of using data science methods for social science research methods and skills to apply these methods in students’ own research domain.
If you have questions about the course, please contact me via grp-dcm-teaching@helsinki.fi or come to my office hours.
Prerequisite
Before the first class, student should master basci or Python or R and know how to
- working with variables, for-loops and if-structures
- opening files and writing files
- calling functions or methods
Course materials
- Computational Thinking and Social Science (shared via email)
- In class coding activities, install needed packages from
00 Setup Python
or 00 Setup R
Suggested additional reading
- Hastie, T., Tibshirani, R., & Friedman, J. The Elements of Statistical Learning. Elements. New York, NY: Springer New York.
Course evaluation
Course is evaluated as pass/fail.
To pass, you need writing a manuscript using some of the methods and techniques presented during the course. The manuscript must be written during the Spring semester. Please use office hours to gain more feedback and comments on your manuscript. The manuscript must have all bits in place: there must be a meaningful introduction, some theory or previous work section, description of data and methods, and a discussion.
Syllabus
16.1. Day one
9.15-10.00: Introduction
Read before the class
- Chapter "Data Science" from Computational Thinking and Social Science
Supplementary readings
- Hindman, M. (2015). Building Better Models: Prediction, Replication, and Machine Learning in the Social Sciences. The ANNALS of the American Academy of Political and Social Science, 659(1), 48–62.
- Bartlett, A., Lewis, J., Reyes-Galindo, L., & Stephens, N. (2018). The locus of legitimate interpretation in Big Data sciences: Lessons for computational social science from -omic biology and high-energy physics. Big Data & Society, 5(1)
- Grimmer, J., & Stewart, B. M. (2013). Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3), 267–297.
10:00 - 13:00 Decision trees and random forests
Code exercise
Supplementary readings
11:00-12:00 Lunch break
13:00 - 14:30 Working with Textual Data
Code exercise
- Dictionary methods and working with Textual data
Supplementary readings
- Hoffman, E. R., McDonald, D. W., & Zachry, M. (2017). Evaluating a Computational Approach to Labeling Politeness. Proceedings of the ACM on Human-Computer Interaction, 1(CSCW), 1–14.
- King, G., Lam, P., & Roberts, M. E. (2017). Computer-Assisted Keyword and Document Set Discovery from Unstructured Text. American Journal of Political Science, 61(4), 971–988.
- Munger, K. (2017). Tweetment Effects on the Tweeted: Experimentally Reducing Racist Harassment. Political Behavior, 39(3), 629–649.
14:30-15:45 Support vector machines and Naive Bayes
Read before class
Code exercise
Supplementary readings
- Weber, I., Ukkonen, A., & Gionis, A. (2012). Answers, not links. In Proceedings of the fifth ACM international conference on Web search and data mining - WSDM ’12 (pp. 613–622). New York, New York, USA: ACM Press.
- Yu, B., Kaufmann, S., & Diermeier, D. (2008). Classifying Party Affiliation from Political Speech. Journal of Information Technology & Politics, 5(1), 33–48.
17.1. Day two
9.15 - 10:45 Cluster analysis and K-means
Code exercise
Supplementary readings
- Nelimarkka, M., & Hellas, A. (2018). Social Help-seeking Strategies in a Programming MOOC. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education - SIGCSE ’18 (pp. 116–121). New York, New York, USA: ACM Press.
- Burscher, B., Vliegenthart, R., & Vreese, C. H. de. (2016). Frames Beyond Words. Social Science Computer Review, 34(5), 530–545.
10:45 - 13:30 Topic models
Read before class
- Pöyhtäri, R., Nelimarkka, M., Nikunen, K., Ojala, M., Pantti, M., & Pääkkönen, J. (2019). Refugee debate and networked framing in the hybrid media environment. International Communication Gazette.
- Pantti, M., Nelimarkka, M., Nikunen, K., & Titley, G. (2019). The meanings of racism: Public discourses about racism in Finnish news media and online discussion forums. European Journal of Communication, 34(5), 503–519.
Code exercise
Supplementary readings
- Baumer, E.P.S., Mimno, D., Guha, S., Quan, E. and Gay, G.K. (2017), Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence?. Journal of the Association for Information Science and Technology, 68: 1397-1410.
11-12 Lunch break
13:30 - 14:30 Association rule
Code example
Supplementary readings
14.30 - 15:45 Conversations on ethics and reliability
Read before class
- Chapter "Research Ethics in Computational Social Science" from Computational Thinking and Social Science
- Chapter "Mistakes and Quality of Results in Computational Social Sciences" from Computational Thinking and Social Science
20.1. Project work kickoff
Read before class
- Chapter "Integrating Computational Methods into Research" from Computational Thinking and Social Science
Additional readings
- Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3), 199–215.
- Hofman, J. M., Sharma, A., & Watts, D. J. (2017). Prediction and explanation in social systems. Science, 355(6324), 486–488.
- Radford, J., & Joseph, K. (2020). Theory In, Theory Out: The uses of social theory in machine learning for social science, 1–19. Note, this is not a published paper yet.
9.15 - 12:00 Project planning activities
12:00 - 13:00 Break
13:00 - 15:45 Drop-in hours for individual support