Aim and approach
This course provides a strong methodology foundation in applying data science methods for social sciences.
In the end of the course, the student knows how to (a) conceptually apply data science approaches in work and (b) knows how to code machine learning using R or Python language.
This course focuses on
- conceptual understanding of applying data science approaches through discussing papers which have used a particular approach.
- hands-on skills of conducting these analysis with data sets
By the end of the course, students can
- conduct data pre-processing both with quantitative and qualitative datasets
- apply unsupervised machine learning methods to quantitative and qualitative data and draw social science relevant conclusions from the results
- apply supervised machine learning methods to quantitative and qualitative data and draw social science relevant conclusions from the results
- discuss the benefits and challenges of using data science methods for social science research methods and skills to apply these methods in students’ own research domain.
If you have quiestions about the course, please contact us via email@example.com .
Before the first class, student should master bascis or Python or R and know how to
- working with variables, for-loops and if-structures
- opening files and writing files
- calling functions or methods
- Chapter "Algorithmic Data Analysis" from Coding Social Science. Understanding and Doing Computational Social Science (shared via email) and prepeare by going through the exercises.
- In class coding activities, install needed packages from
00 Setup Python or
00 Setup R
Suggested additional reading
- Hastie, T., Tibshirani, R., & Friedman, J. The Elements of Statistical Learning. Elements. New York, NY: Springer New York.
Course is evaluated as pass/fail. To pass, you need 125 points. Each student choose what learning activities they want to engage tio help them in their learning process and combine different approaches and modules.
- Attending class discussion: 5 points per each class
- Writing a response to data science article of your choice discussing its methodological choices: 5 points per each response
- Doing the class activity and writing a reflection diary based on it: 10 points per each activity
- Taking a paper of your choice which is not using data science methods and introduce how you would use data science approaches to redo that paper: 10 points per paper
- Taking a paper of your choice which is not using data science methods and do write a replication study which used data science methods: 25 points per paper
- Writing an empirical article (with introduction, theory, methods etc.) which utilises two methods discussed in the class: 80 points per article
- Writing an empirical article (with introduction, theory, methods etc.) which utilises one method discussed in the class: 60 points per article
- Writing a brief analysis of a research problem of your choice with these methods: 15 points per article
- Propose your own activity here
(Please note that point scales migth still change before the first class.)
30.8. Introduction and Social science research questions and data science
- Bartlett, A., Lewis, J., Reyes-Galindo, L., & Stephens, N. (2018). The locus of legitimate interpretation in Big Data sciences: Lessons for computational social science from -omic biology and high-energy physics. Big Data & Society, 5(1)
- Grimmer, J., & Stewart, B. M. (2013). Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3), 267–297.
30.8. Dictionary based methods and Working with Textual Data
- Munger, K. (2017). Tweetment Effects on the Tweeted: Experimentally Reducing Racist Harassment. Political Behavior, 39(3), 629–649.
- Hoffman, E. R., McDonald, D. W., & Zachry, M. (2017). Evaluating a Computational Approach to Labeling Politeness. Proceedings of the ACM on Human-Computer Interaction, 1(CSCW), 1–14.
- Code exercise: Dictionary methods
31.8. K-means and cluster analysis
- Nelimarkka, M., & Hellas, A. (2018). Social Help-seeking Strategies in a Programming MOOC. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education - SIGCSE ’18 (pp. 116–121). New York, New York, USA: ACM Press.
- Burscher, B., Vliegenthart, R., & Vreese, C. H. de. (2016). Frames Beyond Words. Social Science Computer Review, 34(5), 530–545.
- Code exercise: K-means and clustering
31.8. Topic models
- Pöyhtäri, R., Nelimarkka, M., Nikunen, K., Ojala, M., Pantti, M., & Pääkkönen, J. (2019). Refugee debate and networked framing in the hybrid media environment. International Communication Gazette.
- Pantti, M., Nelimarkka, M., Nikunen, K., & Titley, G. (2019). The meanings of racism: Public discourses about racism in Finnish news media and online discussion forums. European Journal of Communication, 34(5), 503–519.
- Code exercise: Topic models
1.9. Support vector machines and Naive Bayes
- Weber, I., Ukkonen, A., & Gionis, A. (2012). Answers, not links. In Proceedings of the fifth ACM international conference on Web search and data mining - WSDM ’12 (pp. 613–622). New York, New York, USA: ACM Press.
- Yu, B., Kaufmann, S., & Diermeier, D. (2008). Classifying Party Affiliation from Political Speech. Journal of Information Technology & Politics, 5(1), 33–48.
- Toivanen, P., Nelimarkka, M., & Valaskivi, K. (2021). Remediation in the hybrid media environment: Understanding countermedia in context. New Media & Society,
- Support Vector Machines
- Naive Bayes
- Code exercise: Support vector machines
1.9. Decision trees and random forests
2.9. Association rules
2.9. Technical activities
3.9. Personal course plan development
3.9. Future Outlook
- Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3), 199–215.
- Hofman, J. M., Sharma, A., & Watts, D. J. (2017). Prediction and explanation in social systems. Science, 355(6324), 486–488.
- Radford, J., & Joseph, K. (2020). Theory In, Theory Out: The uses of social theory in machine learning for social science, 1–19. Note, this is not a published paper yet.
- Baumer, E.P.S., Mimno, D., Guha, S., Quan, E. and Gay, G.K. (2017), Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence?. Journal of the Association for Information Science and Technology, 68: 1397-1410.