Aim and approach

This course provides a strong methodology foundation in applying data science methods for social sciences. In the end of the course, the student knows how to (a) conceptually apply data science approaches in work and (b) knows how to code machine learning using R or Python language.

This course focuses on

conceptual understanding of applying data science approaches through discussing papers which have used a particular approach.
hands-on skills of conducting these analysis with data sets

By the end of the course, students can

conduct data pre-processing both with quantitative and qualitative datasets
apply unsupervised machine learning methods to quantitative and qualitative data and draw social science relevant conclusions from the results
apply supervised machine learning methods to quantitative and qualitative data and draw social science relevant conclusions from the results
discuss the benefits and challenges of using data science methods for social science research methods and skills to apply these methods in students’ own research domain.

If you have questions about the course, please contact me via grp-dcm-teaching@helsinki.fi or come to my office hours.

Prerequisite

Before the first class, student should master basci or Python or R and know how to

working with variables, for-loops and if-structures
opening files and writing files
calling functions or methods

Course materials

Computational Thinking and Social Science (shared via email)
In class coding activities, install needed packages from 00 Setup Python or 00 Setup R

Course evaluation

Course is evaluated as pass/fail.

To pass, you need writing a manuscript using some of the methods and techniques presented during the course. The manuscript must be written during the Spring semester. Please use office hours to gain more feedback and comments on your manuscript. The manuscript must have all bits in place: there must be a meaningful introduction, some theory or previous work section, description of data and methods, and a discussion.

Syllabus

16.1. Day one

9.15-10.00: Introduction

Read before the class

Chapter "Data Science" from Computational Thinking and Social Science

Supplementary readings

10:00 - 13:00 Decision trees and random forests

Code exercise

Decision trees
Decision trees and random forests

Supplementary readings

11:00-12:00 Lunch break

13:00 - 14:30 Working with Textual Data

Code exercise

Dictionary methods and working with Textual data

Supplementary readings

14:30-15:45 Support vector machines and Naive Bayes

Read before class

Toivanen, P., Nelimarkka, M., & Valaskivi, K. (2021). Remediation in the hybrid media environment: Understanding countermedia in context. New Media & Society,

Code exercise

Support Vector Machines
Naive Bayes
Support vector machines

Supplementary readings

17.1. Day two

9.15 - 10:45 Cluster analysis and K-means

Code exercise

k-means
K-means and clustering

Supplementary readings

10:45 - 13:30 Topic models

Read before class

Code exercise

Topic models
LDA

Supplementary readings

Baumer, E.P.S., Mimno, D., Guha, S., Quan, E. and Gay, G.K. (2017), Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence?. Journal of the Association for Information Science and Technology, 68: 1397-1410.

11-12 Lunch break

13:30 - 14:30 Association rule

Code example

Association rules
Association rules

Supplementary readings

Jurek, S. J., & Scime, A. (2014). Achieving Democratic Leadership: A Data-Mined Prescription. Social Science Quarterly, 95(1), 97–110.

14.30 - 15:45 Conversations on ethics and reliability

Read before class

Chapter "Research Ethics in Computational Social Science" from Computational Thinking and Social Science
Chapter "Mistakes and Quality of Results in Computational Social Sciences" from Computational Thinking and Social Science

20.1. Project work kickoff

Read before class

Chapter "Integrating Computational Methods into Research" from Computational Thinking and Social Science

Additional readings

Data science Spring 2023

Aim and approach

Prerequisite

Course materials

Suggested additional reading

Course evaluation

Syllabus

16.1. Day one

9.15-10.00: Introduction

10:00 - 13:00 Decision trees and random forests

11:00-12:00 Lunch break

13:00 - 14:30 Working with Textual Data

14:30-15:45 Support vector machines and Naive Bayes

17.1. Day two

9.15 - 10:45 Cluster analysis and K-means

10:45 - 13:30 Topic models

11-12 Lunch break

13:30 - 14:30 Association rule

14.30 - 15:45 Conversations on ethics and reliability

20.1. Project work kickoff

9.15 - 12:00 Project planning activities

12:00 - 13:00 Break

13:00 - 15:45 Drop-in hours for individual support