Aim and approach

This course provides a strong methodology foundation in applying data science methods for social sciences. In the end of the course, the student knows how to (a) conceptually apply data science approaches in work and (b) knows how to code machine learning using R or Python language.

This course focuses on

  • conceptual understanding of applying data science approaches through discussing papers which have used a particular approach.
  • hands-on skills of conducting these analysis with data sets

By the end of the course, students can

  • conduct data pre-processing both with quantitative and qualitative datasets
  • apply unsupervised machine learning methods to quantitative and qualitative data and draw social science relevant conclusions from the results
  • apply supervised machine learning methods to quantitative and qualitative data and draw social science relevant conclusions from the results
  • discuss the benefits and challenges of using data science methods for social science research methods and skills to apply these methods in students’ own research domain.

If you have questions about the course, please contact me via grp-dcm-teaching@helsinki.fi or come to my office hours.

Prerequisite

Before the first class, student should master basci or Python or R and know how to

  • working with variables, for-loops and if-structures
  • opening files and writing files
  • calling functions or methods

Course materials

  • Computational Thinking and Social Science (shared via email)
  • In class coding activities, install needed packages from 00 Setup Python or 00 Setup R

Suggested additional reading

  • Hastie, T., Tibshirani, R., & Friedman, J. The Elements of Statistical Learning. Elements. New York, NY: Springer New York.

Course evaluation

Course is evaluated as pass/fail.

To pass, you need writing a manuscript using some of the methods and techniques presented during the course. The manuscript must be written during the Spring semester. Please use office hours to gain more feedback and comments on your manuscript. The manuscript must have all bits in place: there must be a meaningful introduction, some theory or previous work section, description of data and methods, and a discussion.

Syllabus

16.1. Day one

9.15-10.00: Introduction

Read before the class

  • Chapter "Data Science" from Computational Thinking and Social Science

Supplementary readings

10:00 - 13:00 Decision trees and random forests

Code exercise

Supplementary readings

11:00-12:00 Lunch break

13:00 - 14:30 Working with Textual Data

Code exercise

  • Dictionary methods and working with Textual data

Supplementary readings

14:30-15:45 Support vector machines and Naive Bayes

Read before class

Code exercise

Supplementary readings

17.1. Day two

9.15 - 10:45 Cluster analysis and K-means

Code exercise

Supplementary readings

10:45 - 13:30 Topic models

Read before class

Code exercise

  • Topic models
  • LDA

Supplementary readings

11-12 Lunch break

13:30 - 14:30 Association rule

Code example

Supplementary readings

14.30 - 15:45 Conversations on ethics and reliability

Read before class

  • Chapter "Research Ethics in Computational Social Science" from Computational Thinking and Social Science
  • Chapter "Mistakes and Quality of Results in Computational Social Sciences" from Computational Thinking and Social Science

20.1. Project work kickoff

Read before class

  • Chapter "Integrating Computational Methods into Research" from Computational Thinking and Social Science

Additional readings

9.15 - 12:00 Project planning activities

12:00 - 13:00 Break

13:00 - 15:45 Drop-in hours for individual support