на голосование

A Beginner's Guide to Statistical Data Analysis for Social Sciences

Проголосовать за курс:
Olga Echevskaya
Novosibirsk State University
This course aims to develop the skills of understanding, transforming, analyzing and presenting the data of social surveys and other kinds of quantitative research in social sciences. Starting from the very basics (data matrix, variables and observations, variable types and basic visualization instruments) the course discusses the basic analytical procedures and methods: descriptive statistics, relationships between variables, comparing means and distributions, transforming data and constructing typologies. The course is developed for beginners (students and researchers in social sciences).

Урок: Introductory Lesson: The Data Basics

План курса:

  1. Data basics and descriptive statistics: structure of the data matrix, variables and observations, the main types of scales for measuring social phenomena: nominal, ordinal, interval. Variable distributions. Characteristics of variable distributions: measures of center and spread, visualization of variable distributions (pie chart, bar chart, histogram, boxplot, scatterplot). Distributions of interval variables: normal distribution; skewness and modality of distributions.
    Examples: distribution of individual monthly income in Russia, analyzed and visualized.

  2. Data transformation and sampling: splitting and sorting the dataset, recoding variables, computing complex variables.
    Working with missing values and outliers.
    Example: correcting the distribution of income.
    The basics of sampling: what is a sample, and what is a representative sample? Estimating population parameters based on the characteristics of a sample. Selecting random and conditional subsamples from a dataset.
    Example: selecting subsamples of urban and rural population. Analyzing the differences in characteristics of the subsamples.

  3. Relationships between two variables: logic and coefficients for used to analyze relationships of different forms. Linear and non-linear forms of relationship. Correlation coefficients for ordinal and interval variables. Simple linear regression: the basic idea. Limits and traps: correlation is not causation.
    Examples: “storks and babies”, “chocolate consumption and Nobel Prize winners”.
    Coefficients for nominal variables. Analyzing and interpreting tables and coefficients.

  4. Statistical comparison: parametric and nonparametric tests. Hypothesis testing, hull hypothesis and alternative hypothesis, the idea of statistical significance of the differences between groups.
    Working with interval normally distributed variables: parametric tests, logic, application and limitations. Normality testing: statistical and visual instruments. T-test for comparing means in two groups (independent and related samples). ANOVA (Analysis Of Variance) for testing the differences between more than two groups. F-test and multiple comparisons.
    Example: the analysis of regional differences in income levels in Russia.
    Working with other variable types and small samples.
    Nonparametric tests for comparing two and more groups.
    Example: analyzing socio-demographic differences in consumption patterns.

  5. Basics of typological analysis. Analytical (designed by researcher) and automatic (generated by automated algorithms) typologies. Typologizing observations (the idea of clustering) and variables (the basic idea of dimension reduction and factor analysis). Some examples of analytical typologies (modeling social inequalities) and automatic typologies (modeling consumption patterns).
    The in-depth understanding of constructing typologies requires serious statistical background; here only the basic ideas behind them are discussed. The students will learn to understand the logic behind it, but not to construct the typologies automatically (this needs to be done within an advanced course on data analysis).
    The video lectures are illustrated with the examples designed by the teacher especially for the purposes of this course based on the data of Russia Longitudinal Monitoring survey (RLMS). We thank the Russia Longitudinal Monitoring survey, RLMS-HSE, conducted by the National Research University Higher School of Economics and ZAO “Demoscope” together with Carolina Population Center, University of North Carolina at Chapel Hill and the Institute of Sociology RAS for making these data available (http://www.hse.ru/rlms, http://www.cpc.unc.edu/projects/rlms ).

    The lectures include in-build test questions aimed at self-evaluation and making sure that the material is understood. The answers to these quizzes do not affect the grade.
    Student’s progress is measured based on the quizzes (10-question quiz at the end of each unit), and a short peer-evaluated assignment at the end of the course. The grade for the course is composed as follows:
    Quizzes - 50% (10% each weekly quiz)
    Peer-evaluated assignment: 35%
    Evaluating and commenting the work of peers: 15%.
    The score above 70% indicates successful completion of the course; the score above 90% indicates completion of the course with distinction.
comments powered by Disqus