ESRA logo

ESRA 2023 Glance Program


All time references are in CEST

Really Random? Exploring Bias and Quality in Random Sampling

Session Organisers Dr Lydia Repke (GESIS - Leibniz Institute for the Social Sciences)
Dr Barbara Felderer (GESIS - Leibniz Institute for the Social Sciences)
TimeTuesday 18 July, 09:00 - 10:30
Room

Random sampling is considered the gold standard in survey research for building population inferences. Extensive research has been conducted to develop sampling strategies that ensure a fully random sample, where the selection probability of each individual in the population is known. However, even random sampling methods can sometimes result in unintended side effects that may bias survey estimates.

For example, the frequently used random route procedure has been shown not to consistently produce equal selection probabilities across households. Similarly, the last/first birthday method for sampling individuals within a household often results in a non-uniform distribution of birth months at the sample level. This can skew survey estimates if the birth months are correlated with the variables of interest.

Random sampling is often carried out by field agencies, leaving survey organizations with little control over the sampling process. Even with strictly random sampling designs, implementation errors may occur, resulting in samples that are not fully random. Currently, there are no well-established methods to assess the quality of (random) samples. However, if the target population is known, indicators can be derived by comparing the net sample to the general population using information such as the distribution of surnames and gender.

We welcome papers on topics including, but not limited to:
1. The relationship between randomization mechanisms and variables of interest,
2. The detection of failed randomization, and
3. The development of quality indicators for successful random sampling.

Keywords: random sampling, sampling bias, randomization errors, quality indicators

Papers

Exploring the Impact of Response Rate on Study Findings: Evidence from Georgia

Mrs Ani Lordkipanidze (GORBI)
Mr Erekle Antadze (GORBI) - Presenting Author

Response rates are widely recognized in literature as a critical factor influencing the accuracy and validity of study findings. High response rates are generally associated with more reliable results, while low response rates can introduce bias, especially if non-respondents differ significantly from respondents. This paper explores the impact of response rate on study outcomes using data from multiple nationwide Computer-Assisted Personal Interviewing (CAPI) surveys conducted in Georgia. These studies were carried out over the year and faced varying response rates, particularly in the context of national unrest and protests, which led to a significant drop in response rates. Despite these fluctuations, our analysis revealed no significant differences in demographic determinants—such as income, education, and employment status—either at the national level or in the capital city. To further investigate, we conducted multiple simulations using existing metadata, manipulating response rates to assess their effect on study outcomes. The results consistently showed no major differences across simulations with varying response rates. Based on Georgian example, these findings suggest, that response rate may not be a major determinant of study results, as long as the sample remains representative, and probability based. Our study provides new insights into the resilience of survey findings to changes in response rate, particularly in volatile contexts, and suggests that methodological rigor in sample design can mitigate potential biases introduced by lower response rates.


Horoscoping and Sampling: Preregistered Exploration of the Impact of Birth Month on Research Outcomes via the 'Whose Birthday Is Next' Sampling Strategy

Dr Lydia Repke (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
Professor Joris Mulder (Tilburg University)
Professor Daniel Oberski (Utrecht University)

A large corpus of substantive research across various domains has documented a birth month effect, wherein individuals born in specific months exhibit distinct outcomes or experiences compared to those born in other months. This phenomenon permeates areas such as health, sports, socioeconomic status, and behavioral traits, to name a few. This contrasts with the common practice in some large-scale survey projects of selecting respondents “whose birthday is next” or “whose birthday was last,” which assumes birth month is uncorrelated with outcome variables. However, if a birth month effect exists, this method could introduce bias, especially when comparing groups that systematically differ in household size (e.g., non-Western immigrants vs. majority populations in Western Europe).

In this study, we first explore and formalize the theoretical relationship between birth month effects and potential biases in the next-birthday sampling design. Subsequently, we conduct a preregistered empirical analysis utilizing the LISS panel (Longitudinal Internet studies for the Social Sciences), a probability-based online panel of Dutch households in which all individuals aged 16 and above within a household participate. Through simulations across 12 different fieldwork periods (i.e., months), we assess the extent of bias that could arise if the LISS panel had employed the next-birthday sampling method instead. We examine 35 target variables, including personality traits, health outcomes, and socioeconomic status, to evaluate the potential impact on research outcomes.

Our findings show no strong evidence of a birth month effect in this study for the variables considered. This suggests that, at least for the Netherlands, the typical social science survey questions and their respective levels of measurement are unlikely to be biased by a birthday-based sampling procedure or birth month effect.


Assessing Mode Effects in Survey Design: A Case Study in Niger

Ms Ceren Huryol (Food and Agricultural Organization of the United Nations (FAO)) - Presenting Author
Ms Ines Lecland (Food and Agricultural Organization of the United Nations (FAO))

This study compares the performance of Computer-Assisted Telephone Interviews (CATI) and Computer-Assisted Personal Interviews (CAPI) in household surveys, focusing on their application within the Food and Agriculture Organization’s (FAO) Data in Emergencies (DIEM) Monitoring system. Leveraging DIEM’s large dataset combined with a case study in Niger, the research evaluates the effectiveness of sample designs, and the consistency of data collected across both modalities. With CAPI considered the “gold standard,” CATI data is benchmarked against CAPI results to identify strengths, weakness and the contextual appropriateness of each modality.

The study aims to (1) assess the parameters guiding sample design to enhance comparability, (2) test non-coverage bias inherit to CATI methodology and potential mode effects. A combination of literature review, desk analysis of DIEM datasets, and a Niger case study informs the research. For the case study, two simultaneous surveys were conducted using DIEM-recommended parameters across four strata in Niger, with CAPI (n=700) and CATI (n=600) samples designed for direct comparison.

By comparing CATI and CAPI in terms of representativeness, coverage bias, and data quality, the research provides insights to optimize survey design and implementation. Although centered on DIEM data and guiding principles, which are themselves based on standard approaches, the findings offer broader implications for researchers and practitioners seeking to balance resource constraints, logistical challenges, and data quality across survey modalities in conflict driven contexts.

This research seeks to contribute to the growing body of literature on survey methodology, providing evidence-based recommendations to improve data reliability and validity. It highlights practical considerations for questionnaire development, sample selection, and data collection strategies, both within the DIEM framework and in other diverse survey contexts.


Could some of the discrepancies in fertility estimates be due to differences in sampling implementation?

Professor Ismet Koc (Hacettepe University Institute of Population Studies) - Presenting Author
Dr Melike Sarac (Hacettepe University Institute of Population Studies)

Some of the discrepancies in fertility estimates may be due to differences in sample composition and implementation. If women with high fertility are underrepresented or overrepresented in a demographic survey by chance, we expect the fertility estimated in that survey to be below or above the fertility in another survey for all time periods. The estimate of recent fertility will thus be underestimated or overestimated. This issue can be checked by retrospectively estimating the composition of the sample for some socio-demographic characteristics in a given age group. In this study, the proportions of women age 15-34 with at least six years of education in the 15 years preceding the survey were computed by using the Demographic and Health Surveys that were conducted in Turkey between 1993 and 2018 in 5-year interval. As for total fertility rates, the retrospective estimates of educational levels should match across surveys for the same periods. Results show that in the second survey (TDHS-1998), the percentage of women with six years of education is much higher than in the third (TDHS-2003) and fourth (TDHS-2008) surveys; this difference helps to explain the much lower fertility in the TDHS-1998 compared with subsequent two surveys for the same periods. Furthermore, the fifth survey (TDHS-2013) also included a larger proportion of educated women than the following survey (TDHS-2018); this confirms the role of discrepancies in sample composition in the lower fertility reported for the same period in the TDHS-2013 in Turkey. Overall, these results indicate that sample composition is a potential determinant of fertility rates estimated in four of the six surveys in Turkey, suggesting that the effect of sampling implementation causes a small to moderate bias in fertility estimates over time in the same periods.