All time references are in CEST
The Implementation of PIAAC Cycle 2: Challenges and Lessons Learnt |
|
Session Organisers | Ms Anouk Zabal (GESIS – Leibniz Institute for the Social Sciences) Ms Silke Martin (GESIS – Leibniz Institute for the Social Sciences) |
Time | Tuesday 18 July, 09:00 - 10:30 |
Room |
PIAAC, the Programme for the International Assessment of Adult Competencies, is a multi-cycle international survey that aims at measuring key skills in the adult populations of the participating countries. PIAAC Cycle 2 is designed to obtain international comparable data of the highest quality possible, and to be comparable to the first cycle of PIAAC. Thirty-one countries participated in the second cycle of PIAAC and collected data within an elaborate quality assurance and control framework, adhering to gold standards in survey methodology.
The second cycle of PIAAC introduced innovative elements in various areas, such as in the instrumentation (e.g., the inclusion of social-emotional skills in the background questionnaire or the introduction of the new assessment domain adaptative problem solving in the cognitive assessment), the interview mode of the cognitive assessment (tablet-based), or survey operations (e.g., introduction of a dashboard as a tool to monitor fieldwork, adaptive survey design). The PIAAC survey had many implementational challenges stemming from its unique design and specifications, with the additional challenge of finding a balance between innovation and trend. Implementing PIAAC within the national settings often required new national solutions to meet international standards. In addition, the COVID-19 pandemic impacted on the data collection of both the field study, carried out in 2021, and the main survey, carried out in 2022/2023. Many countries faced difficulties during fieldwork, such as declining response rates, and had to rise to the challenge of adapting to changing situations and constraints. This session will focus on methodological issues and aims to contribute to extending the survey-methodological body of knowledge and offer insights pertinent to other international surveys and large-scale assessments.
Keywords: PIAAC, international comparability, survey operations, data quality, nonresponse
Dr Silke Schneider (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
The production and certification of skills is a primary task of educational systems. One aim of the OECD’s Programme for the International Assessment of Adult Competencies (PIAAC) is to evaluate how effectively and equitably educational systems produce basic cognitive skills. Educational attainment is, as the most important predictor of adult skills, a crucial background variable in PIAAC. However, due to varying educational organizations across countries, it is also the most challenging variable to measure internationally.
This paper presents how PIAAC Cycle 2 measured and harmonized educational attainment across countries and related opportunities and challenges. While national adaptation and international harmonization were already in place in PIAAC Cycle 1, the international PIAAC consortium refined these procedures for Cycle 2. They included a formal education consultation process between countries and the international consortium, following the practice established for the European Social Survey. It explains the target international coding scheme, which builds upon but extends the International Standard Classification of Education 2011, and the various derived variables made available for analysis, including variables for comparisons across PIAAC Cycles.
Although the refined procedures for Cycle 2 likely improved the information content and comparability of education measures in PIAAC, the process was highly challenging and had limitations. Firstly, national adaptations were very complex in some countries, requiring disproportionate time and effort for consultation, data processing, and quality control. Secondly, countries' implementations of ISCED do not always lead to comparable measures. Thirdly, and most importantly, data protection requirements and lacking research data management procedures at the OECD led to the suppression of different variables in different countries, limiting the potential of the data for scientific research. At the same time, only some countries offer scientific use files.
Dr Britta Gauly (GESIS) - Presenting Author
Mrs Silke Martin (GESIS)
Mrs Anouk Zabal (GESIS)
Dr Sanja Kapidzic (GESIS)
Dr Natascha Massing (GESIS)
Declining response rates affect the composition of population survey samples. Weighting is one approach to address potential nonresponse bias. One limitation is that nonresponse adjustment requires auxiliary variables that are available for both respondents and nonrespondents and are associated with participation and the research outcome. The second cycle of PIAAC, the Programme for the International Assessment of Adult Competencies, included an interviewer observation module intended to yield (comparable) information from respondents and nonrespondents as an additional data source. The German PIAAC survey implemented a registry sample with limited but high-quality registry-based information available for both respondents and nonrespondents. Thus, exploring the possibility of additional appropriate auxiliary variables was particularly important. To evaluate the quality of the interviewer observations for weighting, a reliability study was conducted. For a subsample of the PIAAC sample, another interviewer was appointed to carry out a second set of observations. We will investigate the predictive power of the interviewer observation indicators on participation and skills – the latter are the central research outcome of PIAAC – and evaluate whether these variables qualify as candidate variables for (nonresponse) weighting. Furthermore, we present results from the reliability study to assess the accuracy of the interviewer observation indicators. Interviewer observations were collected before the first contact with the target person and are somewhat subjective in nature. However, some of the observations focus on more objective features, such as house type or house condition. PIAAC Germany included an additional very coarse indicator of the target person’s education because cognitive skills are closely related to the respondent’s educational attainment. This observation is, however, difficult to assess objectively before having contact with the target person and is hence more prone to variation. In our analyses, we will contrast the different types of observation data and evaluate their quality and reliability.
Dr Michal Sitek (Educational Research Institute) - Presenting Author
Professor Artur Pokropek (Educational Research Institute)
Poland's large decline in PIAAC results between Cycle 1 (2012/13) and Cycle 2 (2022/23) is a puzzle. It was among the largest of all participating countries and has raised concerns about data quality in large-scale assessments. While the OECD flagged problematic interviewer data in the international analyses, the full extent of quality issues—ranging from rapid guessing and item omissions to inconsistent interviewer performance—remains underexplored. This study investigates the role of respondent engagement, measured through detailed test-taking behavior, and interviewer effects in shaping Poland's PIAAC outcomes. Using paradata (timing and action data), alongside test characteristics (item response theory parameters and linking items), we analyze variations in rapid guessing, item omissions, and response times. Building on established frameworks (e.g., Goldhammer et al., 2016; Kroehne & Goldhammer, 2018; Pokropek, 2016; Wise & Kong, 2005; Ulitzsch et al., 2019), we demonstrate that motivational factors, such as low engagement and rapid guessing, explain a portion of the decline that was not accommodated by the scaling methods used. We also address interviewer effects as a potential contributing factor. By applying advanced classification and modeling techniques, we explore how test-taking behavior may have distorted the measurement of true skill levels and how this behavior varies among PIAAC respondents.
This study contributes to the broader discourse on survey methodology by highlighting the importance of paradata in understanding respondent behavior and data quality. Our findings underscore the critical need to account for motivational factors in the development, scaling, and analysis of large-scale assessment data, offering valuable insights for improving the design and interpretation of future large-scale assessments.
Mrs Silke Martin (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
Mrs Anouk Zabal (GESIS - Leibniz Institute for the Social Sciences)
The Programme for the International Assessment of Adult Competencies, PIAAC, is a multi-cycle international comparative assessment of literacy, numeracy, and problem solving. This large-scale survey comprises an interviewer-administered background questionnaire and a self-administered assessment conducted in the official country language(s). PIAAC collects data from the 16- to 65-year-old population residing in the participating country, regardless of citizenship, status, or knowledge of the official country language(s). The aim is to obtain participation from as many selected target persons as possible in order to draw reliable conclusions. Target persons with language barriers are often unable to participate because their language skills are insufficient, and they may also be reluctant to interact with interviewers. The participation of these persons is crucial for the valid measurement of skills in the population. In PIAAC, the best option is to have an interpreter assist respondents with language barriers when completing the background questionnaire. If this is not possible, rather than obtaining a refusal, interviewers can attempt to conduct a doorstep interview. This interview consists of six questions that respondents can answer in their own language at the doorstep of their homes. It provides minimal key information on characteristics of these doorstep respondents (e.g., age, gender, employment status), and, importantly, sufficient information for the estimation of proficiencies. The doorstep interview was developed for PIAAC Cycle 2 and is an innovative approach to reduce literacy-related nonresponse. Despite the benefits of the doorstep interview, we encountered some challenges in implementing this non-standard instrument. The present contribution will illustrate and summarize experiences from implementing the doorstep interview in PIAAC Germany and reflect on issues of international comparability.
Dr Jungmin Lee (Korea Research Institute for Vocational Education and Training (KRIVET) ) - Presenting Author
Test engagement is a critical factor influencing the reliability and validity of large-scale assessments. Utilizing process data from the Programme for the International Assessment of Adult Competencies (PIAAC), this study investigates the relationship between response time and response accuracy to evaluate test engagement and its impact on assessment outcomes. By applying a dual-process model, we categorize responses into two patterns: "fast but inaccurate" and "deliberate and accurate," enabling a nuanced understanding of engagement behaviors during testing.
The study further explores how deviations in response time—both excessively short and prolonged—correlate with response accuracy across varying item difficulties. Using descriptive and inferential statistics, we identify patterns of engagement based on response time distributions and associate these with demographic and contextual factors, such as age, educational background, and country of origin.
To enhance the interpretability of engagement metrics, we apply hierarchical linear modeling (HLM) to compare engagement patterns across countries, considering national educational systems and assessment cultures. Furthermore, we simulate low-engagement scenarios by manipulating response time thresholds to measure their influence on test outcomes, offering insights into data-cleaning strategies to improve assessment reliability.
Previous findings suggest that moderate response times are associated with higher accuracy, while extreme response times—both short and long—indicate lower engagement. Significant variations in engagement patterns across demographic groups and countries underscore the importance of accounting for contextual factors in interpreting assessment data.
This research advances survey methodologies by providing evidence-based strategies to identify and mitigate low engagement. The results have implications for improving the design, administration, and interpretation of large-scale assessments like PIAAC, ensuring that findings more accurately reflect the actual competencies of respondents.
Mrs Elisabeth Falnes-Dalheim (Statistics Norway) - Presenting Author
The Gift Card Experiment
Statistics Norway has registry information containing the information of the highest, completed education of the citizens of Norway. In the database of Higher education in Norway there is data and information from the universities and all the other higher educational institutions in Norway. The Statistics Act for Statistics Norway gives Statistics Norway the right to use this database and therefore the educational level of almost all the respondents in the Norwegian PIAAC sample are known to Statistics Norway.
Norway, as many other countries, faced difficulties during fieldwork of PIAAC cycle 2. Such as declining response rates among the lower educated respondents. In addition, the data collection was too slow. To meet these challenges, an experiment was constructed.
The experiment aimed at speeding up the data collection and raising the response rate among the lower educated respondents ensuring high quality in the collected data, and better representation for all educational groups in Norway.
During fieldwork the response rates among the respondents with high (completed master’s degree) and very high education (Ph.D.) were higher than the response rate among the respondents with education beneath master (bachelor’s degree and beneath). In particularly, the goal was to achieve higher response rate for this group of respondents and fight biases in the PIAAC data. The experiment shows how raising the amount of the gift card impacted the willingness to take part in the survey. Prior to the experiment every respondent participating was offered a gift card of 500 Norwegian kroner (NOK). In the experiment some of the refusals and non-contacts were offered higher amounts (800 NOK or 1100 NOK) and the control group still got the same amount as the higher educated respondents.
The experiment produced many interesting findings and useful knowledge for other data collection teams.
Mr Benjamin Schneider (Westat) - Presenting Author
Mr Tom Krenzke (Westat)
Dr Oksana Balabay (Westat)
Ms Wendy Van de Kerckhove (Westat)
Mr John Lopdell (Westat)
The Programme for the International Assessment of Adult Competencies (PIAAC) facilitates comparisons of published statistics among dozens of OECD countries that vary in the patterns of nonresponse they face as well as the resources available to mitigate nonresponse bias through data collection strategies and weighting adjustments. The OECD helps data users assess the quality of published estimates by requiring each country’s data undergo a standardized set of nonresponse bias analyses (NRBAs) that identify the potential for nonresponse bias and assess the effectiveness of the country’s efforts to mitigate nonresponse bias. An important but challenging objective of the NRBA process is to produce brief, accurate quality summaries that synthesize the findings from multiple analyses across many countries, while providing straightforward guidance that helps data users determine when countries’ PIAAC estimates are fit for the users’ purposes.
We provide an overview of the key methods used in assessing nonresponse bias for PIAAC Cycle 2, including standard approaches such as level-of-effort analyses or comparisons of estimates before and after calibration to alternative sources of control totals. We also detail a new summary statistic introduced in PIAAC Cycle 2 to concisely summarize the combined effects of nonresponse and nonresponse weighting adjustments, termed the “explained variation in outcomes (EVO)” statistic. The EVO statistic is compared to related summary measures such as the R-indicator or Fraction of Missing Information statistic. We discuss the role of these various NRBA methods in providing overall assessments of nonresponse bias and fitness for purpose of estimates produced in PIAAC.