All time references are in CEST
Measurement and coding of job-related information: Occupation, industry, and skill |
|
Session Organisers | Dr Malte Schierholz (LMU Munich) Ms Olga Kononykhina (LMU Munich) Dr Calvin Ge (TNO) |
Time | Tuesday 18 July, 09:00 - 10:30 |
Room |
Occupation coding refers to coding a respondent’s text answer (or the interviewer’s transcription of the text answer) about the respondent’s job into one of many hundreds of occupation codes. Relatedly, many surveys gather data about the person’s industry or her various skills in similar ways. We welcome any papers on how to best measure jobs and job-related information, including, but not limited to:
- measurement of occupations, industries, and skill (e.g., mode, question design, …)
- handling of different occupational and industry classifications (e.g., ISIC, NACE, NAICS, ISCO, ESCO and national classifications)
- problems of coding (e.g., costs, data quality, …)
- techniques for coding (e.g., automatic coding, computer-assisted coding, manual coding, interview coding)
- computer algorithms for coding (e.g., machine learning, LLMs, rule-based, …)
- cross-national and longitudinal issues
- Measurement of derived variables (e.g., ISEI, ESeC, SIOPS, job-exposure matrices, …)
- other methodological aspects related to the measurement and coding of job-related information
Keywords: measurement, coding, occupation, industry, skill, long-list questions
Dr Dimitar Minovski (Ericsson AB) - Presenting Author
Miss Zoe Ansaldi (Ericsson AB)
Mr Alexander Hall Lanerfeldt (Ericsson AB)
Mr Gokul Panneerselvam (Ericsson AB)
Traditional user research often begins with building user personas to guide recruitment and research design. However, in organizational settings, particularly when evaluating policies like the in-office attendance, employee personas are often not well-defined at the outset. This study proposes a method for constructing employee user personas by designing complementary background and behavioral questions tailored to the "office first" policy, which in our case, allows up to two days of remote work per week.
The background questions were designed to capture and categorize employees' diverse demographic and perspectives regarding in-office attendance, while the behavioral questions complements the demographics by focusing on experiences, preferences, barriers, and challenges. A total of 7 background and 7 behavioral questions were included in the survey. This approach enabled the categorization of employees into distinct personas without pre-imposed biases or pre-selection criteria, as recruitment was randomized across a global workforce of over 14,000 employees at Ericsson.
The survey results revealed an average satisfaction score of 3.08 out of 5 regarding the "office first" policy. Besides a thorough evaluation of the "office first" policy, the primary contribution of the study is the post-creation of 7 employee user personas, describing in-dept differentiation factors across workforce groups and cultures in the context of the policy. Additionally, the personas highlight the drivers and barriers influencing in-office attendance and provide actionable insights for tailoring workplace improvements, communication strategies, and support systems to better meet employee needs.
By leveraging this combined methodology, we demonstrate a framework for building post-hoc user personas using survey results as a foundation for data-driven organizational decision-making. The proposed framework describes the employee user personas as well as offers a replicable approach for organizations aiming to enhance employee satisfaction and engagement while adapting to evolving workplace.
Ms Britta Maskow (Chemnitz University of Technology) - Presenting Author
Professor Jochen Mayerl (Chemnitz University of Technology)
A quantitative and qualitative change in the relationship between companies and their employees in new forms of work has been discussed for many years (Pongratz and Voß 2003). The resulting shift of the transformation problem to employees has consequences for employment and may also lead to a change in the general constitution of labour capacity in our society. Subjectivisation of Labour is a social process in which the 'whole person' is or should be included in companies' rationalisation strategies in order to gain extended access to individual competences (working definition, cf. Minssen, 2012).
Most measurement instruments for this construct have been developed qualitatively. Fritz et al. developed the first quantitative Workforce Entrepreneur Scale (16 items) based on Pongratz & Voß (2003), Nievergelt (2004), and Schmitz & Schwarzer (1999). Our research design includes a mixed mode sample (online and postal survey, N = 788) and an online access panel survey (N = 1500) in Germany. The presentation discusses the results of a confirmatory factor analysis with randomly split data, which cross-validates a new Subjectivisation of Labour Scale. To test the scale in different sub-populations, we conducted several multi-group analyses based on industry classifications (NACE), occupational position, survey mode, and demographics using the online access panel survey.
Professor Marcin Kocór (Jagiellonian University) - Presenting Author
Professor Barbara Worek (Jagiellonian University)
In the face of implementing the Green Deal policy, labor market research increasingly focuses on green skills. However, their definition and measurement pose significant challenges. In the latest edition of the Human Capital Study survey, the need arose not only to operationalize the measurement of these skills but also to highlight mismatches in this area.
During the presentation, difficulties related to defining green skills and creating questionnaire questions for employers and employees will be discussed. The research results will also be presented, allowing for an assessment of the validity of the proposed approach. The conclusions from the presentation may contribute to a better understanding and more effective measurement of green competencies in the context of the labor market and skills measurement.
Mrs Olga Kononykhina (LMU Munich) - Presenting Author
Dr Malte Schierholz (LMU Munich)
Occupational coding is a critical funnel between open-ended job descriptions and the statistical frameworks that shape employment research and policies. Automatic coding tools—whether rule-based or machine learning (ML)—have streamlined the process, and demonstrate promising results. Yet, ML approaches typically require extensive, high-quality training data that exceed what a typical national survey can provide and fall under data protection constraints.
This study asks whether mainstream large language models (LLMs) can serve as a viable alternative, largely bypassing the need for exhaustive training data and requiring only some coding skills and API access. We created embeddings for standardized German (Kldb) job descriptions, then used respondents’ own words (e.g., “doctor”) from a representative German survey to generate job embeddings. Cosine similarity was applied to find the five most likely occupational codes for each response.
To assess performance, we compared LLM-based suggestions with those from a German ML occupational coding tool (OccuCoDe), using professional manual coding as our benchmark. Results show that in 55% of the cases, both LLM and OccuCoDe included the correct code among their top five suggestions. However, there was limited overlap: in 60% of the cases, the two tools shared at most two out of their five recommended codes. While OccuCoDe more frequently placed the correct code as the first suggestion, LLM-embeddings suggested the correct occupation in 45% of cases where OccuCoDe did not provide any result. Additionally, LLM performance was sensitive to minor changes in job descriptions (e.g., capitalisation or gendered job titles) and sometimes showed “embedding drift,” raising reproducibility concerns.
Our findings highlight LLMs’ promise as a complement or substitute to other tools for occupational coding in limited training data contexts, while underscoring critical limitations that must be addressed before fully entrusting them with classifying the work we do.
Dr Sebastian Kocar (Institute for Social Science Research, University of Queensland) - Presenting Author
Dr Daniela Peycheva (Institute of Epidemiology and Health Care, University College London)
Dr Matt Brown (Centre for Longitudinal Studies, University College London)
Professor Joseph W. Sakshaug (Institute for Employment Research (IAB), and Professor of Statistics, Ludwig Maximilian University of Munich)
Dr Claire Bhaumik (Ipsos UK)
Professor Lisa Calderwood (Centre for Longitudinal Studies, University College London)
Occupation coding is a critical component of social research, providing essential insights into socio-economic status. Traditionally, occupation data have been collected through interviewer administration using open-ended questions and manual office coding, a method regarded as the "gold standard." However, this approach is resource-intensive and may be less feasible for self-completion surveys. As web surveys become increasingly prevalent, it is vital to explore new approaches to occupation coding.
One new approach is the look-up self-coding approach in which participants (or interviewers) enter keywords which describe their job and select an appropriate code from a presented list. In this paper we evaluate the potential of this approach and compare it with traditional office coding. We use data from an experiment conducted in the 9th wave of the Next Steps longitudinal study in the UK, a web-first mixed mode survey in which participants were asked to both self-code their occupation and provide an open-ended job description, which was then manually coded by two independent office coders.
The study uses two indicators of feasibility and data quality: the look-up coding rate and the agreement between look-up and office coding. We will explore the impact of respondent characteristics and look-up input metrics and will use these findings to propose potential improvements to the look-up approach.
The findings will be of significant value to survey practitioners wishing to collect information about occupation.
Mr Russell Castañeda (Verian Belgium) - Presenting Author
Mr Nicolas Becuwe (Verian Belgium)
Collecting detailed and accurate information on respondents' occupations and industries is a significant challenge in social research, particularly in cross-national surveys. This data is typically gathered through open-ended questions and later recoded into standard classifications like ISCO-08 and NACE. Coding these responses is costly and time-consuming, requiring well-trained coders, standardised training, detailed guidelines, and verification processes to ensure accuracy. The variability in respondent information and coder interpretation further complicates this task, especially in online surveys without direct interviewer oversight.
To address these challenges, the Synergies for Europe's Research Infrastructures in the Social Sciences (SERISS) developed occupation and industry databases to harmonise data collection. At Verian, we initially adopted these databases and then customised them to suit the unique requirements of our high-quality cross-national online surveys targeting the EU working population. Our presentation will share our experience with this practice, detailing its implementation across all survey stages: questionnaire design, scripting, translation, piloting, fieldwork, and data processing. Like any tool, it has potential improvements such as optimising search functionality, adding broader categories, and ranking results by relevance. This approach sets a new standard for cross-national surveys by improving data accuracy and consistency, while saving time and resources.
Dr Kea Tijdens (WageIndicator Foundation) - Presenting Author
'Industry' is a major concept in many statistics. It identifies the activities of enterprises, the coverage of collective bargaining, the industries in which workers are employed, and so on. Statistical offices have agreed on an industry classification, NACE or ISIC. In open questions in surveys respondents report answers ranging from highly aggregated (“I work in an office”) to very detailed disaggregated concepts (“I work in a mixed goat and sheep farm”). Such answers challenge the level of detail on which the industry can be defined. To tackle this problem Belloni and Tijdens (2017) developed occupation to industry predictions for measuring ‘industry’ in web surveys, based on the respondents’ occupations. These were selected from a predefined list of ISCO-coded occupations. The prediction uses the 4-digit ISCO code as input and results in a 2-digit NACE code.
The resulting prediction table has been applied in the worldwide WageIndicator Salary Check, with more than a million web users from many countries worldwide. Based on the selection of their occupation, users can explore the salary range applicable to the selected occupation on 4 digit level, or on aggregated levels if data was not available in such detail. Users are also asked for their industry, whereby the list of predicted 2-digit NACE industries is shown, including an option 'other'. This paper explores to what extent users click 'other', including variation across occupations and across countries.
REF: M. Belloni & K. Tijdens (2019). Occupation to industry predictions for measuring industry in surveys, AIAS-HSI Working Paper 5. Deliverable 8.11 SERISS, funded under European Union’s Horizon 2020 research and innovation programme GA No:654221 https://wageindicator.org/documents/publicationslist/wageindicator-org-publications-2019/1904-wp-5-def-occupation-to-industry-prediction.pdf