Surveying non-native speakers of the survey language(s): Representation, coverage, data quality |
|
Chair | Dr Michael Ochsner (FORS Lausanne ) |
Coordinator 1 | Dr Oliver Lipps (FORS Lausanne) |
Little is known about representation effects in surveys from offering additional survey administration languages. Until now, logistic and financial reasons are main drivers.
In this potential analysis, we investigate the possible representation of different groups in Swiss general population surveys defined by different survey topics from adding several languages (English, Serbo-Croatian, Portuguese, Albanian), on top of the three Swiss national languages. Topics investigated are religious affiliation, nationality, education level, occupational activity, migration status, and main mode of transport to school/work.
For the analysis, we conduct a potential analysis using a part of the Swiss pooled yearly census survey from 2010 to 2014. Results show that the level and heterogeneity of those who master one of the three Swiss national languages depend on the topic (person groups) considered and the language mastery needed to complete a questionnaire. The topic with the highest heterogeneity with respect to a “good” language competence is nationality with the categories Swiss, foreigners from a neighboring country, foreigners from an English speaking country, and other foreigners. Also groups distinguished by religious affiliation exhibit a high language competence variation.
The language which reduces this heterogeneity in the most efficient way also depends on the topic considered and the language mastery needed. An important result is that the candidates to be the ‘best’ language to be added in the “basic” language scenario reduce to two (English and Portuguese) while in the “good” language scenario all four additional languages could be the best to be added to reduce heterogeneity across the topic categories considered. Also interestingly, additionally providing English would even increase heterogeneity in both language scenarios if education would be the topic of interest. The reason is that adding English would add native English speakers (who have a higher than average education and who are already well represented without offering English) in the “good” language scenario, and in addition those who learnt English at school in the “basic” language scenario (with the same reason than in the “good” language scenario).
The main message of this paper is that the decision, whether a language should be added to the survey language(s) used anyway, and if yes, which language to add, needs a careful investigation of the (main) survey topic and the degree of language mastery necessary to complete the survey. Some topics may be less sensitive to a potentially decreased heterogeneity from an additional language offered, such as in our research the main mode of travel to work or to school. Other topics may be much more sensitive, such as measuring nationality (especially if only a basic language competence is needed) or the educational level (in both language competence scenarios).
General population surveys face increasing linguistig and cultural heterogeneity because of globalization and migration. However, there is a lack of knowledge on the effects of this heterogeneity on the representation bias and response rates of surveys. Linguistic and cultural heterogeneity raises many complex issues, such as translation processes and, depending on the mode, multilingual interviewers or a complicated process of assigning interviewers to respondents. Additionally, survey administrators are increasingly under financial pressure, making it difficult to survey a more complex population with less funding.
In this presentation, I will use two examples for studying the effects of both adding and removing languages to a survey on representation bias or response rates.
I will start with an example for examining the effect of adding languages. Two surveys were administered in three humanities fields at Swiss universities: English and German literature studies, and art history: The first survey was administered in English and German, thus covering the language of the first two fields, including the language of the majority in Switzerland. The second was administered adding a third and fourth language, namely French and Italian, two Swiss national languages that are at the same time very important scholarly languages in the third subject field. I will examine the representation regarding language region (Swiss and French part of Switzerland) and subject field and examine the mother tongue as well as the language chosen to fill in the questionnaire. The results show that adding selected languages can reduce representation bias.
The second example examines reducing languages. A general population survey of a Swiss city was to date administered as a telephone survey in multiple languages. Due to budget constraints and especially severe drops in the response rates over time caused by decreasing phone coverage, a single-language web/paper mixed-mode experiment was conducted. The findings suggest that using a more inclusive mixed-mode design can compensate for some representation bias when reducing languages.
The two examples shed light on the advantages and disadvantages of using multiple survey languages. It also reveals some practical implications for deciding how many and which languages to choose when administering a survey: strategic considerations must be taken in order to balance out ethical and political issues (inclusion of minorities), methodological effects (change of mode to compensate), representation bias (are minorities large enough to make a difference), language competence in the surveyed population, and financial constraints.
In health examination surveys, data is collected through questionnaires, physical measurements and analysis of biological samples. Good understanding of the material is required to understand the invitation, fill-in the questionnaires and provide written informed consent, required for the physical measurements and collection of biological samples.
Finland has two official languages, Finnish and Swedish. The mother tongue of each person is registered in the National Population Information System. Majority of the people living in Finland (95%), speak at least one of these two languages leaving 5% of population speaking other languages. There is legal obligation to provide survey material at least in both official languages.
Health examination surveys, the FINRISK Study, have been conducted in Finland every 5-years since 1972. Information about registered mother tongue has been available from the sampling frame, the National Population Information System since the survey in 1997.
In the FINRISK Study, a random sample of persons aged 25-64 years has been drawn separately for each survey year. The sample size was; 9,900 in 1997; 9,952 in 2002; 7,962 in 2007 and 7,921 in 2012. Invitees receive an invitation letter with a questionnaire and pre-defined appointment time for the health examination. They are asked to fill-in the questionnaire at home and return it during the health examination visit.
The proportion of people in the sample having some other language than Finnish or Swedish as their mother tongue has increased over the years from 1.8% in 1997 to 5.5% in 2012. About 2% of population had Swedish as their mother tongue in all survey years. Those who don’t have Finnish or Swedish as their mother tongue also tend to have lower education than those with Finnish and Swedish as their mother tongue.
When comparing the participation rates between these three language groups, a clear difference was observed. In all years, the participation rate was lowest for those having other languages than Finnish or Swedish as their mother tongue and highest for those having Finnish as their mother tongue. The participation rate has been declining among those with Finnish as mother tongue but at the same time remained relatively stable among two other language groups. In 1997, the participation rate among the Finnish group was 72%, among the Swedish group 68% and among others 50%. By 2012, participation rates among the Finnish group had declined to 63%, but were 69% among the Swedish group and 49% among others.
In Finland, an increasing number of people who don’t speak either of the official languages as their mother tongue are affecting the survey organization. Traditionally it was enough to have survey material in Finnish and Swedish as required by law but nowadays there is pressure to have material also in other languages such as English and Russian. This obviously increases the survey cost but at the same time may help to increase the participation rate.
In recent years, survey methodologists have sought to increase response from Spanish-speaking respondents. About 16 million people in the United States are Spanish speakers with no or very limited English proficiency. Studies have shown that mail surveys are likely to underrepresent Spanish speakers (Caporaso et al., 2013)—particularly when materials are presented only in English. In one study, response rates among Spanish speakers were half that of English speakers (McGovern, 2004). While sending survey materials in both English and Spanish to all respondents has been shown to increase Spanish response (Brick et al., 2012), this approach can be prohibitively expensive.
We will examine efforts made to increase Spanish-language participation in a large annual household survey. The IRS Individual Taxpayer Burden (ITB) survey is an annual multi-mode survey sent to 20,000 individuals in the United States. It measures the time and money taxpayers spend complying with tax law regulations. The IRS ITB Survey is currently being fielded for the sixth consecutive year. Each year, most respondents choose to complete the paper survey. The survey is offered in both English and Spanish, but few respondents complete the survey in Spanish. Although Spanish-speakers may choose to complete the survey in English (perhaps with assistance from family or friends), it seems likely that Spanish-speakers are underrepresented.
With each fielding of the survey, our researchers have sought to improve Spanish-language response through a variety of methods. Although it was not feasible to send all respondents all materials in both languages, over the years, we have incorporated a number of techniques to target Spanish-speakers. These include: sending a Spanish version of the IRS prenote in addition to the English version; increasing the number of modes offered in Spanish (from phone-only to phone, web, and mail); allowing web survey respondents to easily toggle between languages; offering a dedicated Spanish-language customer service phone line; incorporating a Spanish-language callout on English-language materials; and providing web instructions in Spanish.
In this paper, we will discuss the impact of these techniques on the number, mode, and timing of Spanish-language completes, as well as the number and type of calls received on our Spanish-language customer service phone line.
When surveying immigrant populations and members of ethnic minorities, survey researchers have to consider that respondents vary in their level of language proficiency. Large-scale national surveys often provide translated questionnaires to respondents who do not master the survey language well enough and might otherwise not be able to participate. However, not all respondents with low levels of language skills might choose to use translated questionnaires, and survey translations might only be available for a limited number of languages. Respondents completing the survey in a language that they do not master well might have problems in understanding survey questions or in reporting their answer, which might affect the quality of responses they provide.
This paper provides insight into the impact of native language proficiency on survey data quality. At the first wave of Understanding Society: The United Kingdom Household Longitudinal Study (UKHLS), a large sample of respondents were asked about their English-language abilities. Questions included whether English is their first language and any difficulties arising from speaking, reading, and understanding English. The responses to these questions are used to compare data quality amongst those answering the survey in English, using a large number of responses for each respondent. In addition, we have coding of all survey measures on 13 question characteristics including measures of task difficulty and risk of socially desirable reporting.
Using these additional measures of language capability and question characteristics, we initially explore data quality outcomes in a similar way to the recent study by Kleiner et al. (2015). Data quality measures include missing data (through “don’t know” or “refused” answers), the presence of primacy or recency effects, and possible straight-lining of responses in grids. We explore both aggregated data quality measures as well as models of responses within respondents, estimating the impact of question characteristics.
We also add two important extensions relating to language proficiency and measurement of data quality beyond the measures used to identify ability and different question coding. First, UHKLS has a self-completion section, so we are able to explore differential impacts of language ability in aural and visual administration of the survey. Second, we further our understanding by leveraging the longitudinal aspect to UKHLS. We explore the amount of change, an important indicator of data quality in longitudinal studies, and how change differs by respondents’ language ability and question characteristics. We are also able to see if there are any changes in the data quality measures (i.e. DK/REF responses, primacy/recency, straight-lining) across waves by language proficiency.
Initial results suggest that non-native speakers of English, particularly those with difficulty speaking and reading provide more DR/REF responses. Additionally, non-native speakers are more likely to reject completing the self-completion part of the survey than native English speakers.