ESRA logo

Tuesday 16th July       Wednesday 17th July       Thursday 18th July       Friday 19th July      

Download the conference book

Download the program





Thursday 18th July 2013, 11:00 - 12:30, Room: No. 18

Data Management and Data Analysis in Quantitative Historical Social Research

Convenor Dr Ronald Gebauer (Friedrich-Schiller-Universität Jena)
Coordinator 1Dr Axel Salheiser (Friedrich-Schiller-Universität Jena)

Session Details

Today, historical social research no longer is entirely restricted to the study and analysis of ususal historiographic sources, such as written records, manuscripts, or collections of chronological treatises. Currently, one can notice a kind of revolution: Many of these collections had or have been already transfered or copied to electronic media (CD or DVD) or will be, sooner or later. Besides, some historical data of more recent origin had been already collected in order to analyze them electronically. All these types of data have in common that they are ready or almost ready to be analyzed by applying advanced empirical research techniques and that their number is increasing day by day.
This panel is dedicated to the discussion of the analysis of historical data and its major methodological and technical problems. First, this will comprise issues concerning the process of data mining, the transfer of these data to electronic media, and their usage for empirical research in the contemporary Social Sciences and Humanities such as handling shifting validity and bias, detecting and repairing miscoded and missing data, as well as the optimal preparation for computer assisted analysis. Second, a further emphasis will be on strategies of data analysis focussed on historical social structure and historical social change. Third, in the case of more recent data, possibilities of data linking to contemporary data in order to complete longitudinal data such as biographies and other event histories will be discussed. Please submit your paper abstracts to Ronald Gebauer and Axel Salheiser, Institute of Sociology, University of Jena, Germany, ronald.gebauer@uni-jena.de or axel.salheiser@uni-jena.de.


Paper Details

1. The Belgian HISSTAT project: reconstructing and documenting the 1961 census sample

Dr Wouter Ronsijn (VUB)

In the recent past, new datasets have been created or old ones reconstructed for historians to address new research questions. Several national and international projects are running to preserve and reconstruct our statistical heritage. In Belgium, HISSTAT started in 2009, to gather statistical data available for the whole territory of Belgium since 1800. Belgium has a long tradition of local statistical data. With people such as Adolphe Quetelet in its statistical administration, Belgium was among the leading countries for governmental statistics in the nineteenth century. HISSTAT was set up to preserve the country's rich statistical heritage. To that end, both individual microdata and municipal-level aggregated data, originating from censuses as well as from other enquiries, are gathered. The project is currently running to bring together cross-sectional aggregated data since 1800, aggregated longitudinal data since 1880 and cross-sectional microdata since 1961, all covering the entire country.
This paper will first present the main results of HISSTAT, and then focus on the 1961 census sample. The latter is today the oldest individual census data available in Belgium in digital form, covering a broad range of topics such as family situation, education, employment and commuting. Now, the 1961 census file has been prepared for analysis with modern statistical software and its codebook reconstructed. In particular, the paper will report on the background of the 1961 census and the sample file, errors in the file and how these were corrected, and finally on the sample size and accuracy.



2. GDR Petitions – A sunken treasure for Social and Historical Sciences?

Dr Marian Krawietz (University of Potsdam)

To analyze social transformation before and after a certain event at least two data points are required: One before and one after the event. This sine qua non was surprisingly rarely considered in the mainstream transformation studies, which accompanied the changes in the post communistic countries. Benchmark were the living conditions in the western societies, not the pre-conditions in the changing societies itself.
Newest research marks this weakness and claims a more comprehensive perspective. E.g. under the keyword "lange Wende" ("prolonged transformation") the stepwise societal and economic transformation in the late GDR and early Eastern Germany is actually under examination. Manifold challenges connected to this young perspective lead to the question, how to enrich this research with better data from GDR times. Taking into account the partial very problematic data from GDR Social Sciences, one possibility seems to be discovering the sunken treasure of petitions ("Eingaben").
As Mühlberg (2001) stated out GDR petitions are a source of high potential and quality. However since today petitions were not systematically used in Social Sciences, maybe because quantitative orientated researchers do not feel comfortable working in Archives. The Project will break tradition by identifying the "basic population" of the remaining GDR Petitions, polling a sample, building up a longitudinal data set and developing a procedure for in- and external data evaluation.
The contribution is the result of a DFG-project proposal, which Ulrich Kohler and Marian Krawietz are developing at the University of Potsdam.



3. Don't Knows in Online and Telephone Surveys

Mrs Steve Schwarzer (TNS)
Mrs Eva Thalhammer (MeSoS University of Vienna)
Mr Dylan Connor (Department of Geography, UCLA)

Data are now collected through surveys that are deployed in multiple modes across numerous countries. This has made variation in don't know (DK) responses increasingly problematic.
The level of DK responses recorded in surveys are impacted by both social desirability (SD) and satisficing (SC). Both SD and SC are known to be sensitive to survey mode, and can inflate the rate of non-committal responses. The rate of DK responses vary due to the mode of data collection, and this inconsistency can introduce noise into the data.
By analyzing different response rates between online and telephone survey modes, we investigate how survey design can exacerbate or mitigate variation. First, we investigate how the presentation of DK response options in web surveys, can influence the likelihood of their selection by respondents.
Second, we show which question formats limit differences between online and telephone survey modes.
Finally, we test how the effect of survey and response formats differ between countries.
The data for this experiment was collected in Nov/Dec 2012. Online surveys were administered in five European countries (n=1000 each) and telephone benchmark surveys were conducted (n=1000 each) concurrently. This paper concludes by informing researchers how to successfully bridge modes to limit the questionnaire design and mode effects that influence the answering behavior of respondents.


4. A Split Questionnaire Design based on NEPS Data with Block Structure Correlation Matrix

Mrs Sara Bahrami (NEPS, Otto-Friedrich-Universität Bamberg, Germany)
Dr Christian Aßmann (NEPS, Otto-Friedrich-Universität Bamberg, Germany)
Dr Florian Meinfelder (Lehrstuhl für Statistik und ökonometrie, Otto-Friedrich-Universität Bamberg, Germany)
Professor Susanne Rässler (Lehrstuhl für Statistik und ökonometrie, Otto-Friedrich-Universität Bamberg, Germany)

Increasing the length of a questionnaire increases the respondent burden. This can have a negative influence on the response rate as well as the quality of the responses.

A solution to this problem is to decrease the length of the questionnaire using multiple matrix sampling design or split questionnaire survey design. Here we develop and evaluate a method for creating a design by combining the matrix sampling and the split questionnaire designs where only a subset of items is administered to randomly selected respondents. This design is created in such a way that it includes items that are predictive of the excluded items. Subsequent analysis based on multiple imputation can recover information about the excluded items. To avoid loss of information due to item reduction, the correlation of the variables is used as a criterion to select the best combination of items to assign to each individual.

This design is based on pilot data from student cohort in NEPS with a particular correlation matrix structure. The items of the pilot data can be categorized in several blocks. The correlation coefficients of the variables within the blocks are in general higher than those between the blocks. Generating a dataset which mimics the correlation matrix structure of the pilot data was essential for further evaluation of this design.

Results of a simulation study comparing the parameter estimates of several analysis models for a complete dataset to a dataset reduced by the design, shows the effectiveness of the design in keeping parameter estimates unbiased.