KonsortSWD Coding
 

Aim

As part of the "Coding" subproject within Task Area 3 (Data Generation), an infrastructure is to be established for the efficient coding of textual or open-ended information from surveys, particularly in the areas of occupation, industries, and (further) education, courses and fields of study. By establishing a competence center, the know-how from related research, from different application contexts and from relevant stakeholders will be brought together. The goal is to develop and provide database-driven software to support (quasi-)automated coding and derivation processes, so that suitable standard variables can be generated in a cost- and time-efficient manner. For data producers and providers, such an offer means new possibilities for enriching their data sets. For data users, the added value lies in the expanded research potential of the (additional) standard variables and their high comparability and interoperability. This objective is also served by the close cooperation of the project with the subproject on standardization and harmonization of variables (TA.3-M.1), for which GESIS is responsible.

 

Background

Open and semi-open response formats are part of almost all surveys in the social, behavioral, educational and economic sciences. Generally, they serve the empirical operationalization of theoretical constructs for which it is not possible to adequately represent all relevant response options in the instrument. The (post-)usability of such data for quantitative analyses largely depends on the - typically retrospective - coding of the textual information and the subsequent derivation of standard variables. Manual coding in this context is a time-consuming, error-prone, and costly task in the face of hundreds or even thousands of categories. An example is occupational information (e.g., job and occupation titles), the provision of which in the form of relevant classifications (e.g., ISCO, KldB) and derived status, class, or prestige indicators (e.g., ISEI, SIOPS, EGP class scheme, CAMSIS) significantly increases the analytical potential of research data. In particular, panel studies with extensive educational and employment biographies face the challenge of having to code large amounts of textual entries in a high-quality and consistent manner within a short period of time.

 

Procedure

In the first phase of the project, the focus is on conceptual preparations, in particular the determination of the needs of potential users, the definition of appropriate functions and a distribution model, and the definition of a suitable software architecture. The second phase of the project essentially serves the technical implementation of the concept and the basic testing of the tool in a broader context. At the same time, quality standards and documentation materials are to be developed and coordinated. In addition, there are various experiments as part of an accompanying research on the reliability and efficiency of the tool. After successful testing and further development of the CODI tool, the final project phase is reserved for the publication of the tool in graduated Open Access versions, the introduction of the service in the scientific community and the support of the users. It is also important to design strategies for long-term operation and continuous further development of the CODI.

 

Project profile