Data and methods

Data are the foundation for evidence-based research. Therefore, the value of different types of data collection is made transparent. Important statistical and econometric methods are explained that provide instruments to condense information and to identify and quantify correlation or causality. Data sources used in our articles are cited according to the IZA World of Labor data citation convention.

The list of data sources can be found here.

The list of methods can be found here.

Subject Editor

Arnaud Chevalier Royal Holloway, University of London, UK, and IZA, Germany

Associate Editor(s)

John M. Abowd Cornell University, USA, and IZA, Germany

Eric Bartelsman VU University Amsterdam, The Netherlands, and IZA, Germany

John Haltiwanger University of Maryland, USA and IZA, Germany

Arie Kapteyn University of Southern California, Los Angeles, USA, and IZA, Germany

  • Articles

  • Opinions

Recruiting intensity

Recruiting intensity is critical for understanding fluctuations in the labor market

10.15185/izawol.21 21 Faberman, R

by R. Jason Faberman

When hiring new workers, employers use a wide variety of different recruiting methods in addition to posting a vacancy announcement, such as adjusting education, experience or technical requirements, or offering higher wages. The intensity with which employers make use of these alternative methods can vary widely depending on a firm’s performance and with the business cycle. In fact, persistently low recruiting intensity partly helps to explain the sluggish pace of the growth of jobs in the US economy following the Great Recession of 2007–2009.

The importance of measuring dispersion in firm-level outcomes

Ignoring the large variation in firm-level outcomes can create misunderstandings about the consequences of many policies

10.15185/izawol.53 53 Syverson, C

by Chad Syverson

Recent research has revealed enormous variation in performance and growth among firms, which both drives and is driven by large reallocations of inputs and outputs across firms (churning) within industries and markets. These differences in firm-level outcomes and the associated turnover of firms affect many economic policies (both labor- and non-labor-oriented), on both a microeconomic and a macroeconomic scale, and are affected by them. Properly evaluating these policies requires familiarity with the sources and consequences of firm-level variation and within-industry reallocation.

Correspondence testing studies

What can we learn about discrimination in hiring?

10.15185/izawol.58 58 Rooth, D

by Dan-Olof Rooth

Anti-discrimination policies play an important role in public discussions. However, identifying discriminatory practices in the labor market is not an easy task. Correspondence testing provides a credible way to reveal discrimination in hiring and provide hard facts for policies. The method involves sending matched pairs of identical job applications to employers posting jobs—the only difference being a characteristic that signals membership to a group.

Poverty persistence and poverty dynamics

Snapshots of who is poor in one period provide an incomplete picture of poverty

10.15185/izawol.103 103 Biewen, M

by Martin Biewen

A considerable part of the poverty that is measured in a single period is transitory rather than persistent. In most countries, only a portion of people who are currently poor are persistently poor. People who are persistently poor or who cycle into and out of poverty should be the main focus of anti-poverty policies. Understanding the characteristics of the persistently poor, and the circumstances and mechanisms associated with entry into and exit from poverty, can help to inform governments about options to reduce persistent poverty. Differences in poverty persistence across countries can shed additional light on possible sources of poverty persistence.

The importance and challenges of measuring work hours

Measuring hours worked is important, but different surveys can tell different stories

10.15185/izawol.95 95 Stewart, J

by Jay Stewart

Work hours are key components in estimating productivity growth and hourly wages as well as being a useful cyclical indicator in their own right, so measuring them correctly is important. The US Bureau of Labor Statistics (BLS) collects data on work hours in several surveys and publishes three widely-used series that measure average weekly hours. The series tell different stories about average weekly hours and trends in those hours but qualitatively similar stories about the cyclical behavior of work hours. The research summarized here explains the differences in levels, but only some of the differences in trends.

Randomized control trials in an imperfect world

How can we assess the policy effectiveness of randomized control trials when people don’t comply?

10.15185/izawol.110 110 Siddique, Z

by Zahra Siddique

Randomized control trials (RCTs) have become increasingly important as an evidence-based method to evaluate interventions such as government programs and policy initiatives. Frequently, however, RCTs are characterized by “imperfect compliance,” in that not all the subjects who are randomly assigned to take a treatment choose to do so. This could result in a failure to identify the treatment effect, or the impact of the treatment on the population. However, useful information on treatment effectiveness can still be recovered by estimating “bounds,” or a range of values in which treatment effectiveness can lie.

Measuring the cost of children

Knowing the real cost of children is important for crafting better
 economic policy

10.15185/izawol.132 132 Donni, O

by Olivier Donni

The cost of children is a critical parameter used in determining many economic policies. For instance, correctly setting the tax deduction for families with children requires assessing the true household cost of children. Evaluating child poverty at the individual level requires making a clear distinction between the share of family resources received by children and that received by parents. The standard ad hoc measures (equivalence scales) used in official publications to measure the cost of children are arbitrary and are not informed by any economic theory. However, economists have developed methods that are grounded in economic theory and can replace ad hoc measures.

Counting on count data models

Quantitative policy evaluation can benefit from a rich set of econometric methods for analyzing count data

10.15185/izawol.148 148 Winkelmann, R

by Rainer Winkelmann

Often, economic policies are directed toward outcomes that are measured as counts. Examples of economic variables that use a basic counting scale are number of children as an indicator of fertility, number of doctor visits as an indicator of health care demand, and number of days absent from work as an indicator of employee shirking. Several econometric methods are available for analyzing such data, including the Poisson and negative binomial models. They can provide useful insights that cannot be obtained from standard linear regression models. Estimation and interpretation are illustrated in two empirical examples.

Intergenerational income persistence

Measures of intergenerational persistence can be indicative of equality of opportunity, but the relationship is not clear cut

10.15185/izawol.176 176 Blanden, J

by Jo Blanden

A strong association between incomes across generations—with children from poor families likely to be poor as adults—is frequently considered an indicator of insufficient equality of opportunity. Studies of such “intergenerational persistence,” or lack of intergenerational mobility, are concerned with measuring the strength of the relationship between parents’ socio-economic status and that of their children as adults. However, reliable measurement requires overcoming important data and methodological difficulties. Moreover, the association between equality of opportunity and common measures of intergenerational persistence is not as clear-cut as is often assumed.

Matching as a regression estimator

Matching avoids making assumptions about the functional form of the regression equation, making analysis more reliable

10.15185/izawol.186 186 Black, D

by Dan A. Black

“Matching” is a statistical technique used to evaluate the effect of a treatment by comparing the treated and non-treated units in an observational study. Matching provides an alternative to older estimation methods, such as ordinary least squares (OLS), which involves strong assumptions that are usually without much justification from economic theory. While the use of simple OLS models may have been appropriate in the early days of computing during the 1970s and 1980s, the remarkable increase in computing power since then has made other methods, in particular matching, very easy to implement.

The use of natural experiments in migration research

Data on rapid, unexpected refugee flows can credibly identify the impact of migration on native workers’ labor market outcomes

10.15185/izawol.191 191 Tumen, S

by Semih Tumen

Estimating the causal effect of immigration on the labor market outcomes of native workers has been a major concern in the literature. Because immigrants decide whether and where to migrate, immigrant populations generally consist of individuals with characteristics that differ from those of a randomly selected sample. One solution is to focus on events such as civil wars and natural catastrophes that generate rapid and unexpected flows of refugees into a country unrelated to their personal characteristics, location, and employment preferences. These “natural experiments” yield estimates that find small negative effects on native workers’ employment but not on wages.

Evaluating the efficiency of public services

Differences in efficiency in public services can offer clues about good practice

10.15185/izawol.196 196 Johnes, G

by Geraint Johnes

Efficiency is an important consideration for those who manage public services. Costs vary with output and with a variety of other factors. In the case of higher education, for example, factors include quality, student demographics, the scale and scope of the higher education provider, and the size and character of the real estate. But even when taking all these factors into account, costs vary across providers because of differences in efficiency. Such differences offer clues about good practice that can lead to improvements in the system as a whole. The role of efficiency is illustrated by reference to higher education institutions in England.

Google search activity data and breaking trends

Google search activity data are an unconventional survey full of unbiased, revealed answers in need of the right question

10.15185/izawol.206 206 Askitas, N

by Nikolaos Askitas

Using Google search activity data can help detect, in real time and at high frequency, a wide spectrum of breaking socio-economic trends around the world. This wealth of data is the result of an ongoing and ever more pervasive digitization of information. Search activity data stand in contrast to more traditional economic measurement approaches, which are still tailored to an earlier era of scarce computing power. Search activity data can be used for more timely, informed, and effective policy making for the benefit of society, particularly in times of crisis. Indeed, having such data shifts the relation between theory and the data to support it.

Measuring disincentives to formal work

Does formal work pay? Synthetic measurements of taxes and benefits can help identify incentives and disincentives to formal work

10.15185/izawol.213 213 Weber, M

by Michael Weber

Evidence from transition economies shows that formal work may not pay, particularly for low-wage earners. Synthetic measurements of work disincentives, such as the formalization tax rate or the marginal effective tax rate, confirm a significant positive correlation between these measurements and the probability of informal work. These measures are especially informative for impacts at lower wage levels, where informality is highest. Policymakers who want to increase formal work can use these measurements to determine optimal labor taxation rates for low-wage earners and reform benefit design.

The challenges of linking survey and administrative data

Combining survey and administrative data is growing in popularity, even though data access is still highly restricted

10.15185/izawol.214 214 Künn, S

by Steffen Künn

Using administrative records data and survey data to enhance each other offers huge potential for scientific and policy-related research. Two recent changes have expanded the potential for creating such linked data: the improved availability of data sources and progress in data-matching technology. These developments are reflected, among other ways, in the growing number of academic papers in labor economics that use linked survey and administrative data. While the number of studies using linked data is still small, the trend is clearly upward. Slowing the growth, however, are concerns about data security and privacy, which impede data access.

What makes a good job? Job quality and job satisfaction

Job satisfaction is important to well-being, but intervention may be needed only if markets are impeded from improving job quality

10.15185/izawol.215 215 Clark, A

by Andrew E. Clark

Many measures of job satisfaction have been trending downward. Because jobs are a key part of most people’s lives, knowing what makes a good job (job quality) is vital to knowing how well society is doing. Integral to worker well-being, job quality also affects the labor market through related decisions on whether to work, whether to quit, and how much effort to put into a job. Empirical work on what constitutes a good job finds that workers value more than wages; they also value job security and interest in their work. Policy to affect job quality requires information on the cost of the different aspects of job quality and how much workers value them.

Skill mismatch and overeducation in transition economies

Substantial skill shortages coexist with overeducation, affecting both young and old workers

10.15185/izawol.224 224 Kupets, O

by Olga Kupets

Large imbalances between the supply and demand for skills in transition economies are driven by rapid economic restructuring, misalignment of the education system with labor market needs, and underdeveloped adult education and training systems. The costs of mismatches can be large and long-lasting for workers, firms, and economies, with long periods of overeducation implying a loss of human capital for individuals and ineffective use of resources for the economy. To make informed decisions, policymakers need to understand how different types of workers and firms are affected by overeducation and skill shortages.

Can “happiness data” help evaluate economic policies?

“Happiness data” may help assess the welfare effects of a new labor market policy, like a change in benefit generosity

10.15185/izawol.226 226 MacCulloch, R

by Robert MacCulloch

Imagine a government confronted with a controversial policy question, like whether it should cut the level of unemployment benefits. Will social welfare rise as a result? Will some groups be winners and other groups be losers? Will the welfare gap between the employed and unemployed increase? “Happiness data” offer a new way to make these kinds of evaluations. These data allow us to track the well-being of the whole population, and also sub-groups like the employed and unemployed people, and correlate the results with relevant policy changes.

Gravity models: A tool for migration analysis

Availability of bilateral data on migratory flows has renewed interest in using gravity models to identify migration determinants

10.15185/izawol.239 239 Ramos, R

by Raul Ramos

Gravity models have long been popular for analyzing economic phenomena related to the movement of goods and services, capital, or even people; however, data limitations regarding migration flows have hindered their use in this context. With access to improved bilateral (country to country) data, researchers can now use gravity models to better assess the impacts of migration policy, for instance, the effects of visa restriction policies on migration flows. The specification, estimation, and interpretation of gravity models are illustrated in different contexts and limitations of current practices are described to enable policymakers to make better informed decisions.

Using instrumental variables to establish causality

Even with observational data, causality can be recovered with the help of instrumental variables estimation

10.15185/izawol.250 250 Becker, S

by Sascha O. Becker

Randomized control trials are often considered the gold standard to establish causality. However, in many policy-relevant situations, these trials are not possible. Instrumental variables affect the outcome only via a specific treatment; as such, they allow for the estimation of a causal effect. However, finding valid instruments is difficult. Moreover, instrumental variables estimates recover a causal effect only for a specific part of the population. While those limitations are important, the objective of establishing causality remains; and instrumental variables are an important econometric tool to achieve this objective.

Disentangling policy effects into causal channels

Splitting a policy intervention’s effect into its causal channels can improve the quality of policy analysis

10.15185/izawol.259 259 Huber, M

by Martin Huber

Policy evaluation aims at assessing the causal effect of an intervention (for example job-seeker counseling) on a specific outcome (for example employment). Frequently, the causal channels through which an effect materializes can be important when forming policy advice. For instance, it is essential to know whether counseling affects employment through training programs, sanctions, job search assistance, or other dimensions, in order to design an optimal counseling process. So-called “mediation analysis” is concerned with disentangling causal effects into various causal channels to assess their respective importance.

Performance measures and worker productivity

Choosing the right performance measures can inform and improve decision-making in policy and management

10.15185/izawol.260 260 Sauermann, J

by Jan Sauermann

Measuring workers’ productivity is important for public policy and private-sector decision-making. Due to a lack of reliable methods to determine workers’ productivity, firms often use specific performance measures, such as how different incentives affect employees’ behavior. The public sector also uses these measures to monitor and evaluate personnel, such as teachers. To select the right performance measures, and as a result design better employment contracts and improve productivity, policymakers and managers need to understand the advantages and disadvantages of the available metrics.

Estimating the return to schooling using the Mincer equation

The Mincer equation gives comparable estimates of the average monetary returns of one additional year of education

10.15185/izawol.278 278 Patrinos, H

by Harry Anthony Patrinos

The Mincer equation—arguably the most widely used in empirical work—can be used to explain a host of economic, and even non-economic, phenomena. One such application involves explaining (and estimating) employment earnings as a function of schooling and labor market experience. The Mincer equation provides estimates of the average monetary returns of one additional year of education. This information is important for policymakers who must decide on education spending, prioritization of schooling levels, and education financing programs such as student loans.

Why do we need longitudinal survey data?

Knowing people’s history helps in understanding their present state and where they are heading

10.15185/izawol.308 308 Joshi, H

by Heather Joshi

Information from longitudinal surveys transforms snapshots of a given moment into something with a time dimension. It illuminates patterns of events within an individual’s life and records mobility and immobility between older and younger generations. It can track the different pathways of men and women and people of diverse socio-economic background through the life course. It can join up data on aspects of a person’s life, health, education, family, and employment and show how these domains affect one another. It is ideal for bridging the different silos of policies that affect people’s lives.

Can lab experiments help design personnel policies?

Employers can use laboratory experiments to structure payment policies and incentive schemes

10.15185/izawol.318 318 Villeval, M

by Marie Claire Villeval

Can a company attract a different type of employee by changing its compensation scheme? Is it sufficient to pay more to increase employees’ motivation? Should a firm provide evaluation feedback to employees based on their absolute or their relative performance? Laboratory experiments can help address these questions by identifying the causal impact of variations in personnel policy on employees’ productivity and mobility. Although they are collected in an artificial environment, the qualitative external validity of findings from the lab is now well recognized.

Meta-regression analysis: Producing credible estimates from diverse evidence

Meta-regression methods can be used to develop evidence-based policies when the evidence base lacks credibility

10.15185/izawol.320 320 Doucouliagos, C

by Chris Doucouliagos

Good policy requires reliable scientific knowledge, but there are many obstacles. Most econometric estimates lack adequate statistical power; some estimates cannot be replicated; publication selection bias (the selective reporting of results) is common; and there is wide variation in the evidence base on most policy issues. Meta-regression analysis offers a way to increase statistical power, correct the evidence base for a range of biases, and make sense of the unceasing flow of contradictory econometric estimates. It enables policymakers to develop evidence-based policies even when the initial evidence base lacks credibility.

Maximum likelihood and economic modeling

Maximum likelihood is a general and flexible method to estimate the parameters of models in labor economics

10.15185/izawol.326 326 Lanot, G

by Gauthier Lanot

Most of the data available to economists is observational rather than the outcome of natural or quasi experiments. This complicates analysis because it is common for observationally distinct individuals to exhibit similar responses to a given environment and for observationally identical individuals to respond differently to similar incentives. In such situations, using maximum likelihood methods to fit an economic model can provide a general approach to describing the observed data, whatever its nature. The predictions obtained from a fitted model provide crucial information about the distributional outcomes of economic policies.

Measuring entrepreneurship: Type, motivation, and growth

Effective measurement can help policymakers harness a wide variety of gains from entrepreneurship

10.15185/izawol.327 327 Desai, S

by Sameeksha Desai

Policymakers rely on entrepreneurs to create jobs, provide incomes, innovate, pay taxes to support public revenues, create competition in industries, and much more. Due to its highly heterogeneous nature, the choice of entrepreneurship measures is critically important, impacting the diagnosis, analysis, projection, and understanding of potential and existing policy. Some key aspects to measure include the how (self-employment, new firm formation), why (necessity, opportunity), and what (growth). As such, gaining better insight into the challenges of measuring entrepreneurship is a necessary and productive investment for policymakers.

Using linear regression to establish empirical relationships

Linear regression is a powerful tool for estimating the relationship between one variable and a set of other variables

10.15185/izawol.336 336 Verbeek, M

by Marno Verbeek

Linear regression is a powerful tool for investigating the relationships between multiple variables by relating one variable to a set of variables. It can identify the effect of one variable while adjusting for other observable differences. For example, it can analyze how wages relate to gender, after controlling for differences in background characteristics such as education and experience. A linear regression model is typically estimated by ordinary least squares, which minimizes the differences between the observed sample values and the fitted values from the model. Multiple tools are available to evaluate the model.