Performance measures and worker
                    productivity

Sauermann, Jan

doi:10.15185/izawol.260

one-pager full article

Elevator pitch

Measuring workers’ productivity is important for public policy and private-sector decision-making. Due to the lack of a general measure that captures workers’ productivity, firms often use one- or multi-dimensional performance measures, which can be used, for example, to analyze how different incentive systems affect workers’ behavior. The public sector itself also uses measures to monitor and evaluate personnel, such as teachers. Policymakers and managers need to understand the advantages and disadvantages of the available metrics to select the right performance measures for their purpose.

Emphasis on direct measures of worker
productivity

Key findings

Pros

Performance measures provide detailed information about worker productivity.

To inform about a wide range of questions, such as how incentives work, how peer effects operate, or how workers accumulate human capital, performance measures can be useful.

Reliable performance measures are needed to design appropriate contracts and improve productivity.

Automatization creates an increasing number of performance metrics across low- and high-skilled jobs, as well as for jobs in the private and public sectors.

Cons

There is no universal definition of worker productivity; measures of worker productivity typically depend on the setting in which they are collected.

Worker productivity is usually multidimensional, measures of worker performance do not always capture all of its dimensions.

If policies are based on the wrong performance measures, distortions can create negative effects on worker productivity.

For settings in which performance is only observable at the team level, it is not always possible to estimate individual contributions to team productivity.

Author's main message

Measures of worker productivity can give important insights into how workers perform and how workplaces should be organized. Direct measures of productivity are used to study a range of questions, such as the effects of incentives on workers’ productivity, the influence of peers on behavior, or the accumulation of human capital on the job. For these and related questions, it is important to select appropriate performance measures. This choice is critical, as relying on inappropriate measures can lead to the design of inefficient incentives, poor employment contracts, or wrong policy conclusions.

Motivation

Politicians and managers often make decisions that involve the behavior of individuals at work. For example, managers decide how to establish optimal incentives or how much training to provide for employees, while policymakers make decisions about the regulation of working hours. To make informed choices about these kinds of issues, it is important to know how these decisions will affect workers’ behavior, in terms of their productivity.

The conceptualization of worker productivity has gained increasing attention in the last decade. Direct measures are now commonly used in research within economics and related fields; they frequently serve as approximations of workers’ productivity. This article describes how worker productivity can be defined and provides an overview of the most up-to-date performance measurements available in order to help decision makers choose the right ones for their specific purposes.

Discussion of pros and cons

Defining worker productivity

In a general sense, productivity can be defined as the ratio between a measure of output and a measure of input. The productivity of workers could thus be measured as an output, for example, sales or units produced, relative to an input, for example, the number of hours worked or the cost of labor. Traditionally, labor productivity is derived from aggregate measures at the firm level, for example, value-added per worker. To account for differences between labor inputs, this measure has often been disaggregated according to various labor types, for example, low-, medium-, and high-skilled labor. However, even at this disaggregated level, measures of labor productivity can mask considerable variation with respect to workers’ underlying productivity, either between workers, or over time. Under the assumption of competitive labor markets, workers’ wages are equal to (marginal) productivity and could therefore be interpreted as an indirect measure of productivity. Although correlated with the underlying productivity of the individual worker, there are several reasons why wages do not directly reflect the worker's actual productivity: wages often depend on age or tenure; wage differentials due to discrimination; monopsonistic labor markets can result in lower wages; or job (dis)amenities can be related to higher (lower) wages. This is further complicated by the fact that data often do not contain information on hourly wages, but rather on monthly earnings. Variations in monthly wages might thus not only reflect differences in productivity, but also differences in the number of working hours. Lastly, higher worker productivity is typically linked to higher wages, but higher wages can also be influenced by higher worker productivity thus creating a reverse causality problem.

Both labor productivity and wages have their shortcomings when it comes to assessing workers’ productivity. Ideally, each individual worker's productivity would be observed at each point in time. In reality, however, output is rarely observable at the individual level for a reasonable cost, thus making it practically impossible to calculate each individual's productivity. Instead, firms use individual measures of workers’ performance as an approximation of their productivity. Most occupations have one or more metrics that can be used to evaluate how well workers perform. These measures, also known as “key performance indicators” (KPIs), are regularly used for internal evaluation and monitoring in firms. In some cases, wages are based on these KPIs, allowing firms to create performance-related incentives. Figure 1 shows evidence from the World Management Survey on the extent to which firms collect performance measures and use them for monitoring purposes.

Use of direct performance measurement
across countries

There is a key set of properties that should be met when assessing measures of worker productivity:

(i) Objectivity: measures should be objective, as opposed to subjective, with respect to supervisor or peer ratings.

(ii) Availability: measures need to be available at the individual (worker) level (i.e. not on aggregated levels such as team or firm level).

(iii) Comparability: tasks and measurements should be the same across workers and time.

(iv) Quality and controllability: workers should have sufficient influence on the outcome, that is, by choosing their own effort levels.

The precise measurement of performance across workers and over time allows policymakers to address important economic questions, such as how incentives affect workers’ behavior, how the presence of peers affects workers’ productivity, or how workers accumulate human capital in firms.

Figure 2 provides examples of workers’ productivity measures which were used in studies published in leading economics journals, and whose findings can be used to inform policymakers and practitioners [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. The figure shows that performance is not only measurable for low-skilled jobs with routine tasks, but that it can also be measured for rather knowledge-intensive, non-routine professions, such as lawyers, physicians, or scientists.

Examples of measures of workers’
productivity

Advantages and disadvantages of measuring workers’ productivity

Firms regularly use measures of workers’ performance to approximate productivity. While it is usually possible to measure inputs, that is working hours, measuring workers’ performance is not always straightforward, or even possible, at a reasonable cost. Even though most occupations include some measures of performance at the worker level, Figure 2 shows that there is no universal measure. Instead, the degree to which performance can be monitored depends on the setting.

What do performance measures entail?

Worker performance can be a function of many features, some of which are under the worker's control (e.g. workers’ effort, education, age, or tenure), some of which are under the firm's control (e.g. work environment, wages, or incentives), and other factors, such as the natural environment (e.g. air pollution, heat, or humidity). These determinants of worker performance can interact, as in the case where workers change their effort according to incentives provided by the firm, or with respect to the natural environment, which can, to some degree, be affected by firms. An observed change in a worker's performance might be due to changes in the worker's skills, for example, due to training programs or from learning on-the-job, or due to changes in effort provided by the worker, for example, due to different incentives set by the management.

A performance measure's usefulness for assessing workers’ behavior crucially depends on the degree to which the worker has influence on the measure. Measures will be unreliable predictors of workers’ productivity if they are largely driven by factors that are outside of a worker's control, such as variation in customer demand, or weather conditions in agriculture. Likewise, the nature of measuring performance, such as automated high-frequency measurement of performance, should not affect the well-being or mental health of workers. Appropriate measures of workers’ productivity should therefore be balanced with respect to the determinants that are within the worker's control versus those that are not, and to the degree that the measurement itself impacts a worker's well-being.

Multidimensional performance

In reality, it is difficult to assess workers’ productivity using just one measure. Workers’ jobs can include one or several tasks. University professors conduct research, are involved in teaching, and perform administrative tasks. Each of these, in turn, can be evaluated along different dimensions, for example, by the quality and quantity of a task (workers could work quickly, but provide low quality, or slowly, but with high quality). Workers could be evaluated with separate performance measures for each relevant dimension. The task of conducting research, for example, could be measured by the number of publications, but also by the quality of the publications, for example, measured by a journal's impact factor. Although quality and quantity are dimensions that apply to almost every task, there are other dimensions, for example, the policy relevance of the research.

Even for single-task occupations, the multi dimensionality of workers’ productivity has important implications with respect to the use of performance measures for policy determination. Whereas the firm is interested in productivity, workers are usually evaluated (and incentivized) based on specific performance measures. Since workers’ productivity is not perfectly observable, and given that human resource decisions are frequently based on observable performance measures, distortions can arise. This would be the case, for example, if a firm only observes how much total output a worker produces, but ignores the quality of the output. Incentives based on observable measures could result in distortion, that is, the incentives of the firm and the worker are not perfectly aligned. Whereas the firm is interested in both dimensions, the worker might solely focus on performing well on the measurable performance indicators, which does not necessarily coincide with the un-measurable ones. While it may be possible to mitigate this problem through intrinsic motivation, distortions are likely to create negative effects if incentive schemes are not well designed.

One way to get around this incentive problem is to use alternative measures, such as firm-level valued-added metrics. While these types of measures appear to solve the problem of distortion, additional risks arise for the worker, since these measures are less controllable from the individual's point of view. This issue demonstrates an important trade-off between distortion on the one hand, and how precisely a performance measure can actually assess workers’ effort, on the other.

Another alternative to one-dimensional measures is subjective performance measures, such as supervisor ratings. While these measures can capture a broader picture of performance than single measures allow, they are typically less frequently available, and might be prone to subjectivity, such as gender bias [13].

Quantity and quality

It is difficult to identify appropriate measures for multiple performance dimensions. Two very common dimensions are the quality at which a job is performed, and how fast it is done (quantity). If, for example, workers’ productivity in manufacturing is analyzed by means of a quantity-related measure of output, such as the number of products manufactured per hour, then a suitable candidate to measure the quality of output could be the number of defective products produced, that is, the defection rate.

Figure 2 shows a range of studies that primarily use performance measures showing how quickly a worker performs (i.e. quantity-based studies) [1], [2], [3], [4], [6]. Another group of studies utilize performance measures that are either based on quality, or quality-adjusted measures [7], [8]. Only a few studies use several measures, based on different dimensions [5], [9].

Although trade-offs between dimensions (e.g. the quality and the quantity dimension) are not unlikely, there are often incentives in place to prevent workers from focusing their effort only at one of the dimensions. Both explicit incentives, for example by decreasing their output if workers are required to re-do a task in case of a defective product or service [1], and implicit incentives, for instance if supervisors can identify low-performing workers and are able to sanction them, help to mitigate problems related to multiple dimensions.

Data sources and aggregation

Most studies that use measures of worker productivity employ data from firms, for example, from firms’ internal databases, or public data, for example, publication databases. In some cases, measures can be directly used, whereas in others, they first need to be estimated. For teachers, for example, teacher value-added measures are based on an estimation of students’ test scores, controlling for student and school characteristics, and capture the additional input a teacher has on the students’ grade outcomes [8]. Studies have shown that this measure of teachers’ productivity is indeed predictive of students’ later outcomes, such as college attendance and salaries [8]. Both types of measures, so-called direct measures and estimated measures, allow analyses of the effects of firms’ policies on workers’ productivity at the individual level.

For some jobs, though, workers’ performance cannot be measured at the individual level, but only at a more aggregated level. Examples of this include firms that rely on team-based production, that is, where output is jointly produced by a team of workers, or joint publications in academic research. If there is sufficient variation in team composition, it is possible to separately identify the individual contribution of team members.

Time dimension

An advantage with direct, that is, individual, measures of performance is that these are often available at a higher frequency compared to measures of aggregate productivity or wages. Figure 2 shows that some of the selected measures are available by the week, day, and in some cases, even by the minute.

These high frequency measures can inform about a range of research questions and thereby assist in the decision-making process. This allows for the measurement of variation across time compared to studies that use longer data point frequencies. One example of variation across time comes from changes that are unrelated to the workers’ effort, such as from changes in customer demand in service-sector jobs, or from weather conditions in agriculture. Depending on the setting, it might be important to take these effects into account, for example, by controlling for the specific day of the week.

The contrary can be the case if the production process is lumpy, that is, if output occurs infrequently, or if the aim of the measure is to capture long-term performance. Examples are number of research publications for the former, or CEO performance for the latter. In both cases, it is important to set measurement periods that are sufficiently long.

How can measures of worker productivity be used to inform policy?

Studies using measures of worker productivity originated from performance analyses in private-sector firms, usually focusing on questions and decisions related to human resource (HR) management. These studies, which often make use of a single firm's personnel data, are referred to as “insider-econometrics studies.” They exploit external shocks to identify the causal effects of treatments on an outcome, using the performance measure as a metric [1], [4], [6]. Increased digitalization facilitates the gathering of performance data for both researchers and managers. In fact, firms collect performance measures not only to evaluate workers, but also to experiment and improve aspects related to HR management. A reliance on direct measures of productivity is by no means limited to the private sector. Measures of workers’ productivity have been applied to a wide range of fields, such as research and education, health care, and politics (see Figure 2).

Using workers’ productivity for setting incentives

A large number of studies use direct measures of workers’ productivity in conjunction with either a change in the organization's incentive system or a randomized experiment to analyze how incentives affect individuals’ behavior. The common finding is that monetary incentives affect workers’ effort, resulting in higher performance. Besides the effect on workers’ intrinsic motivation, incentives can also affect the sorting of workers, which can explain up to 50% of the gains in productivity [1].

These types of studies have also been applied to organizations whose primary goal is not that of profit maximization, such as schools, universities, and even politics. In US Charter Schools (schools that receive government funding but operate independently of the established public school system in which they are located), teacher value-added measures are increasingly used to evaluate and incentivize teachers.

More direct worker productivity measures are available in academic research as well as in firm-based research and development. Scientists’ productivity can be measured according to various metrics, such as patent statistics, (quality-adjusted) counts of publications, or citations [7]. As is often the case in the private sector, the introduction of monetary incentives for publications is shown to increase research productivity.

Although less obvious, the same logic used to measure workers’ productivity can be applied to almost any field that has some degree of quantifiable outcomes. One group of workers that is rarely related to productive behavior is politicians. But even for this group, it is possible to construct performance measures to demonstrate how, for example, remuneration of politicians can be optimally designed. One method that could be used to measure politicians’ performance is the number and type of bills submitted [11].

There are two important aspects to keep in mind when defining incentive systems. First, the time frame over which performance is evaluated can matter: if the date at which they are to be evaluated is close at hand, workers might increase their effort in order to achieve a bonus; or they might decrease effort if they are not able to achieve the bonus. Higher frequency performance measures, for example, at the daily or weekly level, can help to understand these dynamic patterns in how individuals respond to incentives [12]. Second, if only one of several performance dimensions is targeted, workers are likely to focus on those incentivized and neglect unincentivized performance dimensions, which might make a measure ultimately a bad measure of performance (Goodhart's law). In worse cases, workers might even aim at manipulating data points underlying their performance measures, such as in the case of teachers changing pupils’ grades as a reaction to changes in teacher incentives [14].

Formal and informal human capital acquisition in firms

Measures of worker productivity can also be used to estimate how workers accumulate human capital. Informal learning on-the-job as well as formal learning in training programs can increase workers’ human capital, thus increasing individual productivity, according to performance measures.

This can be shown, for instance, by analyzing how job performance increases with job tenure. The common finding is that performance increases strongly in the early employment period, whereas the marginal return to tenure decreases over time [10]. Likewise, workers’ productivity measures can be used to estimate the returns to training programs [4]. In both cases, direct measures of workers’ productivity allow for an estimation of how learning affects an individual's job performance. In contrast, wages, which are often fixed in the short term, might not exhibit these effects. Further, wages would only capture returns to training if the benefits from increased productivity were shared between employer and employee.

Peer effects

Peers affect people's behavior, both at work and in other environments. In the workplace, peers can affect workers’ behavior either through social pressure, or because workers learn from them. Although human capital externalities have long been part of growth models in macroeconomics, peer effects in the workplace have been analyzed only in recent years. This may be due, in part, to the availability of detailed information on workers’ networks and detailed measures of their productivity. From a methodological point of view, studies require both detailed information on the workplace and/or exogenous variation in peer composition, for example, through random team composition [2], [4], [6], or exogenous exits [7].

Working hours and performance

Another example of how direct measures of workers’ productivity can inform policy involves working hours. The length of the standard workweek varies both between and within countries and is subject to the policy debate of shortening to a four-day week. Despite this, relatively little is known about the productivity effects of weekly or daily working hours. Direct measures of workers’ productivity allow estimation of how the number of working hours or shifts in working hours affect performance; this might be crucial with respect to health and safety. Long working hours, for example, might result in increased fatigue, which is particularly dangerous in medical occupations [9]. This type of result can also provide direct advice about whether working hours should be expanded or reduced.

Gender differences

One advantage with direct performance measures is that they are often objective and thus less likely to be affected by subjective evaluations. This objectivity can be especially useful when analyzing wage gaps between groups in the labor market, for example, between men and women. Although gender pay gaps have decreased over time, women still earn considerably less than men. While there are many arguments for why gender pay gaps exist, it is important to understand to what degree the gender pay gap depends on underlying productivity differences. Direct measures of performance can help examine whether gender pay gaps are based on differences in productivity, how these gaps can be explained, and how possible gender differences in performance affect career choices. For lawyers, gender gaps in productivity are found to occur early in their careers, and have long-lasting effects on earnings and career advancement [5]. While objective measures of performance can help to identify gender gaps, it is more difficult to assess how accurately they reflect differential effort choices from men and women, for example, if workers expect to be discriminated against. For university teachers, studies have found teaching evaluations to be biased against female teachers [13], which might induce female teachers to put more time and effort into teaching-related activities.

Public policies and worker behavior

Not only managerial policies, but also public policies can affect worker behavior. If changes in employment protection rules increase the risk of becoming unemployed, or if unemployment benefits are decreased, workers might respond by providing higher effort to decrease their chances of becoming unemployed. In general, relatively little is known about the effect of policies or shocks external to the firm on worker effort, and thereby performance measures at the worker level. For supermarket cashiers, which is a group of workers without specific skills, decrease in unemployment insurance benefits has been shown to increase worker effort [3]. This is not only important at the firm level, but also when thinking about costs and benefits of public policies.

Working from home and measuring performance

Working from home (WFH) has gained increased traction as a result of the Covid-19 pandemic. A substantial share of the workforce was encouraged to work from home in order to decrease the spread of Covid-19. Even after most restrictions have been withdrawn, WFH remains more common than before the pandemic. With respect to measuring worker productivity, there are two important questions: first, how can worker performance be measured with higher take-up of WFH? Second, do workers perform better or worse when working from home? With regard to the latter question, the same logic of performance measurement applies to jobs done at home. In some jobs, performance measurement is easier than others; input measures, such as hours, might be more difficult to control, whereas output measures should be equally easy (or difficult) to measure. Employers might, however, be more reluctant to allow WFH for jobs where monitoring is difficult. With respect to the second question, the (scarce) evidence so far rather points at positive effects on performance measures [15].

Limitations and gaps

It is evident that there is no universal measure of workers’ productivity. Rather, there are various measures that capture workers’ performance in their specific settings. While this has the advantage of allowing highly detailed analysis of the determinants of workers’ productivity, it comes at a cost.

First, even if performance can be measured for a wide range of occupations, the measures used are based on single workplaces, which are not representative of the whole sector or economy. While these data allow for comparisons between workers within the same tasks or occupation, it is rarely possible to make direct comparisons of workers’ performance across different occupations. It is, however, possible to compare estimates based on several studies to draw more representative conclusions.

Second, the specificity of many performance measures makes comparisons over long time periods or across multiple countries difficult. While performance measures often allow for the analysis of day-to-day or week-to-week variations of worker behavior, they are generally only available for rather short periods of time [4], [6]. Only a few studies exploit longer data sets covering several years, which allow for the analysis of trends [8]. For these reasons, direct measures of workers’ productivity are not always appropriate. Instead, other measures, such as wages or firm value added may be more useful.

Lastly, although many occupations include some measures of performance, these often capture only one dimension. In some cases, for example, when designing and evaluating incentive schemes, observing just one measure of performance might hide important aspects of workers’ behavior, due to the multidimensional nature of productivity.

Summary and policy advice

Direct measures of workers’ productivity are not only used in personnel economics, but also in other fields, such as education and health, and are used in both the private and the public sectors. These measures are useful to inform policymakers about individuals’ behavior at work, and to improve decision making, such as setting appropriate incentives.

Measures of workers’ productivity represent a relatively new tool for economics research. Even if there is no universal definition of workers’ productivity, studies using these measures have made important contributions to a wide range of fields. These contributions are not limited to low-skilled routine jobs where performance measurement might be easier. Rather, measures of workers’ productivity are available for a wide range of jobs, including low- and high-skilled jobs in both the private and public sectors.

Although these studies are often based on single firms, which are not representative of a whole sector or economy, the detail at which workers’ behavior can be observed permits the examination of questions that are difficult to address using survey or register data. Given the specificity of the settings, an appropriate way to achieve more “representative” results is to conduct studies with similar questions in different settings, for example, in different firms. Meta studies can then be used to find a consensus, which could come close to the average effect for an economy.

Given the current state of research on using workers’ productivity measures, policymakers and managers alike should select the right measures that will help them make informed decisions, for instance, when it comes to setting and designing incentives, or when regulating working hours.

Acknowledgments

The author thanks two anonymous referees and the IZA World of Labor editors for many helpful suggestions on earlier drafts. The author also thanks Nora Döring. Version 2 of the article updates the “Illustration” and Figure 2, includes new research on public policies and worker behavior and working from home and performance measures, and adds new “Key references” [3], [5], [7]–[15].

Competing interests

The IZA World of Labor project is committed to the IZA Code of Conduct. The author declares to have observed the principles outlined in the code.

evidence map

Performance measures and worker productivity

Performance measures and worker productivity Updated

Choosing the right performance measures can inform and improve decision-making in policy and management

Elevator pitch

Key findings

Pros

Cons

Author's main message

Motivation

Discussion of pros and cons

Defining worker productivity

Advantages and disadvantages of measuring workers’ productivity

What do performance measures entail?

Multidimensional performance

Quantity and quality

Data sources and aggregation

Time dimension

How can measures of worker productivity be used to inform policy?

Using workers’ productivity for setting incentives

Formal and informal human capital acquisition in firms

Peer effects

Working hours and performance

Gender differences

Public policies and worker behavior

Working from home and measuring performance

Limitations and gaps

Summary and policy advice

Acknowledgments

Competing interests

evidence map

Performance-related pay and productivity

How effective are financial incentives for teachers?

Do social interactions in the workplace lead to productivity spillover among co-workers?

Multitasking at work: Do firms get what they pay for?

Bonuses and performance evaluations

Performance measures and worker productivity Updated

Choosing the right performance measures can inform and improve decision-making in policy and management

Elevator pitch

Key findings

Pros

Cons

Author's main message

Motivation

Discussion of pros and cons

Defining worker productivity

Advantages and disadvantages of measuring workers’ productivity

What do performance measures entail?

Multidimensional performance

Quantity and quality

Data sources and aggregation

Time dimension

How can measures of worker productivity be used to inform policy?

Using workers’ productivity for setting incentives

Formal and informal human capital acquisition in firms

Peer effects

Working hours and performance

Gender differences

Public policies and worker behavior

Working from home and measuring performance

Limitations and gaps

Summary and policy advice

Acknowledgments

Competing interests

evidence map

Full citation

Full citation

Data source(s)

Data type(s)

Method(s)

Countries