Performance evaluation will not die, but it should Kevin R. Murphy
Abstract
A wide range of systems for evaluating performance have been used in organisations, ranging from traditional annual performance appraisals to performance management systems built around informal, real‐time evaluations, and these systems almost always fail. Rather than continuing to make cosmetic adjustments to this system, organisations should consider dropping the practice of regularly evaluating the performance of each of their employees, focusing rather on the small subset of situations in which evaluations of performance and performance feedback are actually useful. Four barriers to successful performance evaluation are reviewed: (a) the distribution of performance, (b) the continuing failure to devise reliable and valid methods for obtaining judgments about performance, (c) the limited utility of performance feedback to employees, and (d) the limited utility of performance evaluations to organisations. In this paper, I propose ways of managing performance without relying on regular performance evaluation, refocusing managers' activities from performance management to performance leadership.
1 INTRODUCTION
Organisations use a range of systems to evaluate, manage, reward, and direct the job performance of their employees. These often take the form of formal performance appraisal systems, which include annual reviews of employee performance, formal feedback sessions or appraisal interviews, efforts to calibrate evaluations across departments or divisions, and the use of appraisals to drive key human resource management decisions, such as salary increases, training, or even the separation of poor performers (Murphy, Cleveland, & Hanscom, 2018, provide the most recent review of research on performance appraisal). Some appraisal systems are built to motivate future performance, by linking evaluations of performance with valued rewards, whereas others are designed to identify poor performers and either correct their performance or separate them from the organisation (Murphy & Cleveland, 1995; Welch & Byrne, 2001). Other organisations employ performance management systems that are built to align the performance goals and activities of employees, work groups, departments, and divisions with the broad strategic goals of the organisation and support employees in executing the plans and strategies that are used to achieve unit goals (Aguinis, 2013; Pulakos, Mueller‐Hanson, Arad, & Moye, 2015; Pulakos & O'Leary, 2011). Performance management systems are sometimes built around traditional performance appraisal methods (e.g., annual evaluations of performance, see Pulakos et al., 2015), but increasingly these systems are built around more streamlined and informal evaluations, focusing on real‐time feedback rather than annual summaries of performance (Aguinis, 2013).
Regardless of how they are designed or configured, performance appraisal and performance management systems are almost always rated as failures by both employees and management (Adler et al., 2016; Murphy et al., 2018; Pulakos et al., 2015). In this paper, I will argue that there is a common feature in virtually all performance appraisal and performance management systems that contributes substantially to their failure—that is, they are built around subjective evaluations of job performance (Murphy et al., 2018).
The central thesis of this article is that the process of performance evaluation in organisations is fundamentally flawed, regardless of the specific form performance evaluation systems take and that radical changes are called for in organisational systems that depend upon these evaluations. That is, I call for the development of performance management systems in organisations in which the supervisor's judgment about the adequacy or the level of each employee's performance is not sought or conveyed to every employee on a regular basis.
It is useful to define precisely what I mean by “performance evaluation.” Performance evaluation is a process in which one of more individuals in organisations (typically supervisors) observe and obtain information about the job performance and effectiveness of individual employees. They use this information to make subjective, evaluative judgements about the performance of individuals. The term “subjective” is used in the same sense as in Landy and Farr (1983)—that is, an evaluation is subjective if it requires judgment and cannot be arrived at by a simple objective count. “Subjective” does not imply that these judgments are biased or inaccurate, simply that they are not subject to external objective verification. This lack of external verifiability, however, does leave evaluations of job performance open to doubt and challenge. The term “evaluative” means that judgments about performance can be scaled on a negative to positive continuum. That is, performance might be described as poor versus good or as unacceptable versus acceptable; performance evaluations are ultimately statements about the value the evaluator places on the employee's performance. Regardless of the specific form performance appraisal or performance management systems take, all of these systems rely on evaluative judgments about the performance and effectiveness of employees, and that is their Achilles heel.
This paper is divided in to two major sections. First, I document the persistent failure of performance appraisal and performance management systems in organisations and the lack of effective responses to these failures. The second section of this paper I take up the question of whether we should evaluate performance, and I show both why this process is so challenging and why organisations obtain so few benefits and incur so many costs in attempting to evaluate job performance. I offer the potentially radical suggestion that organisations should abandon the whole concept of regularly evaluating the performance of each employee; I end this paper by describing potential replacements for performance evaluation.
2 THE FAILURE OF PERFORMANCE APPRAISAL AND PERFORMANCE MANAGEMENT SYSTEMS
The coming demise of performance appraisal in organisations has long been a staple of the business press. A number of large organisations (e.g., Accenture, Deloitte, Microsoft, GAP, and Medtronic) have abandoned or substantially curtailed their use of formal performance appraisal systems (Buckingham & Goodall, 2015; Capelli & Tavis, 2016; Culbert & Rout, 2010; Cunningham, 2015), and it is easy to see why. Performance appraisals are often described as the “job managers love to hate” (Pettijohn, Parker, Pettijohn, & Kent, 2001, p. 754) and as “one of the most persistent problems in organizations” (Gordon & Stewart, 2009, p. 473). Supervisors and employees dread performance appraisals (Adler et al., 2016), and the great majority of appraisal systems in organisations are viewed as ineffective (Pulakos et al., 2015; Smith, Hornsby, & Shirmeyer, 1996).
Performance management systems do not seem to fare much better; the conclusion that performance management is broken is shared among many researchers and practitioners (Pulakos et al., 2015; Pulakos & O'Leary, 2011). Reviews of research on both performance appraisal and performance management noted that there is little if any evidence that these systems have any real impact on the performance or effectiveness of employees (DeNisi & Murphy, 2017; DeNisi & Smith, 2014; Murphy et al., 2018; Pulakos et al., 2015; Pulakos & O'Leary, 2011). More than a century of research has been devoted to identifying and fixing the problems with performance appraisal and performance management systems in organisations (Austin & Villanova, 1992; DeNisi & Murphy, 2017), but to date, this research has not led to performance appraisal or performance systems that are seen by their users as consistently accurate or useful.
Critiques of performance appraisal and performance management systems are often dominated by practical concerns. For example, Deloitte, a global firm offering consultation, auditing and financial advisory services found themselves devoting a great deal of time and money to performance management; Deloitte's review suggested that they were spending two million hours per year completing forms, holding meetings, and conducting performance reviews (Buckingham & Goodall, 2015). The large investments of time and energy these systems require are not unique to Deloitte; many organisations devote significant resources to performance appraisal and performance management systems. Unfortunately, it is not always clear that this investment yields substantial benefits to organisations. On the contrary, the approval of and satisfaction with performance appraisal and performance management systems is often abysmal (Holbrook, 1999; Levy & Williams, 2004; Reinke, 2003; Russell & Goode, 1988; Taylor, Tracy, Renard, Harrison, & Carroll, 1995).
Surprisingly, widespread dissatisfaction has not led most organisations to abandon performance appraisal or performance management. On the contrary, the great majority of medium to large organisations continue to have some sort of formal programme for evaluating the performance of their employees, give them performance feedback, and use the results of performance appraisals to inform and sometimes to drive decisions about rewards, such as salary increases, promotions, developmental opportunities, and sanctions, such as layoffs or terminations (Lawler, Benson, & McDermott, 2012; Mercer, 2013). For example, Mercer (2013) surveyed more than 1,000 organisations in more than 50 countries and reported that the vast majority of organisations set individual goals (95%) and conduct formal year‐end review discussions (94%) and that most link individual ratings and compensation decisions (89%). Gorman, Meriac, Roch, Ray, and Gamble (2017) surveyed Fortune 500 firms in the United States and reported that a substantial majority of these firms use formal performance appraisals as a key part of their human resource management strategy. The recent (2015–2016) Cranet survey of organisational policies and practices (Dewenttick & Remue, 2011) provided information from over 4,800 organisations in 24 countries throughout the world. The great majority of firms included in this study (82%) had a formal performance appraisal system in operation, and it was common to use the results of appraisal to make decisions about pay, development, and career moves.
Rather than abandoning formal systems for performance appraisal or performance management, many organisations make periodic attempts to improve their systems, by using improved rating procedures or incorporating a wider array of information in evaluating employee performance. For example, Buckingham and Goodall (2015) describe the radical approach Deloitte took to simplifying their performance rating system, narrowing down a complex performance appraisal form to four simple items (e.g., given what I know of this person's performance, I would always want him or her on my team).1 Other organisations go in the opposite direction, adding layers to their evaluation system in an effort to increase its accuracy and acceptance. For example, many organisations have adopted some form of multisource feedback (often referred to as 360‐degree feedback) in which performance evaluations and feedback are sought from peers, subordinates, customers or clients, or self‐evaluations in addition to the traditional reviews conducted by supervisors (Atwater, Brett, & Charles, 2007; Bracken, Rose, & Church, 2016; Bracken, Timmreck, & Church, 2001; Taylor & Bright, 2011). Although initially developed in North American organisations, these multisource appraisal systems have become increasingly popular across the globe (Bailey & Fletcher, 2002; Brutus et al., 2006; McCarthy & Garavan, 2001).
Still, other organisations have moved away from traditional annual performance appraisals towards performance management systems that involve more frequent and informal performance evaluations and feedback (Aguinis, 2009, 2013; Buckingham & Goodall, 2015; Pulakos, 2009). These systems are often attractive to employer and employees because they seem simpler and more user‐friendly (Murphy et al., 2018), but there are persistent doubts about the effectiveness of performance management systems. There is little evidence that real‐time performance management systems provide better or more useful evaluations of performance than more traditional performance appraisal systems, and even less evidence that they do any better than traditional performance appraisal systems in increasing the performance and effectiveness of employees (DeNisi & Murphy, 2017; DeNisi & Smith, 2014; Murphy et al., 2018).
In thinking about why performance appraisal and performance management systems refuse to die, three conclusions stand out. First, many stakeholders believe that it is important and beneficial to measure performance and to use that information to drive decisions. Second, it is widely believed that performance feedback is valuable and that it helps to improve employee motivation and performance. Third, there does not seem to be any clear alternative to the type of evaluation system most organizations use; virtually, every system that has been proposed to replace traditional performance appraisal (e.g., performance management systems, evaluation systems based on objective performance, and productivity measures) has fared as badly, if not worse. I believe that all of these assumptions and conclusions are flawed and that the strategies organisations typically follow in an effort to improve their performance appraisal, and performance management systems are also flawed.
The assumption that tends to drive most efforts to improve performance appraisal and performance management systems in organisation is that some surface feature of the system (e.g., the rating scales being used, the schedule for providing feedback, and the way managers are trained to implement these systems) is the problem, and replacing this feature or set of features with will lead to meaningful improvements in these systems. The persistent failure of a wide range of approaches to improving performance appraisal and performance management systems suggests that a new strategy is needed. Rather than continuing to rearrange the surface features of performance appraisal and performance management systems, I propose that it is time to critically examine the concept that underlies virtually all of these systems—that is, performance evaluation.
3 SHOULD WE EVALUATE PERFORMANCE?
The belief that we should evaluate job performance is usually taken as a given, and the debate is almost always over how to measure performance rather than about whether the entire enterprise is misdirected. A critical examination of the belief that it is useful and beneficial to evaluate performance suggests that the problem is not that we are doing performance evaluation badly but rather that we are doing it at all.
The belief that you should evaluate job performance is based on a series of assumptions, none of which are well grounded. First, performance evaluation involves the assumption that there is something meaningful to measure—that is, that people differ in meaningful ways in their effectiveness in performing their jobs. Research on the distribution of job performance calls this assumption into question. Second, the belief that you should evaluate performance makes sense only if it is feasible to do this. The failure of nearly a century of research on performance evaluation to provide methods of obtaining judgments about performance that are reliable and valid calls that belief into question.
These first two assumptions deal with the feasibility of performance evaluation. If there are not broad and meaningful differences in performance, or if they cannot be reliably measured, performance evaluation may be a doomed enterprise. However, even if it is feasible to evaluate performance, it is not clear whether these evaluations have any real value for employees or organisations. For example, the most frequent justification for formal performance appraisal systems in organisations is that they provide performance feedback that is useful to employees (Murphy et al., 2018). Research on both reactions to feedback and on the effectiveness of feedback call this assumption into question. Similarly, organisations often claim that performance evaluations are an important component of their human resource management programmes, driving decisions such as salary increases or training. The reality is that many organisations make little effective use of performance evaluations, either ignoring them, attempting to use them for incompatible and conflicting purposes, or using them in a half‐hearted way that undermines the value of these evaluations.
4 THE DISTRIBUTION OF JOB PERFORMANCE CAN MAKE PERFORMANCE EVALUATION POINTLESS
It has long been assumed that job performance is distributed normally and that there is meaningful variability in job performance and effectiveness. This assumption is explicit in studies designed to estimate the economic utility of human resource interventions (Hunter & Hunter, 1984; Hunter & Schmidt, 1982), in which a central concern is expressing the variability in job performance in monetary terms (Bobko, Karren, & Kerkar, 1987; Bobko, Shetzer, & Russell, 1991). These studies suggest that the variability in job performance is substantial and meaningful, typically producing estimates of the standard deviation of job performance worth the equivalent of 40–70% of the annual salary attached to that job. For example, in a job where the annual salary is $50,000, these methods will often lead to the conclusion that employees near the lower end of the performance distribution (e.g., at the 15th percentile, or one standard deviation below the mean) produce goods and services worth approximately $25,000, whereas those nearer to the top of the distribution produce goods and services worth approximately $75,000. If you assume that performance is normally distributed, all you need to know is the mean and standard deviation of the distribution of job performance to make statements about the value of performance at any particular level and about the likely economic impact of a wide range of interventions.
In a series of papers, Aguinis and his colleagues have strongly challenged the assumption that job performance is normally distributed (Aguinis & Bradley, 2015; Aguinis, Ji, & Joo, 2018; Aguinis & O'Boyle, 2014; Aguinis, O'Boyle, Gonzalez‐Mulé, & Joo, 2016; Crawford, Aguinis, Lichtenstein, Davidsson, & McKelvey, 2015). They note that many indices of performance and contribution to the organisation appear to follow a power law distribution rather than a normal distribution. This same trend has been noted by other researchers (e.g., Clark, 2012), and it is hardly limited to performance distributions. This power law can be used to describe phenomena ranging from distribution of wealth and the ownership of resources, to the magnitude of earthquakes to the size of cities; it can even be used to characterise the distribution of journal article citations.2 Figure 1 illustrates the type of power law distribution that has been widely observed in studies of accomplishments and performance.
It is important to understand precisely what the power law says about job performance. This distribution does not imply that most people are poor performers. Rather, a power distribution suggests that (a) very few people are highly effective at their jobs and (b) the great majority of the distribution is made up of people who are markedly less effective. The bulk of the workforce is likely to represent people who are at least acceptable performers; they are simply not star performers.3
Arguments about the exact shape of this distribution of performance are interesting in an academic sense, but the true importance of the work of Aguinis and his colleagues becomes apparent when one focusses on the contributions different types of performers make.
The key argument Aguinis and his colleagues make is that a handful of top performers or “stars” contribute disproportionately to the overall output of most organisations. Aguinis and Bradley (2015) describe the management of star performers as the “secret sauce for organisational success” and recommend that organisations should devote their attention to a handful of stars, whose contributions are critical to success. Much in the same way that there are a small number of people who are immensely rich or a small number of research articles that attract large numbers of citations, the data these studies review make a compelling case that a small number of people perform at a high level of effectiveness and that differences among the rest of the employees in an organisation are comparatively small and unimportant.
It is useful to think through the implications of the argument by Aguinis and his colleagues that in many settings, there are a handful of stars who perform at a very high level and a very large number of employees whose performance is substantially less stellar and relatively homogeneous. If this is a good representation of the distribution of job performance, there is hardly any need for complex methods of performance evaluation. The stars should be easy to spot, and virtually, everyone else will be performing at such a lower level that differentiating among these more average performers will be virtually pointless. Consider, for example, the job of car sales. Most salespersons sell about 10 cars a month, and eight cars a month is often described as poor performance, but a star salesperson might sell 20 or more cars a month.4 In a power law distribution, there is little meaningful variation in the performance or contribution of most workers, at least in comparison to the differences between the stars and the rest of the employees. This suggests that there might be no need for a complex system of performance measurement. The differences between stars and the rest will be so large that they will be quite easy to detect, and the differences in performance levels of the individuals who are not stars will be comparatively small and unimportant.
The starting point for virtually any performance evaluation system is the belief that there is something worth measuring and evaluating. If most people perform at very similar levels, and the only distinction that is worth making is the simple and obvious distinction between stars and almost everyone else, the idea that it is worthwhile and important to put much effort into evaluating job performance loses a good deal of its lustre.
5 IT MAY NOT BE POSSIBLE TO OBTAIN RELIABLE AND VALID JUDGMENTAL MEASURES OF PERFORMANCE
Putting aside the question of the distribution of job performance, the belief that it is useful and important to evaluate job performance implies that it is feasible to obtain performance evaluations that provide reliable and credible evidence about job performance. There are some jobs in which objective measures capture the many important aspects of job performance (e.g., piece‐rate production and sales), but even in these jobs, objective measures of job performance rarely capture the full range of behaviours that are thought to represent effective performance on the job (Landy & Farr, 1983, provide a detailed review and critique of objective measures of job performance). As a result, the evaluation of job performance almost always depends on the subjective judgments of supervisors, peers, or other sources that might be asked to evaluate particular employees (Murphy et al., 2018; Murphy & Cleveland, 1995).
There is a very large literature dealing with performance ratings and other subjective measures of job performance, and it is not possible to summarise it entirely here. Suffice it to say that important concerns have been raised about the feasibility of measuring job performance well with judgments. First, there is considerable evidence that ratings of job performance are not reliable. Viswesvaran and his colleagues (e.g., Ones, Viswesvaran, & Schmidt, 2008; Viswesvaran, Ones, & Schmidt, 1996) have documented the pervasive levels of disagreement in job performance ratings and have suggested the reliability of job performance ratings is approximately .505 Well‐developed tests often show much higher levels of precision, with reliabilities in the .85–.90 range. Measures with reliabilities lower than .70 are often described as not sufficiently reliable to be used in making important decisions (Nunnally, 1978). Interrater reliability is poor when ratings are obtained from similar raters (e.g., managers); even lower levels of interrater reliability are reported when ratings are obtained from different sources (e.g., peers vs. supervisors; Conway & Huffcutt, 1997; Valle & Bozeman, 2002).
The weakness of subjective evaluations as measures of performance is not limited to low interrater reliability. A number of studies have attempted to evaluate the validity of evaluative judgments about performance by modelling the sources of variability in ratings of performance. If performance ratings were a reasonably accurate index of job performance, you would expect that most of the variability in ratings would be due to the performance of the people being evaluated and that relatively little of the variance would be due to extraneous factors such as the source of ratings (were ratings obtained from supervisors or peers), the timing of performance ratings, or irrelevant attributes of the people being evaluated (e.g., gender and attractiveness). This is not what has been found. For example, Scullen, Mount, and Goff (2000) and Greguras and Robie (1998) attempted to explain the variability in ratings obtained from multiple raters. They found that about a third of the variance in performance evaluations is due to differences in ratee performance (i.e., some ratees are better performers than others) and that the majority of the variability in ratings of job performance is due to irrelevant factors such as unique rater biases. Studies by Greguras, Robie, Schleicher, and Goff (2003), Dierdorff and Surface (2007), Hoffman, Lance, Bynum, and Gentry (2010), and Woehr, Sheehan, and Bennett (2005) show a similar pattern, documenting the importance of factors other than the performance in determining the evaluations different employees receive.
Murphy (2008) and DeNisi and Murphy (2017) summarised substantial bodies of research that suggest that the types of performance measures most widely used in performance appraisal and performance management (i.e., ratings of performance by supervisors, peers, or some other source) are at best weak indicators of individual job performance. They are not reliable and are influenced by a host of factors that have little or nothing to do with the performance of the person being evaluated. Despite almost a century of research on developing better methods of evaluating performance, little progress has been made in developing methods for obtaining reliable and valid evaluations of job performance. A wide range of interventions (e.g., rater training, improving rating scales, and adding more levels of rating) have been tried over the last century, and none of them has led to substantial improvements in subjective ratings as measures of job performance. Arguably, it is time to admit defeat and conclude that subjective evaluations of job performance are simply not adequate measures of peoples' effectiveness (Adler et al., 2016).
6 PERFORMANCE FEEDBACK IS NOT ACCEPTED AND IS NOT USEFUL TO MOST RECIPIENTS
One of the major justifications for investing in performance appraisal and performance management systems is that they provide employees with valuable performance feedback. It is far from clear that the assumption that performance feedback is valuable is justified. There is evidence feedback can be useful, especially when people have little experience with their job, and the feedback contains information they might not normally have access to (Li, Harris, Boswell, & Xie, 2011). However, the proposition that giving most employees performance feedback is valuable is not one that receives strong support in the literature. Kluger and DeNisi (1996) conducted the most wide‐ranging review of studies of the effects of feedback. They found that feedback leads to improvements in performance in roughly a third of the studies they reviewed. However, feedback leads to decreases in performance in a roughly equal number of studies. The other third of the studies they reviewed suggested that feedback has little discernable effect.
Studies of performance feedback in organisations are similarly discouraging (Seifert, Yukl, & McDonald, 2003; Smither, London, & Reilly, 2005). The most widely studied programmes are those that provide feedback from multiple sources (e.g., peers, supervisors, subordinates, and clients), on the theory that each source might provide unique and valuable information. Like other feedback programmes, the effects of multisource feedback on subsequent behaviour and performance appear to be mixed, at best (Atwater et al., 2007; Atwater, Waldman, & Brett, 2002; Seifert et al., 2003; Smither et al., 2005). Research on performance management leads to similar conclusions. A defining feature of most performance management programmes in organisations is that they provide frequent informal feedback to employees (Aguinis, 2009, 2013; Pulakos, 2009). There is, however, little evidence that this feedback leads to improvements in performance (DeNisi & Smith, 2014). Indeed, the whole idea of giving frequent feedback may be dubious. Put yourself in the place of an employee who feels like the performance feedback he or she gets is inaccurate and biased (i.e., too low). Do you think a new system that features a lot more feedback will seem appealing? Many employees actively avoid performance feedback (Moss, Valenzi, & Taggart, 2003); adding more feedback may do little more than increase their stress.
In longitudinal studies of feedback systems, there is evidence that the largest performance improvements come early in the process and that subsequent feedback might have less influence on the behaviour of recipients (Reilly, Smither, & Vasilopoulos, 1996). That is, the first time you give an employee feedback, it might be useful. The fifth or sixth time you give similar feedback is likely to feel more like nagging than a sincere attempt to help the employee. This suggests that rather than giving everyone regular feedback, organisations would be better served by giving feedback only where it is useful. I will examine approaches for targeting feedback later in this paper.
Finally, it is important to note that while there are good reasons for skepticism about the general value of performance feedback, there is considerable variability in employees' interest in and willingness to act on performance feedback. Many employees dread receiving (and many supervisors dread giving) performance feedback (Adler et al., 2016). However, there are some employees who actively seek feedback and who are willing to use that information to change their behaviour. Research on feedback‐seeking behaviour (e.g., Anseel, Beatty, Shen, Lievens, & Sackett, 2015; Ashford, Blatt, & VandeWalle, 2003; Ashford & Cummings, 1983) has identified a wide range of individual and situational factors that can lead individuals to actively seek out performance feedback. Unfortunately, the people who need and would benefit from performance feedback (e.g., poor performers) are often actively avoid feedback (e.g., Moss et al., 2003; Moss, Sanchez, Brumbaugh, & Borkowski, 2009), and some feedback seeking is probably an effort to get bosses to recognise good performance rather than an effort to improve future performance (Nae, Moon, & Choi, 2015). Although there are good reasons to doubt that all employees want or benefit from performance feedback, there are certainly some employees who do.
7 CONCERNS OVER FAIRNESS AND ACCURACY UNDERMINE THE POTENTIAL VALUE OF FEEDBACK
The track record on feedback is not encouraging, but it is also not entirely negative. This suggests there might be some situations in which feedback has the potential to lead to improvements in performance. However, research on effects performance feedback in organisations suggests a potentially serious barrier to the effectiveness of even the most carefully targeted feedback. There is considerable evidence that in order to work, performance feedback must be accepted by recipients as fair and valid (Anseel & Lievens, 2009; Hedge & Teachout, 2000; Keeping & Levy, 2000; Leung, Su, & Morris, 2001). Unfortunately, the type of evaluative judgments that underlie performance appraisal and performance management are rarely seen as accurate or fair.
First, performance ratings, even when they are truly accurate, are often seen by recipients as unduly harsh. There is extensive evidence (e.g., Campbell, Campbell, & Ho‐Beng, 1998; Harris & Schaubroeck, 1988; Meyer, 1980; Thornton, 1980) that people view their own performance more favourably that do their supervisors, their peers, or other external raters. Indeed, the tendency for people to view ratings they receive from others as unfairly low has been identified by Murphy et al. (2018) as one of the principal structural sources of failure in performance appraisal systems. If people receive performance ratings that are perfectly accurate, they are likely to perceive them as being too low and to dismiss them as inaccurate or distort the feedback they receive to make it feel more positive (Waung & Highhouse, 1997). One result is that performance feedback, even when it is accurate, is likely to be dismissed or distorted rather than accepted and acted upon.
The persistent gap between the feedback people believe they should get (i.e., that their performance is good) and the feedback they do get (i.e., that their performance is not that good) contributes to destructive cycle of cynicism and distrust that undermines the value of performance evaluations and performance feedback. Murphy et al. (2018) discuss the “death spiral” of performance appraisal systems, describing ways in which disappointing experiences with performance evaluation lead to higher levels of cynicism and disengagement, and show how these negative experiences feed upon themselves. The belief that the systems used in organisations to evaluate performance are exercises in organisational politics rather than performance measurement is particularly destructive in this regard (Longenecker, Sims, & Gioia, 1987).
Employees who receive evaluations that are less positive than they believe they deserve will naturally ask themselves why they are being undervalued, and if they reach the conclusion that supervisors and managers are distorting evaluations to help advance their own interests and the interests of their favourites, trust in performance evaluations is likely to sink quickly (Curtis, Harvey, & Ravden, 2005; Tziner, 1999). There is evidence that political considerations of this sort do influence performance evaluations (Tziner, 1999; Tziner, Latham, Price, & Haccoun, 1996) and that a politicised work environment can contribute to less positive attitudes and even lower levels of performance (Hochwarter, Witt & Kacmar, 2000). If the gap between the evaluation you receive and the evaluation you believe you deserve is attributed to systematic biases or self‐seeking behaviour on the part of evaluators, this belief can fatally undermine your willingness to accept and act upon performance feedback.
This belief that feedback is not accurate or fair has been shown to lead to reductions in motivation and to lower level of willingness to comply with suggestions for improvement (Kinicki, Prussia, Wu, & McKee‐Ryan, 2004). Because they are subjective judgments of value or worth, performance evaluations are more likely to breed resentment than feedback that might be obtained from objective sources (e.g., sales reports). Research documenting negative reactions to feedback is so well established that a considerable amount of attention has been given to ways of providing feedback that will dampen negative reactions. For example, rather than giving feedback about something that has already happened, Kluger and Nir (2010) suggest using feedforward.
The feedforward interview focuses on (a) articulating what has gone well, by eliciting description of positive experiences from the target; (b) understanding how the strengths of the target and the context in which the event or experience contributed to that positive experience; and (c) helping the target to apply those strengths and/or situational resources to increase the likelihood of success and positive experiences in the future. Feedback is often focused on mistakes or what went wrong, and even when it focuses on what went well, there is rarely any systematic exploration of why it went well. Building on the idea of feedforward rather than feedback, Bouskila‐Yam and Kluger (2011) proposed a fundamental reorientation of performance appraisal and feedback. Rather than attempting to measure performance levels or to give feedback about performance deficiencies, they proposed a strength‐based performance appraisal system. Borrowing key ideas from positive psychology (Seligman & Csikszentmihalyi, 2000), they proposed that performance reviews should focus on what people do well and should work towards establishing goals that are based on strengths rather than weaknesses.
The recent movement towards feedforward and strength‐based assessments suggests that it might be possible to reduce some of the negative reactions performance feedback so often engenders. However, the very fact that such programmes are needed is clear testimony to one of the fundamental weaknesses of most forms of performance evaluation. Many people do not want to receive evaluative feedback and are not likely to use that feedback as a tool for improving their performance (Murphy et al., 2018), especially if the feedback is less positive than what they believe they deserve.
8 PERFORMANCE EVALUATIONS ARE NOT PARTICULARLY USEFUL TO ORGANISATIONS
Organisations devote significant resources to obtaining performance evaluations, and it is fair to ask what they actually do with this information. Cleveland, Murphy, and Williams (1989) suggested that there were four broad purposes for evaluating job performance: (a) to make distinctions between individuals, such as identifying the best candidates for salary increases or promotions; (b) to make distinctions within individuals, such as identifying individual strengths and weaknesses for the purpose of determining training and development needs and priorities; (c) to support HR systems in organisations, such as validating personnel tests, evaluating the success, or training programmes; and (d) documentation, such as providing a record to support decisions such as promotions or dismissal.
Several surveys have shown that performance evaluations are most commonly used in organisations for two purposes—that is, to provide information that can be useful for making training and development and to serve as input for decisions about salary, promotions, layoffs, and dismissals (Dewettinick & Remue, 2011; Mercer, 2013; Milliman, Nason, Zhu, & De Cieri, 2002). For example, in a recent study of performance appraisal and performance management practices in over 3,000 organisations in 24 countries, Morley, Murphy, Cleveland, Heraty, and McCarthy (2019) reported that over 80% of these organisations claim to use the results of performance evaluations to make decisions about employee development and over 75% reported using this same information when making decisions about career moves (e.g., promotions and transfers). Unfortunately, it has been known for over 50 years that these two purposes put conflicting demands on performance evaluation systems (Meyer, Kay, & French, 1965). For the purposes of assessing training and development needs, performance ratings that highlight differences within individuals, separating strengths and weaknesses, are most useful. However, if each performance appraisal has clear peaks and valleys (i.e., high ratings on some aspects of performance and low ratings on others), the average rating each individual receives will tend to be similar, making it difficult to distinguish between employees. On the other hand, performance appraisals that are designed to make distinctions between individuals function best when there is clear separation in terms of the average ratings people receive, and this only occurs when there are not substantial peaks and valleys in individual rating profiles (Murphy et al., 2018; Murphy & Cleveland, 1995). Thus, a performance appraisal system that is particularly useful for assessing training needs is likely to be nearly useless for identifying candidates for a raise or promotion, and vice versa. Unfortunately, the majority of organisations that have formal performance appraisal systems appear to use them for these two conflicting purposes, often ending up with appraisals that are not useful for either purpose (Cleveland et al., 1989; Murphy & Cleveland, 1995).
The conflicting demands of using performance evaluations for administrative purposes (i.e., salary and promotions) and developmental purposes (e.g., training and developmental assignments) are only one factor limiting the usefulness of performance evaluations in organisations. There are also important characteristics of the evaluations themselves that greatly limit their utility. First, it is common for the vast majority of employees in an organisation to receive similar evaluations, with the great majority of employees rated as “above average” in most organisations(Bretz, Milkovich & Read, 1992; Murphy & Cleveland, 1995). This creates significant range restriction in performance evaluations. If almost everyone receives ratings of “4” or “5” on a 5‐point rating scale, it will be very difficult to distinguish between employees who are actually performing at a high level from those who receive high ratings because they will complain and be demotivated if they receive the ratings they deserve (Miceli, Jung, Near, & Greenberger, 1991). Second, there are persistently high levels of correlations among evaluations of different aspects of performance (often referred to as halo; Cooper, 1981a, 1981b; Murphy, Jako, & Anhalt, 1993). This combination of range restriction and high levels of intercorrelation in ratings of separate aspects of performance creates an almost insurmountable barrier effectively using performance evaluations for one of its most commonly cited uses—that is, for training and development. The whole theory of using performance evaluations to guide the choice among training programmes is that each person has a potentially different set of strengths and weaknesses, and the training should be focused on building on strengths and addressing weaknesses. The sad fact is that most performance profiles are relatively flat, with few meaningful peaks and valleys (Cooper, 1981a; Murphy & Cleveland, 1995). Thus, even if organisations offer the possibility of pursuing different courses of training and development depending on one's overall level of performance and one's unique strengths and weaknesses, performance evaluations seldom prove useful for distinguishing among people on either basis.
The second major purpose of performance appraisal is to make decisions about rewards, such as salary increases. The range restriction that is typically observed in performance ratings substantially undermines the idea of using rewards to motivate performance or using performance to make decisions about rewards. If most employees end up receiving similar evaluations, it may not be possible to effectively link performance with rewards. In fact, it is not simply range restriction in performance evaluations that weakens links between performance and rewards. Organisations are often unwilling to take serious steps to link their decisions with employees' performance levels. This is most apparent in studies of merit pay.
Many organisations claim to provide some sort of performance‐based pay, in which good performance is rewarded and encouraged by higher levels of pay (Gerhart & Fang, 2015; Gerhart, Rynes, & Fulmer, 2009; Schaubroeck, Shaw, Duffy, & Mitra, 2008). Despite the widespread acceptance of the idea that better performance should lead to better pay, serious questions have been raised about the effectiveness of merit pay in most organisations. There is surprisingly little evidence that merit pay systems are effective (Heneman, 1992; Milkovich & Wigdor, 1991). Part of the problem is the discrepancy between what organisations are willing to commit to rewarding merit and what it appears to take to make a real difference.
Murphy et al. (2018) note that merit‐based salary increases are typically in the range of 2–3% of annual salary. There is a good deal of evidence that pay increases are not seen by recipients as meaningful unless they are at least 7% of one's annual salary (Mitra, Gupta, & Jenkins, 1997; Mitra, Tenihälä, & Shaw, 2016). Murphy et al. (2018) suggest that organisations that give merit‐based pay raises in the 2–3% range are more likely to breed cynicism than to motivate their employees. In a typical merit‐based pay system, truly superior performers receive salary increases that are virtually identical to those received by average or poor performers, and this is hardly a recipe for increasing employee motivation and commitment.
9 IS IT POSSIBLE TO MANAGE PERFORMANCE WITHOUT EVALUATION?
The challenges of putting together performance appraisal and performance management systems are well known; these systems are expensive, time‐consuming, and stressful. Performance appraisal has survived multiple attempts to bury this practice largely because managers, supervisors, and even employees believe, at some level, that it is useful and important to formally evaluate employees' job performance. Performance management appears to be limping along on the same basis; supervisors, managers, and executives continue to believe that it is useful and important to evaluate their employees.
Suppose you accept the argument advanced in the preceding section, it is not useful, important, or even feasible to formally evaluate the performance of most employees. Could organisations function without performance evaluation? Is not evaluation central to the design of performance appraisals or performance management systems? There are several reasons to believe that getting rid of the practice of evaluating the performance of all of your employees on a regular basis would not substantially disrupt the way organisations manage their human resources.
First, there are many reasons to believe that organisations do function without genuine systems for evaluating the performance of their workers. To be sure, most organisations go through the motions of conducting performance appraisals, giving performance feedback, and incorporating performance ratings into their HR decisions. However, if performance ratings are unreliable and do not distinguish employees in terms of either their overall performance levels or their unique strengths and weaknesses, and if performance feedback is generally ignored or ineffective, organisations are carrying on without functioning performance appraisal or performance management systems. If they are not really measuring performance, and if the measures they do obtain are ignored by employees and have only minimal effects on organisational decisions, it is hard to argue that they have genuine performance appraisal or performance management systems, regardless of all the forms they fill out or the feedback meetings they hold.
Second, the idea that evaluating most employees is essentially pointless and sometimes positively harmful (e.g., negative experiences with evaluation can lead to cynicism and disengagement) does not mean that performance evaluation and attempts to manage performance have to be abandoned in all instances. There are many reasons to believe that organisations and their employees would benefit if the emphasis was switched from providing evaluations of all employees to providing coaching and assistance to the subset of employees who are most likely to benefit from it. For example, there is evidence that developmental feedback can be useful when the recipient is new to the task or is a newcomer in the organisation (Li, Harris, Boswell & Xie, 201; Nurse, 2005; Reilly et al., 1996). Rather than serving as judges who are resented and mistrusted by many of their subordinates, supervisors and managers might act as coaches, providing information, support, and suggestions for activities to achieve employee's goals. Feedback that is tightly focused on those employees who need it and that is oriented towards development rather than evaluation can prove valuable in organisations.
There is clear evidence that employee coaching can be effective (Carr, 2016; Gregory & Levy, 2010, 2011), but it is also clear that there are significant challenges to effective coaching. Coaches are most likely to succeed if they are willing to and skilled at communicating with employees about their job performance, are willing to work with employees and to believe that employees can improve, and are willing to try a range of approaches to helping their subordinates (Gregory & Levy, 2011; Heslin, VandeWalle, & Latham, 2006). Because the feedback coaches provide is typically focused on learning and development rather than evaluation, rewards, and sanctions, employees may be more receptive to this form of help and guidance than they are to traditional performance evaluations. A switch from evaluation to coaching also helps in resolving one of the longest standing controversies in performance appraisal and performance management.
For over 50 years, researchers have advocated separating the two functions that most performance appraisal and performance management systems are designed to serve—that is, making decisions about rewards and sanctions (e.g., pay raises) and making decisions about development (Cleveland et al., 1981b; Meyer et al., 1965). Murphy et al. (2018) reviewed the most common explanations for the failure of traditional performance evaluation systems and concluded that the use of performance evaluations to drive decisions about raises, promotions, and other valued rewards represented the most serious barrier to the success of these systems. In traditional performance evaluation and performance systems, supervisors and managers are strongly motivated to distort their evaluations to blur distinctions between employees and to give evaluations that will cause the fewest problems rather than giving evaluations that are truly useful. The idea of replacing broadly targeted evaluation with carefully targeted coaching avoids the trap of building performance evaluation systems that are doomed to fail because they are pursuing conflicting objectives (Cleveland et al., 1989; Murphy & Cleveland, 1995).
The term “performance evaluation” is one that is no longer frequently used, but it is an apt term. Evaluation is all about judgments of value, and the hard lesson of almost 100 years of research on performance appraisal and performance management is that these judgments are difficult to make and even more difficult to use effectively. Employees do not really want to hear whether or not you value their contributions, in part because they suspect that will not like what they hear. Organisations pretend that these judgments of each employee's value are important, but they end up doing very little with these judgments. With exceptions at the very high and the very low end of the scale, performance evaluations often have little practical effect on the lives or the work of most employees. Their links to pay are weak, and their links to training, development, and career advancement are often uncertain. Dropping performance evaluations for most employees might seem like a bold, and even a reckless step, but it may in truth represent nothing more than a recognition of the limited roles these evaluations in fact play in the lives of organisations or their members. One of the most persistent complaints about traditional performance appraisal systems is that they involve collecting and recording a tremendous amount of information (see, e.g., Buckingham & Goodall, 2015), which is promptly filed away and forgotten (Murphy et al., 2018; Murphy & Cleveland, 1995). In too many organisations, performance evaluation is a sham and a tiresome exercise in checking the boxes. We would all be better off if organisations cold redirect this effort and energy towards that limited set of employees who actually want and need feedback and towards that limited set of situations where this feedback can be put in the form of helpful guidance rather than judgments that will not be accepted or acted upon.
10 REORIENTING PERFORMANCE MANAGEMENT
What would a nonevaluative, coaching‐oriented system of performance management look like? I believe it would look remarkably like the type of system Aguinis (2009, 2013) advocates, with a few minor tweaks. In particular, I believe it would be possible to create successful performance management systems by focusing on the behaviours that have consistently been shown to be essential to successful leadership and coaching. One of the earliest substantive theories of leadership (Bass & Bass, 2008; Tracy, 1987) describes leader behaviour in terms of two key constructs—that is, consideration (exhibiting concern for the welfare of employees and work groups) and initiating structure (i.e., defining roles, plans, and strategies for accomplishing key tasks and responsibilities). There have been many advances in leadership theory since then (Avolio, Walumbwa, & Weber, 2009; Kozlowski, Mak, & Chao, 2016), but the key insights of this early theory still gives managers concrete suggestions for improving performance appraisal. Both these early theories of leadership and more contemporary thinking about coaching (e.g., Office of Personnel Management, 2019) suggest that the first task in creating a successful performance management system emphasises initiating structure.
One of the core concepts of performance management is the idea of cascading goals (Aguinis, 2013; Pulakos, 2009; Pulakos et al., 2015). The first responsibility of a supervisor or manager should be to help translating broad unit‐level goals into plans and strategies individuals and work teams can pursue to help realise these goals. That is, managers and supervisors should focus first and foremost on their roles as communicators and translators. The cascaded goals the performance management literature describe tell managers and supervisors what must be done. Their first and most important task is to help in determining how these goals will be accomplished.
Organisations devote substantial amounts of time and resources to train managers to evaluate the performance of their subordinates. In their review of 100 years of performance appraisal research, DeNisi and Murphy (2017) noted that there is little convincing evidence that this training leads to better or more useful appraisals. Rather than training managers to be better judges, I believe it is more useful to devote training resources towards developing coaching skills. Graham, Wedman, and Garvin‐Kester (1994) identify specific skills that underlie successful coaching; two of the most critical of these skills are the ability to communicate clear performance objectives and the ability to link individual employees' behaviour with the performance goals of the unit. Reorienting training towards developing these skills would have the potential to contribute substantially to the success of performance appraisal systems.
In traditional performance appraisal and in many variations of performance management, managers and supervisors have also been responsible for enforcement—that is, monitoring employee behaviour to make sure it is consistent with these plans and goals and correcting deviations from these plans as they occur (Murphy et al., 2018). There is evidence (Milkovich & Wigdor, 1991) that this emphasis on using rewards and sanctions (e.g., pay increases, promotions, and dismissal) to enforce goals and plans is both worn‐headed and ineffective. Rather than acting as a judge to evaluate deviations from plans and strategies that are imposed above and forcing employees to get back on track, a coaching‐oriented system of performance management would be built around persuasion, inspiration, and assistance—that is, consideration. That is, the goal of the supervisor or manager should be to build commitment to and engagement with the organisation and its broad goals and to assist and support employees who are having difficulty carrying out their particular roles in the process of executing this process.
In traditional performance appraisal and performance management systems, there is a great deal of emphasis on managing the behaviour of individual employees by paying attention to whether they are not they are doing what their performance plans say they should be doing and using rewards and sanctions to steer their behaviour back in the desired direction when it deviates from that plan. A reformed performance management system would focus on why employees are deviating from plans and identifying ways to help and encourage employees in executing the plans and strategies that have been agreed upon for accomplishing unit‐level goals. Rather than rewarding adherence to and punishing deviations from these plans, managers who are effective at demonstrating consideration will focus on identifying the barriers employees face (e.g., lack of information or resources and lack of understanding of plans) and removing them. To be sure, some performance deficiencies are likely to be a result of the employee's low level of motivation and commitment to the plan, but even here, the job of supervisors and managers should be to find ways to build motivation and commitment rather than cudgel employees into conforming with plans and strategies that have been imposed upon them.
At the heart of this proposal to reform performance management, it a belief that forcing supervisors and managers to act as judges, evaluating employee behaviour, rewarding compliance, and sanctioning deviation from externally imposed plans and strategies prevent them from functioning effectively as leaders. There are many theories and models of effective leadership (Avolio et al., 2009; Bass & Bass, 2008), and they offer a wide range of suggestions for ways of influencing the behaviour of individuals and teams. Very few of these suggest that the approach embodied by traditional performance appraisal systems—that is, evaluate each person's performance on several dimensions and provide feedback at the end of the year, with the possibility of a small raise if you are judged to be effective—is likely to be effective. Even the best performance management systems fall short of what most leadership theories suggest, because they depend on goals, plans, and strategies that are imposed on employees, coupled with close monitoring of employees to detect and correct deviations from those goals, plans, and strategies. One of the consistent themes running through a wide range of leadership theories is that leaders are most effective when they can inspire, guide, and assist employees (Bass & Bass, 2008). A performance management system built around coaching and incorporating the two key behaviours leaders need to exhibit—that is, consideration and initiating structure—holds real promise from organisations. Stop evaluating and start leading!
ENDNOTES
- 1At the same time, they simplified their performance rating system, Deloitte added elements to their performance management system, such as increased feedback, that may have reduced the extent to which the new system is actually simpler than their old one.
- 2http://blogs.sciencemag.org/sciencehound/2016/08/04/journal-impact-factors-fitting-citation-distribution-curves/
- 3The same exact distribution form will not hold in all instances; Aguinis et al. (2016) explore aspects of jobs and organisations that may influence the exact shape of the distribution of performance, and Beck, Beatty, and Sackett (2014) note the different types of performance measures may show very different distributions.
- 4https://www.huffingtonpost.com/quora/how-much-do-car-salesmen_b_7504680.html
- 5There are debates among psychometricians about how to best interpret interrater agreement measures (LeBreton, Scherer, & James, 2014; Murphy & DeShon, 2000), but there is clear consensus that performance ratings obtained from multiple raters show lower levels of agreement than would be expected from reliable and valid measures.
CONFLICT OF INTEREST
The author declares that there is no conflict of interest.
Comments