Mission Statement

The Rant's mission is to offer information that is useful in business administration, economics, finance, accounting, and everyday life.

Tuesday, March 26, 2019

Performance Management: Changing Behavior That Drives Organizational Effectiveness (part 18)



Measurement Methods
by
Charles Lamson

There are two major measurement methods: counting and judging. While counting is preferred because of greater reliability, judgement can be a valid way of measuring; it allows us to get the benefit of measuring in areas that would otherwise go unmeasured.

Image result for the ohio river

Counting

Counting is very straightforward. We can count the number of parts made, engineering drawings completed, lines of computer code written, or hours of overtime. Counting is the preferred method of measuring because practically everybody can do it - and with a high degree of reliability. It is easy and usually can be completed quickly.

Another advantage of counting is that when we do not need to count every instant of a behavior or result, we can sample. Sampling involves counting at random times or inspecting random units of production. The process of sampling is highly scientific and as such there are rules about how to select a sample and how many observations or counts constitute a representative sample. Most statistical books cover this subject in detail. Those readers who have been trained in Statistical Process Control are very familiar with sampling methods and processes.

Nevertheless, sampling errors are common in business and everyday life. We often draw conclusions about behavior and results based on too little information. If you are not counting every occurrence of something, you must make sure you have counted enough instances to have an adequate sample.

Image result for the ohio river

Judgment

When the pinpoints you have selected cannot be made specific enough to permit counting, you can use the judgment technique of measuring. Judgment is the process of forming an opinion or evaluation by discerning and comparing.

Even though judgment is less reliable and more subjective than counting, it has at least two very practical uses in Performance Management (PM). First judgment allows you to measure any performance. Second, in most cases when you use judgment measures, you discover new ways to count, and counting is almost always preferred to judgment. However, when you cannot count, there are four techniques that are used in making judgments about performance. They are:
  1. Opinion-based Ranking
  2. Opinion-based Rating
  3. Criteria-based Ranking
  4. Criteria-based Rating
These alternatives are presented in the four cells of Figure 1. They range from a general opinion about who is better in Cell 1 to a much more specific rating of performance in Cell 4.

Figure 1 Four Techniques of Measuring by Judgment
Click to enlarge.

Image result for the ohio river

When making judgments you need a frame of reference. As a starting point, this reference is usually your experience or point of view - in other words, your opinion (Figure 1, Cells 1 and 2). When you rank (assign a number to your opinion), you are practically always forced to be more specific. Ranking causes you to think things like, Would she rank fifth or sixth? I believe I'll rank her fifth because she is friendlier than Ted when she waits on customers; or, I think George would rank above Jack in being responsive to customers because he asks more questions.

If each time you rate or rank you get a little more specific, you will be surprised at how quickly you increase the reliability of your judgments. For example, suppose we have a supervisor who wants to measure a mechanic's neatness but doesn't know exactly how to do it. Let's start by asking the supervisor to rate the performer on a neatness scale where 0 is the messiest mechanic he has ever known, and 10 is the neatest mechanic he has ever known. Also, at the end of the job, ask the mechanic to rate himself. Then compare the ratings. If they both agree, the supervisor has some evidence that they are measuring neatness in similar ways.

Most often they will not agree. What the mechanic sees as very good (an 8 on the scale); the supervisor may see as average (a 5). This will immediately cause questions to come up like, "Why did you give me a 5?" or "Why did you give yourself an 8?" The answers to these questions will add more definition to the scale so that with repeated measures their ratings will match on more items.

Image result for the ohio river

Over time, this opinion-based measure can be made more objective by specifying distinct criteria for judging performance. Once this happens, judgments can be made on the basis of observation (Figure 1, Cell 3 and Cell 4).

Let's assume one element in neatness is to clean up the debris produced before leaving the job. This creates what is referred to as a pre-established criterion.

Look at the samples in Cells 3 and 4. Note that the determinations about neatness are based on observation, rather than simple opinion. Cells 3 and 4 are still somewhat subjective, but they are much more objective than Cells 1 and 2.

The measure in Cell 4 is an Anchored Rating Scale. Judgments are described as concretely as possible and given a weight along the scale from 1 to 10. These judgments are pre-established criteria for, in this case, neatness. This method of judgment gives the performer significantly more information on how their behavior is perceived and specific indications as to how to improve.

While this is the most effective judgment technique. It can be improved by making it a Behaviorally Anchored Rating Scale (BARS). Here examples of the value (neatness, in this case) are given to each rater who scores it on the selected scale (1-5, 1-7, etc.). The scores are averaged so that each example has a common perceived rating. The description "picks up tools and debris, organizes area, wipes surface clean," for example, might merit a value of 7.3. With BARS, we find enough descriptors to discriminate between several levels of performance. This method has the advantage of promoting greater inter-rater reliability so that judgments are not perceived as being capricious or as unique to an individual rater.

Image result for the ohio river

Rank or Rate?

As noted in Figure 1, we can rank or rate. Ranking involves comparing the performances of individuals against each other (Cells 1 and 3). You simply decide whether the performance you are considering is better or worse than the one closest to it. You are in effect lining up the performers from best to worst.

Using ranked scores may cause problems when the results are fed back to the performers. When you rank how well people do, you distribute performance measures across the whole range from best to worst. In other words, you establish only one winner, just as the typical contest does. If you have only one winner, you limit reinforcement for all other performers, thereby ultimately reducing the overall performance. Even if you have a hierarchy of winners (first, second, third place, etc.), you are creating a diminishing distribution of reinforcement. Since behavior is a function of its reinforcement, ranking should be used only when ratings are impractical.

Performances are judged independently when using ratings, thus avoiding the problem of limited reinforcement encountered in ranking systems. In a rating system, all performers could attain a perfect score at the same time, so the fact that you get reinforced for what you do does not reduce the reinforcement others experience for what they do.

Image result for the ohio river

Many performance appraisal processes use a ranking system but call the scores a rating. This is because so many systems try to accomplish incompatible outcomes with the same instrument. One is the attempt to distribute a limited amount of any reinforcer (money, promotion, etc.). This will always require ranking. Only if the reinforcer is unlimited can you use a rating system. The other part of the appraisal process is to improve performance. This requires reinforcement based on what the individual accomplishes and should not be limited by what others do or don't do. In fact, it should not be limited at all since increases in reinforcement increase the rate of change. Calling the hybrid score, used in most appraisal systems, by the more benign name (rating) does not change its nature or its function.

If your ratings are based on opinion, your labels will be terms like poor to excellent, or never, to very often. Your judgments involve only your impressions of the performance (Cell 2), and do not eventually move to rate pre-established criteria (Cell 4), your measure is probably not important to the performer. If it is important, the performers will want to know why they received the ratings they did and what they have to do to get a higher score. When you are able to tell them, you will have what you need to develop a pre-established criterion scale.

Figure 2 is another illustration of the four types of judgment techniques. This example includes only a few of the possible criteria that might be used to measure the value of a suggestion. However, it does demonstrate some of the ways you could measure a good idea.

Figure 2 Measurement Methods Using Judgment

Improving rate reliability

If you must use judgment as a measurement tool, consider training the observers so they use similar criteria. Increasing inter-rater reliability is a way to optimize the value you get from this approach. Since your judgment is based on your personal life experiences and value system, you will differ from others in how you judge what you see. Everyone knows of instances when in discussing someone's personality, opinions were expressed that were all over the map. This is all too often an issue in performance appraisal systems. When people use their opinion as a basis for measurement, you could encounter situations in which people give the same number for different reasons or see the same things but give different numbers.


Image result for the ohio river

Training those who will rate the performers - the observers - to see the same things is the objective of inter-rater reliability training. In essence, this means that the different observers judge the same performance, using the same criteria, and then compare their numbers. They explain their reasoning and determine which criteria used are common and which are unique to the observer. This process is repeated until there is a high degree of consistency in the ratings. This is the technique that allows judges of competitive events, such as athletic contests, to rate performances in a consistent fashion.

*PERFORMANCE MANAGEMENT: CHANGING BEHAVIOR THAT DRIVES ORGANIZATIONAL EFFECTIVENESS, 4TH ED., 2004, AUBREY C. DANIELS & JAMES E. DANIELS, PGS. 139-144*

end

No comments:

Post a Comment