Copyright © 2007-2018 Russ Dewey
More Reinforcement Techniques
Before you can reinforce a behavior, the behavior must occur. What if the behavior is not occurring?
Then you must use the technique called shaping, also known as the method of successive approximations. It was mentioned earlier in connection with teaching a rat to press a bar in a Skinner Box.
What is the technical name for "shaping"?
To approximate something is to get close to it. To do successive approximations is to get closer by small steps. Shaping works by starting with whatever the organism can already do and reinforcing closer and closer approximations to a goal.
Here are five simple rules for shaping.
What are five rules to observe, while using shaping?
1. Make sure the target behavior is realistic and biologically possible.
2. Specify the current (entering) behavior and desired (target) behavior.
3. Plan a small chain of behavioral changes leading from the entering behavior to the target behavior.
4. If a step proves too large, break it into smaller steps.
5. Use reinforcers in small quantities, to avoid satiation (getting full).
How are the five rules illustrated by teaching a dog to catch a Frisbee?
To illustrate the five rules, consider the task of teaching a dog to catch a Frisbee. If you have ever seen dogs catch a Frisbee, you know it is quite impressive. Suppose you want to teach your dog this trick. How do you do it?
According to rule #1, you have to decide whether your dog is physically capable of such an act. The national champion Frisbee-catching dogs are usually dogs like whippets with a lean, muscular build permitting them to leap high into the air. Other breeds–bulldogs, Pekinese, and dachshunds–might be less able to learn this skill.
Suppose you have a dog that is physically capable of catching a Frisbee. Rule #2 says "specify the current (entering) behavior." This must be a behavior the dog can already perform. It should be a behavior that can be transformed, in small steps, into the target behavior.
Frisbee catching requires that the dog take a Frisbee into its mouth, so you might start by reinforcing the dog for the entering behavior of holding the Frisbee in its mouth. Most dogs are capable of doing this without any training. They can puncture a Frisbee with their canine teeth, so use a cheap Frisbee you do not mind sacrificing.
The dog enters the experimental situation with this behavior already in its repertoire. That is why it is called an entering behavior.
Rule #3 says to devise a series of small steps leading from the entering behavior (holding the Frisbee in his mouth) to the target behavior (snatching the Frisbee from the air). Finding this sequence of steps is the trickiest part of shaping.
How can you get from "here" to "there"? One approach is to toss the Frisbee about a foot in the air toward the dog, hoping it will perform the skill so you can reinforce it. Unfortunately, this probably will not work.
The dog does not know what to do when it sees the Frisbee coming, even if the dog has chewed on it in the past. It hits the dog on the nose and falls to the ground.
This brings us to rule #4. If a step is too large (such as going directly from chewing the Frisbee to snatching it out of the air) you must break it into smaller steps. In the Frisbee-catching project, a good way to start is to hold the Frisbee in the air.
The dog will probably rise up on his hind legs to bite it. You let the dog grab it in his mouth, then you release it. That is a first, simple step.
Next, you release the Frisbee a split second before the dog grabs it. If you are holding the Frisbee above the dog, you might drop it about an inch through the air, right into the dog's mouth.
Now the most critical part of the shaping procedure takes place. You gradually allow the Frisbee to fall a greater and greater distance before the dog bites it.
You might start one inch above the dogs mouth, work up to two inches, then three, and so on, until finally the dog can grab the Frisbee when it falls a whole foot from your hand to the dog's mouth. (For literate dogs outside the U.S. and Britain, use centimeters and meters.)
Keep rule #4 in mind. If the dog cannot grab the Frisbee when it falls 8 inches, you go back to 6 inches for a while, then work back to 8, then 10, then a foot.
Eventually, if the dog gets into the spirit of the game, you should be able to work up to longer distances. Once the dog is lunging for Frisbees that you flip toward it from a distance of a few feet, you are in business. From there to a full-fledged Frisbee retrieval is a matter of degree.
Rule #5 says to have reinforcers available in small quantities to avoid satiation. Satiation (pronounced SAY-see-AY-shun) is "getting full" of a reinforcer: getting so much that the animal (or person) no longer wants it.
If satiation occurs, you lose your reinforcer. Then your behavior modification project grinds to a halt. Food reinforcement, if it is required, must therefore be used in small quantities.
Why is satiation unlikely to be a problem in this situation?
Dogs respond well to social reinforcement (praise and pats). That never gets old to a loving dog. So dog trainers do not necessarily have to use food reinforcement at all. Retrieval games are intrinsically reinforcing to many dogs.
When I took a dog obedience course, the trainer used retrieval of a tennis ball to reinforce her dog at the end of a training session. That was an example of the Premack Principle, because a preferred behavior (retrieving a tennis ball) was used to reinforce non-preferred behaviors (demonstrating obedience techniques).
Prompting and Fading
Prompting is the act of helping a behavior to occur. This is a useful way to start teaching a behavior. A coach who helps a small child hold a baseball bat, to teach a proper swing, is using prompting.
Fading is said to occur when the trainer gradually withdraws the prompt. For example, the baseball coach gradually allows the child to feel more and more of the bat's weight, until the coach is no longer holding it. Eventually the child swings the bat alone. The prompt has been "faded away."
What is prompting and fading?
Prompting and fading is commonly used in dog obedience training. For example, to teach a dog to sit, one gives the command (sit) then forces the dog to comply with it by gently sweeping the arm into the dog's back knees from behind, so the dog's back legs buckle gently and its rump goes down to the ground.
Meanwhile one holds the dog's collar so the head stays up. This forces the dog to sit. When the dog sits, the trainer praises it or offers a morsel of food.
How is prompting and fading used in dog obedience training?
The command is a stimulus that eventually functions as an S+. The upward tug on the collar and the arm behind the back knees are called a prompt because they help or prompt the behavior to occur.
The procedure of gradually removing the prompt is called fading. The prompt becomes weaker and weaker; it is "faded out." After about 20 repetitions there is no need to touch the back of the dog's legs; one says "sit" and the dog sits.
How did a city use prompting and fading?
Prompting and fading was used by one city when it switched from signs with the English words "No Parking" to signs with only an international symbol (a circle with a parked car in it and a diagonal line crossing it out). For the first three months, the new signs contained both the international symbol and the English words.
Then the words were removed. People hardly noticed the transition to the new signs, because their behavior was transferred smoothly from one controlling stimulus to another.
Differential reinforcement is selective reinforcement of one class of behaviors from among others. Unlike shaping, differential reinforcement is used when behavior already occurs and has good form (does not need shaping) but tends to get lost among other behaviors. The solution is to single out desired behaviors and reinforce them.
What is differential reinforcement? How is it distinguished from shaping? What is a "response class"?
Differential reinforcement is applied to a category or group of of behaviors. For example, if one was working in a day care center for children, one might reinforce any sort of cooperative play, while discouraging any fighting. The "cooperative play" behaviors would form a group singled out for reinforcement.
Such a group is labeled a response class. A response class is a set of behaviors–a category of operants–singled out for reinforcement while other behaviors are ignored or (if necessary) punished.
The only limitation on the response class is that the organism being reinforced must be able to discriminate it. Differential reinforcement is limited by the pattern recognition capabilities of the animal being reinforced.
In the case of preschoolers at a day care center, the concept of cooperative play could be explained to them in simple terms. ("Play nicely, don't hit" etc.)
Children observed to engage in cooperative play would be reinforced in some way that worked. For example, they might be appreciated verbally. Or they might receive a star on a chart, or they could be allowed time to do what they enjoy most.
Karen Pryor is a porpoise trainer who became famous for demonstrating that porpoises could be reinforced directly for creative behavior. Evidently porpoises are intelligent enough to discriminate a response class consisting of novel behaviors.
Pryor reinforced two porpoises at the Sea Life Park in Hawaii any time the animals did something new. The response class, in this case, was any behavior the animal had never performed.
Pryor set up a contingency whereby the porpoise got fish only for performing novel (new) behaviors. At first this amounted to an extinction period. The animals were getting no fish.
How did Pryor reinforce creative behavior?
As usual when an extinction period begins, the porpoises showed an "extinction burst" or extinction-induced resurgence. The variety of behavior increased, and the porpoises showed a higher level of activity than normal.
They tried their old tricks but got no fish. Then they tried variations of old tricks. These were reinforced if the porpoise had never done them before.
The porpoises soon caught on to the fact that they were only being given a fish if they did new and different things. One porpoise "jumped from the water, skidded across 6 ft of wet pavement, and tapped the trainer on the ankle with its rostrum or snout, a truly bizarre act for an entirely aquatic animal" (Pryor, Haag, & O'Reilly, 1969).
The animals also emitted four new behaviors: the corkscrew, back flip, tailwave, and inverted leap. None had ever before performed spontaneously by porpoises.
DRO and DRL
A special form of differential reinforcement is differential reinforcement of other behavior, abbreviated DRO. "Other" behavior means any behavior except the one you want to eliminate.
In the behavioral laboratory, DRO is technically defined as "delivery of a reinforcer if a specified time elapses without a designated response." In other words, the animal can do whatever it wants, as long as it does not do a particular behavior for a certain period of time. Then it receives a reinforcer.
What is DRO? What are situations in which DRO might be useful?
DRO is used to eliminate a behavior without punishment. Suppose you have a roommate who complains constantly about poor health.
You could say, "Stop talking about your health" but that would be rude. So how can you encourage your roommate to stop talking about health?
One approach is to use DRO. If the roommate spends a minute without talking about health, you pay attention and act friendly. If the conversation turns to aches and pains, you stop talking.
Eventually, if the procedure works, your roommate will stop talking about health problems. As this example shows, DRO involves extinction of the problem behavior. You cut off reinforcements to the behavior you want to get rid of (extinction) and you reinforce any other behavior (DRO).
How can DRO supplement or replace punishment?
Whenever punishment is used, DRO should be used as well. If a parent feels the need to discipline a child, the child should not merely punish the wrong response with a "No!" or other aversive stimulus. The parent should positively reinforce a correct response.
Some students (who experienced "spare the rod, spoil the child") are surprised to learn that physical punishment can be avoided altogether with children. Children respond strongly to the social reinforcer of sincere appreciation (see "Catch them being good.")
Another variation of differential reinforcement is DRL or differential reinforcement of a low rate of behavior. DRL occurs when you reinforce slow or infrequent responses.
Psychologists were initially surprised that such a thing as DRL could exist. After all, reinforcement is defined as increasing the rate of behavior. However, many animals can learn a contingency in which responding slowly produces reinforcement.
What is DRL?
A student reports using DRL to deal with a roommate problem:
My experience with my roommate is an example of DRL. My roommate is a wonderful person, but she talks too much.
A simple yes or no question receives a "15 minute lecture" answer. She talks constantly.
After the psychology lecture on differential reinforcement for a low rate of behavior, I decided to try this method. When I asked her a simple question and received a lengthy answer, I simply ignored her or left the room.
When she gave a simple reply, I tried to seem interested and even discussed her answer. Now my roommate talks less and I don't get as aggravated with her. [Author's files]
Pryor, K. W., Haag, R., & O'Reilly, J. (1969) The creative porpoise: Training for the novel behavior. Journal of the Experimental Analysis of Behavior, 12, 653-661.
Write to Dr. Dewey at firstname.lastname@example.org.