This article was first published by Clean Run – The Magazine for Dog Agility Enthusiasts, in August 2017. I changed the title after publication in this version. Please see the note about that at the end of the article.
If I’ve trained recall on a variable ratio reinforcement schedule, how likely are my dogs to come away from the fascinating distraction behind the fence?
But do I have to carry around treats or toys forever?
This is a common question from trainers who are new to positive reinforcement techniques. And most of us have heard the following typical answer.
No, you’ll be able to wean the dogs off the treats. You do it like this.
When you first train a new behavior, reinforce the dog every single time she performs it. When she responds consistently to the cue, start reinforcing every other time she performs it. Then do it every third time, et cetera.
When she has learned that she might not get something every time she performs the behavior, randomize the reinforcement. You will reinforce on average every third or fourth time your dog performs the behavior. But avoid staying in a pattern she’ll figure out. This is called a variable ratio reinforcement schedule and it makes behavior resistant to extinction. You can also work in some life rewards.
Unfortunately, the “let’s thin out the reinforcement” plan is based on experimental practices that are hard to duplicate outside the lab. Vital bits of instruction are usually left out when the practice is suggested. And the results of the “thin it out” plan with performance dogs can be dire. For years we were encouraged to thin, thin, thin the reinforcement until it was time to perform in the ring. Then our dogs often performed with no reinforcement from us at all. Or they didn’t perform. Remember when lots of breeds were considered “untrainable”?
Times are changing, and the great trainers are showing us how to develop secondary reinforcers to help transform the ring into a fun place rather than a joyless or scary one. In the agility world, we have the advantage that many of the activities are already fun for our dogs. On top of that, we can easily associate them with good stuff. Even in competition, we can have a cooler with great treats at a seating area close to the ring. If our dogs like to tug, we can bring a reinforcer right into the ring in the form of a well-chosen leash.
Clara thinks training is fun, and that I’m fun too.
But the best thing we can bring into the ring with our dogs is our own self. If we have used rich schedules of reinforcement for daily behaviors as well as agility behaviors, we have likely built a bond with our dog and a beautiful classical association to the activities we do together.
Rather than “thinning out the food” we should consider maintaining the food and adding every other reinforcer we can think of. I aim for a continuous reinforcement schedule for the majority of behaviors I ask of my dogs. Because in the real world, thinning a reinforcement schedule does not have the same effects that it has in the lab. It’s almost impossible to bring along the controlled conditions that yield the desired result.
The problems with using intermittent reinforcement schedules in the real world fall into three areas. A problem in any area can be enough to punch holes in the expected benefits. First, “resistance to extinction” is not the best measure of behavior when our goal is to get enthusiastic, consistent responses exactly when we want them. Second, even if resistance to extinction were our goal, it’s difficult for humans to perform the necessary randomized schedules. Third, in the real world, there are many alternative sources of reinforcement (we call them distractions). That means even when done correctly, the possible value of an intermittent reinforcement schedule can be demolished by something called the Matching Law.
Intermittent Reinforcement and Extinction Trials
Much of the information we have on the effects of variable ratio reinforcement schedules comes from lab experiments called extinction trials (Mowrer & Jones, 1945). An animal confined in a small area is trained to perform a behavior. A monkey may press a lever, a pigeon may peck a disk, or a rat may run down a chute to jump on a platform at the end. The animal performs the behavior repeatedly, and the behavior is reinforced each time. After these reinforced repetitions, the reinforcement schedule is thinned according to a preplanned formula. Reinforcement is gradually reduced, and in some experiments taken down to zero. The pattern of the animal’s response is recorded under these conditions of reduced reinforcement.
It’s true that many studies have shown that a variable ratio reinforcement schedule is comparably more resistant to extinction. (Although it’s important to note that not all studies have shown that. Recent studies have shown that richer reinforcement schedules can lead to better resistance to extinction, a phenomenon called “behavioral momentum” (Nevin, Mandell, & Atak, 1983).)
When a behavior is resistant to extinction, an animal will keep performing it as reinforcement becomes sporadic. But there is no guarantee that the behavior will happen when we want it. A behavior that is resistant to extinction is only more likely to be performed after decreased reinforcement. Also, just because a behavior is resistant to extinction doesn’t mean that it will be performed eagerly, enthusiastically, or with low latency. These are all qualities we value and need in our dogs’ behavior.
Finally, many extinction trials are performed in what is called a “free operant” setup. In this setup, there is a signal to the animal that reinforcement may be available for a certain behavior the animal has already learned. The signal stays on for a period and the animal is free to perform the behavior multiple times. The performances of the behavior are reinforced or not, according to the schedule. But counting free operant responses yields data that have little relevance to most of our training situations. Real world training usually incorporates what are called “discrete trials.” That is, we give one cue and we need the dog to perform a behavior right then. If instead, the dog waits 90 seconds and then performs it three times, those would count as “responses” in a free operant trial. In the lab, they would count towards “resistance to extinction.” But in real life, they wouldn’t help us at all (if we happened to wait long enough to find out about them).
In order to attempt to get the resistance to extinction that can be tied to variable ratio reinforcement schedules, we need to follow a precise plan.
First, we need to train the behavior to fluency. Behavioral fluency is defined as a combination of accuracy plus speed of responding (Binder, 1996). Fluency is a much bigger challenge in the real world than in the lab because we need our dogs to be able to respond in so many different situations. There’s a lot of generalization work to do before we can reduce the schedule. And our dog might never achieve the fluency that an animal alone in a Skinner box could.
Second, after the behavior is fluent and generalized, we need to change the schedule gradually. That’s one thing the science is in agreement on. For instance, if we changed from continuous reinforcement directly to a schedule where the dog was reinforced every eighth time on average, the dog would likely give up rather than transitioning to the new schedule (Schwartz, 2002, p. 219). Resistance to extinction only occurs if the thinning of the schedule is gradual.
And consider what withholding reinforcement means to the dog. When teaching a new behavior, we withhold reinforcement when the dog responds incorrectly. But when we switch to an intermittent schedule, we will withhold reinforcement when the dog responds correctly. We need a plan for explaining the new rules to the dog. Removal of reinforcement is a known cause of frustration and even aggression in animals.
Finally, we need a method to compute and track the schedule. It must average the right number of reinforcements and must be random.
Randomizing is hard for humans. Let’s say we’ve decided that our goal is to reinforce the dog for one out of every four sits, that is, 25% of the time. But it has to be random. So if the dog is going to sit 20 times, we will plan to reinforce five of them, but we can’t do it in a pattern.
If we try to wing it, we’ll likely become predictable. That’s what humans do. We may reinforce more often in the kitchen than in the den, or more often when the dog looks at us a certain way. Or we’ll reinforce when the dog sits in a more difficult situation and consistently skip it during the easier times. And the dog will learn the pattern, because that’s what dogs do. So in those situations where we tend not to reinforce, they will tend not to respond.
How would we address this problem? By preparing beforehand. We can use a random number generator or do it by hand. For example, we could plan ahead to reinforce sits #3, #9, #10, #15, and #18 of the dog’s first 20 sits of the day.
After we’ve memorized the sit numbers, what about the times we ask the dog for eye contact or to get on a mat? We will need a plan for those, too. Good luck with memorizing all that.
This is no joke. The data about the effects of intermittent reinforcement come from precisely computed schedules. If we are going to try to use variable ratio reinforcement, we need to use the methods that make it work.
The Matching Law
There is one law of learning that tends to overpower most others when training in real life, and that is the Matching Law (Herrnstein, 1961). The Matching Law deals with concurrent schedules of reinforcement, where more than one reinforcer is available at a given time. The Matching Law says that a behavior will be performed with a frequency that correlates mathematically to the rate of reinforcement. So the more one is likely to gain reinforcement from a behavior, the more one is likely to perform it.
When we walk out the front door with our dogs, or even out of the training room, the Matching Law hits us square in the face. Look at all those competing reinforcers! Why wouldn’t a dog want to sample all of them? We live in a Matching Law world and all creatures have evolved to take advantage of resources when they are available. It is natural to switch between reinforcers when given the opportunity.
Digging for a turtle: priceless!
This is the biggest problem of all. Our carefully crafted, randomized schedule of reinforcement is in direct competition with richer schedules. Many of the distractions around us are reinforcing every time the dog gains access to them. Popular lampposts don’t pay off with good pee-mail every third or fifth time the dog goes to sniff them. It’s a good bet that they will pay off every single time. Then there are birds. Squirrels. Cats. Other humans. Other dogs.
The Matching Law research approximates real-world conditions better than most lab studies. And the data are consistent. Activities that offer richer reinforcement schedules win.
Slot Machines or Vending Machines?
When discussing variable ratio reinforcement, people often present the idea of a slot machine. They talk about the excitement for the player of wondering if this is the time she will get a payout. They theorize about the excitement and persistence the parallel situation could invoke in their dogs.
But the slot machine model has a problem. Let’s say you are gambling on a slot machine that makes payouts up to $100. The most common payouts are $5 and $10. As you are gambling, someone regularly strolls through the casino, taps you on the shoulder, and hands you a $100 bill. Do you stop and accept the free money, or do you turn away and concentrate on your lever? Of course you take the money! Your machine will still be there after you pocket the cash. (Although you may decide to follow around the money guy instead!)
We are walking around in a world full of free $100 bills for our dogs. Being a slot machine putting out random $5’s and $10’s on a thin schedule is not good protection against them.
Instead, if we are rich and consistent providers of a variety of reinforcement for our dogs: food, play, fun, and social companionship, we have a better chance against those tempting $100 bills.
The agility ring environment is a controlled one. Yes, there are plenty of loose $100 bills in there, but we can proof for many of them. And if we have made agility a source of invigorating, partnering fun for our dogs, we can drive it up towards the $1,000 range.
Agility: Also priceless!
All in all, I’d rather be the much-maligned vending machine. I do plan to carry around the treats forever. I want to be a consistent source of fun and goodies for my dogs. I want to provide as close to continuous reinforcement for the things I ask them to do as I can.
Real life will teach them that occasional brief dry spells of one type of reinforcement are not the end of the world. My goal is not to get the most behavior out of them for the cheapest payout on my part. My goal is for them to have fun, enriching lives and fit into our human world with the most ease possible. Being generous with all sorts of reinforcers works beautifully for agility and daily life.
Addendum 9/21/18: Thank you to Jakub Beran and Eduardo Fernandez for pointing out that my inclusion of the phrase “variable reinforcement” in the title and article was problematic. Although that is generally what people say when they discuss this issue, there is no such thing. And that’s actually part of the problem. There are variable ratio schedules of reinforcement, variable interval schedules of reinforcement, and more. But there is no such thing as a variable reinforcement schedule. A better general term for non-continuous reinforcement schedules is “intermittent reinforcement.” For more information on schedules, check out my article on the matching law linked below.
Binder, C. (1996). Behavioral fluency: Evolution of a new paradigm. The Behavior Analyst, 19(2), 163-197.
Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the experimental analysis of behavior, 4(3), 267-272.
Mowrer, O. H., & Jones, H. (1945). Habit strength as a function of the pattern of reinforcement. Journal of Experimental Psychology, 35(4), 293-311.
Nevin, J. A., Mandell, C., & Atak, J. R. (1983). The analysis of behavioral momentum. Journal of the Experimental analysis of behavior, 39(1), 49-59.
Schwartz, B. (2002). Psychology of learning and behavior, fifth edition. WW Norton & Co.
Copyright 2017 Eileen Anderson
Herrnstein’s Matching Law and Reinforcement Schedules
This article was first published by Clean Run – The Magazine for Dog Agility Enthusiasts, in August 2017. Thank you to Clean Run for publishing it, and for allowing me to republish.