Tag: B.F. Skinner

“The Negative Effects of Positive Reinforcement” by Michael Perone: Another Misrepresented Article

“The Negative Effects of Positive Reinforcement” by Michael Perone: Another Misrepresented Article

Three orange and red bags of Cheetos snacks are standing up in a row

Note: I have been working on this paper for 18 months. Today when I published it, I was unaware that Dr. Perone was the head of a recent task force that concluded that contingent electric skin shock of of a population that could include people with developmental disabilities,  emotional disorders, and autistic-like behaviors could be part of an “ethically sound treatment program.”  It casts his paper in a different light. I’m leaving my writeup published for now because I think we need these answers to what is an often quoted paper. Please don’t consider it in support of Dr. Perone in any way.

“The Negative Effects of Positive Reinforcement” by Dr. Michael Perone is a scholarly article some trainers like to use to muddy the waters about positive reinforcement training. They throw out Dr. Perone’s article title like a bogeyman and use it to defend aversive methods in dog training. That usually indicates they haven’t read it. It’s a thoughtful article and has some interesting things to consider, but it doesn’t say what they seem to think it does. Not even close.

I’m going to list here and summarize the effects of positive reinforcement mentioned in the article. I’ll summarize why they have almost nothing to do with well-executed dog training. They give us something to think about in our human lives. But they apply almost exclusively to humans and our lifestyles, and the ones that can apply to animals are easily avoided.

Positive Reinforcement Can Have Delayed Aversive Consequences

Perone attributes the first mention of these aversive consequences to Skinner and quotes him several times (1971, 1983).

Here’s what they are talking about. Let’s say I spend my whole weekend water-skiing. I may come home with a sunburn (but the sun felt so good!), sore or strained muscles (but every run was great!), and maybe even a hangover (gosh that socializing was the best!). Don’t drink and boat, folks, this is just an example. I may be so wrung out after my fun weekend that I won’t have enough energy to finish the report I was supposed to have completed by Monday. All the things I did were fun and reinforcing at the time and I kept doing them, to the detriment of my body.

These potential longer-term aversive effects are one category of “negative effects” Perone is talking about.

How much do they apply to positive reinforcement-based animal training? Hardly at all! We don’t choose training methods and activities with delayed aversive consequences. As animal guardians, we aim to protect our animals from such consequences in both training and the rest of their lives. For example, we don’t let dogs overdo playing in the water hose—we don’t want to risk obsession or water intoxication. We don’t let a dog with an injury play endless games of fetch, even if they beg us. We interrupt dogs playing with each other when they begin to ramp up into over-arousal. The equivalent of my water-skiing weekend shouldn’t happen.

Perone quotes Skinner about activities that are so reinforcing they exhaust him. Skinner wrote, “Fatigue is a ridiculous hangover from too much reinforcement” (1983). He was concerned that the attraction of highly reinforcing activities would prevent him from more important activities with less immediate reinforcement. This is a crucial concern for any human with control over their activity choices, and one many of us wrestle with for most of our lives. Should I do the immediate fun thing or the less fun thing that has good results over time?

But this is unlikely to be a concern for positive reinforcement-based animal trainers. On the contrary, well-executed positive reinforcement training is a highly reinforcing activity for both the human and animal. It also has delayed positive consequences for both parties.

Do I even need to point out that aversive methods often have long-term aversive consequences, even deadly consequences? There is just no comparison.

Positive Reinforcement Can Make People Vulnerable to Exploitation by Government and Business.

This is true. Exploiters can use positive reinforcement (praise, social acceptance, money, tangible items) to draw people into dangerous or unfair situations from which they can’t escape. This happens on the large scale but also on the small, interpersonal scale. This danger, again, has very little application to training animals or to our lives with animals. We already have a ton of control over their lives, even those of us who do our best to give our animals freedom. We work hard to make even the onerous experiences of life fun for our animals. Things such as some husbandry activities, taking meds, and physical therapy. And we use positive reinforcement to give the animal more choices, more opportunities, a wider world. Plus remember: it’s fun.

Some Reinforcing Activities Naturally Have Delayed Aversive Consequences

This is a reiteration of the first point, but Perone includes a list of “more mundane” activities for short-term pleasure here.

Positive reinforcement is implicated in eating junk food instead of a balanced meal, watching television instead of exercising, buying instead of saving, playing instead of working, or working instead of spending time with one’s family. Positive reinforcement underlies our propensity toward heart disease, cancer, and other diseases that are related more to maladaptive lifestyles than to purely physiological or anatomical weaknesses.

Perone, 2003, referencing Skinner, 1971

Of course!

Here is my own example: Let’s say I eat a whole bag of Cheetos because they are engineered to taste good and cause me to want more and more. The behaviors of reaching into the bag or the bowl and putting a piece in my mouth and all other behaviors that get those Cheetos ingested are immediately and powerfully reinforced. Delayed aversive consequences can include stomachache, bloating, poor nutrition, and that “ick” feeling. Oh yeah, and getting the orange stuff all over my fingers. (See big important note at the bottom of the post. I am not food- or body-shaming here.)

Again, this doesn’t apply to animal training or living with our pets. For instance, with both horses and dogs, we educate ourselves about bloat and do our best to prevent the circumstances that can cause it. And I’m pretty sure I don’t have a single positive reinforcement dog training friend who would let their dog eat a whole bag of Cheetos.

But once during an agility trial, I gave Zani too many rich treats over the course of the day. On our last run, she had diarrhea in the ring. Was my conclusion, “Welp, better stop using positive reinforcement”? Of course not. My conclusion was, “You asshole, you made your dog sick with that Braunschweiger. It could have even been worse; dogs can suffer or even die of pancreatitis from too much fatty food. Don’t do that again.”


Aspects of Positive Reinforcement Schedules Can Be Aversive

Top-down view of a pigeon pecking a yellow button in a Skinner box

Perone describes two studies identifying aspects of positive reinforcement schedules that can be aversive. Yes, in a controlled laboratory environment, we can test to see whether an animal will work to avoid a certain positive reinforcement schedule.

In the first study, the researchers studied the effects on pigeons of a change from a rich reinforcement schedule (Variable Interval 30 seconds) to a leaner one (VI 120 seconds). With some clever indicators to the pigeons of which schedule was in effect, they showed the leaner schedule was an aversive condition compared to the richer schedule and that indicators of the leaner schedule could act as conditioned punishers (Jwaideh & Mulvaney, 1976).

In the second study, pigeons were taught to recognize predictors of changes in reinforcement schedules and reinforcer magnitude. They were given the option to “escape,” to peck a key that would stop the trial until they pecked it again. When the trial was stopped, the indicator lights changed, the “house-light” color and intensity changed, and no pecks on any keys were reinforced. It turned out that within a schedule, the pigeons were most likely to take a time-out just after being reinforced. During schedule transitions, the pigeons were most likely to take a time-out when the indicators told them they were switching from high magnitude reinforcers to lower magnitude reinforcers (Everly et al., 2014). These situations meet the criteria for aversiveness because the birds were opting to escape, to “quit the game” for a time.

These are valuable lessons. It’s important to note that these were “free operant” experiments, rather than the discrete trials we generally use in training. This post discusses the difference. In life, we should have very few situations in which we make large step-downs in reinforcer magnitude or frequency for the same behavior. But it can happen by accident or out of ignorance. If there is likely to be a step-down of this sort, we need to take action about it.

Sable dog trotting toward camera with her mouth open and tail up (looking happy)
Summer in a competitive rally run

The example that comes to mind is competitive obedience. I used to compete in rally obedience with my dog Summer. While learning and practicing, I generally reinforced (and reinforced well, with meat or cheese) every behavior. Then I carefully stepped down to every second or third behavior. This was OK with her, and she maintained her enthusiasm. But what would have happened if, at that point, I had suddenly taken her into an obedience ring and performed a minute-and-a-half-long run of 25 behaviors with no reinforcement until the end? Well, maybe nothing bad performance-wise the first time. Her behaviors were strong and resistant to extinction. But it wouldn’t have been kind, and over time (it doesn’t take much time at all!) she would have learned the trial environment predicted no goodies while in the ring. This happened to a lot of dogs before skilled positive reinforcement trainers entered the obedience world.

Thanks to modern dog training methods, we now know lots of ways to make the ring experience happier for the dog and not have that huge step-down in fun. These include using conditioned reinforcers and putting some thought into our reinforcement schedules. Luckily, I had good teachers. What I did was gradually wean Summer from intermittent treats during the run during practice while teaching her she would get a mega-treat (a whole jar of chicken baby food) at the end of the run. We even practiced a fun “hurry from the ring to our crating area to get the treat” sequence as part of the routine when preparing. Believe me, this switch did not diminish her interest and happiness with rally at all! And I was able to do the same during trials, so trials didn’t predict a leaner schedule to her.

Conclusion

Please note what I have not said here. I have not said that training with positive reinforcement has no possible negative consequences. It can. When we humans hold access to all the good stuff, it takes a mindful approach to avoid coercion. But if we are positive reinforcement-based trainers, avoiding coercion is already a top goal. Schedule effects such as Perone describes are a very good thing for us to learn about to provide the best, happiest experience for our animals. Punitive schedule changes can be avoided.

In the meantime, keep in mind that the negative side effects of positive reinforcement training listed in this article by Perone are minimal in animal training. These effects are not at all comparable to the potential fallout from force-based training, which can ruin the lives of dogs and destroy relationships.

The title of the article causes some trainers who use highly aversive methods to hope it can work as a “gotcha” to support their stance. “Look, positive reinforcement is just as bad!” Except it doesn’t show that at all, and they would know if they had read it. Or they do know, and expect you not to read it. Next time you see it referenced, feel free to link to this post.

Training with positive reinforcement, even moderately well, is unlikely to have delayed aversive effects. It’s more likely to have both current and delayed beneficial effects.

A Note about Cheetos

I eat Cheetos and other snack foods. I’m aware they are engineered to be extremely tasty but not satisfying, so we eat more. I eat them anyway. I don’t food shame anybody. I don’t idealize thin body types. I hope everyone reading has the resources to treat themselves to plenty of their preferred pleasures in life, both short-term and long-term.

Further Reading

I find this article by Balsam and Bondy, The Negative Side Effects of Reward, a far better discussion of challenges we might encounter when doing positive reinforcement training. Before you get worried: this article is not at all damning of positive reinforcement-based animal training either. It gives some very practical information about challenges we already recognize. For instance, if you use a powerful food reinforcer, you may get more “food approaching” behavior than the behavior you are trying to capture and reinforce. (“My dog is distracted by the food!”) This is a fairly minor training challenge. The other points in the article are similar. Again, the negative side effects” are not at all comparable to the fallout associated with force-based training.

Also, for advanced reading and more information about how to make positive reinforcement training the best it can possibly be, take a look at Nonlinear Contingency Analysis by Layng, Andronis, Codd, and Abdel-Jalil (2021).

Thank you to my well-qualified friend who looked over my post. All mistakes, of course, are my own.

Related Post

References

Balsam, P. D., & Bondy, A. S. (1983). The negative side effects of reward. Journal of Applied Behavior Analysis16(3), 283-296.

Everly, J. B., Holtyn, A. F., & Perone, M. (2014). Behavioral functions of stimuli signaling transitions across rich and lean schedules of reinforcement. Journal of the Experimental Analysis of Behavior101(2), 201-214.

Jwaideh, A. R., & Mulvaney, D. E. (1976). Punishment of observing by a stimulus associated with the lower of two reinforcement frequencies. Learning and Motivation, 7, 211- 222.

Layng, T. J., Andronis, P. T., Codd, R. T., & Abdel-Jalil, A. (2021). Nonlinear contingency analysis: Going beyond cognition and behavior in clinical practice. Routledge.

Perone, M. (2003). Negative effects of positive reinforcement. The Behavior Analyst26, 1-14.

Skinner, B. F (1971). Beyond freedom and dignity. New York: Knopf.

Skinner, B. F. (1983). A matter of consequences. New York: Knopf.

Errorless Learning II

Errorless Learning II

Seems like I’ve been eating a lot of humble pie lately. Pull up a chair and have a slice with me, won’t you?

This is an addendum to and correction of my post on Errorless Learning.

A knowledgeable Internet friend  gently let me know a few things about the origin of Errorless Learning that I had incorrect in my earlier post. She was generous with her time and patient with my learning curve. These misconceptions affected some of my conclusions as well. I’m letting my earlier article stand, since there is still much in there that I think is worthwhile and accurate. But I’m linking them and I hope anyone who read it will read this as well.

Mine were common misconceptions. Up until today, the Wikipedia article on errorless learning, along with quite a few other posts and articles, attributed the term to Herbert Terrace and cited his experiments with pigeons as the beginning of its usage, as did I. But in 1963, behaviorists, psychologists, and educators had been discussing errorless learning for 30 years.

Errorless learning was an instructional design introduced by B.F. Skinner in the 1930s. There was controversy at the time, and following, about whether errors were necessary in learning a behavior. Jesus Rosales-Ruiz summarized the approach from Skinner’s 1968 book The Technology of Teaching as follows:

“In [Skinner’s] system, errors are not necessary for learning to occur. Errors are not a function of learning or vice-versa nor are they blamed on the learner. Errors are a function of poor analysis of behavior, a poorly designed shaping program, moving too fast from step to step in the program, and the lack of the prerequisite behavior necessary for success in the program” (Rosales-Ruiz, 2007).

It helps to know that Skinner was responding to the then-famous 1898 Thorndike paper called “Trial and Error Learning” which posited that learning was a slow and laborious process. Skinner’s response was that it didn’t  have to be–with proper planning, the teacher could grease the skids for the learner and learning could be achieved through “trial” only. Hence “errorless learning.”

Skinner’s system has been summarized really well elsewhere so I’m not going to go into it at length in this post. I may later, since it seems overshadowed by Terrace’s work and is fascinating. My friend summed it up as an anti-lumping program: through good planning make the correct behavior so easy as to be almost inevitable, slice the desired behavior/s thinly and maintain a very high rate of reinforcement. This method is highly humane, in contrast to what I was objecting to in my other post. But it’s more than “setting the animal up for success” as we think of it in a general way. As I understand it, it entails much more planning and prompting than we usually see, even in clicker training these days.

Terrace’s work, on which I based my objections in the other post, was an example of how low the error rate could be pushed down with extreme planning and control of the environment and training. I still maintain that the level of control he maintained with pigeons in the lab would be not only unfeasible but some aspects actually undesirable in our home environments. And I maintain, if not skepticism, an appreciation of the challenges inherent in teaching a later conflicting behavior if one behavior were taught with controls similar to Terrace’s, and especially with the number of repetitions. However, these concerns cancel out. If we can’t attain the control and terrifically low rate of error Terrace got in the first place, we aren’t going to have the problems that Marsh and Johnson‘s pigeons had in learning a conflicting behavior.

Geikie, a greater sulfur-crested cockatoo, in a color discrimination session with a high rate of reinforcement
Geikie, a greater sulfur-crested cockatoo, in a color discrimination session with a high rate of reinforcement

Another astute reader questioned the logic of saying that the design of the experiment defined the concept or the term (thanks Margery!). Turned out she was right on target, since the concept and the name predated the experiments. Not only that, but the educational work she described doing herself in her comment on  the original post is directly related to the early learning methods for humans that Skinner worked on. Which were also very humane, pleasant, and natural for the learner.

The Method vs the Term

Which brings me back to one of my original gripes. The term itself. But first I can now say what is good about the term. It refers to the fact that a subject does not need to make errors to learn a behavior.

What I don’t like about the term remains: it is inaccurate and unattainable for lots of behaviors for most of us in real world training. Jesus Rosales Ruiz writes in a piece that discusses errorless learning in a positive way,

“At each step of the program, the learner has a reasonable chance of success….Good shaping is characterized by high rates of reinforcement and low use of extinction.”  (emphasis mine)

Dr. Rosales Ruiz trains in an academic setting, but also in the real world. “Reasonable chance” of success is not 100%. And “low use of extinction” is not 0%. These are much friendlier terms for the average trainer. But as my friend reminds me, if we treat the term “errorless” as an ideal and put our mind and heart into setting the animal up to succeed as much as possible, we will get really good training.

The Rosales Ruiz article is available as a download in this post in Mary Hunter’s great blog. It is a must-read.

I have a final objection to the term, not the method. It is probably the most important of all that I have said.

I don’t like it because it can make people feel bad, and thereby discourage them. Approaching zero errors requires not only great control over the learning environment, but great skill, which many of us don’t have. For literal people, the term itself, coupled with the amazingly low error rate in Terrace’s article that is always cited, can make the whole concept daunting and unapproachable.

In my other post I suggested a couple of candidate behaviors for errorless learning; behaviors for which there is no potential conflicting opposite behavior, such as scent work like a diabetic alert. There is another obvious candidate in the dog world: house training. We really would prefer no mistakes, and hardly anybody anymore would say that is is important that the dog have an accident in the house and be punished for it for their understanding to become complete. It’s a great example. However, I know several people who thought they were utter failures because they failed to live up to the “no accidents” and errorless approach that is written about in at least one puppy book, including dire predictions of terrible things that will happen if there are errors.

Is this use of a pee pad in error?
Is Summer’s use of a pee pad in error?

Which brings me to my final point. If the role/goal of the teacher is to set the student up for success and make the desired behavior the easiest and most likely, shouldn’t the term itself focus on success and not errors? Also, Skinner himself said that it was about the teaching, not the learning.  I would never have written any of these rants if the term had been something Enhanced Chances of Success Teaching instead of Errorless Learning. But I doubt if the whole educational world is going to change its nomenclature just for me.

Grins. Thanks for reading!

Coming up soon:

Eileenanddogs on Youtube

P.S. A Project

The Wikipedia article has been amended to begin with Skinner’s role in errorless learning and now has several references for that. However, the bulk of the article is still about Terrace’s work, which is actually just one of the many offshoots.  Skinner’s method and the discussions from the 1930s on need to be the focus. It would be great if somebody who knows about that history, or wants to read up on it, would further edit the article.

Copyright 2021 Eileen Anderson All Rights Reserved By accessing this site you agree to the Terms of Service.
Terms of Service: You may view and link to this content. You may share it by posting the URL. Scraping and/or copying and pasting content from this site on other sites or publications without written permission is forbidden.