A guest post by Eduardo Fernandez, first published in 2001 in the now out-of-print American Animal Trainer Magazine as “Click or Treat: A Trick or Two in the Zoo.”
Photo courtesy of Wikimedia Commons.
A recent discussion on an Association of Zoos and Aquariums listserv, (specifically their ‘training’ list) caught my eye and my keystrokes, and one that has apparently become a commonplace discussion among many bridge trainers. The discussion emerged as a simple inquiry by another list member on whether it was appropriate to use a bridge without being followed by a “treat”, (whether food or some other backup reinforcer). I quickly answered that anything less than a 1:1 pairing would weaken the reinforcing value of the bridge, and put the subject to rest. But a strange thing happened. As I continued to read the posts on this listserv, many other list members took the exact opposite stance: that it was ok to ‘click’ and not treat, and that such ‘click or treating’ may even strengthen the bridge. Astounded by the ensuing discussions and arguments, I decided to gather up the data and attempt a thorough review of what was the appropriate way to go about this business of clicks with or without a treat. The following is the result.
Operants and Respondents: Behavior’s Double Helix
Important to understanding any behavioral process, especially those entailed in bridge training methods, are the dual roles that both respondent and operant conditioning methods share. Many behaviorists recognize the importance of both processes on behavior, and many more recognize the practical impossibility of stating any set of responses as only respondent or operant behavior. Our training methods, whether one uses a bridge or not, are no different. For the sake of this article, however, I’ll focus on the use of a bridge, (specifically a clicker, although any bridge could fit the equation), and the dual processes involved.
As those of us who use bridges know, one must first pair the sound of the clicker with some reinforcer for it to function as a conditioned reinforcer, (a process referred to as “magazine training” in the laboratory). This is best understood, however, through the process of respondent conditioning. Just as Pavlov conditioned the tone of a bell with food to elicit a conditioned response, so do we initially pair the sound of a clicker as a conditioned stimulus (CS) with some unconditioned stimulus (US), generally food.
The continual pairings during our training programs between the sound of the clicker and food should also be understood through the process of respondent conditioning. Even though we are now also using the clicks as conditioned reinforcers (CR’s), the respondent conditioning process is still at work.
Pavlov’s work reveals two crucial discoveries relevant to bridge training: the temporal distance between the presentation of a US and the CS and the occurrence or nonoccurrence of a US following a CS, (Pavlov, 1928). Pavlov found that the distance between the presentation of the US following a CS important. The further in time the two were presented, the weaker the effects of the CS. Also, Pavlov found that each presentation of a CS without the following US weakened the effects of the CS.
Extinction and Ineffective CR’S
Later researchers also examined the importance of the CS-US pairings and their temporal distance, as well as the conditioned reinforcer effects based on such pairings. The Rescorla-Wagner model (Rescorla & Wagner, 1972) gives us such an extinction curve, where one can graphically demonstrate the weakening of a CS over time when not paired with a US.
Other researchers examined the reinforcing effects of a CR based on its previous pairings with a primary reinforcer/US. Again, the results were as Pavlov had demonstrated almost a century ago, the longer the temporal delay between the CS and US, the weaker the CS’ s effects were (Fantino, 1977). But what of the actual reinforcing effects of a CS? Does a weak CS necessarily mean a weak CR? Egger and Miller (1962) examined this aspect itself. They conditioned rats by pairing two different stimuli (SI and S2) with food on two different schedules. One stimulus (S1) was always followed by a US (a 1:1 CS-US or click-treat pairing). The other stimulus (S2) was occasionally not followed by a US, and therefore not a 1:1 pairing. They then examined the conditioned reinforcing effects of each stimulus on lever pressing. The stimulus that was occasionally presented by itself, (S2, or the non-1:1 pairing) did not become an effective reinforcer, while the other stimulus (S1, or the 1:1 pairing) did. Although this study was conducted to examine the ability of a stimulus to provide information about the delivery of primary reinforcement (its ability to function as a ‘marker’), its point, along with the other presented data, should still be taken. Anything less than a 1:1 click-treat relation will produce a weaker CR than the direct 1:1 relation, at best.
Beyond the Data
Now that we’ve examined the data-driven story, what about the arguments against using the 1:1 click-treat pairing? Here are a few comments, concerns, and arguments often leveled at the idea.
What about VRs?
As Skinner himself demonstrated, variable ratio schedules of reinforcement (VR) are highly effective, and often more effective than fixed ratio schedules (FR), (Skinner, 1938). However, clicking each response but not delivering food after each click is NOT a VR schedule. You are still presenting a reinforcer for each occurrence of the response, regardless of whether it’s a CR or primary one. What you are doing, however, is weakening the CR’ s reinforcing effects, since you are now not pairing each click with food. This is still a continuous schedule of reinforcement (CRF). The only way this becomes a VR schedule is if the clicks were no longer functioning as reinforcers, at which point the clicks would then be meaningless. To effectively intermittently reinforce responses on a VR schedule or other, you would need to allow more than one response to occur, then click, and finally “treat” after each click.
The Occasional Click or Treat
The respondent conditioning processes described above are not only relevant to the initial pairings of the clicks with “treats”; they are an ongoing process. Although an extensive history of click-treat pairings will strengthen the CR effects of your clicks, this in no way renders your clicker invincible. At any point during your training, extinction of a response based on a lack of click-treat pairings is a threat. Each click that is not followed with a “treat” undergoes this process, regardless. It may be minimal, but you’ve still weakened your bridge.
Too Much Food!
I’ve heard a few trainers comment that it’s just not possible to give as many “treats” as they do clicks. The argument is simple; “I can’t give the animal THAT much food!” Fortunately, the answer is just as simple, “Then don’t.” I use “treats” in quotes for a reason. Treats do not necessarily mean food. They simply refer to any stimulus that functions as a reinforcer for an organism, whether a primary or secondary one. This can include pets, hugs, ice cubes, play time, ball chases, escape from “work”, etc. Pairing a number of reinforcers with your bridge is not only an option, it’s ideal. While a click paired with one reinforcer is simply a conditioned reinforcer, a click paired with numerous reinforcers now becomes a generalized conditioned reinforcer. Generalized conditioned reinforcers are more resistant to both satiation and extinction. The only thing to remember is that any item paired with the clicks should in fact be a reinforcer in and of itself.
Chain Them Up
Many trainers use chained schedules of reinforcement. Some trainers also insist that they are simply using the bridge as a ‘marker’ for some responses and a conditioned reinforcer for others. For example, you click without “treating” for a dog running through a tunnel (1st response), use the same clicking sound without a “treat” for running up a ladder (2nd response), and finally use the same click again with a “treat” for heeling on a stand (3rd and terminal response). However, you’re still clicking without treating, and the animals you’re training may not be as unforgiving as you expect.
There is an important distinction between a chain schedule of reinforcement where different stimuli are used as discriminative stimuli SD/CRs for each response, and a tandem schedule where the same SD/CR is used. Although tandem schedules are often used to explain light stimuli that do not change, using the same click for different responses in a chain without treating is generally the functional equivalent.
The distinction may seem complicated, but the previous example illustrates the difference. In the original instance, where the dog received the same click with or without food for each response, the trainer is using a tandem schedule. However, if the trainer were to use a different marker or bridge for the two previous responses, and only used a bridge followed by a “treat” for the terminal response, he/she would be using a chain schedule.
For those who insist on using markers, the solution is again simple. Use different stimuli as markers. By doing so, you’ve more accurately established a chain schedule. The marking stimuli do not have to be paired directly with a reinforcer, and you can now save your click followed by the treat for the terminal response. However, whether using markers is more or less beneficial than not using trainer-specified SD’s/CR’s for non-terminal responses is yet to be proven. The termination of a response, (e.g., the dog making it to the end of the tunnel) will function as a CR for that response, as well a SD for the next response, without additional bridges or markers. No specific conclusions can be made about the benefits or lack thereof of using trainer-specified markers until such empirical applied research has been conducted.
Ivory Tower Blues
A possible argument against any empirical support for using a 1:1 click-treat pairing method is that it’s strictly laboratory-based. Such arguments have been leveled against scientific communities on occasion, especially those within the behavioral sciences. Any such data are claimed to be too “basic”, and therefore not relevant to applied fields.
A valid aspect of this argument is that for animal training, the applied arena is drastically different than the lab. The animals we work with are distantly related to rats and pigeons, the behaviors we train are drastically different than lever presses and key pecks, and I have yet to see an animal trainer train in a setting that even slightly resembles a Skinner box. Our applied arena is different, and one that needs its own applied research. Animal training can lead to discovery too, and new phenomena not covered by the basic research are bound to occur.
However, little can be justifiably argued from this stance on this particular issue. Click-treat pairings are directly based upon concepts discovered from these same or similar basic laboratories. Also, all data that I know of to date is in support of the 1:1 click-treat-pairing method. Although further applied research on this issue would be beneficial (as is generally the case in any science), there is no evidence at present that I am aware of to support the concept of a non-1:1 bridge pairing being as effective as a 1:1 pairing.
Which vehicle would you prefer for a drive in dangerous conditions? Photos courtesy of Wikimedia Commons.
Even with all this said, many have stated that a weaker bridge is still “good enough”. One does not necessarily need the world’s most potent reinforcer for a specific response to effectively train. Even with a considerably weaker bridge, one might get the desired target responses they want.
Still, this is a dangerous place for any trainer to base decisions upon, let alone any endeavor you choose to engage in. Imagine yourself forced to drive in the mountains in a blizzard. Now imagine being given the choice of driving a 4×4 Chevy or a Pinto. The Pinto might be “good enough” to get you up or down the mountain, but the Chevy is probably a much safer bet! You shouldn’t treat your clicking much differently especially considering how comparably equal the two methods are in terms of time, effort, and money.
Training Beyond Hallows Eve
The choices we have for how to train may seem endless, but a few questions beg simple solutions. I believe this area is just that: one with a simple solution. Allow me to simplify this for you if the point hasn’t hit home yet; anything less than a 1:1 click-treat pairing will weaken your bridge. It’s that simple.
Although science creates no hard and fast rules, Pavlov, Skinner, and their colleagues have stood the test of time on this topic for nearly a century now. When it comes to pairing a “treat” with a bridge, always follow the bridge as immediately as possible with a “treat”. Eventually, our own applied research will bridge the gaps between basic research and applied phenomena. Until that time, we should let the basic research guide our behavior, and keep an eye out for areas that might demand such future research.
We all know that training methods are, for the most part, based on simple behavioral principles. We also know that regardless of that fact, training can become infinitely complicated. Therefore, if you need no other excuse, stick with the simple plan. Parsimony is next to godliness in the sciences, and little else gets simpler than “do x after y“, a.k.a. 1:1.
Egger, M.D., and Miller, N.E. (1962). Secondary reinforcement in rats as a function of information value and reliability of the stimulus. Journal of Experimental Psychology, 64, 97-104.
Fantino, E. (1977). Conditioned reinforcement: Choice and information. In W.K. Honig & J.E.R. Staddon (Eds.), Handbook of operant behavior (pp. 313-339). Englewood Cliffs, NJ: Prentice-Hall
Pavlov, I.P. (1928). Lectures on conditioned reflexes. New York: International Publishers.
Rescorla, R.A., and Wagner, A.R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A.H. Black & W.F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp.64-69). New York: Appleton-Century-Crofts.
Skinner, B.F. (1938). The behavior of organisms. New York: Appleton-Century-Crofts.
Many thanks to Eduardo Fernandez for letting me republish this important article. You can see more of Dr. Fernandez’ work on his ResearchGate page. He also runs a FaceBook group: Animal Reinforcement Forum.
Copyright 2001 Eduardo Fernandez
Gorilla photo credit Tomáš Petřík via Wikimedia Commons.
Ford Pinto photo credit dave_7 via Wikimedia Commons.
Chevrolet Colorado 4×4 photo credit Kobak via Wikimedia Commons.
Clicker photo copyright Eileen Anderson.