1:1 Pairings: The Science Behind Clicking and Treating

A guest post by Eduardo Fernandez,  first published in 2001 in the now out-of-print American Animal Trainer Magazine as “Click or Treat: A Trick or Two in the Zoo.”

Photo courtesy of Wikimedia Commons.

A recent discussion on an Association of Zoos and Aquariums listserv, (specifically their ‘training’ list) caught my eye and my key­strokes, and one that has apparently be­come a commonplace discussion among many bridge trainers. The discussion emerged as a simple inquiry by another list member on whether it was appropriate to use a bridge without being followed by a “treat”, (whether food or some other backup reinforcer). I quickly answered that anything less than a 1:1 pairing would weaken the reinforcing value of the bridge, and put the subject to rest. But a strange thing hap­pened. As I continued to read the posts on this listserv, many other list members took the exact opposite stance: that it was ok to ‘click’ and not treat, and that such ‘click or treating’ may even strengthen the bridge. Astounded by the ensuing discussions and arguments, I decided to gather up the data and attempt a thorough review of what was the appropriate way to go about this busi­ness of clicks with or without a treat. The following is the result.

Operants and Respondents: Behavior’s Double Helix

Important to understanding any behavioral process, especially those entailed in bridge training methods, are the dual roles that both respondent and operant conditioning methods share. Many behaviorists recognize the importance of both pro­cesses on behavior, and many more recog­nize the practical impossibility of stating any set of responses as only respondent or operant behavior. Our training methods, whether one uses a bridge or not, are no different. For the sake of this article, however, I’ll focus on the use of a bridge, (specifically a clicker, although any bridge could fit the equation), and the dual processes involved.

As those of us who use bridges know, one must first pair the sound of the clicker with some reinforcer for it to function as a condi­tioned reinforcer, (a process referred to as “magazine training” in the laboratory). This is best understood, however, through the pro­cess of respondent condi­tioning. Just as Pavlov conditioned the tone of a bell with food to elicit a condi­tioned response, so do we initially pair the sound of a clicker as a conditioned stimulus (CS) with some unconditioned stimulus (US), generally food.

The continual pairings during our training programs between the sound of the clicker and food should also be understood through the process of respondent condi­tioning. Even though we are now also using the clicks as conditioned reinforcers (CR’s), the respondent conditioning process is still at work.

Pavlov’s work reveals two crucial discoveries relevant to bridge training: the temporal distance be­tween the presentation of a US and the CS and the occurrence or nonoccurrence of a US following a CS, (Pavlov, 1928). Pavlov found that the distance between the presentation of the US following a CS important. The further in time the two were presented, the weaker the effects of the CS. Also, Pavlov found that each presentation of a CS without the following US weakened the effects of the CS.

Extinction and Ineffective CR’S

Later researchers also examined the importance of the CS-US pairings and their temporal distance, as well as the conditioned reinforcer effects based on such pair­ings. The Rescorla-Wagner model (Rescorla & Wagner, 1972) gives us such an extinction curve, where one can graphically demonstrate the weakening of a CS over time when not paired with a US.

Other researchers examined the reinforcing effects of a CR based on its previous pairings with a primary reinforcer/US. Again, the results were as Pavlov had demonstrated almost a century ago, the longer the temporal delay between the CS and US, the weaker the CS’ s effects were (Fantino, 1977). But what of the actual rein­forcing effects of a CS? Does a weak CS necessarily mean a weak CR? Egger and Miller (1962) examined this aspect itself. They conditioned rats by pairing two different stimuli (SI and S2) with food on two different sched­ules. One stimulus (S1) was always followed by a US (a 1:1 CS-US or click-treat pairing). The other stimulus (S2) was occasionally not followed by a US, and there­fore not a 1:1 pairing. They then examined the condi­tioned reinforcing effects of each stimulus on lever pressing. The stimulus that was occasionally presented by itself, (S2, or the non-1:1 pairing) did not become an effective reinforcer, while the other stimulus (S1, or the 1:1 pairing) did. Although this study was conducted to examine the ability of a stimulus to provide informa­tion about the delivery of primary reinforcement (its ability to function as a ‘marker’), its point, along with the other presented data, should still be taken. Anything less than a 1:1 click-treat relation will produce a weaker CR than the direct 1:1 relation, at best.

Beyond the Data

Now that we’ve examined the data-driven story, what about the argu­ments against using the 1:1 click-treat pairing? Here are a few comments, concerns, and arguments often leveled at the idea.

What about VRs?

As Skinner himself demonstrated, variable ratio schedules of reinforcement (VR) are highly effective, and often more effective than fixed ratio schedules (FR), (Skinner, 1938). How­ever, clicking each response but not delivering food after each click is NOT a VR schedule. You are still pre­senting a reinforcer for each occurrence of the response, regardless of whether it’s a CR or primary one. What you are doing, however, is weak­ening the CR’ s reinforcing effects, since you are now not pairing each click with food. This is still a continu­ous schedule of reinforce­ment (CRF). The only way this becomes a VR schedule is if the clicks were no longer functioning as reinforcers, at which point the clicks would then be meaningless. To effectively intermittently reinforce responses on a VR schedule or other, you would need to allow more than one response to occur, then click, and finally “treat” after each click.

The Occasional Click or Treat

The respondent conditioning processes described above are not only relevant to the initial pairings of the clicks with “treats”; they are an ongoing process. Al­though an extensive history of click-treat pairings will strengthen the CR effects of your clicks, this in no way renders your clicker invin­cible. At any point during your training, extinction of a response based on a lack of click-treat pairings is a threat. Each click that is not followed with a “treat” undergoes this process, regardless. It may be mini­mal, but you’ve still weak­ened your bridge.

Too Much Food!

I’ve heard a few trainers comment that it’s just not possible to give as many “treats” as they do clicks. The argument is simple; “I can’t give the animal THAT much food!” Fortunately, the answer is just as simple, “Then don’t.” I use “treats” in quotes for a reason. Treats do not neces­sarily mean food. They simply refer to any stimulus that functions as a reinforcer for an organism, whether a primary or secondary one. This can include pets, hugs, ice cubes, play time, ball chases, escape from “work”, etc. Pairing a number of reinforcers with your bridge is not only an option, it’s ideal. While a click paired with one reinforcer is simply a conditioned reinforcer, a click paired with numerous reinforcers now becomes a generalized conditioned reinforcer. Generalized conditioned reinforcers are more resistant to both satiation and extinction. The only thing to remember is that any item paired with the clicks should in fact be a reinforcer in and of itself.

Chain Them Up

Many trainers use chained schedules of rein­forcement. Some trainers also insist that they are simply using the bridge as a ‘marker’ for some responses and a conditioned reinforcer for others. For example, you click without “treating” for a dog running through a tunnel (1st response), use the same clicking sound without a “treat” for running up a ladder (2nd response), and finally use the same click again with a “treat” for heeling on a stand (3rd and terminal response). However, you’re still clicking without treating, and the animals you’re training may not be as unforgiving as you ex­pect.

There is an important distinction between a chain schedule of reinforcement where different stimuli are used as discriminative stimuli SD/CRs for each response, and a tandem schedule where the same SD/CR is used. Although tandem schedules are often used to explain light stimuli that do not change, using the same click for different responses in a chain without treating is generally the functional equivalent.

The distinction may seem complicated, but the previous example illustrates the difference. In the original instance, where the dog received the same click with or without food for each response, the trainer is using a tandem schedule. How­ever, if the trainer were to use a different marker or bridge for the two previous responses, and only used a bridge followed by a “treat” for the terminal response, he/she would be using a chain schedule.

For those who insist on using markers, the solu­tion is again simple. Use different stimuli as markers. By doing so, you’ve more accurately established a chain schedule. The marking stimuli do not have to be paired directly with a rein­forcer, and you can now save your click followed by the treat for the terminal re­sponse. However, whether using markers is more or less beneficial than not using trainer-specified SD’s/CR’s for non-terminal responses is yet to be proven. The termination of a response, (e.g., the dog making it to the end of the tunnel) will function as a CR for that response, as well a SD for the next response, without additional bridges or mark­ers. No specific conclusions can be made about the benefits or lack thereof of using trainer-specified mark­ers until such empirical applied research has been conducted.

Ivory Tower Blues

A possible argument against any empirical sup­port for using a 1:1 click-treat pairing method is that it’s strictly laboratory-based. Such arguments have been leveled against scientific communities on occasion, especially those within the behavioral sciences. Any such data are claimed to be too “basic”, and therefore not relevant to applied fields.

A valid aspect of this argument is that for animal training, the applied arena is drastically different than the lab. The ani­mals we work with are distantly related to rats and pigeons, the behaviors we train are drastically different than lever presses and key pecks, and I have yet to see an animal trainer train in a setting that even slightly resembles a Skinner box. Our applied arena is different, and one that needs its own applied research. Animal training can lead to discovery too, and new phenomena not covered by the basic research are bound to occur.

However, little can be justifiably argued from this stance on this particular issue. Click-treat pairings are directly based upon concepts discovered from these same or similar basic laboratories. Also, all data that I know of to date is in support of the 1:1 click-treat-pairing method. Although further applied research on this issue would be beneficial (as is generally the case in any science), there is no evidence at present that I am aware of to support the concept of a non-1:1 bridge pairing being as effective as a 1:1 pairing.

Good Enough

Which vehicle would you prefer for a drive in dangerous conditions?  Photos courtesy of Wikimedia Commons.

Even with all this said, many have stated that a weaker bridge is still “good enough”. One does not necessarily need the world’s most potent reinforcer for a specific response to effectively train. Even with a considerably weaker bridge, one might get the desired target responses they want.

Still, this is a dangerous place for any trainer to base decisions upon, let alone any endeavor you choose to engage in. Imagine yourself forced to drive in the mountains in a blizzard. Now imagine being given the choice of driving a 4×4 Chevy or a Pinto. The Pinto might be “good enough” to get you up or down the mountain, but the Chevy is prob­ably a much safer bet! You shouldn’t treat your clicking much differently especially considering how comparably equal the two methods are in terms of time, effort, and money.

Training Beyond Hallows Eve

The choices we have for how to train may seem endless, but a few questions beg simple solutions. I believe this area is just that: one with a simple solution. Allow me to simplify this for you if the point hasn’t hit home yet; anything less than a 1:1 click-treat pairing will weaken your bridge. It’s that simple.

Although science creates no hard and fast rules, Pavlov, Skinner, and their colleagues have stood the test of time on this topic for nearly a century now. When it comes to pairing a “treat” with a bridge, always follow the bridge as immediately as possible with a “treat”. Eventually, our own applied research will bridge the gaps be­tween basic research and applied phenom­ena. Until that time, we should let the basic research guide our behavior, and keep an eye out for areas that might demand such future research.

We all know that training methods are, for the most part, based on simple behavioral principles. We also know that regardless of that fact, training can become infinitely complicated. Therefore, if you need no other excuse, stick with the simple plan. Parsimony is next to godliness in the sci­ences, and little else gets simpler than “do x after y“, a.k.a. 1:1.

References

Egger, M.D., and Miller, N.E. (1962). Second­ary reinforcement in rats as a function of information value and reliability of the stimulus. Journal of Experimental Psychol­ogy, 64, 97-104.
Fantino, E. (1977). Conditioned reinforce­ment: Choice and information. In W.K. Honig & J.E.R. Staddon (Eds.), Handbook of oper­ant behavior (pp. 313-339). Englewood Cliffs, NJ: Prentice-Hall
Pavlov, I.P. (1928). Lectures on conditioned reflexes. New York: International Publishers.
Rescorla, R.A., and Wagner, A.R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A.H. Black & W.F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp.64-69). New York: Appleton-Century-Crofts.
Skinner, B.F. (1938). The behavior of organ­isms. New York: Appleton-Century-Crofts.


Many thanks to Eduardo Fernandez for letting me republish this important article. You can see more of Dr. Fernandez’ work on his ResearchGate page. He also runs a FaceBook group: Animal Reinforcement Forum.

Copyright 2001 Eduardo Fernandez

Photo Credits

Gorilla photo credit Tomáš Petřík via Wikimedia Commons.

Ford Pinto photo credit dave_7 via Wikimedia Commons.

Chevrolet Colorado 4×4 photo credit Kobak via Wikimedia Commons.

Clicker photo copyright Eileen Anderson.

 

Share Button
This entry was posted in Behavior analysis, Clicker, Guest post and tagged , , , , . Bookmark the permalink.

3 Responses to 1:1 Pairings: The Science Behind Clicking and Treating

  1. Pingback: 1:1 Pairings: The Science Behind Clicking and Treating — eileenanddogs – The Two Of Us

  2. How does this translate to when you are training more than one dog? How do they know its not “their” click.

    • Eileen Anderson says:

      It depends on the situation. I’ve been told that in whole classes full of dogs being clicker trained, the dogs quickly learn “their” click. Since they recognize their own click, the 1:1 pairing is maintained. When I train two dogs at home, I have never used a clicker, but I do use a verbal marker sometimes. It’s usually a dog being active and a dog stationing, so my placement and orientation to them is probably part of the bridging stimulus.

      It’s a great question, though, and hopefully a pro can chime in.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.