Positive reinforcement occurs whenever an appetitive stimulus makes a behaviour more likely to be repeated (Mongillo et al., 2014); for example, giving the animal food each time they enter a particular area may positively reinforce the behaviour of entering that area. The internal mechanism of positive reinforcement involves complex interactions of neurotransmitters (dopamine, serotonin and opioid) and receptors (Sudakov, 2019). Although positive reinforcement is often considered from an operant perspective, the animal also learns, through classical conditioning, to associate the operant behaviour with the reinforcer, thereby creating a positive conditioned emotional response (Panksepp, 2011). Affective state (positive and negative valence) is believed to be one of the earliest forms of subjective experience in vertebrate evolution and may be adaptive by informing animals of how they are currently fairing in regards to survival (Panksepp, 2011).
The efficacy of a positive reinforcer depends on motivating operations (Pfaller-Sadovsky et al., 2019), for example, deprivation or satiation. This may be an adaptation which increases fitness by providing a positive emotional response (PER) to the action of seeking required resources (for example, food) and then reducing PER when the objective is achieved. However, deprivation of basic needs may raise ethical issues and contravene the Animal Welfare Act 2006 if animals are deprived of food to the degree that their welfare is affected. Sudakov (2019) discusses how dopamine rises and peeks during the seeking (or opportunity) of reinforcers. The emotional response, which may be needed to create an operant response (Panksepp, 2011), therefore occurs prior to reinforcement of the operant behaviour.
Cognitive control is an important factor in positive reinforcement because it facilitates discrimination between important (for survival) and unimportant stimuli (Alexander and Brown, 2010). Functional imaging has been used to identify that the anterior cingulate region, of the prefrontal cortex, plays a major role in responding to conflicting stimuli and decision making (Newman et al., 2015). In wild animals cognitive decision making may increase fitness; however, in the domestic environment, it facilitates the use of positive reinforcement training, for example, reinforcement of an alternative behaviour, to overcome behavioural issues (Vicars et al., 2014). However, different species and individuals within species have varying preferences, as may be evidenced through preference testing (Clay et al., 2009). The existence of alternative, preferred, reinforcers in the presence of standard reinforcers may therefore diminish the value of the standard reinforcer, preventing a PER. For example, giving a piece of kibble to a border collie for an alternative behaviour to crouching in the presence of sheep, may not act as a reinforcer (and may even become aversive) due to the dog’s strong (and innate) preference for crouching. For this reason, it is important to be aware of environmental influences during behaviour modification procedures and to understand that preferences are subject to change depending on current needs (Franks, 2019).
The term poisoned cue was introduced by Pryor (2002) and refers to cues which are associated with appetitive and aversive stimuli simultaneously. For example, giving a dog a piece of food and patting them on the head each time they respond appropriately to the cue word may poison the cue word. The dog may not be able to form (or maintain) a PER to the reinforcer (the food) because they have a negative emotional response to the aversive (the pat on the head). This is a clear example of how cues may be poisoned without any intention to use aversive training methods; many owners do not realise that dogs usually do not enjoy head pats (Nichols et al., 2012). Furthermore, the poisoned cue effect may extend further than the actual cue for a behaviour and become associated with the trainer. Sankey et al. (2010) found that horses (Equus caballus) trained using mildly aversive methods (riding crop waved in front of their head) retained a negative emotion response to particular and unknown humans five months later. Of particular noteworthiness is the short duration time needed to create this enduring emotional effect; training consisted of five sessions lasting only one to five minutes. Horses trained using only positive reinforcement had no signs of negative emotional responses.
The use of stronger aversives, for example, yanking, shouting and hitting, are likely to cause greater conflicts of emotion because they are linked to more severe side effects; for example, aggression, escape and learned helplessness (Fernandes et al., 2017). China et al. (2020) studied the efficacy of e-collar training on dogs and found that professional trainers who used a mix of positive punishment (via e-collar) and positive reinforcement (via food treats) were less successful than trainers who used positive reinforcement without any form of punishment. This may be surprising considering that dogs in the e-collar group had the opportunity to avoid punishment (the shock) and receive reinforcement via food (increasing motivation). However, it may be that dogs in the e-collar group were suffering negative valance (due to fear of being shocked) and were therefore less able to form a positive conditioned emotional response to the available treats. This postulation concurs with the general agreement in psychology that negative experiences are more powerful than positive experiences because they are often more critical to survival (Baumeister et al., 2001).
Although aversive training methods can be effective, literature reviews by Ziv (2017) and Fernandes et al. (2017) found no suggestion that they are more effective than non-aversive methods. Considering the lack of any improvement in efficacy, the unintended consequences of poisoned cues, behavioural side affects and negative valance, it is seemingly unnecessary and unethical to knowingly introduce aversive stimuli into animal training or behaviour modification.
Jackpots are reinforcers of a considerably higher value than the animal’s usual reinforcer and come as a surprise to the animal; although they are often used and recommended by trainers, there is no scientific definition for the term ‘jackpot’, nor have the behavioural effects been widely studied (Kuroda et al., 2020; Lattal, 2020). In the influential book, Don’t Shoot the Dog, Pryor (1984) described jackpots as reinforcers which are considerably larger than the animal’s usual reinforcer and delivered as a surprise event. However, she used the example of giving two fish to a dolphin for ‘nothing’. It is therefore unclear how the jackpots were contingent on behaviour. Ramirez (2017) discusses the use of jackpots for particularly well executed behaviour, but seemingly contradicts this by suggesting giving a jackpot at the end of the day, if all criteria have been met for that day. And Fisher (2009) recommends giving jackpots at the start of training. These anomalies by high profile trainers may contribute to misunderstandings and inconsistencies amongst animal trainers in the application of jackpots.
For a jackpot to be considered effective, it would need to result in greater behaviour effect than the standard reinforcement (Kuroda et al., 2020). Muir (2010) found no such effect; in fact, the jackpot was found to disrupt responding rates and training flow. Weatherly et al. (2004) found that although response rates may increase as reinforcer magnitude increases, they also decrease as the reinforcer value decreases. Furthermore, not receiving an expected reinforcer may lead to frustration (Jacovcevic et al., 2013), for example, during the process of extinction. The process of reducing reinforcement value from jackpot level to normal level therefore introduces the risk of disruption (of training flow) and frustration. This is further illustrated by the phenomena of negative incentive contrast, whereby animals may respond to a reduction in comparative value (of the reinforcer) by refusing it, slowing down, vocalising or performing displacement behaviour (Brosnan and de Waal, 2012). This concurs with the findings of Lee et al. (2007), that reduction in reinforcement schedules influences variability of behaviour rather than making it more likely.
Animals are usually willing to make greater physical effort or number of responses to gain greater reinforcers (Walton et al., 2006). However, because jackpots are usually a once per session surprise event (Lattal, 2020) it may be impossible for the animal to predict which operant behaviour will trigger the jackpot and, therefore, may be of little value as positive reinforcement. And considering that emotional response may be reliant on anticipated reinforcers, jackpots may fail to produce any beneficial conditioning.
Instinctive drift occurs when an operant behaviour drifts towards an instinctive behaviour (Burgos, 2015). Breland and Breland (1961) discuss multiple instances of animals drifting from trained behaviours towards instinctive behaviours. For example, pigs selecting to root rather than completing the positively reinforced behaviour (placing discs into a piggybank) they had been trained to do, and chickens pecking at objects rather than completing the operantly conditioned behaviour. Breland and Breland (1961) discuss how instinctive drift causes a breakdown of learning theory as it applies to positive reinforcement. Similar findings have been discussed by Bitterman (1975) and Boakes et al. (1978); however, considering that each example of instinctive drift that Breland and Breland (1961) encountered involved food acquisition, the phenomena may be an example of how classical and operant conditioning are intrinsically linked; just as dogs may be conditioned to drool at the sound of a bell (Domjan, 2005), the association between a behavioural cue and food may trigger innate, food related, species specific, behaviours; for example, rooting and pecking (Burgos, 2015). The innate responses are unlikely to be followed by reinforcement, because they are not the required (by the trainer) behaviours; therefore, the cue may condition a negative emotional response (due to frustration) rather than a positive emotional response. The phenomena of instinctive drift demonstrates the need to provide captive animals with outlets to express normal behaviour patterns as stipulated in the five freedoms of the Farm Animal Welfare Council (2012) and raises ethical concerns where this is not provided.
I’m sure there are many more examples you could think of, but the point is, we shouldn’t assume that a reinforcer is always a reinforcer.
References available on request.
5 thoughts on “When is a reinforcer not a reinforcer?”
Thankyou for this 🙂
Thank you so much for this. So interesting. Is it possible that instinctive drift is caused by the animal finding a natural behaviour more intrinsically reinforcing than the conditioned operant behaviour, rather than possibly being triggered by the use of food as a positive reinforcer? The natural behaviour would then end up being reinforced by the trainer stopping the training reinforcement as there isn’t a trained behaviour to reinforce. That would then make the rooting/ pecking more likely to happen again.
Hi Carol, yes indeed and I have given a similar arguement for it in the past. I think maybe the real difference though is the intrinsic value of the rooting compared with a trained behaviour and also we may be triggering food related model action patterns, or similar. I think both arguements hold water and are interesting to consider.
Please could I have a list of the references
Hi Sue, please use the contact form to give me an email address and I’ll send it.