How Do You Know if a Stimulus Is Reinforcing

Affiliate 8. Learning

viii.ii Changing Behaviour through Reinforcement and Penalisation: Operant Workout

Learning Objectives

  1. Outline the principles of operant conditioning.
  2. Explicate how learning can be shaped through the employ of reinforcement schedules and secondary reinforcers.

In classical conditioning the organism learns to acquaintance new stimuli with natural biological responses such as salivation or fear. The organism does not learn something new but rather begins to perform an existing behaviour in the presence of a new signal. Operant conditioning, on the other mitt, is learning that occurs based on the consequences of behaviour and tin can involve the learning of new actions. Operant workout occurs when a dog rolls over on command because it has been praised for doing so in the past, when a schoolroom bully threatens his classmates because doing and then allows him to get his way, and when a child gets good grades considering her parents threaten to punish her if she doesn't. In operant conditioning the organism learns from the consequences of its own deportment.

How Reinforcement and Punishment Influence Behaviour: The Inquiry of Thorndike and Skinner

Psychologist Edward L. Thorndike (1874-1949) was the first scientist to systematically report operant workout. In his research Thorndike (1898) observed cats who had been placed in a "puzzle box" from which they tried to escape ("Video Clip: Thorndike'southward Puzzle Box"). At first the cats scratched, scrap, and swatted haphazardly, without any idea of how to get out. But eventually, and accidentally, they pressed the lever that opened the door and exited to their prize, a scrap of fish. The next fourth dimension the cat was constrained within the box, it attempted fewer of the ineffective responses before carrying out the successful escape, and afterwards several trials the cat learned to almost immediately brand the correct response.

Observing these changes in the cats' behaviour led Thorndike to develop his law of event, the principle that responses that create a typically pleasant effect in a particular situation are more than likely to occur again in a similar state of affairs, whereas responses that produce a typically unpleasant outcome are less probable to occur again in the state of affairs (Thorndike, 1911). The essence of the law of effect is that successful responses, because they are pleasurable, are "stamped in" by experience and thus occur more than oftentimes. Unsuccessful responses, which produce unpleasant experiences, are "stamped out" and subsequently occur less oftentimes.

When Thorndike placed his cats in a puzzle box, he found that they learned to engage in the important escape behaviour faster afterward each trial. Thorndike described the learning that follows reinforcement in terms of the law of effect.

"" Watch: "Thorndike'south Puzzle Box" [YouTube]: http://www.youtube.com/sentry?5=BDujDOLre-8

The influential behavioural psychologist B. F. Skinner (1904-1990) expanded on Thorndike'southward ideas to develop a more complete fix of principles to explain operant conditioning. Skinner created specially designed environments known as operant chambers (usually called Skinner boxes) to systematically study learning. A Skinner box (operant chamber) is a structure that is big enough to fit a rodent or bird and that contains a bar or key that the organism tin can press or peck to release food or water. It also contains a device to record the animal's responses (Figure viii.5).

The about bones of Skinner'due south experiments was quite like to Thorndike'south research with cats. A rat placed in the chamber reacted as one might expect, scurrying about the box and sniffing and clawing at the floor and walls. Somewhen the rat chanced upon a lever, which it pressed to release pellets of food. The next time around, the rat took a fiddling less time to press the lever, and on successive trials, the time it took to printing the lever became shorter and shorter. Soon the rat was pressing the lever as fast every bit it could eat the food that appeared. As predicted by the constabulary of upshot, the rat had learned to repeat the action that brought about the nutrient and stop the actions that did not.

Skinner studied, in particular, how animals inverse their behaviour through reinforcement and punishment, and he developed terms that explained the processes of operant learning (Tabular array 8.1, "How Positive and Negative Reinforcement and Penalization Influence Behaviour"). Skinner used the term reinforcerto refer to any result that strengthens or increases the likelihood of a behaviour, and the term punisher to refer to any event that weakens or decreases the likelihood of a behaviour. And he used the terms positive and negative to refer to whether a reinforcement was presented or removed, respectively. Thus, positive reinforcement strengthens a response by presenting something pleasant after the response, and negative reinforcement strengthens a response past reducing or removing something unpleasant. For instance, giving a kid praise for completing his homework represents positive reinforcement, whereas taking Aspirin to reduce the pain of a headache represents negative reinforcement. In both cases, the reinforcement makes it more than likely that behaviour will occur once again in the future.

""
Effigy eight.5 Skinner Box. B. F. Skinner used a Skinner box to study operant learning. The box contains a bar or key that the organism can printing to receive food and water, and a device that records the organism's responses.
Table viii.1 How Positive and Negative Reinforcement and Punishment Influence Behaviour.
[Skip Tabular array]
Operant conditioning term Description Event Example
Positive reinforcement Add together or increment a pleasant stimulus Behaviour is strengthened Giving a pupil a prize afterward he or she gets an A on a exam
Negative reinforcement Reduce or remove an unpleasant stimulus Behaviour is strengthened Taking painkillers that eliminate hurting increases the likelihood that you volition take painkillers once again
Positive penalisation Present or add an unpleasant stimulus Behaviour is weakened Giving a student extra homework after he or she misbehaves in class
Negative punishment Reduce or remove a pleasant stimulus Behaviour is weakened Taking away a teen's computer afterwards he or she misses curfew

Reinforcement, either positive or negative, works past increasing the likelihood of a behaviour. Punishment, on the other manus, refers to any effect that weakens or reduces the likelihood of a behaviour. Positive penaltyweakens a response by presenting something unpleasant afterward the response, whereas negative punishmentweakens a response by reducing or removing something pleasant. A child who is grounded after fighting with a sibling (positive punishment) or who loses out on the opportunity to go to recess afterward getting a poor grade (negative punishment) is less likely to echo these behaviours.

Although the distinction betwixt reinforcement (which increases behaviour) and punishment (which decreases it) is usually clear, in some cases information technology is difficult to determine whether a reinforcer is positive or negative. On a hot day a absurd cakewalk could be seen every bit a positive reinforcer (because it brings in absurd air) or a negative reinforcer (because it removes hot air). In other cases, reinforcement can be both positive and negative. One may fume a cigarette both because it brings pleasure (positive reinforcement) and because it eliminates the craving for nicotine (negative reinforcement).

It is also important to notation that reinforcement and punishment are not simply opposites. The use of positive reinforcement in changing behaviour is virtually always more effective than using punishment. This is considering positive reinforcement makes the person or animal feel better, helping create a positive relationship with the person providing the reinforcement. Types of positive reinforcement that are effective in everyday life include verbal praise or approving, the awarding of status or prestige, and direct financial payment. Punishment, on the other hand, is more than likely to create only temporary changes in behaviour because it is based on coercion and typically creates a negative and adversarial human relationship with the person providing the reinforcement. When the person who provides the penalty leaves the state of affairs, the unwanted behaviour is probable to return.

Creating Complex Behaviours through Operant Conditioning

Mayhap you call up watching a motion picture or being at a show in which an animate being — maybe a canis familiaris, a horse, or a dolphin — did some pretty amazing things. The trainer gave a command and the dolphin swam to the bottom of the pool, picked upward a ring on its nose, jumped out of the h2o through a hoop in the air, dived again to the bottom of the pool, picked up another band, and then took both of the rings to the trainer at the border of the pool. The creature was trained to do the trick, and the principles of operant workout were used to train information technology. But these circuitous behaviours are a far cry from the simple stimulus-response relationships that nosotros take considered thus far. How can reinforcement be used to create circuitous behaviours such as these?

I way to expand the apply of operant learning is to modify the schedule on which the reinforcement is practical. To this signal we have only discussed a continuous reinforcement schedule, in which the desired response is reinforced every time it occurs; whenever the dog rolls over, for example, it gets a biscuit. Continuous reinforcement results in relatively fast learning but likewise rapid extinction of the desired behaviour once the reinforcer disappears. The problem is that because the organism is used to receiving the reinforcement after every behaviour, the responder may requite up quickly when it doesn't appear.

Most real-world reinforcers are not continuous; they occur on a partial (or intermittent) reinforcement schedule a schedule in which the responses are sometimes reinforced and sometimes not. In comparison to continuous reinforcement, partial reinforcement schedules lead to slower initial learning, but they also pb to greater resistance to extinction. Because the reinforcement does not appear after every behaviour, information technology takes longer for the learner to determine that the reward is no longer coming, and thus extinction is slower. The 4 types of partial reinforcement schedules are summarized in Tabular array 8.2, "Reinforcement Schedules."

Table 8.2 Reinforcement Schedules.
[Skip Tabular array]
Reinforcement schedule Explanation Existent-globe instance
Fixed-ratio Behaviour is reinforced after a specific number of responses. Factory workers who are paid according to the number of products they produce
Variable-ratio Behaviour is reinforced after an average, but unpredictable, number of responses. Payoffs from slot machines and other games of risk
Stock-still-interval Behaviour is reinforced for the first response after a specific corporeality of time has passed. People who earn a monthly salary
Variable-interval Behaviour is reinforced for the first response after an average, but unpredictable, amount of time has passed. Person who checks electronic mail for messages

Partial reinforcement schedules are adamant by whether the reinforcement is presented on the basis of the time that elapses between reinforcement (interval) or on the basis of the number of responses that the organism engages in (ratio), and past whether the reinforcement occurs on a regular (fixed) or unpredictable (variable) schedule. In a stock-still-interval schedule, reinforcement occurs for the first response made after a specific amount of time has passed. For instance, on a 1-minute fixed-interval schedule the beast receives a reinforcement every minute, assuming information technology engages in the behaviour at to the lowest degree in one case during the minute. Equally you lot can encounter in Figure 8.6, "Examples of Response Patterns past Animals Trained under Unlike Fractional Reinforcement Schedules," animals under fixed-interval schedules tend to ho-hum down their responding immediately later the reinforcement but then increase the behaviour once again as the time of the side by side reinforcement gets closer. (Most students report for exams the same way.) In a variable-interval schedule, the reinforcers appear on an interval schedule, but the timing is varied around the boilerplate interval, making the bodily advent of the reinforcer unpredictable. An example might be checking your email: yous are reinforced past receiving messages that come up, on average, say, every 30 minutes, but the reinforcement occurs just at random times. Interval reinforcement schedules tend to produce tedious and steady rates of responding.

""
Figure 8.6 Examples of Response Patterns past Animals Trained under Different Fractional Reinforcement Schedules. Schedules based on the number of responses (ratio types) induce greater response rate than do schedules based on elapsed time (interval types). Likewise, unpredictable schedules (variable types) produce stronger responses than do predictable schedules (fixed types).

In a fixed-ratio schedule, a behaviour is reinforced after a specific number of responses. For instance, a rat'south behaviour may be reinforced after it has pressed a key 20 times, or a salesperson may receive a bonus after he or she has sold 10 products. Every bit you can run across in Figure viii.6, "Examples of Response Patterns by Animals Trained under Unlike Fractional Reinforcement Schedules," one time the organism has learned to act in accordance with the fixed-ratio schedule, it will interruption only briefly when reinforcement occurs before returning to a high level of responsiveness. A variable-ratio scheduleprovides reinforcers after a specific merely average number of responses. Winning money from slot machines or on a lottery ticket is an example of reinforcement that occurs on a variable-ratio schedule. For example, a slot machine (see Effigy 8.vii, "Slot Machine") may be programmed to provide a win every twenty times the user pulls the handle, on boilerplate. Ratio schedules tend to produce loftier rates of responding considering reinforcement increases as the number of responses increases.

""
Effigy viii.7 Slot Machine. Slot machines are examples of a variable-ratio reinforcement schedule.

Complex behaviours are also created through shaping, the process of guiding an organism's behaviour to the desired effect through the apply of successive approximation to a terminal desired behaviour. Skinner made extensive use of this procedure in his boxes. For instance, he could train a rat to press a bar two times to receive food, by first providing food when the animal moved near the bar. When that behaviour had been learned, Skinner would begin to provide food but when the rat touched the bar. Farther shaping express the reinforcement to only when the rat pressed the bar, to when it pressed the bar and touched it a 2d time, and finally to simply when it pressed the bar twice. Although it tin can take a long time, in this mode operant workout tin can create chains of behaviours that are reinforced simply when they are completed.

Reinforcing animals if they correctly discriminate betwixt similar stimuli allows scientists to test the animals' power to learn, and the discriminations that they can make are sometimes remarkable. Pigeons accept been trained to distinguish between images of Charlie Brown and the other Peanuts characters (Cerella, 1980), and betwixt dissimilar styles of music and art (Porter & Neuringer, 1984; Watanabe, Sakamoto & Wakita, 1995).

Behaviours can also exist trained through the utilise of secondary reinforcers. Whereas a primary reinforcer includes stimuli that are naturally preferred or enjoyed by the organism, such as food, water, and relief from pain, a secondary reinforcer (sometimes called conditioned reinforcer) is a neutral outcome that has become associated with a primary reinforcer through classical conditioning. An case of a secondary reinforcer would be the whistle given by an animal trainer, which has been associated over time with the primary reinforcer, nutrient. An example of an everyday secondary reinforcer is coin. Nosotros enjoy having coin, non so much for the stimulus itself, only rather for the primary reinforcers (the things that money can purchase) with which it is associated.

Key Takeaways

  • Edward Thorndike developed the law of effect: the principle that responses that create a typically pleasant outcome in a particular situation are more than probable to occur again in a similar situation, whereas responses that produce a typically unpleasant outcome are less probable to occur again in the situation.
  • B. F. Skinner expanded on Thorndike's ideas to develop a set of principles to explicate operant conditioning.
  • Positive reinforcement strengthens a response by presenting something that is typically pleasant after the response, whereas negative reinforcement strengthens a response by reducing or removing something that is typically unpleasant.
  • Positive punishment weakens a response by presenting something typically unpleasant later the response, whereas negative penalization weakens a response by reducing or removing something that is typically pleasant.
  • Reinforcement may be either partial or continuous. Partial reinforcement schedules are determined past whether the reinforcement is presented on the basis of the time that elapses between reinforcements (interval) or on the ground of the number of responses that the organism engages in (ratio), and by whether the reinforcement occurs on a regular (fixed) or unpredictable (variable) schedule.
  • Circuitous behaviours may be created through shaping, the process of guiding an organism's behaviour to the desired issue through the use of successive approximation to a final desired behaviour.

Exercises and Critical Thinking

  1. Give an example from daily life of each of the following: positive reinforcement, negative reinforcement, positive penalization, negative punishment.
  2. Consider the reinforcement techniques that you might use to train a canis familiaris to grab and retrieve a Frisbee that you throw to it.
  3. Picket the following ii videos from electric current television set shows. Can you lot determine which learning procedures are being demonstrated?
    1. The Office: http://www.break.com/usercontent/2009/11/the-office-altoid- experiment-1499823
    2. The Big Bang Theory [YouTube]: http://world wide web.youtube.com/sentinel?5=JA96Fba-WHk

References

Cerella, J. (1980). The dove's analysis of pictures.Design Recognition, 12, i–6.

Kassin, South. (2003). Essentials of psychology. Upper Saddle River, NJ: Prentice Hall. Retrieved from Essentials of Psychology Prentice Hall Companion Website: http://wps.prenhall.com/hss_kassin_essentials_1/15/3933/1006917.cw/index.html

Porter, D., & Neuringer, A. (1984). Music discriminations by pigeons.Journal of Experimental Psychology: Animal Behavior Processes, 10(two), 138–148.

Thorndike, East. L. (1898).Animal intelligence: An experimental study of the associative processes in animals. Washington, DC: American Psychological Association.

Thorndike, Due east. Fifty. (1911).Animal intelligence: Experimental studies. New York, NY: Macmillan. Retrieved from http://www.archive.org/details/animalintelligen00thor

Watanabe, Due south., Sakamoto, J., & Wakita, M. (1995). Pigeons' discrimination of painting by Monet and Picasso.Journal of the Experimental Analysis of Behaviour, 63(2), 165–174.

Image Attributions

Figure eight.5: "Skinner box" (http://en.wikipedia.org/wiki/File:Skinner_box_photo_02.jpg) is licensed under the CC By SA 3.0 license (http://creativecommons.org/licenses/by-sa/3.0/deed.en). "Skinner box scheme" by Andreas1 (http://en.wikipedia.org/wiki/File:Skinner_box_scheme_01.png) is licensed nether the CC BY SA 3.0 license (http://creativecommons.org/licenses/by-sa/3.0/human action.en)

Figure 8.6: Adapted from Kassin (2003).

Effigy 8.seven:  "Slot Machines in the Hard Rock Casino" by Ted Murpy (http://commons.wikimedia.org/wiki/File:HardRockCasinoSlotMachines.jpg) is licensed under CC BY 2.0. (http://creativecommons.org/licenses/past/2.0/deed.en).

rogersongolind.blogspot.com

Source: https://opentextbc.ca/introductiontopsychology/chapter/7-2-changing-behavior-through-reinforcement-and-punishment-operant-conditioning/

0 Response to "How Do You Know if a Stimulus Is Reinforcing"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel