Effects of Response Frequency Constraints on Learning in a Non-Stationary Multi-armed Bandit Task
نام عام مواد
[Article]
نام نخستين پديدآور
Young, Michael; Racey, Deborah E.
یادداشتهای مربوط به خلاصه یا چکیده
متن يادداشت
An n-armed bandit task was used to investigate the trade-off between exploratory (choosing lesser-known options) and exploitive (choosing options with the greatest known probability of reinforcement) human choice in a trial-and-error learning problem. A different probability of reinforcement was assigned to each of eight response options using random-ratios (RRs), and participants chose by clicking buttons in a circular display on a computer screen using a computer mouse. To differentially increase exploration, relative frequency thresholds were randomly assigned to each participant and acted as task constraints limiting the proportion of total responses that could be attributed to any response option. The potential benefit of increased exploration in non-stationary environments was investigated by changing payoff probabilities so that the leanest options became the richest or the richest options became the leanest. On the average, forcing participants to explore at moderate to high levels always resulted in their earning less reinforcement, even when the payoffs changed. This outcome may be due to humans' natural level of exploration in our task being sufficiently high to create sensitivity to environmental dynamics.
مجموعه
تاريخ نشر
2014
عنوان
International Journal of Comparative Psychology
شماره جلد
27/2
نام شخص به منزله سر شناسه - (مسئولیت معنوی درجه اول )