عنوان

Reinforcement learning :

پدید آورنده

Richard S. Sutton and Andrew G. Barto.

موضوع

Reinforcement learning.,Machine Learning.,Reinforcement, Psychology.,54.72 artificial intelligence.,Reinforcement learning.

رده

Q325
.
6
.
R45

2018

کتابخانه

مرکز و کتابخانه مطالعات اسلامی به زبان‌های اروپایی

محل استقرار

استان: قم ـ شهر: قم

تماس با کتابخانه : 32910706-025

شابک

0262039249

شابک

9780262039246

عنوان و نام پديدآور

عنوان اصلي

Reinforcement learning :

نام عام مواد

[Book]

ساير اطلاعات عنواني

an introduction /

نام نخستين پديدآور

Richard S. Sutton and Andrew G. Barto.

وضعیت ویراست

وضعيت ويراست

Second edition.

وضعیت نشر و پخش و غیره

محل نشرو پخش و غیره

London, England :

نام ناشر، پخش کننده و غيره

The MIT Press,

تاریخ نشرو بخش و غیره

[2018]

تاریخ نشرو بخش و غیره

مشخصات ظاهری

نام خاص و کميت اثر

xxii, 526 pages :

ساير جزييات

illustrations (some color) ;

ابعاد

24 cm

فروست

عنوان فروست

Adaptive computation and machine learning

یادداشتهای مربوط به کتابنامه ، واژه نامه و نمایه های داخل اثر

متن يادداشت

Includes bibliographical references and index.

یادداشتهای مربوط به مندرجات

متن يادداشت

Machine generated contents note: 1. Introduction -- 1.1. Reinforcement Learning -- 1.2. Examples -- 1.3. Elements of Reinforcement Learning -- 1.4. Limitations and Scope -- 1.5. An Extended Example: Tic-Tac-Toe -- 1.6. Summary -- 1.7. Early History of Reinforcement Learning -- 2. Multi-armed Bandits -- 2.1.A k-armed Bandit Problem -- 2.2. Action-value Methods -- 2.3. The 10-armed Testbed -- 2.4. Incremental Implementation -- 2.5. Tracking a Nonstationary Problem -- 2.6. Optimistic Initial Values -- 2.7. Upper-Confidence-Bound Action Selection -- 2.8. Gradient Bandit Algorithms -- 2.9. Associative Search (Contextual Bandits) -- 2.10. Summary -- 3. Finite Markov Decision Processes -- 3.1. The Agent-Environment Interface -- 3.2. Goals and Rewards -- 3.3. Returns and Episodes -- 3.4. Unified Notation for Episodic and Continuing Tasks -- 3.5. Policies and Value Functions -- 3.6. Optimal Policies and Optimal Value Functions -- 3.7. Optimality and Approximation -- 3.8. Summary -- 4. Dynamic Programming

متن يادداشت

Note continued: 11.3. The Deadly Triad -- 11.4. Linear Value-function Geometry -- 11.5. Gradient Descent in the Bellman Error -- 11.6. The Bellman Error is Not Learnable -- 11.7. Gradient-TD Methods -- 11.8. Emphatic-TD Methods -- 11.9. Reducing Variance -- 11.10. Summary -- 12. Eligibility Traces -- 12.1. The A-return -- 12.2. TD(A) -- 12.3.n-step Truncated A-return Methods -- 12.4. Redoing Updates: Online A-return Algorithm -- 12.5. True Online TD(A) -- 12.6.*Dutch Traces in Monte Carlo Learning -- 12.7. Sarsa(A) -- 12.8. Variable A and ry -- 12.9. Off-policy Traces with Control Variates -- 12.10. Watkins's Q(A) to Tree-Backup(A) -- 12.11. Stable Off-policy Methods with Traces -- 12.12. Implementation Issues -- 12.13. Conclusions -- 13. Policy Gradient Methods -- 13.1. Policy Approximation and its Advantages -- 13.2. The Policy Gradient Theorem -- 13.3. REINFORCE: Monte Carlo Policy Gradient -- 13.4. REINFORCE with Baseline -- 13.5. Actor-Critic Methods

متن يادداشت

Note continued: 13.6. Policy Gradient for Continuing Problems -- 13.7. Policy Parameterization for Continuous Actions -- 13.8. Summary -- 14. Psychology -- 14.1. Prediction and Control -- 14.2. Classical Conditioning -- 14.2.1. Blocking and Higher-order Conditioning -- 14.2.2. The Rescorla-Wagner Model -- 14.2.3. The TD Model -- 14.2.4. TD Model Simulations -- 14.3. Instrumental Conditioning -- 14.4. Delayed Reinforcement -- 14.5. Cognitive Maps -- 14.6. Habitual and Goal-directed Behavior -- 14.7. Summary -- 15. Neuroscience -- 15.1. Neuroscience Basics -- 15.2. Reward Signals, Reinforcement Signals, Values, and Prediction Errors -- 15.3. The Reward Prediction Error Hypothesis -- 15.4. Dopamine -- 15.5. Experimental Support for the Reward Prediction Error Hypothesis -- 15.6. TD Error/Dopamine Correspondence -- 15.7. Neural Actor-Critic -- 15.8. Actor and Critic Learning Rules -- 15.9. Hedonistic Neurons -- 15.10. Collective Reinforcement Learning -- 15.11. Model-based Methods in the Brain

متن يادداشت

Note continued: 15.12. Addiction -- 15.13. Summary -- 16. Applications and Case Studies -- 16.1. TD-Gammon -- 16.2. Samuel's Checkers Player -- 16.3. Watson's Daily-Double Wagering -- 16.4. Optimizing Memory Control -- 16.5. Human-level Video Game Play -- 16.6. Mastering the Game of Go -- 16.6.1. AlphaGo -- 16.6.2. AlphaGo Zero -- 16.7. Personalized Web Services -- 16.8. Thermal Soaring -- 17. Frontiers -- 17.1. General Value Functions and Auxiliary Tasks -- 17.2. Temporal Abstraction via Options -- 17.3. Observations and State -- 17.4. Designing Reward Signals -- 17.5. Remaining Issues -- 17.6. Experimental Support for the Reward Prediction Error Hypothesis.

متن يادداشت

Note continued: 4.1. Policy Evaluation (Prediction) -- 4.2. Policy Improvement -- 4.3. Policy Iteration -- 4.4. Value Iteration -- 4.5. Asynchronous Dynamic Programming -- 4.6. Generalized Policy Iteration -- 4.7. Efficiency of Dynamic Programming -- 4.8. Summary -- 5. Monte Carlo Methods -- 5.1. Monte Carlo Prediction -- 5.2. Monte Carlo Estimation of Action Values -- 5.3. Monte Carlo Control -- 5.4. Monte Carlo Control without Exploring Starts -- 5.5. Off-policy Prediction via Importance Sampling -- 5.6. Incremental Implementation -- 5.7. Off-policy Monte Carlo Control -- 5.8.*Discounting-aware Importance Sampling -- 5.9.*Per-decision Importance Sampling -- 5.10. Summary -- 6. Temporal-Difference Learning -- 6.1. TD Prediction -- 6.2. Advantages of TD Prediction Methods -- 6.3. Optimality of TD(0) -- 6.4. Sarsa: On-policy TD Control -- 6.5.Q-learning: Off-policy TD Control -- 6.6. Expected Sarsa -- 6.7. Maximization Bias and Double Learning

متن يادداشت

Note continued: 6.8. Games, Afterstates, and Other Special Cases -- 6.9. Summary -- 7.n-step Bootstrapping -- 7.1.n-step TD Prediction -- 7.2.n-step Sarsa -- 7.3.n-step Off-policy Learning -- 7.4.*Per-decision Methods with Control Variates -- 7.5. Off-policy Learning Without Importance Sampling: The n-step Tree Backup Algorithm -- 7.6.*A Unifying Algorithm: n-step Q(u) -- 7.7. Summary -- 8. Planning and Learning with Tabular Methods -- 8.1. Models and Planning -- 8.2. Dyna: Integrated Planning, Acting, and Learning -- 8.3. When the Model Is Wrong -- 8.4. Prioritized Sweeping -- 8.5. Expected vs. Sample Updates -- 8.6. Trajectory Sampling -- 8.7. Real-time Dynamic Programming -- 8.8. Planning at Decision Time -- 8.9. Heuristic Search -- 8.10. Rollout Algorithms -- 8.11. Monte Carlo Tree Search -- 8.12. Summary of the Chapter -- 8.13. Summary of Part I: Dimensions -- 9. On-policy Prediction with Approximation -- 9.1. Value-function Approximation -- 9.2. The Prediction Objective (VE)

متن يادداشت

Note continued: 9.3. Stochastic-gradient and Semi-gradient Methods -- 9.4. Linear Methods -- 9.5. Feature Construction for Linear Methods -- 9.5.1. Polynomials -- 9.5.2. Fourier Basis -- 9.5.3. Coarse Coding -- 9.5.4. Tile Coding -- 9.5.5. Radial Basis Functions -- 9.6. Selecting Step-Size Parameters Manually -- 9.7. Nonlinear Function Approximation: Artificial Neural Networks -- 9.8. Least-Squares TD -- 9.9. Memory-based Function Approximation -- 9.10. Kernel-based Function Approximation -- 9.11. Looking Deeper at On-policy Learning: Interest and Emphasis -- 9.12. Summary -- 10. On-policy Control with Approximation -- 10.1. Episodic Semi-gradient Control -- 10.2. Semi-gradient n-step Sarsa -- 10.3. Average Reward: A New Problem Setting for Continuing Tasks -- 10.4. Deprecating the Discounted Setting -- 10.5. Differential Semi-gradient n-step Sarsa -- 10.6. Summary -- 11.*Off-policy Methods with Approximation -- 11.1. Semi-gradient Methods -- 11.2. Examples of Off-policy Divergence

بدون عنوان

یادداشتهای مربوط به خلاصه یا چکیده

متن يادداشت

"Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives while interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the field's key ideas and algorithms."--

موضوع (اسم عام یاعبارت اسمی عام)

موضوع مستند نشده

Reinforcement learning.

موضوع مستند نشده

Machine Learning.

موضوع مستند نشده

Reinforcement, Psychology.

موضوع مستند نشده

54.72 artificial intelligence.

موضوع مستند نشده

Reinforcement learning.

رده بندی ديویی

شماره

006

3/1

ويراست

رده بندی کنگره

شماره رده

Q325

شماره رده

Q325

نشانه اثر

R45

2018

نشانه اثر

S88

2018

نام شخص به منزله سر شناسه - (مسئولیت معنوی درجه اول )

مستند نام اشخاص تاييد نشده

Sutton, Richard S.

نام شخص - (مسئولیت معنوی برابر )

مستند نام اشخاص تاييد نشده

Barto, Andrew G.

مبدا اصلی

تاريخ عمليات

20200822104751.0

قواعد فهرست نويسي ( بخش توصيفي )

rda

دسترسی و محل الکترونیکی

نام الکترونيکي

اطلاعات رکورد کتابشناسی

نوع ماده

[Book]

اطلاعات دسترسی رکورد

تكميل شده

عنوان Reinforcement learning :

پدید آورنده Richard S. Sutton and Andrew G. Barto.

موضوع Reinforcement learning.,Machine Learning.,Reinforcement, Psychology.,54.72 artificial intelligence.,Reinforcement learning.

رده Q325.6 .R45 2018

کتابخانه مرکز و کتابخانه مطالعات اسلامی به زبان‌های اروپایی

محل استقرار استان: قم ـ شهر: قم

شابک

عنوان و نام پديدآور

وضعیت ویراست

وضعیت نشر و پخش و غیره

مشخصات ظاهری

فروست

یادداشتهای مربوط به کتابنامه ، واژه نامه و نمایه های داخل اثر

یادداشتهای مربوط به مندرجات

یادداشتهای مربوط به خلاصه یا چکیده

موضوع (اسم عام یاعبارت اسمی عام)

رده بندی ديویی

رده بندی کنگره

نام شخص به منزله سر شناسه - (مسئولیت معنوی درجه اول )

نام شخص - (مسئولیت معنوی برابر )

مبدا اصلی

دسترسی و محل الکترونیکی

اطلاعات رکورد کتابشناسی

اطلاعات دسترسی رکورد

عنوان

Reinforcement learning :

پدید آورنده

Richard S. Sutton and Andrew G. Barto.

موضوع

Reinforcement learning.,Machine Learning.,Reinforcement, Psychology.,54.72 artificial intelligence.,Reinforcement learning.

رده

Q325
.
6
.
R45

2018

کتابخانه

مرکز و کتابخانه مطالعات اسلامی به زبان‌های اروپایی

محل استقرار

استان: قم ـ شهر: قم