عنوان

یادگیری با کارآموزی از طریق یادگیری تقویتی معکوس و کاربرد آن در بازی‌های رایانه‌ای,‮‭Apprenticeship Learning via Inversed Reinforcement Learning and its Application to Computer Games‬

پدید آورنده

/الهه ادیبی

موضوع

رده

کتابخانه

University of Tabriz Library, Documentation and Publication Center

محل استقرار

استان: East Azarbaijan ـ شهر: Tabriz

تماس با کتابخانه : 04133294120-04133294118

NATIONAL BIBLIOGRAPHY NUMBER

Number

‭۲۳۳۰۹پ‬

LANGUAGE OF THE ITEM

.Language of Text, Soundtrack etc

per

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper

یادگیری با کارآموزی از طریق یادگیری تقویتی معکوس و کاربرد آن در بازی‌های رایانه‌ای

Parallel Title Proper

‮‭Apprenticeship Learning via Inversed Reinforcement Learning and its Application to Computer Games‬

First Statement of Responsibility

/الهه ادیبی

.PUBLICATION, DISTRIBUTION, ETC

Name of Publisher, Distributor, etc.

: پردیس

Date of Publication, Distribution, etc.

، ‮‭۱۳۹۴‬

Name of Manufacturer

، راشدی

PHYSICAL DESCRIPTION

Specific Material Designation and Extent of Item

‮‭۹۷‬ص‬

NOTES PERTAINING TO PUBLICATION, DISTRIBUTION, ETC.

Text of Note

چاپی - الکترونیکی

DISSERTATION (THESIS) NOTE

Dissertation or thesis details and type of degree

کارشناسی ارشد

Discipline of degree

مهندسی کامپیوتر گرایش نرم‌افزار

Date of degree

‮‭۱۳۹۴/۱۱/۲۶‬

Body granting the degree

تبریز

SUMMARY OR ABSTRACT

Text of Note

یادگیری تقویتی یکی از مهم‌صترین روش‌صهای یادگیری است که در حل بسیاری از مسائل یادگیری کاربرد دارد .تاکنون از این روش یادگیری به منظور حل مسائل مختلفی به صورت موفقیت‌صآمیز استفاده شده است .به عنوان چند مثال در این زمینه می‌صتوان به کاربرد یادگیری تقویتی در انجام بازی شطرنج و تخته نرد، طراحی هلیکوپتر بدون خلبان، کنترل ترافیک و ... اشاره نمود .در این نوع از مسائل معمولا عاملی وجود دارد که با محیط خود تعامل دارد به این معنا که می‌صتواند اطلاعاتی را از محیط دریافت و سپس بر اساس اطلاعات دریافتی عملی را در محیط انجام دهد که این عمل می‌صتواند بر روی حالت محیط تأثیر بگذارد .در نتیجه عامل با انجام دنباله‌صای از عملیات سعی می‌صکند محیط را به سمت حالت‌صهای مطلوب سوق دهد به طوری که به هدف یا اهداف خود در محیط دست پیدا کند .در این گونه مسائل هدف عامل معمولا با تعریف یک تابع پاداش مشخص می‌صشود .تابع پاداش یکی از ارکان اصلی سیستم‌صهای یادگیری تقویتی است و بدون وجود چنین تابعی معمولا یادگیری غیرممکن است .مسائل بسیاری مانند رانندگی در یک بزرگراه وجود دارند که تعریف یک تابع پاداش مناسب برای آن‌صها به صورت دستی بسیار دشوار و زمان‌صبر و در برخی موارد غیرممکن است .به همین دلیل امکان استفاده از روش‌صهای یادگیری تقویتی در این مسائل وجود ندارد .ما در این پایان‌صنامه به دنبال روشی هستیم که به وسیله‌صی آن عامل بتواند به صورت خودکار تابع پاداش مورد نیاز خود را در محیط یاد بگیرد .عامل برای این منظور سعی می‌صکند رفتار یک فرد خبره را تقلید کند .ارائه‌صی چنین روشی ما را قادر می‌صسازد بتوانیم از یادگیری تقویتی برای حل مسائلی استفاده کنیم که قبلا به دلیل عدم وجود یک تابع پاداش مناسب قادر به حل آن‌صها نبودیم .همچنین قصد داریم در این پایان‌صنامه به منظور ارزیابی نتایج از محیط بازی پکمن استفاده کنیم .اگرچه ممکن است این بازی در مقایسه با بازی‌صهایی همانند شطرنج و تخته نرد بسیار ساده به نظر برسد، اما محیط این بازی دربرگیرنده‌صی چالش‌صهایی است که باعث شده‌صاند عملکرد کامپیوترها در انجام این بازی بسیار ضعیف باشد .برای ارزیابی جامع، الگوریتم پیشنهادی با چندین الگوریام دیگر مانند الگوریتم مینیماکس، هرس آلفا-بتا و یادگیری تقویتی از نطر امتیاز کسب شده و زمان برای هر عمل عامل، مورد مقایسه قرار گرفته است

Text of Note

Reinforced learning is one of the most important methods that is used frequently in solving many learning problem. This method is used successfully in order to solve many problems. For example, use of reinforcement learning in chess, backgammon games and designing helicopters without pilot, traffic control etc. In these problems, usually there is an agent interacting with its environment, i.e. it can receive information from the environment and take actions according to received information that can influence the condition of the environment. Consequently, the agent performs some actions in order to direct the environment towards favorite states, to achieve the desired goal defined in that environment. Usually, in such poblems the desired goal is defined based on a recognized reward function. The reward function is one of the main principles of reinforcement learning systems and without this power, learning is usually impossible. There are many problems such as driving in highways, that it is impossible and time consuming to define a bonus power for them manually. In this dissertation, we are looking for methods that enable the agent to automatically learn the appropriate reward function in the environment for the underlying task. Here, the agent tries to imitate an expert behavior to learn the required reward function. Also, we assume that the rewarding function can be represented as a linear combination of a set of features. The proposed method enables us to use the reinforcement learning to solve problems that we are unable to solve them because of the lack of the reward function. Moreover, we attempt to use Pacman game environment for evaluation of the results. Although this game seems to be simpler than other games such as chess or backgammon, but this game includes challenges that have caused the computers to be weak for this type of game. For inclusive evaluation, the suggested learning algorithm including Perceptron and MIRA, is compared with other traditional algorithm such as Minimax algorithm, Alpha-Beta pruning and more recent techniques like reinforcement learning. The results confirms our claim stating that using apprenticeship learning can be very promising to solve Pacman and other similar games

PARALLEL TITLE PROPER

Parallel Title

‮‭Apprenticeship Learning via Inversed Reinforcement Learning and its Application to Computer Games‬

PERSONAL NAME - PRIMARY RESPONSIBILITY

ادیبی، الهه

Adibi, Elaheh

ELECTRONIC LOCATION AND ACCESS

Public note

سیاه و سفید

نمایه‌سازی قبلی

عنوان یادگیری با کارآموزی از طریق یادگیری تقویتی معکوس و کاربرد آن در بازی‌های رایانه‌ای,‮‭Apprenticeship Learning via Inversed Reinforcement Learning and its Application to Computer Games‬

پدید آورنده /الهه ادیبی

موضوع

رده

کتابخانه University of Tabriz Library, Documentation and Publication Center

محل استقرار استان: East Azarbaijan ـ شهر: Tabriz

NATIONAL BIBLIOGRAPHY NUMBER

LANGUAGE OF THE ITEM

TITLE AND STATEMENT OF RESPONSIBILITY

.PUBLICATION, DISTRIBUTION, ETC

PHYSICAL DESCRIPTION

NOTES PERTAINING TO PUBLICATION, DISTRIBUTION, ETC.

DISSERTATION (THESIS) NOTE

SUMMARY OR ABSTRACT

PARALLEL TITLE PROPER

PERSONAL NAME - PRIMARY RESPONSIBILITY

ELECTRONIC LOCATION AND ACCESS

عنوان

یادگیری با کارآموزی از طریق یادگیری تقویتی معکوس و کاربرد آن در بازی‌های رایانه‌ای,‮‭Apprenticeship Learning via Inversed Reinforcement Learning and its Application to Computer Games‬

پدید آورنده

/الهه ادیبی

کتابخانه

University of Tabriz Library, Documentation and Publication Center

محل استقرار

استان: East Azarbaijan ـ شهر: Tabriz