عنوان

استفاده از یادگیری تقویتی عمیق در آموزش سریع فرمان کنترلی یکپارچه ویژو موتور

پدید آورنده

سید رضا افضلی,‏افضلی،‏

موضوع

رده

کتابخانه

University of Tabriz Library, Documentation and Publication Center

محل استقرار

استان: East Azarbaijan ـ شهر: Tabriz

تماس با کتابخانه : 04133294120-04133294118

NATIONAL BIBLIOGRAPHY NUMBER

Number

پ۲۶۴۳۴

LANGUAGE OF THE ITEM

.Language of Text, Soundtrack etc

per

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper

استفاده از یادگیری تقویتی عمیق در آموزش سریع فرمان کنترلی یکپارچه ویژو موتور

First Statement of Responsibility

سید رضا افضلی

.PUBLICATION, DISTRIBUTION, ETC

Name of Publisher, Distributor, etc.

مهندسی مکانیک

Date of Publication, Distribution, etc.

۱۴۰۰

PHYSICAL DESCRIPTION

Specific Material Designation and Extent of Item

۶۰ص.

Accompanying Material

سی دی

DISSERTATION (THESIS) NOTE

Dissertation or thesis details and type of degree

کارشناسی ارشد

Discipline of degree

مهندسی مکانیک- مکاترونیک

Date of degree

۱۴۰۰/۱۱/۲۳

SUMMARY OR ABSTRACT

Text of Note

استفاده از الگوریتمDeep Deterministic Policy Gradient (DDPG) بدون تابع خط¬مشی، در ارائه کنترل پیوسته برای بازوهای ربات دستاوردهای چشمگیری را نشان داده است. اهمیت استفاده از این مدل، آنست که توانایی یادگیری هر عمل از پیش ناشناخته را در محیط حقیقی بصورت خودکار دارا می¬باشد.اما از آنجاییکه این الگوریتم¬ فضای کنترلی را پیوسته در نظر می¬گیرد، بهینه¬سازی عملکرد آنها از نظر کاهش مدت زمان آزمایش عامل برای یافتن هدف و همچنین پایداری در همگرایی مدل در یافتن هدف، یک چالش در طراحی و بهینه¬سازی آنها می¬باشد. پیاده¬سازی این الگوریتم¬ بدون تابع خط¬مشی در محیط¬های پیوسته برای آموزش بازوی ربات که دارای هفت مفصل و هر مفصل شش درجه آزادی می¬باشد، تبدیل به فرآیندی زمان¬بر می¬نماید که عملا استفاده از این الگوریتم¬ها را ناکارآمد می-کند. به منظور کارآمد نمودن استفاده از مدل یادگیری تقویتی DDPG در این تحقیق روشی ارائه شده است که براساس آن تعداد دفعات آزمایش برای آموزش نقطه هدف توسط شبکه عصبی DDPG به تعداد چشمگیری کاهش یافته است. فرآیند پیاده¬سازی در این پژوهش تغییر ساختار پیاده¬سازی شبکه DDPG به همراه استفاده از حافظه بافر و مقادیر بهینه پیشین شبکه¬های Actor و Critic در مدل DDPG می¬باشد. نتایج بدست آمده نشان¬دهنده بهبود چشمگیری در مدت زمان آموزش و همچنین پایداری مدل برای آزمایش¬های صورت گرفته می¬باشد.

Text of Note

Abstract:Application of the Deep Deterministic Policy Gradient (DDPG) algorithm without the predetermined policy function has shown significant achievements in providing continuous control for robot arm manipulators. The distinction of this approach in arm manipulator task learning is its ability to learn any previously unknown action in the real environment unstructured automatically. These algorithms required less manual data engineering for finding the control function in unforeseen conditions. However, since this algorithm tries to find control function in continuous state-action space, reducing the training time and the trial and error process by optimizing sample resolution from state-action space, as well as making the guarantee of the algorithm stability in terms of the convergence, is a challenging process and issues. Implementing this algorithm without the policy function in with continuous control over the state-action space to train the control model for robot arm manipulator, with seven joints and each joint with six degrees of freedom, becomes a time-consuming process that makes this algorithm inefficient and inapplicable. In order to reduce the trial and error process and optimize the performance of the DDPG reinforcement learning model in this research, a method has been proposed according to which the number of experiments for training the control function for arm manipulator has been significantly reduced. The proposed solution is based on spiking the DDPG model during the training process by keeping the advantage of the buffer memory and applying the skilled knowledge that has gained during the previous agent state-action space interactions. The results show a noticeable improvement in training time as well as model stability for the experiments performed.

OTHER VARIANT TITLES

Variant Title

Intelligence End-to-End Visumotor Fast Policy Training by application of Deep Reinforcement Learning

PERSONAL NAME - PRIMARY RESPONSIBILITY

Entry Element

‏افضلی،‏

Part of Name Other than Entry Element

سید رضا

Relator Code

تهيه کننده

PERSONAL NAME - SECONDARY RESPONSIBILITY

Entry Element

‏ شعاران،

Entry Element

کریمیان خسروشاهی،

Part of Name Other than Entry Element

مریم

Part of Name Other than Entry Element

‏ قادر

Dates

استاد راهنما

Dates

استاد مشاور

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

Entry Element

‏ تبریز

عنوان استفاده از یادگیری تقویتی عمیق در آموزش سریع فرمان کنترلی یکپارچه ویژو موتور

پدید آورنده سید رضا افضلی,‏افضلی،‏

موضوع

رده

کتابخانه University of Tabriz Library, Documentation and Publication Center

محل استقرار استان: East Azarbaijan ـ شهر: Tabriz

NATIONAL BIBLIOGRAPHY NUMBER

LANGUAGE OF THE ITEM

TITLE AND STATEMENT OF RESPONSIBILITY

.PUBLICATION, DISTRIBUTION, ETC

PHYSICAL DESCRIPTION

DISSERTATION (THESIS) NOTE

SUMMARY OR ABSTRACT

OTHER VARIANT TITLES

PERSONAL NAME - PRIMARY RESPONSIBILITY

PERSONAL NAME - SECONDARY RESPONSIBILITY

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

عنوان

استفاده از یادگیری تقویتی عمیق در آموزش سریع فرمان کنترلی یکپارچه ویژو موتور

پدید آورنده

سید رضا افضلی,‏افضلی،‏

کتابخانه

University of Tabriz Library, Documentation and Publication Center

محل استقرار

استان: East Azarbaijan ـ شهر: Tabriz