عنوان

یک راهکار ترکیبی برای خوشهبندی مستندات متنی با استفاده از الگوریتمهای کاهش بعد

پدید آورنده

/ منیژه رئیسی دهکردی, رئیسی دهکردی،

موضوع

Text documents clustering, Weighting features, Euclidean and non-Euclidean similarity measures

رده

کتابخانه

Central library and document university of Kurdistan

محل استقرار

استان: Kurdistan ـ شهر: Sanandaj

تماس با کتابخانه : 9-08733624006 و 08733664600

RIS Bibtex ISO

NATIONAL BIBLIOGRAPHY NUMBER

Number

۲۵۰۷پ

LANGUAGE OF THE ITEM

.Language of Text, Soundtrack etc

فارسی

Language of Original Work

فارسی

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper

یک راهکار ترکیبی برای خوشهبندی مستندات متنی با استفاده از الگوریتمهای کاهش بعد

General Material Designation

[پایان نامه]

First Statement of Responsibility

/ منیژه رئیسی دهکردی

.PUBLICATION, DISTRIBUTION, ETC

Place of Publication, Distribution, etc.

سنندج

Name of Publisher, Distributor, etc.

: دانشگاه کردستان، دانشکده مهندسی

Date of Publication, Distribution, etc.

، ۱۳۹۵

PHYSICAL DESCRIPTION

Specific Material Designation and Extent of Item

ز، ۱۳۰ص

Other Physical Details

: مصور، جدول

Accompanying Material

+ لوح فشرده

GENERAL NOTES

Text of Note

چکیده فارسی - انگلیسی

INTERNAL BIBLIOGRAPHIES/INDEXES NOTE

Text of Note

کتابنامه: ص. ۱۰۵-۱۰۸

CONTENTS NOTE

Text of Note

پیوست: واژه نامه

DISSERTATION (THESIS) NOTE

Dissertation or thesis details and type of degree

کارشناسی ارشد

Discipline of degree

هوش مصنوعي و رباتيكز

Body granting the degree

کردستان

Text preceding or following the note

۲۰

SUMMARY OR ABSTRACT

Text of Note

با رشد روزافزون مستندات متنی، انتخاب اطلاعات مطلوب در زمان محدود کار دشواری است. با استفاده از ابزارهایی نظیر خوشه‌بندی، می‌توان این حجم انبوه اطلاعات را مدیریت نمود. خوشه‌بندی فرآیندی است که در آن مجموعه‌ای از نمونه داده‌ها به گروه‌های مجزایی از خوشه‌ها تقسیم می‌شوند. به طوری‌که، نمونه‌های یک خوشه تا حد امکان به یکدیگر شبیه بوده و با نمونه‌های دیگر خوشه‌ها، متفاوت ‌باشند. خوشه‌بندی در زمینه‌های بسیاری از جمله شناسایی الگو، یادگیری ماشین، داده‌کاوی و بازیابی اطلاعات کاربرد دارد. مسئله خوشه‌بندی دارای چالش¬های مختلفی می¬باشد. ابعاد بالا و متفاوت بودن اهمیت ویژگیها از جمله مشکلات مهم مسئله خوشه‌بندی هستند.در این پایان‌نامه، چهار روش جدید خوشه‌بندی برای مستندات متنی ارائه شده است که در این روش‌ها به منظور انتخاب زیرمجموعه‌ی موثری از ویژگی‌ها، از روش کاهش بعد پراکندگی‌داده استفاده می‌شود. در روش پیشنهادی اول، یک تابع هدف جدید مبتنی بر خوشه‌بندی فازی به همراه آنتروپی وزن ویژگی‌ها ارائه شده است. وزن‌دهی در این روش به صورت سراسری است. از جمله مزیت‌های این روش می‌توان به بروزرسانی وزن ویژگی‌ها در طی فرآیند خوشه‌بندی و مقابله با نویز اشاره کرد. از آنجایی‌که در مسائل واقعی جهان، وزن هر ویژگی در خوشه‌های مختلف، متفاوت است، در دو روش پیشنهادی دوم و سوم، وزن‌دهی ویژگی‌ها به صورت محلی انجام می‌شود. لازم به ذکر است که تفاوت روش پیشنهادی دوم و سوم در معیار شباهتشان می‌باشد. در روش پیشنهادی سوم از معیار شباهت غیراقلیدسی استفاده می‌شود. این امر باعث می‌شود، در مواقعی که نویز بیش از اندازه وجود دارد، خوشه‌بندی با دقت بهتری انجام شود. در روش پیشنهادی چهارم از ترکیب الگوریتم زنبور عسل و وزن‌دهی سراسری استفاده شده است. بدین ترتیب، علاوه بر این‌که از مزایای الگوریتم‌های هوش جمعی بهره می‌برد، با وزن‌دهی ویژگی‌ها دقت خوشه‌بندی نیز بهبود پیدا می‌کند.عملکرد روش‌های پیشنهادی در مجموعه‌ داده‌های عددی و متنی مورد ارزیابی قرار گرفته است. در این ارزیابی، عملکرد روش‌های پیشنهادی با 9 روش خوشه‌بندی شناخته‌شده و بر اساس معیارهای ارزیابی مختلف بررسی شده است. نتایج آزمایشات، کارایی روش‌های پیشنهادی و بهبود روش‌های خوشه‌بندی قبلی را نشان می‌دهند.

Text of Note

Presently, large amounts of data are stored in various databases of organizations, and with the advent of large memory systems and computer networks the amount of stored data grows very quickly. These data contain useful but hidden information that may be extracted for various purposes. Data mining is one of the effective and powerful techniques that are used to extract information and knowledge from a very large amount of data. Clustering is a major data mining task that refers to a process of finding groups in a set of observations such that those belonging to the same group are similar, while those belonging to different groups are distinct, according to some criteria of distance or likeness. Clustering algorithms are used in many fields and applications such as document clustering and information retrieval, pattern recognition, machine learning, data mining and many other fields as both the primary task for understanding the nature and structure of data and in the pre-processing or post-processing phase for high level tasks. There are several challenges for clustering problem such as high dimensionality and different importance for features.In this thesis, four new clustering methods are proposed for text documents. In these methods, we use term variance (TV) as feature selection method to select a relevant feature sublists. In the first proposed method, a new objective function based on fuzzy clustering with entropy of feature weights is provided. In this method, weighting is global. Among the advantages of this method can be noted to the update feature weights during the clustering process and deal with the noise. As in the real-world issues, the weight of each feature in the different clusters is vary, in two the second and third proposed methods, features weighting is done locally. It is necessary to mention that difference between the second and third proposed method is the similarity criteria. The third proposed method is used of non-Euclidean similarity measure. This makes, in cases where there is excess noise, clustering carefully done better. In the fourth proposed method, integration of artificial bee colony algorithm and weighting features is used. Thus, in addition to being, we use the benefits of collective intelligence algorithms, by weighting features, clustering accuracy is also improved.The performance of the proposed methods in numerical and textual data set is evaluated. In this evaluation, the performance of the proposed methods with nine well-known clustering method and based on the different evaluation criteria were examined. The experimental results show the efficiency and effectiveness of the proposed methods as well as improvements over previous related methods.

UNCONTROLLED SUBJECT TERMS

Subject Term

Text documents clustering

Subject Term

Weighting features

Subject Term

Euclidean and non-Euclidean similarity measures

PERSONAL NAME - PRIMARY RESPONSIBILITY

Entry Element

رئیسی دهکردی،

Part of Name Other than Entry Element

منیژه

Relator Code

پديدآور

PERSONAL NAME - SECONDARY RESPONSIBILITY

Entry Element

مرادی،

Entry Element

عبدالله پوری،

Part of Name Other than Entry Element

پرهام

Part of Name Other than Entry Element

علیرضا

Relator Code

استاد راهنما

Relator Code

استاد مشاور

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

Entry Element

دانشگاه کردستان

Subdivision

. دانشکده مهندسی

ORIGINATING SOURCE

Country

ایران

Agency

کتابخانه مرکزی دانشگاه کردستان

LOCATION AND CALL NUMBER

Call Number

EAI۲۶۷۲ ۱۳۹۵ کتابخانه مرکزی

92029

عنوان یک راهکار ترکیبی برای خوشهبندی مستندات متنی با استفاده از الگوریتمهای کاهش بعد

پدید آورنده / منیژه رئیسی دهکردی, رئیسی دهکردی،

موضوع Text documents clustering, Weighting features, Euclidean and non-Euclidean similarity measures

رده

کتابخانه Central library and document university of Kurdistan

محل استقرار استان: Kurdistan ـ شهر: Sanandaj

NATIONAL BIBLIOGRAPHY NUMBER

LANGUAGE OF THE ITEM

TITLE AND STATEMENT OF RESPONSIBILITY

.PUBLICATION, DISTRIBUTION, ETC

PHYSICAL DESCRIPTION

GENERAL NOTES

INTERNAL BIBLIOGRAPHIES/INDEXES NOTE

CONTENTS NOTE

DISSERTATION (THESIS) NOTE

SUMMARY OR ABSTRACT

UNCONTROLLED SUBJECT TERMS

PERSONAL NAME - PRIMARY RESPONSIBILITY

PERSONAL NAME - SECONDARY RESPONSIBILITY

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

ORIGINATING SOURCE

LOCATION AND CALL NUMBER

عنوان

یک راهکار ترکیبی برای خوشهبندی مستندات متنی با استفاده از الگوریتمهای کاهش بعد

پدید آورنده

/ منیژه رئیسی دهکردی, رئیسی دهکردی،

موضوع

Text documents clustering, Weighting features, Euclidean and non-Euclidean similarity measures

کتابخانه

Central library and document university of Kurdistan

محل استقرار

استان: Kurdistan ـ شهر: Sanandaj