عنوان

Compression-Based Parts-Of-Speech Tagger for the Arabic Language

پدید آورنده

Alkhazi, Ibrahim Sulaiman B.

موضوع

Computer science,Language

رده

کتابخانه

مرکز و کتابخانه مطالعات اسلامی به زبان‌های اروپایی

محل استقرار

استان: قم ـ شهر: قم

تماس با کتابخانه : 32910706-025

شماره کتابشناسی ملی

شماره

TL53080

زبان اثر

زبان متن نوشتاري يا گفتاري و مانند آن

انگلیسی

عنوان و نام پديدآور

عنوان اصلي

Compression-Based Parts-Of-Speech Tagger for the Arabic Language

نام عام مواد

[Thesis]

نام نخستين پديدآور

Alkhazi, Ibrahim Sulaiman B.

وضعیت نشر و پخش و غیره

نام ناشر، پخش کننده و غيره

Bangor University (United Kingdom)

تاریخ نشرو بخش و غیره

2019

يادداشت کلی

متن يادداشت

160 p.

یادداشتهای مربوط به پایان نامه ها

جزئيات پايان نامه و نوع درجه آن

Ph.D.

کسي که مدرک را اعطا کرده

Bangor University (United Kingdom)

امتياز متن

2019

یادداشتهای مربوط به خلاصه یا چکیده

متن يادداشت

The Arabic language is a morphologically complex language that causes various difficulties for various NLP systems, such as POS tagging. The motive of this research is to investigate the development and training of a compression-based Arabic POS tagger using the PPM algorithm. The adoption of the algorithm for Arabic POS tagging may increase the efficiency and reduce the Arabic language ambiguity problem.
The best text compression algorithms can be applied to NLP tasks often with state-of-the-art results. This research examines the use of tag-based compression of larger Arabic resources to re-evaluate the performance of tag-based compression which may reveal POS linguistic aspects of the Arabic language. We also found that tag-based text compression for the Arabic text can be utilised as a means of evaluating the performance and quality of the Arabic POS taggers. The results of the experiments show that the tag-based compression of the text can effectively be used for assessing the performance of Arabic POS taggers when used to tag different types of the Arabic text, and also as a means of comparing the performance of two Arabic POS taggers on the same text.
With the rapid growth of Arabic text on the Web, studies that address the problems of classification and segmentation of the Arabic language are limited compared to other languages, most of which implement word-based and feature extraction algorithms. This research adopts a PPM character-based compression scheme to classify and segment Classical Arabic (CA) and Modern Standard Arabic (MSA) texts. An initial experiment using the PPM classification method on samples of text resulted in an accuracy of 95.5%, an average precision of 0.958, an average recall of 0.955 and an average F-measure of 0.954, using the concept of minimum cross-entropy. Segmenting the CA and MSA text using the PPM compression algorithm obtained an accuracy of 86%, an average precision of 0.869, an average recall of 0.86 and an average F-measure of 0.859.
This research describes the creation of the new Bangor Arabic Annotated Corpus (BAAC) which is a Modern Standard Arabic (MSA) corpus that comprises 50K words manually annotated by parts-of-speech. For evaluating the quality of the corpus, the Kappa coefficient and a direct percent agreement for each tag were calculated for the new corpus and a Kappa value of 0.956 was obtained, with an average observed agreement of 94.25%. The corpus was used to evaluate the widely used Madamira Arabic POS tagger and to further investigate compression models for text compressed using POS tags. Also, a new annotation tool was developed and employed for the annotation process of the BAAC.

اصطلاحهای موضوعی کنترل نشده

اصطلاح موضوعی

Computer science

اصطلاح موضوعی

Language

نام شخص به منزله سر شناسه - (مسئولیت معنوی درجه اول )

مستند نام اشخاص تاييد نشده

Alkhazi, Ibrahim Sulaiman B.

شناسه افزوده (تنالگان)

مستند نام تنالگان تاييد نشده

Bangor University (United Kingdom)

دسترسی و محل الکترونیکی

نام الکترونيکي

وضعیت انتشار

فرمت انتشار

اطلاعات رکورد کتابشناسی

نوع ماده

[Thesis]

کد کاربرگه

276903

اطلاعات دسترسی رکورد

سطح دسترسي

تكميل شده

عنوان Compression-Based Parts-Of-Speech Tagger for the Arabic Language

پدید آورنده Alkhazi, Ibrahim Sulaiman B.

موضوع Computer science,Language

رده

کتابخانه مرکز و کتابخانه مطالعات اسلامی به زبان‌های اروپایی

محل استقرار استان: قم ـ شهر: قم

شماره کتابشناسی ملی

زبان اثر

عنوان و نام پديدآور

وضعیت نشر و پخش و غیره

يادداشت کلی

یادداشتهای مربوط به پایان نامه ها

یادداشتهای مربوط به خلاصه یا چکیده

اصطلاحهای موضوعی کنترل نشده

نام شخص به منزله سر شناسه - (مسئولیت معنوی درجه اول )

شناسه افزوده (تنالگان)

دسترسی و محل الکترونیکی

وضعیت انتشار

اطلاعات رکورد کتابشناسی

اطلاعات دسترسی رکورد

عنوان

Compression-Based Parts-Of-Speech Tagger for the Arabic Language

پدید آورنده

Alkhazi, Ibrahim Sulaiman B.

موضوع

Computer science,Language

کتابخانه

مرکز و کتابخانه مطالعات اسلامی به زبان‌های اروپایی

محل استقرار

استان: قم ـ شهر: قم