• Home
  • Advanced Search
  • Directory of Libraries
  • About lib.ir
  • Contact Us
  • History

عنوان
Compression-based parts-of-speech tagger for the Arabic language

پدید آورنده
Alkhazi, Ibrahim

موضوع
Language modelling ; natural language processing

رده

کتابخانه
Center and Library of Islamic Studies in European Languages

محل استقرار
استان: Qom ـ شهر: Qom

Center and Library of Islamic Studies in European Languages

تماس با کتابخانه : 32910706-025

NATIONAL BIBLIOGRAPHY NUMBER

Number
TLets801316

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper
Compression-based parts-of-speech tagger for the Arabic language
General Material Designation
[Thesis]
First Statement of Responsibility
Alkhazi, Ibrahim
Subsequent Statement of Responsibility
Teahan, William

.PUBLICATION, DISTRIBUTION, ETC

Name of Publisher, Distributor, etc.
Bangor University
Date of Publication, Distribution, etc.
2019

DISSERTATION (THESIS) NOTE

Dissertation or thesis details and type of degree
Thesis (Ph.D.)
Text preceding or following the note
2019

SUMMARY OR ABSTRACT

Text of Note
The Arabic language is a morphologically complex language that causes various difficulties for various NLP systems, such as POS tagging. The motive of this research is to investigate the development and training of a compression-based Arabic POS tagger using the PPM algorithm. The adoption of the algorithm for Arabic POS tagging may increase the efficiency and reduce the Arabic language ambiguity problem. The best text compression algorithms can be applied to NLP tasks often with state-of-the-art results. This research examines the use of tag-based compression of larger Arabic resources to re-evaluate the performance of tag-based compression which may reveal POS linguistic aspects of the Arabic language. We also found that tag-based text compression for the Arabic text can be utilised as a means of evaluating the performance and quality of the Arabic POS taggers. The results of the experiments show that the tag-based compression of the text can effectively be used for assessing the performance of Arabic POS taggers when used to tag different types of the Arabic text, and also as a means of comparing the performance of two Arabic POS taggers on the same text. With the rapid growth of Arabic text on the Web, studies that address the problems of classification and segmentation of the Arabic language are limited compared to other languages, most of which implement word-based and feature extraction algorithms. This research adopts a PPM character-based compression scheme to classify and segment Classical Arabic (CA) and Modern Standard Arabic (MSA) texts. An initial experiment using the PPM classification method on samples of text resulted in an accuracy of 95.5%, an average precision of 0.958, an average recall of 0.955 and an average F-measure of 0.954, using the concept of minimum cross-entropy. Segmenting the CA and MSA text using the PPM compression algorithm obtained an accuracy of 86%, an average precision of 0.869, an average recall of 0.86 and an average F-measure of 0.859. This research describes the creation of the new Bangor Arabic Annotated Corpus (BAAC) which is a Modern Standard Arabic (MSA) corpus that comprises 50K words manually annotated by parts-of-speech. For evaluating the quality of the corpus, the Kappa coefficient and a direct percent agreement for each tag were calculated for the new corpus and a Kappa value of 0.956 was obtained, with an average observed agreement of 94.25%. The corpus was used to evaluate the widely used Madamira Arabic POS tagger and to further investigate compression models for text compressed using POS tags. Also, a new annotation tool was developed and employed for the annotation process of the BAAC.

TOPICAL NAME USED AS SUBJECT

Language modelling ; natural language processing

PERSONAL NAME - PRIMARY RESPONSIBILITY

Alkhazi, Ibrahim

PERSONAL NAME - SECONDARY RESPONSIBILITY

Teahan, William

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

Bangor University

ELECTRONIC LOCATION AND ACCESS

Electronic name
 مطالعه متن کتاب 

p

[Thesis]
276903

a
Y

Proposal/Bug Report

Warning! Enter The Information Carefully
Send Cancel
This website is managed by Dar Al-Hadith Scientific-Cultural Institute and Computer Research Center of Islamic Sciences (also known as Noor)
Libraries are responsible for the validity of information, and the spiritual rights of information are reserved for them
Best Searcher - The 5th Digital Media Festival