• Home
  • Advanced Search
  • Directory of Libraries
  • About lib.ir
  • Contact Us
  • History

عنوان
Fine-grained Arabic named entity recognition

پدید آورنده
Alotaibi, Fahd Saleh S.

موضوع
PJ Semitic ; QA75 Electronic computers. Computer science ; QA76 Computer software

رده

کتابخانه
Center and Library of Islamic Studies in European Languages

محل استقرار
استان: Qom ـ شهر: Qom

Center and Library of Islamic Studies in European Languages

تماس با کتابخانه : 32910706-025

NATIONAL BIBLIOGRAPHY NUMBER

Number
TLets649342

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper
Fine-grained Arabic named entity recognition
General Material Designation
[Thesis]
First Statement of Responsibility
Alotaibi, Fahd Saleh S.

.PUBLICATION, DISTRIBUTION, ETC

Name of Publisher, Distributor, etc.
University of Birmingham
Date of Publication, Distribution, etc.
2015

DISSERTATION (THESIS) NOTE

Dissertation or thesis details and type of degree
Thesis (Ph.D.)
Text preceding or following the note
2015

SUMMARY OR ABSTRACT

Text of Note
This thesis addresses the problem of fine-grained NER for Arabic, which poses unique linguistic challenges to NER; such as the absence of capitalisation and short vowels, the complex morphology, and the highly in infection process. Instead of classifying the detected NE phrases into small sets of classes, we target a broader range (i.e. 50 fine-grained classes 'hierarchal-based of two levels') to increase the depth of the semantic knowledge extracted. This has increased the number of classes, complicating the task, when compared with traditional (coarse-grained) NER, because of the increase in the number of semantic classes and the decrease in semantic differences between fine-grained classes. Our approach to developing fine-grained NER relies on two different supervised Machine Learning (ML) technologies (i.e. Maximum Entropy 'ME' and Conditional Random Fields 'CRF'), which require annotated training data in order to learn by extracting informative features. We develop a methodology which exploit the richness of Arabic Wikipedia (A W) in order to create a scalable fine-grained lexical resource and a corpus automatically. Moreover, two gold-standard created corpora from different genres were also developed to perform comparable evaluation. The thesis also developed a new approach to feature representation by relying on the dependency structure of the sentence to overcome the limitation of traditional window-based (i.e. n-gram) representation. Furthermore, by exploiting the richness of unannotated textual data to extract global informative features using word-level clustering technique was also achieved. Each contribution was evaluated via controlled experiment and reported using three commonly applied metrics, i.e. precision, recall and harmonic F-measure.

TOPICAL NAME USED AS SUBJECT

PJ Semitic ; QA75 Electronic computers. Computer science ; QA76 Computer software

PERSONAL NAME - PRIMARY RESPONSIBILITY

Alotaibi, Fahd Saleh S.

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

University of Birmingham

ELECTRONIC LOCATION AND ACCESS

Electronic name
 مطالعه متن کتاب 

p

[Thesis]
276903

a
Y

Proposal/Bug Report

Warning! Enter The Information Carefully
Send Cancel
This website is managed by Dar Al-Hadith Scientific-Cultural Institute and Computer Research Center of Islamic Sciences (also known as Noor)
Libraries are responsible for the validity of information, and the spiritual rights of information are reserved for them
Best Searcher - The 5th Digital Media Festival