Machine learning of antonyms in English and Arabic corpora
نام عام مواد
[Thesis]
نام نخستين پديدآور
Aldhubayi, Luluh Basim M.
نام ساير پديدآوران
Atwell, Eric ; Bennett, Brandon
وضعیت نشر و پخش و غیره
نام ناشر، پخش کننده و غيره
University of Leeds
تاریخ نشرو بخش و غیره
2019
یادداشتهای مربوط به پایان نامه ها
جزئيات پايان نامه و نوع درجه آن
Thesis (Ph.D.)
امتياز متن
2019
یادداشتهای مربوط به خلاصه یا چکیده
متن يادداشت
Identifying lexical semantic relations in the text has been a long-standing dream of artificial intelligence and the target of many researchers' attention over the past years. This thesis addresses the problem of identifying antonymy relations, such as(hot/cold) in an automatic method. This work presents three key points in capturing antonymy word pairs: extracting word pairs examples from a textual corpus, representing antonymy in a pair vector space model, and using a machine learning classifier to predict the antonymy relation. Researchers have found that discriminating antonymy from synonymy is a non-trivial task. Both relations show similar semantic distributions as they are found in similar contexts. This issue affects many similarity-based applications by displaying opposite words instead of synonyms. Moreover, both traditional and modern vector space models such as Bag-of-Words and Word Embeddings models show poor discrimination between antonymy and synonymy words. Therefore, this work proposed antonymy pair vector representation based on symmetric classified patterns extracted from a corpus. Besides, we are motivated by extracting novel antonymy and opposites relations between word pairs. This research aims to capture and predict antonymy pairs generated by a textual corpus to make computers able to understand and capture opposition relation in the text. Our research proposes the Antonymy classifier which combines two approaches: the pattern-based approach and a machine learning classifier. We use the pattern-based approach to extract word pairs and patterns. We also propose using distant supervision learning to label the extracted pairs automatically. Distant supervision uses an external knowledge base (the Open Multilingual WordNet) to generate positive and negative antonymy instances. It also extracts every sentence from a corpus which shows both canonical antonymy pairs such as(national/international)and non-canonical antonymy or opposites pairs such as (internal/international) that might provide statistical evidence for an antonymy relation. In addition, this work presents a pattern classifier model which automatically extracts and classifies antonymy patterns by computing the average co-occurrence association between positive(antonymy) and negative (non-antonymy)instances in the training set. A part of these patterns such as (between X and Y, both X and Y, from X to Y ) were found in related linguistic studies on manual patterns extraction and analysis. We also found some novel textual patterns that are highly associated with antonymy pairs such as (however X or Y, what is X and what is Y) and more. This work also shows experiments in extracting and predicting antonymy on the English BNC and SkELL corpora and the Arabic ArTenTen corpus. The overall outcomes showed a positive prediction improvement in distinguishing antonym pairs compared to previous attempts. Also, we presented new antonymy pairs that are not found in the English and Arabic WordNet. The antonymy classifier model uses a machine learning algorithm to extract and classify novel adjectival and noun antonymous pairs such as (verbal/visual), (input/output), (life/death) and(material/spiritual). Therefore, the work presented in this research is a promising method for better extraction and classification of antonymy pairs and patterns in a corpus.
نام شخص به منزله سر شناسه - (مسئولیت معنوی درجه اول )