Place of publication: United States, Ann Arbor; ISBN=978-0-355-30142-7
یادداشتهای مربوط به پایان نامه ها
جزئيات پايان نامه و نوع درجه آن
Ph.D.
نظم درجات
Computer Science
کسي که مدرک را اعطا کرده
Columbia University
امتياز متن
2017
یادداشتهای مربوط به خلاصه یا چکیده
متن يادداشت
This dissertation studies the problem of identifying the ideological perspective of people as expressed in their written text. One's perspective is often expressed in his/her stance towards polarizing topics. We are interested in studying how nuanced linguistic cues can be used to identify the perspective of a person in informal genres. Moreover, we are interested in exploring the problem from a multilingual perspective comparing and contrasting linguistics devices used in both English informal genres datasets discussing American ideological issues and Arabic discussion fora posts related to Egyptian politics. In doing so, we solve several challenges. Our first and utmost goal is building computational systems that can successfully identify the perspective from which a given informal text is written while studying what linguistic cues work best for each language and drawing insights into the similarities and differences between the notion of perspective in both studied languages. We build computational systems that can successfully identify the stance of a person in English informal text that deal with different topics that are determined by one's perspective, such as legalization of abortion, feminist movement, gay and gun rights; additionally, we are able to identify a more general notion of perspective-namely the 2012 choice of presidential candidate-as well as build systems for automatically identifying different elements of a person's perspective given an Egyptian discussion forum comment. The systems utilize several lexical and semantic features for both languages. Specifically, for English we explore the use of word sense disambiguation, opinion features, latent and frame semantics as well; as Linguistic Inquiry and Word Count features; in Arabic, however, in addition to using sentiment and latent semantics, we study whether linguistic code-switching (LCS) between the standard and dialectal forms for the language can help as a cue for uncovering the perspective from which a comment was written. This leads us to the challenge of devising computational systems that can handle LCS in Arabic. The Arabic language has a diglossic nature where the standard form of the language (MSA) coexists with the regional dialects (DA) corresponding to the native mother tongue of Arabic speakers in different parts of the Arab world. DA is ubiquitously prevalent in written informal genres and in most cases it is code-switched with MSA. The presence of code-switching degrades the performance of almost any MSA-only trained Natural Language Processing tool when applied to DA or to code-switched MSA-DA content. In order to solve this challenge, we build a state-of-the-art system-AIDA-to computationally handle token and sentence-level code-switching.
موضوع (اسم عام یاعبارت اسمی عام)
موضوع مستند نشده
Computer science
اصطلاحهای موضوعی کنترل نشده
اصطلاح موضوعی
Applied sciences;Informal
نام شخص به منزله سر شناسه - (مسئولیت معنوی درجه اول )