• Home
  • Advanced Search
  • Directory of Libraries
  • About lib.ir
  • Contact Us
  • History

عنوان
The computational analysis of morphosyntactic categories in Urdu

پدید آورنده
Hardie, Andrew

موضوع
P Philology. Linguistics

رده

کتابخانه
Center and Library of Islamic Studies in European Languages

محل استقرار
استان: Qom ـ شهر: Qom

Center and Library of Islamic Studies in European Languages

تماس با کتابخانه : 32910706-025

NATIONAL BIBLIOGRAPHY NUMBER

Number
TLets420555

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper
The computational analysis of morphosyntactic categories in Urdu
General Material Designation
[Thesis]
First Statement of Responsibility
Hardie, Andrew

.PUBLICATION, DISTRIBUTION, ETC

Name of Publisher, Distributor, etc.
Lancaster University
Date of Publication, Distribution, etc.
2004

DISSERTATION (THESIS) NOTE

Dissertation or thesis details and type of degree
Thesis (Ph.D.)
Text preceding or following the note
2004

SUMMARY OR ABSTRACT

Text of Note
Urdu is a language of the Indo-Aryan family, widely spoken in India and Pakistan, and an important minority language in Europe, North America, and elsewhere. This thesis describes the development of a computer-based system for part-of-speech tagging of Urdu texts, consisting of a tagset, a set of tagging guidelines for manual tagging or post-editing, and the tagger itself. The tagset is defined in accordance with a set of design principles, derived from a survey of good practice in the field of tagset design, including compliance with the EAGLES guidelines on morphosyntactic annotation. These are shown to be extensible to languages, such as Urdu, that are closely related to those languages for which the guidelines were originally devised. The description of Urdu grammar given by Schmidt (1999) is used as a model of the language for the purpose of tagset design. Manual tagging is undertaken using this tagset, by which process a set of tagging guidelines are created, and a set of manually tagged texts to serve as training data is obtained. A rule-based methodology is used here to perform tagging in Urdu. The justification for this choice is discussed. A suite of programs which function together within the Unitag architecture are described. This system (as well as a tokeniser) includes an analyser (Urdutag) based on lexical look-up and word-form analysis, and a disambiguator (Unirule) which removes contextually inappropriate tags using a set of 274 rules. While the system's final performance is not particularly impressive, this is largely due to a paucity of training data leading to a small lexicon, rather than any substantial flaw in the system.

TOPICAL NAME USED AS SUBJECT

P Philology. Linguistics

PERSONAL NAME - PRIMARY RESPONSIBILITY

Hardie, Andrew

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

Lancaster University

ELECTRONIC LOCATION AND ACCESS

Electronic name
 مطالعه متن کتاب 

p

[Thesis]
276903

a
Y

Proposal/Bug Report

Warning! Enter The Information Carefully
Send Cancel
This website is managed by Dar Al-Hadith Scientific-Cultural Institute and Computer Research Center of Islamic Sciences (also known as Noor)
Libraries are responsible for the validity of information, and the spiritual rights of information are reserved for them
Best Searcher - The 5th Digital Media Festival