Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Urdu Stemmer
06-03-2015, 11:06 AM
#
Urdu Stemmer
Stemming is the term used in linguistic morphology and information retrieval to describe the process for reducing inflected (or sometimes derived) words to their word stem or root form i.e. generally a written word form.
(Wikipedia)

The goal of both stemming is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. For instance:
am, are, is  be
car, cars, car's, cars' car
(Stanford)

This application will apply stemming on Urdu words. Given an input comprising one or more words separated by space character, it will reduce them to their base form.

Input Output
باتوںباتیں بات
لاحاصل حاصل
لازوال زوال

The complete method and steps to implement such a stemmer are given in following research paper. So you have to first read and understand it completely. Then implement it the same way. Your final deliverable marks will depend on how well you have implemented the idea from the research paper, in your application:

http://www.aclweb.org/anthology/W09-34#page=50

You need to implement everything, so no readymade or built in solutions for various aspects of application, will be acceptable.

Don’t forget to put reference of this paper at appropriate place inside your final report.

Tools:Java, Microsoft.Net, Python, or any other modern programming language. SQL Server, MS Access, MySQL, Oracle or any DBMS tool.


Attached File(s)
.docx  Spring 2015_CS619_818.docx (Size: 51.64 KB / Downloads: 5)
 


Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  Urdu Stemmer bilihili 0 356 06-03-2015
Last Post: bilihili

Forum Jump:


User(s) browsing this thread:
1 Guest(s)

Return to TopReturn to Content