Document Type |
: |
Thesis |
Document Title |
: |
Automatic Question Answering System for Arabic Language Textual Data نظام أسئلة وإجابة آلي للبيانات النصية في اللغة العربية |
Subject |
: |
Faculty of Computing and Information Technology |
Document Language |
: |
Arabic |
Abstract |
: |
Question answering (QA) has a long tradition, involving many disciplines, ranging from philosophy to database theory. Depending on the discipline different aspects of the question answering process are investigated.
In this thesis, an Arabic Information Retrieval (IR) System has been implemented, we know that searching inside a large corpus is a hard and time-consuming task for the user, so that, establishing a way to retrieve the data to the user is very effective. The main concern is about the Prophetic Hadith. We have assumed that the corpus is divided into main topics and each one is divided into sub-topics and so on.
In another hand, this thesis presented the application of pattern recognition algorithm based on statistical learning, the Hidden Markov Model (HMM) which builds one model for each topic related trained texts and before training there is a processing step in any IR system which is stemming, which removes morphological information from the word. Stemming has a long tradition in document retrieval, and a variety of stemmers are available. The Arabic language is a highly inflected language and it has a complex morphology.
After stemming, and for training purpose a feature vector for each word in the corpus is generated. A new approach has been implemented, which creates the feature vector for the words from its frequency inside the topics, then labels are generated for the words by clustering them into groups and one label is given for all words in one cluster, the clustering process is used k-means algorithm witch classify or group our stems based on attribute/feature.
Although we used a Prophetic Hadith corpus, the system could be used in any other context, anyhow several experiments have been carried out in this research in order to increase the performance of our system and the highest possible accuracy accomplished in 64%. |
Supervisor |
: |
Dr. Reda A. Alkhoribi |
Thesis Type |
: |
Master Thesis |
Publishing Year |
: |
1430 AH
2009 AD |
Co-Supervisor |
: |
Dr. Omar A. Batarfi |
Added Date |
: |
Monday, December 28, 2009 |
|
Researchers
توفيق زهير حسنين | Hasanain, Tawfeq ZUHAIR | Researcher | Master | TAWFEQ@TAWFEQ.COM |
|