Main Page
Deanship
The Dean
Dean's Word
Curriculum Vitae
Contact the Dean
Vision and Mission
Organizational Structure
Vice- Deanship
Vice- Dean
KAU Graduate Studies
Research Services & Courses
Research Services Unit
Important Research for Society
Deanship's Services
FAQs
Research
Staff Directory
Files
Favorite Websites
Deanship Access Map
Graduate Studies Awards
Deanship's Staff
Staff Directory
Files
Researches
Contact us
عربي
English
About
Admission
Academic
Research and Innovations
University Life
E-Services
Search
Deanship of Graduate Studies
Document Details
Document Type
:
Thesis
Document Title
:
MISSING VALUES IMPUTATION APPROACH IN SPAM TWEETS CLASSIFICATION
نهج التعويض عن القيم المفقودة في تصنيف التغريدات غير المرغوبة
Subject
:
Faculty of Computing and Information Technology
Document Language
:
Arabic
Abstract
:
Missing data is almost unavoidable problem in research that depends on collecting large amount of data. Information may be missing when users are unwilling to provide their personal data due to privacy concerns or absence of motivation. This is mostly true for elective data requested by the systems like Twitter. Omitting missing data can negatively affect the statistical power of a study and can produce biased estimates, leading to invalid conclusions. Our aim of this study is to improve the quality of data classification through using multiple imputation model on incomplete data collected from Twitter microblogging accounts; and to better understand the mode of operation of spammers when identifying their posts. Our research presents a case study of data analysis using two methods for dealing with Twitter missing data: multiple imputation (MI) and Expectation Maximization (EM). The impact of resulted imputed data is then tested through two ways: statistical analysis and classification algorithms. Our developed predictive model reported an evaluation result of 96.2% accuracy using Random Forest classifier from Multiple Imputation by chained equations (MICE) complete dataset. The study also identified that the number of user mentions and number of URLs in each tweet are the most two potential features that can detect spam in posts at most.
Supervisor
:
Prof. Omaimah Bamasag
Thesis Type
:
Master Thesis
Publishing Year
:
1439 AH
2018 AD
Added Date
:
Sunday, July 1, 2018
Researchers
Researcher Name (Arabic)
Researcher Name (English)
Researcher Type
Dr Grade
Email
وفاء حسين دفع
Daffa, Wafaa Hussein
Researcher
Master
Files
File Name
Type
Description
43538.pdf
pdf
Back To Researches Page