Document Details

Document Type : Thesis 
Document Title :
MISSING VALUES IMPUTATION APPROACH IN SPAM TWEETS CLASSIFICATION
نهج التعويض عن القيم المفقودة في تصنيف التغريدات غير المرغوبة
 
Subject : Faculty of Computing and Information Technology 
Document Language : Arabic 
Abstract : Missing data is almost unavoidable problem in research that depends on collecting large amount of data. Information may be missing when users are unwilling to provide their personal data due to privacy concerns or absence of motivation. This is mostly true for elective data requested by the systems like Twitter. Omitting missing data can negatively affect the statistical power of a study and can produce biased estimates, leading to invalid conclusions. Our aim of this study is to improve the quality of data classification through using multiple imputation model on incomplete data collected from Twitter microblogging accounts; and to better understand the mode of operation of spammers when identifying their posts. Our research presents a case study of data analysis using two methods for dealing with Twitter missing data: multiple imputation (MI) and Expectation Maximization (EM). The impact of resulted imputed data is then tested through two ways: statistical analysis and classification algorithms. Our developed predictive model reported an evaluation result of 96.2% accuracy using Random Forest classifier from Multiple Imputation by chained equations (MICE) complete dataset. The study also identified that the number of user mentions and number of URLs in each tweet are the most two potential features that can detect spam in posts at most. 
Supervisor : Prof. Omaimah Bamasag 
Thesis Type : Master Thesis 
Publishing Year : 1439 AH
2018 AD
 
Added Date : Sunday, July 1, 2018 

Researchers

Researcher Name (Arabic)Researcher Name (English)Researcher TypeDr GradeEmail
وفاء حسين دفعDaffa, Wafaa HusseinResearcherMaster 

Files

File NameTypeDescription
 43538.pdf pdf 

Back To Researches Page