Show simple item record

dc.contributor.authorSarker, Shakila
dc.date.accessioned2019-12-09T08:51:12Z
dc.date.available2019-12-09T08:51:12Z
dc.date.issued2019-11-25
dc.identifier.urihttp://dspace.uiu.ac.bd/handle/52243/1507
dc.description.abstractFeatures hold the distinctive characteristics and intrinsic values of data. But it's of no use if the important information and pattern can not be extracted from the data coming from disparate sources and applications. In the area of big data, feature selection is one of the most important pre-processing step in reducing numerous numbers of unessential, irrelevant and noisy features that can seriously affect the outcomes of the classifier models. The main motivation for applying feature selection is to reduce high-dimensionality of large-scale data. As high-dimensional big data has more features for training, it becomes challenging and costly to measure the performances. The aim of the research is to build models with several hybrid feature selection techniques so that the classification algorithms can have only those features that are really relevant and help to achieve better performances. Also, finding the informative features and grouping them so that we can extract the knowledge from Big Data. In this research, we have collected 10 benchmark datasets from UC Irvine Machine Learning Repository. We have applied several feature selection methods and tested their performance (CFS, Chi-Square, Consistency Subset Evaluator, Gain Ratio, Information Gain, OneR, PCA, ReliefF, Symmetrical Uncertainty and Wrapper). The feature grouping methods are named Random Grouping, Correlation based Grouping and Attribute weighting grouping; these groups were experimented with ensemble classifiers: Random Forest, Bagging and Boosting (AdaBoost). With the observed result it has been found that these groups have similar or even better result than the entire feature sets for the datasets. Attribute Weighting grouping method has shown promising performances for the Big Data.en_US
dc.subjectFeature Selectionen_US
dc.subjectFeature Weightingen_US
dc.subjectMachine Learningen_US
dc.subjectData Miningen_US
dc.titleA Feature Group Weighting Method for Classifying High-Dimensional Big Dataen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record