TY - JOUR ID - TI - An integrated robust semi-supervised framework for improving cluster reliability using ensemble method for heterogeneous datasets AU - Smita Prava Mishra a,*, Debahuti Mishra b, Srikanta Patnaik PY - 2015 VL - 1 IS - 4 SP - 200 EP - 211 JO - Karbala International Journal of Modern Science مجلة كربلاء العالمية للعلوم الحديثة SN - 2405609X 24056103 AB - Data mining literature offer some clustering techniques. But when we implement even an effective clustering technique, theresults are found unreliable. The efficacy of the technique come under scrutiny. Here, the proposal is about an integrated framework,which ensures the reliability of the class labels assigned to a dataset whose class labels are unknown. The model uses PSO-k-means,k-medoids, c-means and Expectation Maximization for data clustering. This model integrates their results through majority votingcluster ensemble technique to enhance reliability. The reliable outcomes serve as the training set for the classification processthrough Bayesian classifier, Multi Layer Perceptron, Support Vector Machine and Decision tree. The predicted class labels bymajority of classifiers through bagging classifier ensemble method are included with the training set and in combination, designatedas the set with known class labels. Heterogeneous datasets with unknown class labels but known number of classes, after beingtreated through this model would be able to find the class labels for a significant portion of the data and may be accepted withreliability. The evaluation procedure has been performed by following the Dunn's, DavieseBouldin and Modified GoodmaneKruskalindexing techniques for internal validation and probabilistic measures such as Normalized Mutual Information,Normalized Variation of Information and Adjusted Random Index which are appropriate measures of goodness-of-fit androbustness of the final clusters. The predictive capacity of the model is also validated through probabilistic measures and externalindexing techniques such as Purity Measure, Random Index and F-measure.

ER -