Proposed Method to Enhance Text Document Clustering Using Improved Fuzzy C Mean Algorithm with Named Entity Tag

Abstract

Text document clustering denotes to the clustering of correlated textdocuments into groups for unsupervised document society, text datamining, and involuntary theme extraction. The most common documentrepresentation model is vector space model (VSM) which embodies a setof documents as vectors of vital terms, outmoded document clusteringmethods collection related documents lacking at all user contact. Theproposed method in this paper is an attempt to discover how clusteringmight be better-quality with user direction by selecting features to separatedocuments. These features are the tag appear in documents, like NamedEntity tag which denote to important information for cluster names in text,through introducing a design system for documents representation modelwhich takes into account create combined features of named entity tagand use improvement Fuzzy clustering algorithms.The proposed method is tested in two levels, first level uses only vectorspace model with traditional Fuzzy c mean, and the second level usesvector space model with combined features of named entity tag and useimprovement fuzzy c mean algorithm, through uses a subset of Reuters21578 datasets that contains 1150 documents of ten topics (150)document for each topic. The results show that using second level asclustering techniques for text documents clustering achieves goodperformance with an average categorization accuracy of 90%.