Genetic Based Optimization Models for Enhancing Multi- Document Text Summarization

Abstract

Extractive multi-document text summarization – a summarization with the aim of removing redundant information in a document collectionwhile preserving its salient sentences – has recently enjoyed a large interest in proposing automatic models.This paper proposes two models for extractivemulti-document summarization based on genetic algorithm (GA). First, the problem is described and modeled as a discrete optimization problem with two candidate expressions and a specific fitness function is designed to effectively cope with each candidate. Then, a binary-encoded representation together with a heuristic mutation and a local repair operator are proposed to characterize the adopted GA. The semantic roles of similarity of sentence to sentence, sentence to center of document collection and center of summary to center of document collectionareexploited in the proposed model formulations. Experiments are applied to ten clusters from DUC2002 datasets (d061j through d070f) and compared with another state-of-the-art model.Results clarify the effectiveness of the proposed models. Moreover, the injection of several levels of text similarity in the model formulation shows a positive impact on enhancing the overall performance of the proposed GA.