Improving Extractive Multi-Document Text Summarization Through Multi-Objective Optimization


Multi-document summarization is an optimization problem demanding optimization of more than one objective function simultaneously. The proposed work regards balancing of the two significant objectives: content coverage and diversity when generating summaries from a collection of text documents. Any automatic text summarization system has the challenge of producing high quality summary. Despite the existing efforts on designing and evaluating the performance of many text summarization techniques, their formulations lack the introduction of any model that can give an explicit representation of – coverage and diversity – the two contradictory semantics of any summary. In this work, the design of generic text summarization model based on sentence extraction is redirected into more semantic measure reflecting individually both content coverage and content diversity as two explicit optimization models. The problem is defined by projecting the first criterion, i.e. content coverage in the light of text similarity. The proposed model hypothesizes a possible decomposition of text similarity into three different levels of optimization formula. First, aspire to global optimization, the candidate summary should cover the summary of the document collection. Then, to attain, less global optimization, the sentences of the candidate summary should cover the summary of the document collection. The third level of optimization is content with local optimization, where the difference between the magnitude of terms covered by the candidate summary and those of the document collection should be small. This coverage model is coupled with a proposed diversity model and defined as a Multi-Objective Optimization (MOO) problem. Moreover, heuristic perturbation and heuristic local repair operators have been proposed and injected into the adopted evolutionary algorithm to harness its strength. Assessment of the proposed model has been performed using document sets supplied by Document Understanding Conference 2002 (DUC2002) and a comparison has been made with other state-of-the-art methods. Metric used to measure performance of the proposed work is Recall-Oriented Understudy for Gisting Evaluation (ROUGE) toolkit. Results obtained support strong proof for the effectiveness and the significant performance awarded to the proposed MOO model over other state-of-the-art models.