Automatic News Summarization Using a Dependency Structure
This project aims to create a program that will make a coherent and indicative overview of a given news article by using a dependency structure, or a graph of the relations between words, across the entire article to identify the important concepts and to recombine those important concepts into a high density summary.

Requirements

Design

Source

Explore

Progress
Existing Methods:
Headlines
A very short statement of topic, to draw readers to the article.
- Pros
- Very short
- Very concise
- Cons
- Too short? A few more words communicate a lot more about the article
- Edited for space
- Sometimes sensationalist
- Not written by the author
Human-Written Summaries
A summary written by a person for the space it is to occupy
- Pros
- Any length possible
- Contains the major ideas
- Concise and understandable
- Cons
- Costs money to hire someone to write them
- May need to be rewritten for multiple spaces
Lead-Excerpt Summary
The first few sentences of the article, trimmed by hand and used as a summary
- Pros
- Easy to make
- Gives as good an introduction to the topic as the article does
- Cons
- Readers re-read the lead
- Relies on the lead being an overview
- Not very flexible in length
General Automatic Summaries
Using a generic summarization engine to rank the sentences in an article and compile the top ranked sentences into a summary
- Pros
- Easy to make
- Cons
- Relies on overview sentences in the article
- Not very flexible in length
- Hard to include most of the main ideas
- Designed for longer (100+ words) summaries
The Process
Input
The user enters the article either through a GUI on Windows XP, or the user saves the article as a text file and feeds it into a command line utility.

Part of Speech Tagging
The words in the article are then tagged with their parts of speech (Noun, verb, etc...). This is an important prerequisite for automatically creating the dependencies. The tagging is done with FreeLing's language analysis tools, which uses a Hidden Markov Model.

Dependency Generation
To generate the dependency structure, it uses the Link Generator from CMU, which describes many different types of relationships between the words. The link generation is close to dependency generation when only a certain set of links are used.

Ranking
To determine what is the most important data, each word is ranked based on how many words depend on it and how many words depend on those words.

Generation
Finally, the program follows the dependency relations to regenerate new sentences with the most important information from the original document.
