CHAPTER 2 REHYPOTHECATION, COLLATERAL MISMATCH
2.2 Model
El proceso de generación de un resumen (ver Figura 4-5) inicia con el pre-procesamiento del documento original, primero segmentando el texto en las oraciones que lo conforman, luego normalizando las oraciones convirtiendo mayúsculas a minúsculas y eliminando palabras vacías, después se aplica el proceso de lematización, y finalmente las oraciones son indexadas en una estructura de datos (ver sección 5.1).
El siguiente paso del proceso es llevar cada uno de los términos de las oraciones que se obtienen del pre-procesamiento a la representación del modelo espacio vectorial descrito en la Sección 2.2.1. Por cada término de la oración se almacena su frecuencia relativa en una matriz de pesos, de igual forma se realiza este proceso con los términos del título del documento.
A continuación, con base en la matriz de pesos, se determina la similitud de cosenos entre oraciones, entre cada oración y el documento, y entre cada oración con el título del documento para ser almacenados en una matriz de similitudes. Estos valores de similitud sirven para calcular algunos factores de la función objetivo como la cobertura, cohesión y relación con el título.
79
Por último se ejecutan los algoritmos SLFA-MT-SingleDocSum y SLC-ASC-SingleDocSum, obteniendo al final una solución candidata (vector solución), donde las posiciones con valor uno son organizadas descendentemente basándose en su posición original en el documento. Posteriormente el vector solución es decodificado para obtener las oraciones originales del documento, generando el resumen final, el cual es truncado a cien palabras para realizar la evaluación de calidad y compararlos con otros métodos del estado del arte.
Figura 4-5 Esquema de Generación de Resúmenes
A continuación se presenta un ejemplo con la noticia original y el resumen generado por el algoritmo SLFA-MT-SingleDocSum.
Documento de la noticia:
“US INSURERS expect to pay out an estimated Dollars 7.3bn (Pounds 3.7bn) in Florida as a result of Hurricane Andrew - by far the costliest disaster the industry has ever faced. The figure is the first official tally of the damage resulting from the hurricane, which ripped through southern Florida last week. In the battered region it is estimated that 275,000 people still have no electricity and at least 150,000 are either homeless or are living amid ruins. President George Bush yesterday made his second visit to the region since the hurricane hit. He pledged the government would see through the clean-up 'until the job is done'. Although there had already been some preliminary guesses at the level of insurance claims, yesterday's figure comes from the Property Claims Services division of the American Insurance Services Group, the property-casualty insurers' trade association. It follows an extensive survey of the area by the big insurance companies. Mr Gary Kerney, director of catastrophe services at the PCS, said the industry was expecting about 685,000 claims in Florida alone. It is reckoned the bulk of the damage - over Dollars 6bn in insured claims - is in Dade County, a rural region to the south of Miami. However, the final cost of Hurricane Andrew will be higher still. Yesterday's estimate does not include any projection for claims in Louisiana, which was also affected by the storm, although less severely than Florida. An estimate of the insured losses in this second state will be released later this week. But on the Florida losses alone, Hurricane Andrew becomes the most costly insured catastrophe in the US. Hurricane Hugo, which hit the east coast in September 1989, cost the insurance industry about Dollars 4.2bn. The Oakland fire disaster, in California last year, cost Dollars 1.2bn. By contrast, insurance claims resulting from the Los Angeles riots earlier this year - the most expensive civil disturbance in the US - totalled just Dollars 775m. Hurricane Andrew leaves the US property- casualty insurers facing their worst-ever year for catastrophe losses. The LA riots and a series of tornadoes, wind and hailstorms in states such as Kansas, Oklahoma and Iowa had already produced insured losses of Dollars 3.9bn. With Florida's Hurricane Andrew losses added in, the total rises to Dollars 11.2bn. This easily exceeds the record Dollars 7.6bn of
80
catastrophe losses seen in 1989, when the industry paid out on both Hurricane Hugo and the Loma Prieta earthquake in California. Wall Street, however, has reacted calmly to the record losses expected, and insurers' shares - although lower initially - have been firming recently. The property-casualty industry is thought to have adequate reserves to cover the disaster.”
Resumen generado:
“US INSURERS expect to pay out an estimated Dollars 7.3bn (Pounds 3.7bn) in Florida as a result of Hurricane Andrew - by far the costliest disaster the industry has ever faced. Although there had already been some preliminary guesses at the level of insurance claims, yesterday's figure comes from the Property Claims Services division of the American Insurance Services Group, the property-casualty insurers' trade association. Mr Gary Kerney, director of catastrophe services at the PCS, said the industry was expecting about 685,000 claims in Florida alone. Hurricane Hugo, which hit the east coast in September 1989, cost the insurance industry about Dollars 4.2bn. Hurricane Andrew leaves the US property- casualty insurers facing their worst-ever year for catastrophe losses.”
81
Capítulo 5
5
EVALUACIÓN
En este capítulo se describen todas las tareas involucradas con la experimentación, incluyendo el pre-procesamiento de los documentos, la descripción de los conjuntos de datos y métricas usadas, los valores de los parámetros de los algoritmos propuestos SFLA y SLC, los resultados obtenidos en la calidad de los resúmenes generados por los algoritmo en los tres ciclos de adaptación de los algoritmos, y la comparación de estos resultados con los métodos del estado del arte.