Companies in the software development business accumulate an enormous amount of data, generated through their test and support teams. This data could be in the form of defect records and their resolutions, queries or responses to the customers. This data may be classified as problem data, and contains information that can be used for proactive problem diagnosis and resolution. However, in real situation it is rarely reused, as the data is populated in ?human? (communication language) and lacks structured information to be effectively used by IT management tools.
IBM Log and Trace Analyzer for Autonomic Computing (LTA) [1, 7] is one tool that helps IT administrators and support personnel in easy problem diagnosis through its Symptom Database. A symptom database is a collection of XML files adhering to a schema and contains records of incident and problem indications that could arise in the operation of the software or hardware infrastructure. For every symptom, the Symptom Database also contains the cause of the problem and a recommended solution for the problem. Typically the Symptom Database is created by the development teams from the scratch; however we have found that the unstructured data collected by the support and the test teams can also be processed using various natural language processing and unstructured information processing algorithms to build symptom databases.