loading...
Automatic Recognition of Text Difficulty from Consumers Health Information
Salt Lake City, Utah June 22-June 23
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CBMS.2006.5819th IEEE Symposium on Computer-Based ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Yunli Wang, National Research Council Canada
Internet is used as one of major sources of health information. However, some studies show that the readability of health information presented on health web sites is difficult for many consumers. Readability formulas usually measure difficulty of writing style, instead of difficulty of content. In order to recommend health information with appropriate reading level to consumers, we investigate the feasibility of identifying text difficulty of health information using machine learning methods. Support Vector Machine is used to classify consumer health information into easy to read and reading level for the general public. Three feature sets: surface linguistic features, word difficulty features, unigrams and their combinations are compared in terms of classification accuracy. Unigram features alone reach an accuracy of 80.71%, and the combination of three feature sets is the most effective in classification with accuracy of 84.06%. They are significantly better than surface linguistic features, word difficulty features and their combination.
Citation:
Yunli Wang, "Automatic Recognition of Text Difficulty from Consumers Health Information," cbms, pp.131-136, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.