::: 您目前的位置: 首頁 / 相關知識 /文詞探勘

 

 

 

 

 

標題: Fu, J.H., and S.L. Lee (2012) “A Multi-class SVM Classification System Based On Learning Methods from Indistinguishable Chinese Official Documents”, Expert Systems with Applications, Vol.39, pp.3127–3134.
出版日期: 2014/04/16
摘要: Support Vector Machines (SVM) has been developed for Chinese official document classification in One-against-All (OAA) multi-class scheme. Several data retrieving techniques including sentence segmentation, term weighting, and feature extraction are used in preprocess. We observe that most documents of which contents are indistinguishable make poor classification results. The traditional solution is to add misclassified documents to the training set in order to adjust classification rules. In this paper, indistinguishable documents are observed to be informative for strengthening prediction performance since their labels are predicted by the current model in low confidence. A general approach is proposed to utilize decision values in SVM to identify indistinguishable documents. Based on verified classification results and distinguishability of documents, four learning strategies that select certain documents to training sets are proposed to improve classification performance. Experiments report that indistinguishable documents are able to be identified in a high probability and are informative for learning strategies. Furthermore, LMID that adds both of misclassified documents and indistinguishable documents to training sets is the most effective learning strategy in SVM classification for large set of Chinese official documents in terms of computing efficiency and classification accuracy.
編號:
提供單位:
相關連結: http://www.sciencedirect.com/science/article/pii/S0957417411013078
下載檔案:
文章分享到
分享文章 到 facebook 分享文章 到 Plurk 分享文章 到 Twitter