Using Unlabeled Data to Improve Inductive Models
by Incorporating Transductive Models

ShengJun Cheng; Jiafeng Liu; XiangLong Tang

doi:10.14569/IJARAI.2014.030207

DOI: 10.14569/IJARAI.2014.030207

PDF

Using Unlabeled Data to Improve Inductive Models by Incorporating Transductive Models

Author 1: ShengJun Cheng

Author 2: Jiafeng Liu

Author 3: XiangLong Tang

International Journal of Advanced Research in Artificial Intelligence(IJARAI), Volume 3 Issue 2, 2014.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: This paper shows how to use labeled and unlabeled data to improve inductive models with the help of transductivemodels.We proposed a solution for the self-training scenario. Self- training is an effective semi-supervised wrapper method which can generalize any type of supervised inductive model to the semi-supervised settings. it iteratively refines a inductive model by bootstrap from unlabeled data. Standard self-training uses the classifier model(trained on labeled examples) to label and select candidates from the unlabeled training set, which may be problematic since the initial classifier may not be able to provide highly confident predictions as labeled training data is always rare. As a result, it could always suffer from introducing too much wrongly labeled candidates to the labeled training set, which may severely degrades performance. To tackle this problem, we propose a novel self-training style algorithm which incorporate a graph-based transductive model in the self-labeling process. Unlike standard self-training, our algorithm utilizes labeled and unlabeled data as a whole to label and select unlabeled examples for training set augmentation. A robust transductive model based on graph markov random walk is proposed, which exploits manifold assumption to output reliable predictions on unlabeled data using noisy labeled examples. The proposed algorithm can greatly minimize the risk of performance degradation due to accumulated noise in the training set. Experiments show that the proposed algorithm can effectively utilize unlabeled data to improve classification performance.

Keywords: Inductive model, Transductive model, Semi- supervised learning, Markov random walk

ShengJun Cheng, Jiafeng Liu and XiangLong Tang, “Using Unlabeled Data to Improve Inductive Models by Incorporating Transductive Models” International Journal of Advanced Research in Artificial Intelligence(IJARAI), 3(2), 2014. http://dx.doi.org/10.14569/IJARAI.2014.030207

@article{Cheng2014,
title = {Using Unlabeled Data to Improve Inductive Models by Incorporating Transductive Models},
journal = {International Journal of Advanced Research in Artificial Intelligence},
doi = {10.14569/IJARAI.2014.030207},
url = {http://dx.doi.org/10.14569/IJARAI.2014.030207},
year = {2014},
publisher = {The Science and Information Organization},
volume = {3},
number = {2},
author = {ShengJun Cheng and Jiafeng Liu and XiangLong Tang}
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

Using Unlabeled Data to Improve Inductive Models by Incorporating Transductive Models

Upcoming Conferences