Topic: An effective algorithm for big streaming data
Speaker: Zhide Fang, Louisiana State University
Time: Tuesday, December 26, 15:10-16:10
Place: Room 217, Guanghua Building 2
Rapid development in IT and revolutionary monitoring/measuring technologies have made it possible to collect large scale, sequential data in many practical fields. It has been a challenge to efficiently analyze such data to discoverhidden patterns, association/correlation, and trend of changes.
In machine learning and data mining, cost-sensitive learning and sparse online learning are two important research areas. Many algorithms have been proposed for these learnings separately. Generally, an algorithm performs well in one field may be less good in another field. Very few work has been published combining these two fields together.
To tackle the high-dimensional, highly-skewed data streams, we propose a framework of cost-sensitive sparse online learning, which greatly extend the influential Truncated Gradient (TG) method. Byformulating a new convex optimization problem,the framework intends to balance misclassification cost and sparsity, two mutual restraint factors. We will present the theoretical analysis on the bounds of the regret of actions and cost, and the comparison to those of the existing methods. Evaluated on eight real-life streaming,high-dimensional, severely-skewed datasets, the proposed methods outperforms other traditional ones.
Zhide Fang, PhD, is Professor and Director of Biostatistics in the School of Public Health, and a Professor in the Department of Genetic, School of Medicine, Louisiana State University Health Sciences Center at New Orleans. He is also a Statistician in Louisiana Clinical & Translational Science Center, funded by NIH.
Dr. Fang’s research interests encompass statistical theory and applications in different areas. He has made contributions to the theory of design of experiments for heteroscedastic linear models, dose-response toxicity models, and wavelet regression models. His contributions to the Theory of System Reliability in Engineering include providing algorithms to evaluate system state distribution of certain consecutive-k-out-of-n systems. Dr. Fang has made contributions to Bioinformatics, developing pipelines and statistical methodologies for analysis of high-throughput genomics and metagenomics data, generated via microarrays or next-generation sequencing technologies. Developing algorithms for high-dimensional, highly skewed big streaming data is his new research interest.
Your participation is warmly welcomed!