报告题目：Big Data Analysis and Mining in Microbiome
报告人：Prof. Xiaohua Tony Hu, Drexel University
We know little about the microbial world. Microbiome sequencing (i.e. metagenome, 16s rRNA) extracts DNA directly from a microbial environment without culturing anyspecies. Recently, huge amount of data are generated from many micorbiome projects such as Human Microbiome Project (HMP), Metagenomics of the Human Intestinal Tract (MetaHIT), et al. Analyzing these data will help us to better understand the function and structure of microbial community of human body, earth and other environmental eco-systems. However, the huge data volume, the complexity of microbial community and the intricate data properties have created a lot of opportunities and challenges for data analysis and mining. For example, it is estimate that in the microbial eco-system of human gut, there are about 1000 kinds of bacteria with 10 billion bacteria and more than 4 million genes in more than 6000 orthologous gene family. The challenges are due to the complex properties of microbiome: large-scale, complicated, diversity, correlation, composition, hierarchy, incompleteness etc. Current microbiomes data analysis methods seldom consider these data properties and often make some assumptions such as linear, Euclidean space, metric-space, continue data type, which conflict with the true data properties. For example, some similarities are non-metric because the prevalent existence of some species; and the interactions among species and environment are complex in high order. Thus it is urgent to develop novel computational methods to overcome these assumptions and consider the microbiome data properties in the analysis procedure. In this talk, we will discuss some computational methods to analyze and visualize microbiome big data. Our studies are focusing on 1) novel machine learning and computational technologies for dimension reduction and visualization of microbiome data based on non-Euclidean spaces (manifold learning) to discover nonlinear intrinsic features and patterns in these data to overcome the linear assumptions, 2) novel statistical methods for variable selection in microbiome data by integrating group information among variables.
个人简介：Xiaohua Tony Hu is a full professor and the founding director of the data mining and bioinformatics lab at the College of Computing and Informatics. He is also serving as the founding Co-Director of the NSF Center (I/U CRC) on Visual and Decision Informatics (NSF CVDI), IEEE Computer Society Bioinformatics and Biomedicine Steering Committee Chair, and IEEE Computer Society Big Data Steering Committee Chair. He joined Drexel University in 2002. He founded theInternational Journal of Data Mining and Bioinformatics (SCI indexed) in 2006. Earlier, he worked as a research scientist in the world-leading R&D centers such as Nortel Research Center, and Verizon Lab (the former GTE labs). In 2001, he founded the DMW Software in Silicon Valley, California. He has a lot of experience and expertise to convert original ideas into research prototypes, and eventually into commercial products, many of his research ideas have been integrated into commercial products and applications in data mining fraud detection, database marketing.
Tony’s current research interests are in data/text/web mining, big data, bioinformatics, information retrieval and information extraction, social network analysis, healthcare informatics, rough set theory and application. He has published more than 270 peer-reviewed research papers in various journals, conferences and books He has obtained more thanUS$8.5 million research grants in the past 10 years as PI or Co-PI (PIs of 9 NSF grants). He has graduated 19 Ph.D. students from 2006 to 2017 and is currently supervising 9 Ph.D. students.