People

Principal Investigator

Sumanta Basu

I am broadly interested in developing statistical machine learning methods for structure learning and prediction of complex, high-dimensional systems arising in biological and social sciences. My current research focuses on:

Methods: Network modeling of high-dimensional time series and detecting high-order interactions using randomized tree ensembles.

Interdisciplinary Applications: Collaborations in prostate cancer progression, large-scale metabolomics, and systemic risk monitoring in financial markets.

 

Postdocs

Younghoon Kim

Younghoon is a postdoctoral associate jointly affiliated with the Department of Statistics and Data Science at Cornell University and the Department of Population Health Sciences at Weill Cornell Medicine. Younghoon received his doctoral degree in Statistics and Operations Research at the University of North Carolina at Chapel Hill. His ongoing research as a postdoc includes modeling and inference for high-dimensional graphical time series, numerical optimizations for robust statistics, and machine learning applications to mental health.
Keywords: high-dimensional, time series, network, optimization, machine learning, brain connectome, mental health

PhD Students

Steve Broll

Steve is a fifth-year PhD candidate in Statistics interested broadly in high-dimensional and tensor modeling with applications to biomedical sciences. Currently, he is working on penalized models for longitudinal -omics variables with, large p, small n,t and equipped with a clinical outcome of interest, with applications to Tuberculosis metabolomics and precision nutrition.
Keywords: high-dimensional, lasso, group lasso, network, longitudinal, omics, tensors

Navonil Deb

Navonil is a fourth year PhD student in the Department of Statistics and Data Science at Cornell University. Prior to his doctoral studies, Navonil completed my Bachelor’s and Master’s degrees in Statistics from Indian Statistical Institute Kolkata. He is interested in developing statistical methods to understand complex dependence structure in data arising in biology, neuroscience, finance and other disciplines. His methods involve statistical machine learning approaches involving time series methods, optimization techniques, causal inference, graphical models, high dimensional methods and spectral domain time series. Currently Navonil is actively working in two areas: (1) developing fast and implementable coordinate-descent based algorithms for handling statistical optimization problems in Fourier domain, (2) counterfactual estimation and forecast techniques for longitudinal data with dynamic latent structures.
Keywords: machine learning, statistical optimization, time series, causal inference, graphical models, high dimensional

 

Hao Xue

Hao is a fourth-year PhD student in Computational Biology. His research mainly focus on developing statistical/machine learning methods for analyzing biomedical informatics data (e.g. genetics, genomics, Electronic Health Records). Currently, Hao is working on environmental selection of flies under high sugar diet. Previously, he developed methods for learning temporal embeddings from Electronic Health Records and selecting multi-omics time-series data with similar patterns.
Keywords: multi-omics, Electronic Health Records, biostatistics, time series

 

Sanghee Kim

Sanghee is a third-year PhD student in Statistics and Data Science. Her research interest lies in high-dimensional statistics, time-series analysis, and financial machine learning. Sanghee is currently working on 1. developing efficient algorithms for penalized quantile regression and 2. exploring new method for market behavior prediction using microstructure measures.
Keywords : quantile regression, high-dimensional, optimization, machine learning, econometrics

Ha Nguyen

Ha is a third-year PhD student in Statistics and Data Science. Her research lies broadly in high-dimensional statistics, graphical models and matrix modeling. Currently, she is working on 1. developing automatic approach to variable-specific tuning for high-dimensional Gaussian graphical models, with applications to fMRI-based functional connectivity, and 2. spline-based shape representations from 2D coordinate data to model lower-torso curvature in women for computational apparel design.
Keywords: high-dimensional, graphical lasso, gaussian graphical models, fMRI

Livia Popa

Livia is a third year PhD student in Statistics, interested in Bayesian time series and estimating dynamical systems using time series and machine learning methods. Currently she is working on developing a hybrid approach to estimate noisy dynamical systems using classical time series and types of recurrent neural networks.
Keywords: Bayesian, high dimensional, time series, machine learning

Minjie Jia

Alumni

Sara Venkatraman

Sara is a postdoctoral research associate in Statistics at the Center for Global Health at Weill Cornell Medicine in New York City. She received her PhD from the Department of Statistics and Data Science at Cornell in 2024. Sara’s research is broadly in statistical methods for studying nonlinear dynamics, with a focus on estimating systems of differential equations from time series data, modeling biological rhythms, and spatiotemporal analysis of infectious diseases.
Keywords: dynamical systems, differential equations, spatiotemporal modeling, epidemiology, time series