Principal Investigator

Sumanta Basu
I am broadly interested in developing statistical machine learning methods for structure learning and prediction of complex, high-dimensional systems arising in biological and social sciences. My current research focuses on:
Methods: Network modeling of high-dimensional time series and detecting high-order interactions using randomized tree ensembles.
Interdisciplinary Applications: Collaborations in prostate cancer progression, large-scale metabolomics, and systemic risk monitoring in financial markets.
Postdocs
Younghoon Kim
Younghoon is a postdoctoral associate jointly affiliated with the Department of Statistics and Data Science at Cornell University and the Department of Population Health Sciences at Weill Cornell Medicine. Younghoon received his doctoral degree in Statistics and Operations Research at the University of North Carolina at Chapel Hill. His ongoing research as a postdoc includes modeling and inference for high-dimensional graphical time series, numerical optimizations for robust statistics, and machine learning applications to mental health.
Keywords: high-dimensional, time series, network, optimization, machine learning, brain connectome, mental health
PhD Students

Steve Broll
Steve is a fifth-year PhD candidate in Statistics interested broadly in high-dimensional and tensor modeling with applications to biomedical sciences. Currently, he is working on penalized models for longitudinal -omics variables with, large p, small n,t and equipped with a clinical outcome of interest, with applications to Tuberculosis metabolomics and precision nutrition.
Keywords: high-dimensional, lasso, group lasso, network, longitudinal, omics, tensors

Navonil Deb
Navonil is a fourth year PhD student in the Department of Statistics and Data Science at Cornell University. Prior to his doctoral studies, Navonil completed my Bachelor’s and Master’s degrees in Statistics from Indian Statistical Institute Kolkata. He is interested in developing statistical methods to understand complex dependence structure in data arising in biology, neuroscience, finance and other disciplines. His methods involve statistical machine learning approaches involving time series methods, optimization techniques, causal inference, graphical models, high dimensional methods and spectral domain time series. Currently Navonil is actively working in two areas: (1) developing fast and implementable coordinate-descent based algorithms for handling statistical optimization problems in Fourier domain, (2) counterfactual estimation and forecast techniques for longitudinal data with dynamic latent structures.
Keywords: machine learning, statistical optimization, time series, causal inference, graphical models, high dimensional

Hao Xue
Hao is a fourth-year PhD student in Computational Biology. His research mainly focus on developing statistical/machine learning methods for analyzing biomedical informatics data (e.g. genetics, genomics, Electronic Health Records). Currently, Hao is working on environmental selection of flies under high sugar diet. Previously, he developed methods for learning temporal embeddings from Electronic Health Records and selecting multi-omics time-series data with similar patterns.
Keywords: multi-omics, Electronic Health Records, biostatistics, time series

Sanghee Kim

Ha Nguyen
Ha is a third-year PhD student in Statistics and Data Science. Her research lies broadly in high-dimensional statistics, graphical models and matrix modeling. Currently, she is working on 1. developing automatic approach to variable-specific tuning for high-dimensional Gaussian graphical models, with applications to fMRI-based functional connectivity, and 2. spline-based shape representations from 2D coordinate data to model lower-torso curvature in women for computational apparel design.
Keywords: high-dimensional, graphical lasso, gaussian graphical models, fMRI
Livia Popa
Livia is a third year PhD student in Statistics, interested in Bayesian time series and estimating dynamical systems using time series and machine learning methods. Currently she is working on developing a hybrid approach to estimate noisy dynamical systems using classical time series and types of recurrent neural networks.
Keywords: Bayesian, high dimensional, time series, machine learning
Minjie Jia
Alumni
Sara Venkatraman
Sara is a postdoctoral research associate in Statistics at the Center for Global Health at Weill Cornell Medicine in New York City. She received her PhD from the Department of Statistics and Data Science at Cornell in 2024. Sara’s research is broadly in statistical methods for studying nonlinear dynamics, with a focus on estimating systems of differential equations from time series data, modeling biological rhythms, and spatiotemporal analysis of infectious diseases.
Keywords: dynamical systems, differential equations, spatiotemporal modeling, epidemiology, time series

