My research develops vertically integrated inference systems for complex scientific data. By vertical integration, I mean designing inference architectures that maintain explicit control over the full generative chain: from physical source models and instrument response, through likelihood construction and noise modeling, to hierarchical population inference and scientific interpretation. When the full data-generating process is made explicit, operations can be performed at the level where they are well-defined, uncertainty propagates coherently, and historical approximations can be replaced with modern scalable algorithms.
Most scientific analysis pipelines evolve organically and become locally optimized: each component works in isolation, but the global structure is often fragmented. Derived quantities are treated as primary observables, and approximations compound across steps. My work starts from the opposite direction: first map out the full statistical data-generating process — the generative model — and then build inference as its principled inverse.
This perspective was shaped by six years as a senior data scientist at Microsoft, where vertical integration was not philosophy but operational practice. Each engagement began by understanding the full data-generating process — from telemetry and logging pipelines to production deployment. At petabyte scale, small statistical inconsistencies amplify and approximate reasoning that "works" in a prototype can silently fail in production. When I returned to academia, I brought this architectural discipline back to astrophysics.
The clearest demonstration of this approach is MetaPulsar, a likelihood-level data combination framework for pulsar timing arrays. Rather than reconciling timing residuals — derived quantities — across collaborations, MetaPulsar operates at the likelihood level, preserving dataset-specific structure while maintaining global statistical consistency. This replaced a years-long manual process with a well-defined generative procedure. The framework was used by Yu & Allen (2026) in the first five-PTA stochastic background search based on public datasets.
Pulsar timing arrays use networks of millisecond pulsars as a galaxy-scale gravitational-wave detector, sensitive to nanohertz-frequency signals from supermassive black hole binaries and other cosmological sources. The data analysis challenges are substantial: complex correlated noise, high-dimensional parameter spaces, subtle signal-to-noise regimes, and datasets produced by multiple telescopes and collaborations worldwide.
My work in this area spans the foundational methods that underpin modern PTA science:
I am a member of the NANOGrav collaboration and the International Pulsar Timing Array, where I contribute to data analysis methodology and have played a role in key results including the 2023 evidence for a gravitational-wave background.
A recurring theme in my work is the design of inference algorithms that scale to the complexity of real scientific problems. This includes:
Good inference requires more than good theory — it requires the full chain from model design to implementation to be coherent. I am interested in numerical methods, efficient software, and computational strategies that allow sophisticated models to be applied to real datasets at scale.
Between 2016 and 2021, I worked at Microsoft on machine-learning problems including language models, text analysis, classification, and reinforcement learning. That period deepened my understanding of scalable modeling, production-quality software, and the operational discipline needed to make complex methods work reliably.
I am committed to bringing the best ideas from statistics, machine learning, and scientific computing into domain science in a way that is technically serious and scientifically meaningful. This includes both specific methodological contributions — such as embedding modern ML within principled generative frameworks — and broader efforts in mentoring, collaboration across fields, and developing shared methodological infrastructure.
Machine learning offers transformative opportunities for astronomy, but predictive performance alone is not sufficient in the exact sciences. Models must connect back to the underlying physics and provide controlled uncertainty quantification. My aim is to ensure that AI and ML methods are deployed as structured statistical tools within coherent generative models, rather than as black-box substitutes for reasoning.
For a full list of publications, see Publications, Google Scholar, or NASA/ADS.