Brian R.Y. Huang

prof_pic.jpg

I’m a recent graduate from MIT with Bachelor’s degrees in Math & CS as well as a Master’s in CS. I’m broadly interested in methods and evaluations that may engender more helpful, equitable, and harm-reducing outcomes for all stakeholders of frontier AI systems.

At MIT, I was fortunate to conduct research on adversarial robustness of deep learning models, advised by Hadi Salman and Aleksander Mądry. I was also a teaching assistant for MIT’s flagship graduate-level machine learning class (6.867, now 6.7900) and for the statistical data analysis class (6.3720/3722). Before undergrad, I researched black hole formation in general relativity with Marcus Khuri at Stony Brook University. I’ve also worked on causal mediation analysis for mechanistic interpretability as a research intern at Redwood Research. I’m currently working on interesting perception problems as a research engineer at Matic. Post-graduation, I’ve continued my involvement in various research collaborations, studying factuality, adversarial inputs, and other phenomena in language models.

I’m always open to nuanced discussions about safe, equitable, and rigorously guardrailed deployment of AI systems, as well as the challenges in tech policy and “science of language models” that we must address along the way. I’d also be excited to collaborate on research along the lines of, but not limited to, my research directions here. if our research interests align, feel free to reach out at branhung (at) alum (dot) mit (dot) edu!

selected works

  1. Does It Know?: Probing and Benchmarking Uncertainty in Language Model Latent Beliefs
    Brian RY Huang, and Joe Kwon
    ATTRIB Workshop @ NeurIPS, 2023
    We extend the recent work of Contrast-Consistent Search by Burns et al., 2023, to detect uncertainty in the factual beliefs of language models. We create a toy dataset of timestamped news factoids as a true/false/uncertain classification benchmark for LLMs with a known training cutoff date.
  2. learnedarch.png
    Adversarial Learned Soups: Neural Network Averaging for Joint Clean and Robust Performance
    Brian RY Huang
    Master’s Thesis, 2023
    Supervised by Hadi Salman and Aleksander Mądry.
    We introduce weight-space interpolation methods to the adversarial robustness regime, devising a wrapper architecture to optimize the interpolation coefficients of a "model soup" via adversarial training. Varying the intensity of adversarial training (perturbation distance, TRADES weightings, etc.) leads to a smooth tradeoff between the resulting clean and robust accuracy of the interpolated model.