Can we align language models using observational data instead of costly experiments like A/B tests? In my latest research, titled "Aligning Language Models with Observational Data: Opportunities and Risks from a Causal Perspective," we show that the answer is yes, but you should be careful about causality.
Language models (LLMs) are increasingly used to generate content that drives business outcomes, such as click-through rates, engagement, and medical adherence. But these models aren't aligned with such goals out of the box. Fine-tuning with experimental data (e.g., A/B tests) helps, but it's costly and comes with significant engineering and logistical challenges. Meanwhile, most firms already have massive amounts of historical (observational) data. We find that historical data can be used instead, but it requires careful attention to causality.
Want to know more? Check out the project webpage at deconfoundlm.github.io or read the paper on arXiv.