Interpreting and Steering LLM Agents for Social Simulations
February 15, 2026 · Me
Sybil-Resilient Preference Aggregation for RLHF
December 20, 2025 · Me
Probing Circuit Robustness: How Syntactic Form Shapes Neural Circuit Activation in LLMs
December 15, 2025 · Me
Theorizing with LLMs
September 6, 2025 · Me