Hi there 馃憢

Thanks for dropping by! I’m Arul Murugan, currently a masters student at UC Berkeley. This website is work in progress, you can expect it to be ready in a week or two.

Interpreting and Steering LLM Agents for Social Simulations

We compare prompting, sparse autoencoders, and linear probes for interpreting and controlling LLM agent behavior in social science simulations. SAE-based steering outperforms prompting, offering fine-grained, predictable control over preferences and capabilities.

February 15, 2026 路 Me

Sybil-Resilient Preference Aggregation for RLHF

We present the first formal framework for defending RLHF preference aggregation against sybil attacks. We prove standard Bradley-Terry is not sybil-safe, propose SQ-BT as a defense, and characterize a tight, fundamental safety-liveness tradeoff.

December 20, 2025 路 Me

Probing Circuit Robustness: How Syntactic Form Shapes Neural Circuit Activation in LLMs

We investigate whether neural circuits in LLMs remain stable under semantic-preserving paraphrases. Our central finding: syntactic form, not semantic content, is the primary determinant of circuit activation.

December 15, 2025 路 Me

Theorizing with LLMs

This is the first ever paper that I have contributed to, we did this exactly an year back.

September 6, 2025 路 Me