Publications
2026
- Agentified Agent Assessment Improves Standardization Across Heterogeneous Scenarios2026Under review
- Agents’ Last Exam2026Under review
- FaultLoc: Evaluating AI Coding Agents for Fault Localization from Crash to Cause2026Under review
- CyberCycle: A Scalable Real-World Benchmark for AI Agents’ End-to-End Cybersecurity CapabilitiesIn Proceedings of the International Conference on Machine Learning, 2026
- FICO: Evaluating Vision-Language Models under Visual Fidelity and Compression at ScaleIn Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2026
2025
-
MLAN: Language-Based Instruction Tuning Preserves and Transfers Knowledge in Multimodal Language ModelsIn Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM), Aug 2025 -
Predicting Task Performance with Context-aware Scaling LawsIn Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM), Aug 2025 -
A Comprehensive Survey of Evaluating Multimodal Foundation Models: Hierarchical Perspective and Extensive ApplicationsMay 2025Under review at ARR
2024
-
⭐ Failure in a population: Tauopathy disrupts homeostatic set-points in emergent dynamics despite stability in the constituent neuronsNeuron, May 2024Cover Paper -
Instruction-aware Visual Feature Extraction for Multimodal Large Language ModelDec 2024Preprint -