Multimodal Coding Agent
Investigating agentic frameworks that generate runnable code from UI design images and natural-language functional requirements, bridging visual interface understanding with end-to-end software implementation.
About
My research interests lie in multimodal learning and natural language processing for software engineering.
Currently, I focus on multimodal coding agents and image-assisted code intelligence, especially how visual interfaces and rendered code images can support code generation, understanding, and reasoning.
I am currently an M.Sc. student in Library and Information Studies at Hohai University, and I also work with the LLM for Software Engineering Lab at Shanghai Jiao Tong University.
I am particularly interested in connecting visual interface understanding, code representation, and end-to-end software implementation.
My work sits at the intersection of multimodal machine learning, natural language processing, and software engineering.
Investigating agentic frameworks that generate runnable code from UI design images and natural-language functional requirements, bridging visual interface understanding with end-to-end software implementation.
Studying how image-based code representations can support code generation, understanding, and reasoning, including fine-tuning multimodal models for more effective visual-assisted code intelligence.
Building and evaluating visual code representations, benchmarks, and experimental pipelines for multimodal LLMs in software engineering tasks.
Selected publications and ongoing work.
Proceedings of ISSTA 2026 · 2026
Studies code-as-image representations for multimodal code understanding and shows how visual encoding can improve efficiency while remaining competitive on downstream tasks.
Proceedings of AIware 2026, Benchmark & Dataset Track · 2026
Introduces ClassEval-Pro, a benchmark of 300 class-level code generation tasks across 11 domains, built through an automated three-stage pipeline with complexity enhancement, cross-domain class composition, and real-world GitHub code integration. Each task is validated by an LLM Judge Ensemble and test suites with over 90% line coverage. Experiments on five frontier LLMs under five generation strategies show that the best model reaches only 45.6% class-level Pass@1, while error analysis highlights logic and dependency errors as the main bottlenecks.
International Journal of Intelligent Systems · 2026
Presents MDCFN, a multimodal architecture for robust review credibility assessment across textual, visual, and relational signals.
Research-first, with industry experience that informs implementation and systems thinking.
LLM for Software Engineering Lab (LLMSE), Shanghai Jiao Tong University
Advisor: Prof. Xiaodong Gu
Institute of Management Science, Hohai University
Hohai University
Hohai University
Inspur Morning Cloud Technologies Co., Ltd.
A few implementation-heavy projects that reflect both experimentation and execution.