Chaoxiang Xie

About

Research Interests: Multimodal Learning and Natural Language Processing.

My research interests lie in multimodal learning and natural language processing for software engineering.

Currently, I focus on multimodal coding agents and image-assisted code intelligence, especially how visual interfaces and rendered code images can support code generation, understanding, and reasoning.

I am currently an M.Sc. student in Library and Information Studies at Hohai University, and I also work with the LLM for Software Engineering Lab at Shanghai Jiao Tong University.

I am particularly interested in connecting visual interface understanding, code representation, and end-to-end software implementation.

News

Full CV
  • 2026-04-21 ClassEval-Pro was accepted to AIWare 2026.
  • 2026-04-17 CodeOCR was accepted to ISSTA 2026.
  • 2024 Joined the LLM for Software Engineering Lab at Shanghai Jiao Tong University as a research assistant.
  • 2024 Started the M.Sc. program in Library and Information Studies at Hohai University.
  • 2024 Won the Excellent Work Award at the Intel Mini Hackathon for a fine-tuned LLM project.
Research Interests

Current research interests

My work sits at the intersection of multimodal machine learning, natural language processing, and software engineering.

Multimodal Coding Agent

Investigating agentic frameworks that generate runnable code from UI design images and natural-language functional requirements, bridging visual interface understanding with end-to-end software implementation.

Image-assisted Code Intelligence

Studying how image-based code representations can support code generation, understanding, and reasoning, including fine-tuning multimodal models for more effective visual-assisted code intelligence.

Multimodal Code Understanding

Building and evaluating visual code representations, benchmarks, and experimental pipelines for multimodal LLMs in software engineering tasks.

Publications

Selected publications

Selected publications and ongoing work.

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

Yuling Shi, Chaoxiang Xie, Zhensu Sun, Yeheng Chen, Chenxu Zhang, Longfei Yun, Chengcheng Wan, Hongyu Zhang, David Lo, Xiaodong Gu

Proceedings of ISSTA 2026 · 2026

Studies code-as-image representations for multimodal code understanding and shows how visual encoding can improve efficiency while remaining competitive on downstream tasks.

Conference

ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation

Yeheng Chen*, Chaoxiang Xie*, Yuling Shi, Wenhao Zeng, Yongpan Wang, Hongyu Zhang, Xiaodong Gu

* Equal contribution / co-first authors (Yeheng Chen and Chaoxiang Xie)

Proceedings of AIware 2026, Benchmark & Dataset Track · 2026

Introduces ClassEval-Pro, a benchmark of 300 class-level code generation tasks across 11 domains, built through an automated three-stage pipeline with complexity enhancement, cross-domain class composition, and real-world GitHub code integration. Each task is validated by an LLM Judge Ensemble and test suites with over 90% line coverage. Experiments on five frontier LLMs under five generation strategies show that the best model reaches only 45.6% class-level Pass@1, while error analysis highlights logic and dependency errors as the main bottlenecks.

Conference

Multi-Detector Credibility Fusion Network: A Neural Architecture for Robust Multimodal Review Credibility Assessment

Chaoxiang Xie, Ming Li

International Journal of Intelligent Systems · 2026

Presents MDCFN, a multimodal architecture for robust review credibility assessment across textual, visual, and relational signals.

Under Review
Experience

Research and professional experience

Research-first, with industry experience that informs implementation and systems thinking.

Research

Oct. 2024 - Present Shanghai, China

Research Assistant

LLM for Software Engineering Lab (LLMSE), Shanghai Jiao Tong University

Advisor: Prof. Xiaodong Gu

  • Spearheaded engineering for the CodeOCR experimental pipeline, a systematic study of multimodal LLMs for code understanding.
  • Implemented visual code representation methods that transform source code into rendered images, achieving up to 8x token compression while preserving semantic and structural information.
  • Contributed to ClassEval-Pro, a cross-domain benchmark for class-level code generation, including benchmark construction, evaluation design, and analysis across multiple generation strategies.
  • Conducted evaluations on code completion, clone detection, and class-level code generation tasks, analyzing how model capability, visual cues, and generation strategies affect code understanding and synthesis performance.
  • Investigating multimodal coding agents that generate runnable code from UI design images and natural-language functional requirements.
  • Studying image-assisted code intelligence, including fine-tuning multimodal models to support visual-assisted code generation, understanding, and reasoning.
Jun. 2023 - Present Nanjing, China

Independent Researcher

Institute of Management Science, Hohai University

  • Proposed the Multi-Detector Credibility Fusion Network for detecting sophisticated fake reviews, including human-written and AI-generated content.
  • Designed a hierarchical fusion mechanism combining textual-temporal, visual, and relational graph branches for complementary credibility cues.
  • Built a large annotated multimodal dataset with 33k+ reviews and 50k+ images, including LLM-generated content, and achieved 98.77% accuracy beyond prior baselines.
  • Conducted VLM-based construction-site inspection research for vehicle-washing compliance, using vision-language recognition and chain-of-thought reasoning to localize inspection regions, judge compliance, and structure records for agent-based querying.

Education & Industry

Sep. 2024 - Present Nanjing, China

M.Sc. in Library and Information Studies

Hohai University

  • Focus on data mining with GPA 87/100.
  • Relevant coursework: Business Intelligence Analysis and Mining, Advanced Information Retrieval, Machine Learning Applications.
Sep. 2018 - Jun. 2022 Nanjing, China

B.Sc. in Information Management and Information System

Hohai University

  • Graduated with GPA 85/100.
  • Relevant coursework: Statistics, Database Principles, Data Structures, Information Security, Calculus, Linear Algebra.
Jul. 2022 - Jun. 2024 Shenzhen, China

Software Engineer (Python Backend)

Inspur Morning Cloud Technologies Co., Ltd.

  • Developed and maintained backend modules for an enterprise Human Capital Management system, including data import/export, work alerts, data push, and single sign-on services.
  • Designed multi-model data import/export workflows for concurrent data processing and cross-model data association in enterprise business scenarios.
  • Implemented a tree-traversal export module for complex tables with multi-level headers and structured data relationships.
  • Built a configurable work-alert module with rule-based monitoring, notification targets, message templates, channels, and scheduled delivery.
  • Delivered additional platform features including message withdrawal, data-push conflict prevention, and permission verification during data import; received the R&D Rising Star Award and contributed to one pending patent.
Projects

Selected projects

A few implementation-heavy projects that reflect both experimentation and execution.

Jun. 2024 · Python, PyTorch, LoRA, MLX

Simulating Conversations with Fine-Tuned LLMs

  • Fine-tuned Qwen-7B-Chat with LoRA on a custom Rednote dataset to mimic persona-based conversational styles.
  • Deployed the model for real-time interaction and won the Excellent Work Award at the Intel Mini Hackathon.

Mar. 2022 - Apr. 2022 · Python, Scikit-learn

E-sports Player Style Clustering Analysis

  • Built a pipeline to crawl more than 100 matches of top-ranked players and extract 11 key performance indicators.
  • Applied K-means clustering to categorize playstyles and generate strategic insights for team composition.