Jie He

Ph.D. Student ยท University of Edinburgh

I am a fourth-year Ph.D. student at the University of Edinburgh, fortunately advised by Prof. Jeff Z. Pan and Prof. Victor Gutierrez Basulto. My research is generously supported by Edinburgh Doctoral College and the 2024 Apple Scholars in AI/ML Fellowship.

Before joining UoE, I worked at TJU-NLP since Sept. 2019, advised by Prof. Deyi Xiong and Prof. Qun Liu. I also interned under Prof. Pasquale Minervini in Fall 2021.

Jie He

๐Ÿ”ฌ Research Interests

๐Ÿ”— Augmentation

Investigating the role of knowledge in the era of LLMs, and how external knowledge sources can effectively complement parametric knowledge.

๐Ÿง  Reasoning / Agents

Enhancing LLMs' ability to leverage external knowledge and novel tools through supervised fine-tuning (SFT) and reinforcement learning.

โšก Efficiency

Designing compression methods that reduce the length of external inputs, enabling models to improve efficiency without sacrificing performance.

๐Ÿ–ผ๏ธ Multimodality

Studying efficient training strategies for multimodal LLMs and analyzing their reasoning deficiencies.

In addition, I am interested in LLM evaluation and benchmark construction. I also proposed CLARA, one of the earliest open-source frameworks that jointly learns retrieval and reasoning through latent vectors โ€” receiving 900+ GitHub stars.

๐ŸŽ“ Education

University of Edinburgh
2022 โ€“ Present
Ph.D. in Computer Science (ILCC) ยท Edinburgh, UK
Advisors: Prof. Jeff Z. Pan & Prof. Victor Gutierrez Basulto ยท School of Informatics
Tianjin University
2019 โ€“ 2022
M.S. in Computer Science ยท Tianjin, China
Advisor: Prof. Deyi Xiong ยท TJU-NLP Lab
Shandong University
2015 โ€“ 2019
B.S. in Software Engineering ยท Shandong, China
Software College

๐Ÿ’ผ Work Experience

NVIDIA

Research Intern ยท NVIDIA Nemo Retriever Team
๐Ÿ“Œ Train a SOTA embedding model for reasoning agentic retrieval.
Oct 2025 โ€“ Feb 2026 ยท Remote  |  Mentors: Yauhen Babakhin, Ronay Ak

Apple

Research Intern ยท Apple MLR
๐Ÿ“Œ Improve the retrieval and generation efficiency of LLM with latent vectors.
Jun 2025 โ€“ Sep 2025 ยท Seattle, US  |  Mentor: Dr. Yizhe Zhang

Microsoft

Research Intern ยท OAR & MSR
๐Ÿ“Œ Enhancing Tool Generalization in LLMs.
Jul 2024 โ€“ Oct 2024 ยท Redmond, US  |  Mentors: Pei Zhou, Longqi Yang, Jennifer Neville

Apple

Research Intern ยท Apple AI/ML
๐Ÿ“Œ Investigating rationale generation for knowledge-based QA without training.
Feb 2024 โ€“ May 2024 ยท Seattle, US  |  Mentors: Yiwen Sun, Benjamin Han

๐Ÿ“ Publications * indicates equal contribution

GenTool

GenTool: Enhancing Tool Generalization in Language Models through Zero-to-One and Weak-to-Strong Simulation

Jie He, Jennifer Neville, Mengting Wan, Longqi Yang, Hui Liu, Xiaofeng Xu, Xia Song, Jeff Z. Pan, Pei Zhou
Findings of ACL 2025
Large Language Models (LLMs) can enhance their capabilities as AI assistants by integrating external tools, allowing them to access a wider range of information. We present GenTool, a novel training framework addressing two dimensions: Zero-to-One Generalization and Weak-to-Strong Generalization. Through extensive experiments across four generalization scenarios, our method significantly enhances tool-usage capabilities of LLMs ranging from 1B to 8B parameters, achieving performance that surpasses GPT-4o.
KG2Text

Exploring Knowledge Graph to Text Generation with Large Language Models: Techniques, Challenges, and Innovations

Jie He*, Yijun Yang*, Wanqiu Long, Victor Gutierrez Basulto, Jeff Z. Pan
NAACL 2025
We conduct a comprehensive evaluation of prompting open-source LLMs on graph-to-text generation tasks. We propose a novel diversity-difficulty-based few-shot sample selection method and introduce PlanGTG, a new graph-to-text dataset annotated with reordering and attribution sub-tasks. Extensive evaluations demonstrate significant improvements in generated text quality from both few-shot and fine-tuning perspectives.
MiCEval

MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps

Xiongtao Zhou*, Jie He*, Lanyu Chen, Jingyu Li, Haojing Chen, Victor Gutierrez Basulto, Jeff Z. Pan, Hanjie Chen
NAACL 2025
We propose MiCEval, a framework to assess the correctness of reasoning chains in Multimodal Chain of Thought (MCoT) by evaluating the quality of both the description and each reasoning step. MiCEval is built upon a fine-grained dataset with annotations rating each step according to correctness, relevance, and informativeness. Experiments on four MLLMs show step-wise evaluations align more closely with human judgments.
COLM2024

Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models

Simon Yu*, Jie He*, Pasquale Minervini, Jeff Z. Pan
COLM 2024
We study the robustness of retrieval-augmented ICL against adversarial attacks. Retrieval-augmented models enhance robustness against test sample attacks (4.87% ASR reduction) but exhibit overconfidence in demonstrations (2% ASR increase). We introduce DARD, a training-free adversarial defence method, achieving a 15% reduction in ASR over baselines.
ACL2024

An Empirical Study on Parameter-Efficient Fine-Tuning for Multimodal Large Language Models

Xiongtao Zhou*, Jie He*, Yuhua Ke, Guangyao Zhu, Victor Gutierrez Basulto, Jeff Z. Pan
Findings of ACL 2024
We study Parameter-Efficient Fine-Tuning (PEFT) methods for MLLMs, evaluating four PEFT methods on seven datasets. We show that the adapter is the best-performing PEFT method, and fine-tuning connector layers leads to improved performance in most MLLMs.
BUCA

BUCA: A Binary Classification Approach to Unsupervised Commonsense Question Answering

Jie He, Simon Yu, Victor Gutierrez Basulto, Jeff Z. Pan
ACL 2023
We propose to transform the downstream multiple choice QA task into a simpler binary classification task by ranking all candidate answers according to their reasonableness. We convert knowledge graph triples into reasonable and unreasonable texts for training. Extensive results show effectiveness on various QA benchmarks, with our approach being less data hungry than existing methods.
EMNLP2023

Instances and Labels: Hierarchy-aware Joint Supervised Contrastive Learning for Hierarchical Multi-Label Text Classification

Simon Yu*, Jie He*, Victor Gutierrez Basulto, Jeff Z. Pan
Findings of EMNLP 2023
ACL2021

TGEA: An Error-Annotated Dataset and Benchmark Tasks for Text Generation from Pretrained Language Models

Jie He*, Bo Peng*, Yi Liao, Qun Liu, Deyi Xiong
ACL 2021
We propose TGEA, using carefully selected prompts to guide GPT-2 to generate candidate sentences, selecting 47K for error annotation. We create an error taxonomy covering 24 types of errors and propose a series of automatic diagnosis tasks including error detection, error type classification, associated span detection, and error rationale generation.
EMNLP2020

The Box is in the Pen: Evaluating Commonsense Reasoning in Neural Machine Translation

Jie He*, Tao Wang*, Deyi Xiong, Qun Liu
Findings of EMNLP 2020
We present a test suite to evaluate the commonsense reasoning capability of neural machine translation, covering lexical and syntactic ambiguity that requires commonsense knowledge to resolve. We manually create 1,200 triples and demonstrate that neural machine translation performs poorly on commonsense reasoning.

๐Ÿ† Honors & Awards

๐ŸŽ
2024 โ€“ 2026
๐ŸŽ“
PhD Scholarship, Edinburgh Doctoral College, University of Edinburgh
Oct 2022 โ€“ Sep 2025

๐Ÿค Academic Services

Reviewer

AACL 2022, 2023 SIGDIAL 2023 EMNLP 2023 EACL 2024 NAACL 2024 ACL 2024 ARR 2023.10โ€“now COLM 2024 ICLR 2025 COLING 2025

Area Chair

EMNLP 2024 ARR 2024.06โ€“now

Teaching

Tutor ยท Knowledge Graph ยท University of Edinburgh ยท Fall 2023

๐Ÿ˜Š About Me

I love philosophy. My favorite philosopher is Nietzsche, and I also enjoy reading Kant's Critique of Pure Reason. I hope to explore more studies combining linguistics and philosophy โ€” such as Wittgenstein's language games โ€” in the future.

๐Ÿ’ฌ I am open to academic collaborations. Please drop me an email if you are interested in working with me!