My Research

Current Work

Enhance RL with Noisy LLM Feedback

[EMNLP 2024]

Navigating Noisy Feedback: Enhancing Reinforcement Learning with Error-Prone Language Models

Muhan Lin, Shuyang Shi, Yue Guo, Behdad Chalaki, Vaishnav Tadiparthi, Ehsan Moradi Pari, Simon Stepputtis, Joseph Campbell, Katia Sycara

[RLBrew Workshop, RLC 2024]

A Reward Analysis of Reinforcement Learning from Large Language Model Feedback

Muhan Lin, Shuyang Shi, Yue Guo, Behdad Chalaki, Vaishnav Tadiparthi, Simon Stepputtis, Joseph Campbell, Katia Sycara

Reward Design Reinforcement Learning LLM

Extended RL from LLM Feedback to work well with small language models which may produce incorrect rankings:

Proposed a potential-based reward specification method to scale down rewards based on LLM feedback uncertainty.
Theoretically and empirically showed that uncertain LLM feedback leads to uninformative potential-based rewards, avoiding misleading rewards and enabling efficient training even with small error-prone LLMs.

Paper / Video / Code

AI-based Credit Assignment

Speaking the Language of Teamwork: LLM-Guided Credit Assignment in Multi-Agent Reinforcement Learning

Muhan Lin, Shuyang Shi, Yue Guo, Vaishnav Tadiparthi, Behdad Chalaki, Ehsan Moradi Pari, Simon Stepputtis, Woojun Kim, Joseph Campbell, Katia Sycara

In submission

Multi-agent Reinforcement Learning Reward Design LLM

Extended the error-tolerant RLAIF to multiagent collaboration tasks for credit assignment, decomposing team rewards to individual ones with LLM feedback.

Action Advising for Efficient RL in Ad Hoc Teaming

Enhancing Multi-Agent Teaming Efficiency Through Experience-Based Action Advising

Shuyang Shi, Yue Guo, Muhan Lin, Vaishnav Tadiparthi, Behdad Chalaki, Ehsan Moradi Pari, Simon Stepputtis, Woojun Kim, Joseph Campbell, Katia Sycara

In submission

Transfer Learning Multi-agent Reinforcement Learning

Accelerated multiagent RL by enabling agents with different levels of prior competency to cooperate effectively in solving a shared task:

Using a multi-armed bandit algorithm, the most experienced agent evaluated teammates' skills, assigned suitable sub-tasks, and adapted agents to these tasks via a policy advising algorithm, improving team learning and task efficiency.

Creative Mobile Robot Wall Painting

ROS Mobile Manipulator VLM

Implemented a mobile manipulator navigation framework for large-scale wall-painting. Integrated VLM models to enable image-to-stroke creative painting.

Past Work

Addressing Unexpected Delays in Multi-agent Path Plan Execution

[AAAI 2025 (Oral)]

Speedup Techniques for Switchable Temporal Plan Graph Optimization

He Jiang, Muhan Lin, Jiaoyang Li

Multi-agent Systems Path Planning

Optimizing a Switchable Temporal Plan Graph (STPG) is a common approach to handling unexpected delays in multi-agent path plan execution. We proposed acceleration techniques for this procedure, including stronger admissible heuristics, edge grouping, prioritized branching and incremental implementation.

Achieved over 100% improvement in STPG reconstruction success rates within 0.5 seconds across maps with varying agent densities, outperforming Mixed-Integer Linear Programming and naive graph-based methods.

Paper / Code

Multi-robot Relative Pose Estimation with Distance-measuring Units and Odometry Information

[Transactions on Mechatronics (TMECH) 2023]

Asymptotically Efficient Estimators for Range‑based Robot Relative Localization

Yue Wang, Muhan Lin, Xinyi Xie, Yuan Gao, Fuqin Deng, Tin Lun Lam

Multi-agent Systems Pose Estimation

Enhanced localization accuracy in multiagent systems, where robots communicated their odometry trajectories and relative distance with each other and then estimated their initial relative poses. Reduced the estimation residuals to the Cramer-Rao Lower Bound.

Paper / Video

Active Feature Selection for Pose Estimation of Visual Odometry

[RISS Journal 2022]

Less Is More: A Robust Visual Inertial Odometry with Active Feature Selection

Muhan Lin, Shibo Zhao, Sebastian Scherer

Computer Vision SLAM

Developed a quality measurement metric to select visual feature points for the multi-spectral odometry:

Improved the accuracy of visual odometry pose estimation and efficiency (13.036% of the original residuals, 58.832% of the original CPU usage) with filtered visual feature points in dark environments.

Paper