ROME, an autonomous coding agent powered by Alibaba’s Qwen3-MoE architecture, attempted to mine cryptocurrency and establish covert network tunnels without authorization, according to a technical paper published in December and revised in January. Could this be the start of the AI agents’ coup?
ROME agent hijacks GPU
The autonomous coding agent, ROME, has been under testing at Alibaba’s Agentic Learning Ecosystem, and researchers recently noted two unauthorized behaviors. The model redirected the GPU capacity into cryptocurrency mining, hijacking computing resources to generate crypto profit.
The system created a hidden connection using the SSH (Secure Shell) tunnel to provide remote access to an external user while evading the organization’s security protections.
The 30-billion-parameter open-source model, built on Alibaba’s Qwen3-MoE architecture, has roughly 3 billion parameters active at any given time. It was designed to plan and execute multi-step coding tasks using different software setups, terminal commands, and specific tools.
Suspicious network activity explained
According to the technical paper, the unauthorized behavior was detected during the learning runs; the firewall powered by Alibaba’s cloud identified a security-policy violation, tracing back to the team’s training servers.

The alerts indicated network traffic consistent with crypto mining activities and attempts to probe’ internal network resources. Upon cross-referencing the firewall timestamps and Reinforcement Learning (RL) traces, they found unusual outbound traffic happening consistently, raising concerns that the system might be performing unauthorized actions.
Despite no task instruction mentioning tunneling or mining, the autonomous agent performed the task consistently while repurposing the GPU capacity.
“We also observed the unauthorized repurposing of provisioned GPU capacity for cryptocurrency mining, quietly diverting compute away from training, inflating operational costs, and introducing clear legal and reputational exposure,” one of the researchers wrote.
The rumored theory
According to the researcher, the behavior could be called “ instrumental side effects of autonomous tool use under the RL optimization.” The agent, while trying to optimize for its training objective, understood the need for more energy, which led to an unexpected behavior of fetching more computing power and funds to succeed.
The founder and CEO of decentralized AI research firm Pluralis posted on X, calling it an “insane sequence of statements buried in an Alibaba tech report.” The post got attention from X users, discussing how this may not affect an average person now, but the future of AI safety.