HackerLinks

Tool Profile

Vocaela

Tiny 500M vision-language model for GUI clicking and control.

At a glance:
First seen:2026-05-05
Last seen:2026-05-05
Sightings:1
Source:huggingface.co

What it is

Tiny 500M vision-language model for GUI clicking and control.

Why developers recommend it

It stood out as a small model that actually handled x,y clicking well.

Hacker News evidence

2026-05-05

A commenter said Vocaela, a tiny 500M model, is quite good at x,y clicking when larger models hallucinate coordinates.

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents