We are happy to release MMBench-GUI, a hierarchical, multi-platform benchmark framework and toolbox, to evaluate GUI agents. MMBench-GUI is comprising four evaluation levels: GUI Content Understanding ...
If you process raw evaluation data (optional; see “Evaluation data” below), use the environment suggested in its docs (some scripts assume Python 3.11). UI-TARS-1 ...