Hub
    Docs
Try for Free
xiangyi-li
/
OS-World
mirrored 18 minutes ago
Benchmark CardFiles and versionsLeaderboard
  • Hub
  • Contact
DiscordGitHubXLinkedIn
0
  • README.md
    7.76 kB
    ​
  • __init__.py
    108 B
    ​
  • getters
    -
    ​
  • metrics
    -
    ​
Make PPTX run-count comparison configurable for task b8adbc24 (#443) Add an `examine_run_count` flag to `compare_pptx_files` (defaulting to true) and gate run-count mismatch checks for both text paragraphs and table cells. Disable this check in `b8adbc24-cef2-4b15-99d5-ecbe7ff445eb.json` to prevent false negatives from non-semantic LibreOffice run segmentation differences.
7 days ago
Updated misc:get_rule_relativeTime to support list in relativeRules[expected][time] (#447)
5 days ago
ver Dec22nd re-organized the evaluator structure to improve the extensibility
2 years ago
Clean code; Add todos in desktop_env README
2 years ago
Dunjie LuMerge pull request #452 from xlang-ai/dev_djlu/gpt54_agent optimize gpt5.4 promptcda933f
  1. /
  2. desktop_env
  3. evaluators