Hub
    Docs
Try for Free
xiangyi-li
/
OS-World
mirrored 16 minutes ago
Benchmark CardFiles and versionsLeaderboard
  • Hub
  • Contact
DiscordGitHubXLinkedIn
0
  1. /
  2. evaluators
  3. desktop_env
  4. getters
  • __init__.py
    1.65 kB
    ​
  • calc.py
    522 B
    ​
  • chrome.py
    135 kB
    ​
  • file.py
    5.46 kB
    ​
  • general.py
    1.25 kB
    ​
  • gimp.py
    1.11 kB
    ​
  • impress.py
    7.03 kB
    ​
  • info.py
    1.5 kB
    ​
  • misc.py
    24 kB
    ​
  • replay.py
    709 B
    ​
  • vlc.py
    3.69 kB
    ​
  • vscode.py
    1.08 kB
    ​
fix(chrome): recreation.gov getter timeouts for search result and new page (#438) - Search result: wait for search URL and domcontentloaded before looking for .search-result-highlight--success; add fallback to attached + scroll into view so the element is found when visible but not yet "visible" to Playwright. - New page: use wait_for_load_state(load) instead of networkidle so the popup is considered ready once the load event fires; recreation.gov keeps background requests so networkidle often never fires and caused 60s timeouts. Tested with a full run.
13 days ago
feat: enhance VM wallpaper retrieval and image similarity checks - Added logging to the VM wallpaper retrieval function to capture errors and warnings related to content retrieval and file creation. - Implemented checks for None, empty, and invalid content types to ensure robustness in wallpaper handling. - Enhanced the SSIM structure check function with size validation and improved error handling for image processing. - Added logging for image size discrepancies and exceptions during SSIM computation to aid in debugging. These changes improve error handling and logging, ensuring better maintainability and reliability of the evaluators.
8 months ago
Fix chrome dark-mode task evaluation for appearance settings
24 days ago
Updated misc:get_rule_relativeTime to support list in relativeRules[expected][time] (#447)
5 days ago
Clean code; Refactor environment to pass screenshot content instead of path
2 years ago
add multi-app examples
2 years ago
Fix minor errors in vscode and gimp about path and postconfig
2 years ago
update multi-apps
2 years ago
fix: Enhance error handling and logging across multiple evaluators - Added logging for file retrieval and error handling in file.py, improving robustness during file operations. - Implemented checks for file existence and parsing errors in general.py, enhancing reliability in JSON/YAML processing. - Improved table comparison logic in table.py with detailed error logging for sheet loading and cell value reading. - Enhanced metrics evaluation in slides.py with additional checks for paragraph and run counts, ensuring thorough comparison. - Updated utils.py to include file existence checks and detailed error logging during cell value reading.
8 months ago
Support Docker VM manager and provider (#75) * Add docker provider framework * Update VM download link * Add stop container * Update docker manager & provider * Update * Update * Update provider
a year ago
Finish loading the vscode examples v1; Improve on the infra: Add accessibility tree into the observation; Add activate window function, etc
2 years ago
[Feature] Initialize and Implement Aguvis Evaluation on OSWorld (#98) * Initialize Aguvis eval on OSWorld * Debug * Debug * v1, internal version * Add experiments script * Fix minor bugs * Update new endpoint * Update ip * Update * Update * Update * Update * Update * Update * Update * Update * Fix model name * Fix docker close issues; update prompting * Fix missed * Fix the default port to avoid crashing on examples like '_update_browse_history_setup' * Fix server and chromium ports in setup * Revert and add missed dependency * Add VLC port for docker * Update * Clean --------- Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local> Co-authored-by: FredWuCZ <fredwucz@outlook.com>
a year ago
Dunjie LuMerge pull request #452 from xlang-ai/dev_djlu/gpt54_agent optimize gpt5.4 promptcda933f