1.65 kB
522 B
135 kB
5.46 kB
1.25 kB
1.11 kB
7.03 kB
1.5 kB
24 kB
709 B
3.69 kB
1.08 kB
fix(chrome): recreation.gov getter timeouts for search result and new page (#438)
- Search result: wait for search URL and domcontentloaded before looking for
.search-result-highlight--success; add fallback to attached + scroll into
view so the element is found when visible but not yet "visible" to Playwright.
- New page: use wait_for_load_state(load) instead of networkidle so the
popup is considered ready once the load event fires; recreation.gov keeps
background requests so networkidle often never fires and caused 60s timeouts.
Tested with a full run.
13 days ago
feat: enhance VM wallpaper retrieval and image similarity checks
- Added logging to the VM wallpaper retrieval function to capture errors and warnings related to content retrieval and file creation.
- Implemented checks for None, empty, and invalid content types to ensure robustness in wallpaper handling.
- Enhanced the SSIM structure check function with size validation and improved error handling for image processing.
- Added logging for image size discrepancies and exceptions during SSIM computation to aid in debugging.
These changes improve error handling and logging, ensuring better maintainability and reliability of the evaluators.
8 months ago
Fix chrome dark-mode task evaluation for appearance settings
24 days ago
Updated misc:get_rule_relativeTime to support list in relativeRules[expected][time] (#447)
5 days ago
Clean code; Refactor environment to pass screenshot content instead of path
2 years ago
add multi-app examples
2 years ago
Fix minor errors in vscode and gimp about path and postconfig
2 years ago
update multi-apps
2 years ago
fix: Enhance error handling and logging across multiple evaluators
- Added logging for file retrieval and error handling in file.py, improving robustness during file operations.
- Implemented checks for file existence and parsing errors in general.py, enhancing reliability in JSON/YAML processing.
- Improved table comparison logic in table.py with detailed error logging for sheet loading and cell value reading.
- Enhanced metrics evaluation in slides.py with additional checks for paragraph and run counts, ensuring thorough comparison.
- Updated utils.py to include file existence checks and detailed error logging during cell value reading.
8 months ago
Support Docker VM manager and provider (#75)
* Add docker provider framework
* Update VM download link
* Add stop container
* Update docker manager & provider
* Update
* Update
* Update provider
a year ago
Finish loading the vscode examples v1; Improve on the infra: Add accessibility tree into the observation; Add activate window function, etc
2 years ago
[Feature] Initialize and Implement Aguvis Evaluation on OSWorld (#98)
* Initialize Aguvis eval on OSWorld
* Debug
* Debug
* v1, internal version
* Add experiments script
* Fix minor bugs
* Update new endpoint
* Update ip
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Fix model name
* Fix docker close issues; update prompting
* Fix missed
* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'
* Fix server and chromium ports in setup
* Revert and add missed dependency
* Add VLC port for docker
* Update
* Clean
---------
Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local>
Co-authored-by: FredWuCZ <fredwucz@outlook.com>
a year ago
Dunjie LuMerge pull request #452 from xlang-ai/dev_djlu/gpt54_agent
optimize gpt5.4 promptcda933f