Update 10a730d5-d414-4b40-b479-684bed1ae522.json
change the ight 2 the night
8 months ago
fix: refine task evaluators for Chrome and Thunderbird (#445)
- chrome/480bcfea: mark task as infeasible
- thunderbird/15c3b339: relax evaluator to only require email and password
(remove Anonym Tester and IMAP rules; instruction says stay on page)
- thunderbird/dd84e895: require every message starred in Bills folder
(SQL now checks starred count = total count, not just sum(1) > 0)
5 days ago
fix: refine task evaluators for Chrome and Thunderbird (#445)
- chrome/480bcfea: mark task as infeasible
- thunderbird/15c3b339: relax evaluator to only require email and password
(remove Anonym Tester and IMAP rules; instruction says stay on page)
- thunderbird/dd84e895: require every message starred in Bills folder
(SQL now checks starred count = total count, not just sum(1) > 0)
5 days ago
Mirror external setup/eval download links to HF cache
12 days ago
Mirror external setup/eval download links to HF cache
12 days ago
Dunjie LuMerge pull request #452 from xlang-ai/dev_djlu/gpt54_agent
optimize gpt5.4 promptcda933f
feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
- Set to `true` for 8 files listed in google_chrome.json multi_apps
- Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
- Set to "low" for newly added fields
- Preserve existing values (4 medium, 2 high) for 6 files that already had this field
This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
- Set to `true` for 8 files listed in google_chrome.json multi_apps
- Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
- Set to "low" for newly added fields
- Preserve existing values (4 medium, 2 high) for 6 files that already had this field
This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
- Set to `true` for 8 files listed in google_chrome.json multi_apps
- Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
- Set to "low" for newly added fields
- Preserve existing values (4 medium, 2 high) for 6 files that already had this field
This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
- Set to `true` for 8 files listed in google_chrome.json multi_apps
- Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
- Set to "low" for newly added fields
- Preserve existing values (4 medium, 2 high) for 6 files that already had this field
This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
- Set to `true` for 8 files listed in google_chrome.json multi_apps
- Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
- Set to "low" for newly added fields
- Preserve existing values (4 medium, 2 high) for 6 files that already had this field
This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
- Set to `true` for 8 files listed in google_chrome.json multi_apps
- Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
- Set to "low" for newly added fields
- Preserve existing values (4 medium, 2 high) for 6 files that already had this field
This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
- Set to `true` for 8 files listed in google_chrome.json multi_apps
- Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
- Set to "low" for newly added fields
- Preserve existing values (4 medium, 2 high) for 6 files that already had this field
This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
- Set to `true` for 8 files listed in google_chrome.json multi_apps
- Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
- Set to "low" for newly added fields
- Preserve existing values (4 medium, 2 high) for 6 files that already had this field
This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
- Set to `true` for 8 files listed in google_chrome.json multi_apps
- Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
- Set to "low" for newly added fields
- Preserve existing values (4 medium, 2 high) for 6 files that already had this field
This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples
- Add `fixed_ip` field to all 369 JSON files in examples directory
- Set to `true` for 8 files listed in google_chrome.json multi_apps
- Set to `false` for remaining 361 files
- Add `possibility_of_env_change` field to 363 JSON files missing this field
- Set to "low" for newly added fields
- Preserve existing values (4 medium, 2 high) for 6 files that already had this field
This ensures consistent configuration schema across all evaluation examples
while maintaining backward compatibility with existing settings.
8 months ago