Hub
    Docs
Try for Free
xiangyi-li
/
OS-World
mirrored a minute ago
Benchmark CardFiles and versionsLeaderboard
  • Hub
  • Contact
DiscordGitHubXLinkedIn
0
  1. /
  2. examples
  3. evaluation_examples
  4. multi_apps
  • 00fa164e-2612-4439-992e-157d019a8436.json
    2.66 kB
    ​
  • 02ce9a50-7af2-47ed-8596-af0c230501f8.json
    2.44 kB
    ​
  • 09a37c51-e625-49f4-a514-20a773797a8a.json
    1.86 kB
    ​
  • 0c825995-5b70-4526-b663-113f4c999dd2.json
    4.28 kB
    ​
  • 0e5303d4-8820-42f6-b18d-daf7e633de21.json
    2.49 kB
    ​
  • 185f29bd-5da0-40a6-b69c-ba7f4e0324ef.json
    4.39 kB
    ​
  • 1f18aa87-af6f-41ef-9853-cdb8f32ebdea.json
    4.85 kB
    ​
  • 20236825-b5df-46e7-89bf-62e1d640a897.json
    2 kB
    ​
  • 227d2f97-562b-4ccb-ae47-a5ec9e142fbb.json
    1.99 kB
    ​
  • 22a4636f-8179-4357-8e87-d1743ece1f81.json
    2.85 kB
    ​
  • 236833a3-5704-47fc-888c-4f298f09f799.json
    2.2 kB
    ​
  • 2373b66a-092d-44cb-bfd7-82e86e7a3b4d.json
    1.51 kB
    ​
  • 26150609-0da3-4a7d-8868-0faf9c5f01bb.json
    2.91 kB
    ​
  • 26660ad1-6ebb-4f59-8cba-a8432dfe8d38.json
    1.82 kB
    ​
  • 2b9493d7-49b8-493a-a71b-56cd1f4d6908.json
    2.56 kB
    ​
  • 2c1ebcd7-9c6d-4c9a-afad-900e381ecd5e.json
    6.35 kB
    ​
  • 2c9fc0de-3ee7-45e1-a5df-c86206ad78b5.json
    2.65 kB
    ​
  • 2fe4b718-3bd7-46ec-bdce-b184f5653624.json
    1.5 kB
    ​
  • 337d318b-aa07-4f4f-b763-89d9a2dd013f.json
    2.16 kB
    ​
  • 36037439-2044-4b50-b9d1-875b5a332143.json
    1.69 kB
    ​
  • 3680a5ee-6870-426a-a997-eba929a0d25c.json
    2.98 kB
    ​
  • 3a93cae4-ad3e-403e-8c12-65303b271818.json
    6.08 kB
    ​
  • 3c8f201a-009d-4bbe-8b65-a6f8b35bb57f.json
    1.08 kB
    ​
  • 3e3fc409-bff3-4905-bf16-c968eee3f807.json
    2.3 kB
    ​
  • 3f05f3b9-29ba-4b6b-95aa-2204697ffc06.json
    8.96 kB
    ​
  • 415ef462-bed3-493a-ac36-ca8c6d23bf1b.json
    6.06 kB
    ​
  • 42d25c08-fb87-4927-8b65-93631280a26f.json
    6.69 kB
    ​
  • 42f4d1c7-4521-4161-b646-0a8934e36081.json
    1.63 kB
    ​
  • 46407397-a7d5-4c6b-92c6-dbe038b1457b.json
    4.04 kB
    ​
  • 47f7c0ce-a5fb-4100-a5e6-65cd0e7429e5.json
    2.65 kB
    ​
  • 48c46dc7-fe04-4505-ade7-723cba1aa6f6.json
    5.14 kB
    ​
  • 48d05431-6cd5-4e76-82eb-12b60d823f7d.json
    1.57 kB
    ​
  • 4c26e3f3-3a14-4d86-b44a-d3cedebbb487.json
    1.6 kB
    ​
  • 4e9f0faf-2ecc-4ae8-a804-28c9a75d1ddc.json
    2.47 kB
    ​
  • 510f64c8-9bcc-4be1-8d30-638705850618.json
    3.45 kB
    ​
  • 51f5801c-18b3-4f25-b0c3-02f85507a078.json
    1.82 kB
    ​
  • 58565672-7bfe-48ab-b828-db349231de6b.json
    2.16 kB
    ​
  • 5990457f-2adb-467b-a4af-5c857c92d762.json
    2.75 kB
    ​
  • 5bc63fb9-276a-4439-a7c1-9dc76401737f.json
    2.36 kB
    ​
  • 5df7b33a-9f77-4101-823e-02f863e1c1ae.json
    2.17 kB
    ​
  • 67890eb6-6ce5-4c00-9e3d-fb4972699b06.json
    2.87 kB
    ​
  • 68a25bd4-59c7-4f4d-975e-da0c8509c848.json
    2.37 kB
    ​
  • 69acbb55-d945-4927-a87b-8480e1a5bb7e.json
    1.44 kB
    ​
  • 6d72aad6-187a-4392-a4c4-ed87269c51cf.json
    773 B
    ​
  • 6f4073b8-d8ea-4ade-8a18-c5d1d5d5aa9a.json
    3.13 kB
    ​
  • 716a6079-22da-47f1-ba73-c9d58f986a38.json
    1.59 kB
    ​
  • 74d5859f-ed66-4d3e-aa0e-93d7a592ce41.json
    4.71 kB
    ​
  • 778efd0a-153f-4842-9214-f05fc176b877.json
    2.65 kB
    ​
  • 788b3701-3ec9-4b67-b679-418bfa726c22.json
    4.59 kB
    ​
  • 78aed49a-a710-4321-a793-b611a7c5b56b.json
    4.67 kB
    ​
  • 7e287123-70ca-47b9-8521-47db09b69b14.json
    7.48 kB
    ​
  • 7f35355e-02a6-45b5-b140-f0be698bcf85.json
    1.33 kB
    ​
  • 7ff48d5b-2df2-49da-b500-a5150ffc7f18.json
    4.04 kB
    ​
  • 81c425f5-78f3-4771-afd6-3d2973825947.json
    2.09 kB
    ​
  • 82e3c869-49f6-4305-a7ce-f3e64a0618e7.json
    4.25 kB
    ​
  • 869de13e-bef9-4b91-ba51-f6708c40b096.json
    6.84 kB
    ​
  • 873cafdd-a581-47f6-8b33-b9696ddb7b05.json
    1.88 kB
    ​
  • 881deb30-9549-4583-a841-8270c65f2a17.json
    7.99 kB
    ​
  • 897e3b53-5d4d-444b-85cb-2cdc8a97d903.json
    2.85 kB
    ​
  • 8df7e444-8e06-4f93-8a1a-c5c974269d82.json
    1.94 kB
    ​
  • 8e116af7-7db7-4e35-a68b-b0939c066c78.json
    6.51 kB
    ​
  • 91190194-f406-4cd6-b3f9-c43fac942b22.json
    1.4 kB
    ​
  • 9219480b-3aed-47fc-8bac-d2cffc5849f7.json
    2.69 kB
    ​
  • 937087b6-f668-4ba6-9110-60682ee33441.json
    693 B
    ​
  • 98e8e339-5f91-4ed2-b2b2-12647cb134f4.json
    1.69 kB
    ​
  • 9f3bb592-209d-43bc-bb47-d77d9df56504.json
    2.34 kB
    ​
  • a0b9dc9c-fc07-4a88-8c5d-5e3ecad91bcb.json
    3.62 kB
    ​
  • a503b07f-9119-456b-b75d-f5146737d24f.json
    1.27 kB
    ​
  • a74b607e-6bb5-4ea8-8a7c-5d97c7bbcd2a.json
    1.87 kB
    ​
  • a82b78bb-7fde-4cb3-94a4-035baf10bcf0.json
    3.17 kB
    ​
  • aad10cd7-9337-4b62-b704-a857848cedf2.json
    2.03 kB
    ​
  • acb0f96b-e27c-44d8-b55f-7cb76609dfcd.json
    1.38 kB
    ​
  • aceb0368-56b8-4073-b70e-3dc9aee184e0.json
    2.24 kB
    ​
  • b337d106-053f-4d37-8da0-7f9c4043a66b.json
    2.21 kB
    ​
  • b5062e3e-641c-4e3a-907b-ac864d2e7652.json
    3.75 kB
    ​
  • b52b40a5-ad70-4c53-b5b0-5650a8387052.json
    3.02 kB
    ​
  • bb83cab4-e5c7-42c7-a67b-e46068032b86.json
    2.21 kB
    ​
  • bc2b57f3-686d-4ec9-87ce-edf850b7e442.json
    3.44 kB
    ​
  • c2751594-0cd5-4088-be1b-b5f2f9ec97c4.json
    1.66 kB
    ​
  • c7c1e4c3-9e92-4eba-a4b8-689953975ea4.json
    2.57 kB
    ​
  • c867c42d-a52d-4a24-8ae3-f75d256b5618.json
    2.54 kB
    ​
  • ce2b64a2-ddc1-4f91-8c7d-a88be7121aac.json
    5.63 kB
    ​
  • d1acdb87-bb67-4f30-84aa-990e56a09c92.json
    4.08 kB
    ​
  • d68204bf-11c1-4b13-b48b-d303c73d4bf6.json
    1.52 kB
    ​
  • d9b7c649-c975-4f53-88f5-940b29c47247.json
    2.04 kB
    ​
  • da52d699-e8d2-4dc5-9191-a2199e0b6a9b.json
    2.53 kB
    ​
  • da922383-bfa4-4cd3-bbad-6bebab3d7742.json
    1.97 kB
    ​
  • dd60633f-2c72-42ba-8547-6f2c8cb0fdb0.json
    1.64 kB
    ​
  • deec51c9-3b1e-4b9e-993c-4776f20e8bb2.json
    2.72 kB
    ​
  • df67aebb-fb3a-44fd-b75b-51b6012df509.json
    2.5 kB
    ​
  • e135df7c-7687-4ac0-a5f0-76b74438b53e.json
    2.85 kB
    ​
  • e1fc0df3-c8b9-4ee7-864c-d0b590d3aa56.json
    1.66 kB
    ​
  • e2392362-125e-4f76-a2ee-524b183a3412.json
    3.19 kB
    ​
  • e8172110-ec08-421b-a6f5-842e6451911f.json
    11.1 kB
    ​
  • eb303e01-261e-4972-8c07-c9b4e7a4922a.json
    2.65 kB
    ​
  • ee9a3c83-f437-4879-8918-be5efbb9fac7.json
    2.76 kB
    ​
  • f5c13cdd-205c-4719-a562-348ae5cd1d91.json
    3.54 kB
    ​
  • f7dfbef3-7697-431c-883a-db8583a4e4f9.json
    3.1 kB
    ​
  • f8369178-fafe-40c2-adc4-b9b08a125456.json
    1 kB
    ​
  • f8cfa149-d1c1-4215-8dac-4a0932bad3c2.json
    1.83 kB
    ​
  • f918266a-b3e0-4914-865d-4faa564f1aef.json
    1.51 kB
    ​
Feature/task refine Fix task configurations and evaluators for improved robustness (#420) * Relax reference similarity threshold for APA formatting task Reduce the reference_base_result threshold from 0.93/0.92 to 0.6 for task 2c1ebcd7-9c6d-4c9a-afad-900e381ecd5e (APA 7th edition reference formatting). The original threshold was overly strict and failed submissions with minor formatting variations that are acceptable in APA style (e.g., spacing around journal volume numbers). The new threshold of 0.6 maintains validation of core content accuracy (authors, dates, titles, journals) while accepting reasonable formatting variants, better reflecting real-world APA formatting standards. Changes: - Ubuntu version: 0.93 -> 0.6 - Windows version: 0.92 -> 0.6 * Fix XPath selectors for TripAdvisor hotel search task Replace brittle absolute XPath expressions with robust relative XPath selectors for task b7895e80-f4d1-4648-bee0-4eb45a6f1fa8 (TripAdvisor hotel search in NYC). The original absolute XPath paths (e.g., /html/body/div[1]/main/div[3]/...) are fragile and break when the website's DOM structure changes. The new relative XPath expressions use stable attributes like data-automation and aria-label, making the evaluator resilient to minor UI updates. Changes: - Replaced absolute paths with attribute-based selectors - Check-in date: //button[@data-automation='checkin']//... - Check-out date: //button[@data-automation='checkout']//... - City header: //h2[@data-automation='header_geo_title'] - Room/guest selector: //button[@data-automation='roomsandguests']//... - Price sorting: //button[contains(@aria-label,'PRICE_LOW_TO_HIGH')]//... - Updated expected value for adult field in first test case to match actual output format: "Rooms/Guests1 Room, 2 Guests" This ensures the evaluation will work correctly even when TripAdvisor updates their page layout or CSS classes. * Fix expected output format for file compression task Correct the expected success message format for task 37887e8c-da15-4192-923c-08fa390a176d (compress files modified 30 days ago). The evaluation script eval_20250703.sh outputs "SUCCESS: The task was completed correctly." with "SUCCESS" in all uppercase, but the expected value was using "Success" with only the first letter capitalized. This case mismatch caused the evaluator to incorrectly fail valid submissions. Change: - Expected message: "Success: ..." -> "SUCCESS: ..." This ensures the string matching in check_include_exclude() correctly identifies successful task completion. * Add workspace trust settings to VS Code theme change task Update expected configuration for task 982d12a5-beab-424f-8d38-d2a48429e511 (change VS Code color theme) to include workspace trust security settings. When VS Code launches, it automatically adds workspace trust configuration to settings.json to handle security prompts. The original expected output only included the color theme setting, causing the evaluator to fail even when the task was completed correctly. Changes: - Added three workspace trust settings to expected configuration: - security.workspace.trust.enabled: false - security.workspace.trust.startupPrompt: "never" - security.workspace.trust.emptyWindow: false - Retained the required workbench.colorTheme setting These settings prevent workspace trust dialogs from interfering with automated testing and ensure the evaluator correctly matches the actual settings.json content generated during task execution. * Improve GIMP character extraction task evaluation robustness Enhance task e8172110-ec08-421b-a6f5-842e6451911f (extract pixel art character in GIMP) with more flexible evaluation and reference implementation. Changes: 1. Add reference Python implementation: - Provide complete example script for automated character extraction - Implements background removal using color detection and threshold - Includes helper functions for checkerboard background preview - Serves as learning reference while maintaining task difficulty 2. Add new evaluation function check_structure_sim_with_threshold(): - Customizable SSIM threshold (default 0.85, more lenient than 0.9) - Automatic image resizing when dimensions differ using LANCZOS - Comprehensive logging for debugging (comparison details, SSIM scores) - Handles edge cases: file not found, size mismatch, conversion errors 3. Update task configuration: - Switch from check_structure_sim to check_structure_sim_with_threshold - Set SSIM threshold to 0.85 for both GIMP and code outputs - Add options parameter to pass threshold to evaluator Rationale: - Manual GIMP extraction and Python script may produce slightly different pixel-level results due to different image processing methods (anti-aliasing, edge handling) - 0.85 threshold accepts minor pixel differences while ensuring visual similarity - Auto-resize prevents failures from slight dimension mismatches during cropping - Reference script helps users understand automation requirements without compromising the learning objective This makes evaluation more realistic while maintaining visual quality standards. * Delete evaluation_examples/examples/vs_code/982d12a5-beab-424f-8d38-d2a48429e511.json Just fixed it in a different way in the last commit. Thanks! --------- Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
2 months ago
Feature/task refine Fix task configurations and evaluators for improved robustness (#420) * Relax reference similarity threshold for APA formatting task Reduce the reference_base_result threshold from 0.93/0.92 to 0.6 for task 2c1ebcd7-9c6d-4c9a-afad-900e381ecd5e (APA 7th edition reference formatting). The original threshold was overly strict and failed submissions with minor formatting variations that are acceptable in APA style (e.g., spacing around journal volume numbers). The new threshold of 0.6 maintains validation of core content accuracy (authors, dates, titles, journals) while accepting reasonable formatting variants, better reflecting real-world APA formatting standards. Changes: - Ubuntu version: 0.93 -> 0.6 - Windows version: 0.92 -> 0.6 * Fix XPath selectors for TripAdvisor hotel search task Replace brittle absolute XPath expressions with robust relative XPath selectors for task b7895e80-f4d1-4648-bee0-4eb45a6f1fa8 (TripAdvisor hotel search in NYC). The original absolute XPath paths (e.g., /html/body/div[1]/main/div[3]/...) are fragile and break when the website's DOM structure changes. The new relative XPath expressions use stable attributes like data-automation and aria-label, making the evaluator resilient to minor UI updates. Changes: - Replaced absolute paths with attribute-based selectors - Check-in date: //button[@data-automation='checkin']//... - Check-out date: //button[@data-automation='checkout']//... - City header: //h2[@data-automation='header_geo_title'] - Room/guest selector: //button[@data-automation='roomsandguests']//... - Price sorting: //button[contains(@aria-label,'PRICE_LOW_TO_HIGH')]//... - Updated expected value for adult field in first test case to match actual output format: "Rooms/Guests1 Room, 2 Guests" This ensures the evaluation will work correctly even when TripAdvisor updates their page layout or CSS classes. * Fix expected output format for file compression task Correct the expected success message format for task 37887e8c-da15-4192-923c-08fa390a176d (compress files modified 30 days ago). The evaluation script eval_20250703.sh outputs "SUCCESS: The task was completed correctly." with "SUCCESS" in all uppercase, but the expected value was using "Success" with only the first letter capitalized. This case mismatch caused the evaluator to incorrectly fail valid submissions. Change: - Expected message: "Success: ..." -> "SUCCESS: ..." This ensures the string matching in check_include_exclude() correctly identifies successful task completion. * Add workspace trust settings to VS Code theme change task Update expected configuration for task 982d12a5-beab-424f-8d38-d2a48429e511 (change VS Code color theme) to include workspace trust security settings. When VS Code launches, it automatically adds workspace trust configuration to settings.json to handle security prompts. The original expected output only included the color theme setting, causing the evaluator to fail even when the task was completed correctly. Changes: - Added three workspace trust settings to expected configuration: - security.workspace.trust.enabled: false - security.workspace.trust.startupPrompt: "never" - security.workspace.trust.emptyWindow: false - Retained the required workbench.colorTheme setting These settings prevent workspace trust dialogs from interfering with automated testing and ensure the evaluator correctly matches the actual settings.json content generated during task execution. * Improve GIMP character extraction task evaluation robustness Enhance task e8172110-ec08-421b-a6f5-842e6451911f (extract pixel art character in GIMP) with more flexible evaluation and reference implementation. Changes: 1. Add reference Python implementation: - Provide complete example script for automated character extraction - Implements background removal using color detection and threshold - Includes helper functions for checkerboard background preview - Serves as learning reference while maintaining task difficulty 2. Add new evaluation function check_structure_sim_with_threshold(): - Customizable SSIM threshold (default 0.85, more lenient than 0.9) - Automatic image resizing when dimensions differ using LANCZOS - Comprehensive logging for debugging (comparison details, SSIM scores) - Handles edge cases: file not found, size mismatch, conversion errors 3. Update task configuration: - Switch from check_structure_sim to check_structure_sim_with_threshold - Set SSIM threshold to 0.85 for both GIMP and code outputs - Add options parameter to pass threshold to evaluator Rationale: - Manual GIMP extraction and Python script may produce slightly different pixel-level results due to different image processing methods (anti-aliasing, edge handling) - 0.85 threshold accepts minor pixel differences while ensuring visual similarity - Auto-resize prevents failures from slight dimension mismatches during cropping - Reference script helps users understand automation requirements without compromising the learning objective This makes evaluation more realistic while maintaining visual quality standards. * Delete evaluation_examples/examples/vs_code/982d12a5-beab-424f-8d38-d2a48429e511.json Just fixed it in a different way in the last commit. Thanks! --------- Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
2 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
feat: standardize configuration fields across all evaluation examples - Add `fixed_ip` field to all 369 JSON files in examples directory - Set to `true` for 8 files listed in google_chrome.json multi_apps - Set to `false` for remaining 361 files - Add `possibility_of_env_change` field to 363 JSON files missing this field - Set to "low" for newly added fields - Preserve existing values (4 medium, 2 high) for 6 files that already had this field This ensures consistent configuration schema across all evaluation examples while maintaining backward compatibility with existing settings.
8 months ago
Expand accepted mountain aliases for picture2 in ce2b64a2 task
a month ago
Fix plugin-task evaluation
22 days ago
Add VM machine retrieval and enhance Chrome page info handling - Added `get_vm_machine` method to `PythonController` for retrieving the VM's machine type. - Updated `get_page_info` function in `chrome.py` to utilize the new VM machine retrieval for determining the appropriate Chrome launch command. - Improved error handling and logging for page loading and connection attempts. - Adjusted JSON comparison logic in `vscode.py` for better subset checking and handling of expected results. - Fixed minor typos in JSON example files for clarity.
2 months ago
Calc eval fix (#272) * ver Jun17th updating annotations * ver Jun17th corrected annotation of 1d17 added check for cell merge * ver Jun17th updated several annotations * ver Jun20th fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08 * fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations. * ver Jun21st updating calc evals * ver Jun22nd fixed an impress task * ver Jun22ndv2 adjusted several calc tasks * Clean scalfolds * ver Jul18th added two try-excepts to handle possible formula parsing and calculation failures --------- Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk> Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
8 months ago
Calc eval fix (#272) * ver Jun17th updating annotations * ver Jun17th corrected annotation of 1d17 added check for cell merge * ver Jun17th updated several annotations * ver Jun20th fixed set-up config of 2bd59342-0664-4ccb-ba87-79379096cc08 * fix: Enhance instructions in LibreOffice Calc examples for clarity and specificity, including details on using Pivot Tables, column placements, and revenue calculations. * ver Jun21st updating calc evals * ver Jun22nd fixed an impress task * ver Jun22ndv2 adjusted several calc tasks * Clean scalfolds * ver Jul18th added two try-excepts to handle possible formula parsing and calculation failures --------- Co-authored-by: BowenBryanWang <bryanwang.nlp@connect.hku.hk> Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
8 months ago
Add AutoGLM-OS agent (#309) * autoglm-os initialize * clean code * chore: use proxy for download setup * feat(autoglm-os): add parameter to toggle images * fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel * update * add client_password * update multienv * fix * fix prompt * fix prompt * fix prompt * fix sys prompt * feat: use proxy in file evaluator * fix client_password * fix note_prompt * fix autoglm agent cmd type * fix * revert: fix: use temporary directory for files pulled from the vm to prevent potential collision when running multiple instances of the same task in parallel reverts commit bab5473eea1de0e61b0e1d68b23ce324a5b0ee57 * feat(autoglm): setup tools * fix(autoglm): remove second time of get a11y tree * add osworld server restart * Revert "add osworld server restart" This reverts commit 7bd9d84122e246ce2a26de0e49c25494244c2b3d. * fix _launch_setup * fix autoglm agent tools & xml tree * fix desktop_env * fix bug for tool name capitalization * fix: always use proxy for setup download * add fail after exceeding max turns * fix(autoglm): avoid adding image to message when screenshot is empty * fix maximize_window * fix maximize_window * fix maximize_window * fix import browsertools module bug * fix task proxy config bug * restore setup * refactor desktop env * restore image in provider * restore file.py * refactor desktop_env * quick fix * refactor desktop_env.step * fix our env reset * add max truns constraint * clean run script * clean lib_run_single.py --------- Co-authored-by: hanyullai <hanyullai@outlook.com> Co-authored-by: JingBh <jingbohao@yeah.net>
7 months ago
Dunjie LuMerge pull request #452 from xlang-ai/dev_djlu/gpt54_agent optimize gpt5.4 promptcda933f
Mirror external setup/eval download links to HF cache
12 days ago
Update evaluation examples with clearer instructions and improved configurations - Revised instructions in multiple JSON examples to enhance clarity and specificity, ensuring users understand the tasks better. - Adjusted configurations for tasks involving Chrome, GIMP, LibreOffice, and VLC to reflect more precise requirements and expected behaviors. - These changes aim to improve the usability and effectiveness of the evaluation examples, making them more intuitive for users.
a month ago
Task fix batch (#383) * update 873cafdd-a581-47f6-8b33-b9696ddb7b05 task eval * c1fa57f3-c3db-4596-8f09-020701085416 fix, add tolerance to url matching * 8df7e444-8e06-4f93-8a1a-c5c974269d82 add more clear instruction to the filename for compress * add address string normalization for 6f4073b8-d8ea-4ade-8a18-c5d1d5d5aa9a --------- Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
4 months ago