kleinanzeigen-bot

mirror of https://github.com/Second-Hand-Friends/kleinanzeigen-bot.git synced 2026-03-12 02:31:45 +01:00

Author	SHA1	Message	Date
Jens	ed6137c8ae	fix: use native page xpath api for xpath selectors (#853 ) ## ℹ️ Description Provide a concise summary of the changes introduced in this pull request. - Link to the related issue(s): n/a - Describe the motivation and context for this change. This replaces the stacked XPath work from #845 with a standalone fix from `main`. It makes `By.XPATH` use the native page XPath API instead of routing XPath selectors through text lookup. ## 📋 Changes Summary - Add private XPath helpers in `WebScrapingMixin` for first-match and all-match lookups. - Route `By.XPATH` in `_web_find_once()` and `_web_find_all_once()` through `page.xpath(...)`. - Add unit coverage for XPath helper behavior, empty results, and unsupported parent scoping. - No configuration changes or new dependencies. ### ⚙️ Type of Change Select the type(s) of change(s) included in this pull request: - [x] 🐞 Bug fix (non-breaking change which fixes an issue) - [ ] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist Before requesting a review, confirm the following: - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Refactoring * Improved web scraping element selection reliability through streamlined XPath operations and better internal helper methods. * Tests * Added comprehensive unit tests for XPath-based element lookup operations to ensure consistent behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-03-01 21:00:34 +01:00
Jens	38e0f97578	feat: add grouped selector timeout fallback for login detection (#843 )	2026-02-27 19:11:49 +01:00
Jens	50fc8781a9	feat: collect timeout timing sessions for diagnostics (#814 )	2026-02-13 16:45:52 +01:00
Jens	4282b05ff3	fix: add explicit workspace mode resolution for --config (#818 )	2026-02-11 05:35:41 +01:00
Jens	c212113638	fix: improve Windows browser autodetection paths and diagnose fallback (#816 ) ## ℹ️ Description This pull request fixes Windows browser auto-detection failures reported by users where `diagnose`/startup could not find an installed browser even when Chrome or Edge were present in standard locations. It also makes diagnostics resilient when auto-detection fails by avoiding an assertion-driven abort and continuing with a clear failure log. - Link to the related issue(s): Issue #815 - Describe the motivation and context for this change. - Users reported `Installed browser could not be detected` on Windows despite having a browser installed. - The previous Windows candidate list used a mix of incomplete paths and direct `os.environ[...]` lookups that could raise when variables were missing. - The updated path candidates and ordering were aligned with common Windows install locations used by Playwright’s channel/executable resolution logic (Chrome/Edge under `LOCALAPPDATA`, `PROGRAMFILES`, and `PROGRAMFILES(X86)`). ## 📋 Changes Summary - Expanded Windows browser path candidates in `get_compatible_browser()` to include common Google Chrome and Microsoft Edge install paths, while keeping Chromium and PATH fallbacks. - Replaced unsafe direct env-var indexing with safe retrieval (`os.environ.get(...)`) and added a fallback derivation for `LOCALAPPDATA` via `USERPROFILE\\AppData\\Local` when needed. - Kept legacy Chrome path candidates (`...\\Chrome\\Application\\chrome.exe`) as compatibility fallback. - Updated diagnostics flow to catch browser auto-detection assertion failures and continue with `(fail) No compatible browser found` instead of crashing. - Added/updated unit tests to verify: - Windows detection for LocalAppData Chrome/Edge/Chromium paths. - Missing Windows env vars no longer cause key lookup failures and still surface the intended final detection assertion. - `diagnose_browser_issues()` handles auto-detection assertion failures without raising and logs the expected failure message. ### ⚙️ Type of Change Select the type(s) of change(s) included in this pull request: - [x] 🐞 Bug fix (non-breaking change which fixes an issue) ## ✅ Checklist Before requesting a review, confirm the following: - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Hardened Windows browser auto-detection: checks additional common installation locations for Chrome/Chromium/Edge and treats detection failures as non-fatal, allowing diagnostics to continue with fallback behavior and debug logging when no browser is found. * Tests * Expanded Windows detection tests to cover more path scenarios and added cases verifying failure-mode diagnostics and logging. * Style * Minor formatting tweak in default configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-02-09 19:55:05 +01:00
Jens	ba9b14b71b	fix: address codeql notes and warnings (#740 )	2025-12-20 18:17:51 +01:00
Bjoern147	5f68c09899	feat: Improved WebSelect Handling: Added Combobox Support, Enhanced Element Detection, and Smarter Option Matching (#679 ) ## ℹ️ Description Added Webselect-Function for Input/Dropdown Combobox PR for issue/missing feature #677 # Fixes / Enhancements Finding Special Attributes Elements can fail because they are currently only selected using the name="..." attributes of the HTML elements. If it fails, ALSO fallback-handle selecting special attribute HTML elements by ID instead / additionally. (For example the "brands" Input/Combobox for Mens Shoes... When trying to select a Value in a <select>, it does not only rely on the actual Option value (xxx in the example <options value="xxx">yyy</...>) but instead also on the displayed HTML value (i.e. yyy in above example). This improves UX because the User doesnt have to check the actual "value" of the Option but instead can check the displayed Value from the Browsers Display directly. Testcases for Webselect_Combobox were not added due to missing knowledge about Async Mocking properly. ## 📋 Changes Summary ✅ Fixes & Enhancements - New WebSelect Functionality - Improved Element Detection for Special Attributes - Enhanced <select> Option Matching Logic This improves UX and test robustness — users no longer need to know the exact underlying value, as matching also works with the visible label shown in the browser. 🧩 Result These updates make dropdown and combobox interactions more intuitive, resilient, and user-friendly across diverse HTML structures. ### ⚙️ Type of Change Select the type(s) of change(s) included in this pull request: - [x] 🐞 Bug fix (non-breaking change which fixes an issue) - [x] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist Before requesting a review, confirm the following: - [x] I have reviewed my changes to ensure they meet the project's standards. - [ ] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Field lookup now falls back to locating by ID when name lookup times out. * Option selection uses a two-pass match (value then displayed text); JS-path failures now surface as timeouts. * Error and log messages localized and clarified. * New Features * Support for combobox-style inputs: type into the input, open dropdown, and select by visible text (handles special characters). * Tests * Added tests for combobox selection, missing dropdowns, no-match errors, value-path selection, and special-character handling. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Jens <1742418+1cu@users.noreply.github.com> Co-authored-by: Claude <claude@anthropic.com>	2025-12-05 21:03:31 +01:00
Jens	220c01f257	fix: eliminate async safety violations and migrate to pathlib (#697 ) ## ℹ️ Description Eliminate all blocking I/O operations in async contexts and modernize file path handling by migrating from os.path to pathlib.Path. - Link to the related issue(s): #692 - Get rid of the TODO in pyproject.toml - The added debug logging will ease the troubleshooting for path related issues. ## 📋 Changes Summary - Enable ASYNC210, ASYNC230, ASYNC240, ASYNC250 Ruff rules - Wrap blocking urllib.request.urlopen() in run_in_executor - Wrap blocking file operations (open, write) in run_in_executor - Replace blocking os.path calls with async helpers using run_in_executor - Replace blocking input() with await ainput() - Migrate extract.py from os.path to pathlib.Path - Use Path() constructor and / operator for path joining - Use Path.mkdir(), Path.rename() in executor instead of os functions - Create mockable _path_exists() and _path_is_dir() helpers - Add debug logging for all file system operations ### ⚙️ Type of Change Select the type(s) of change(s) included in this pull request: - [X] 🐞 Bug fix (non-breaking change which fixes an issue) - [ ] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist Before requesting a review, confirm the following: - [X] I have reviewed my changes to ensure they meet the project's standards. - [X] I have tested my changes and ensured that all tests pass (`pdm run test`). - [X] I have formatted the code (`pdm run format`). - [X] I have verified that linting passes (`pdm run lint`). - [X] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Refactor * Made user prompt non‑blocking to improve responsiveness. * Converted filesystem/path handling and prefs I/O to async‑friendly operations; moved blocking network and file work to background tasks. * Added async file/path helpers and async port‑check before browser connections. * Tests * Expanded unit tests for path helpers, image download success/failure, prefs writing, and directory creation/renaming workflows. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-05 20:53:40 +01:00
Jens	89df56bf8b	test: strengthen coverage for sessions, logging, and update check (#686 ) ## ℹ️ Description * Strengthen the session/logging/update-check tests to exercise real resources and guards while bringing the update-check docs in line with the supported interval units. - Link to the related issue(s): Issue #N/A ## 📋 Changes Summary - Reworked the `WebScrapingMixin` session tests so they capture each `stop` handler before the browser reference is nulled, ensuring cleanup logic is exercised without crashing. - Added targeted publish and update-check tests that patch the async helpers, guard logic, and logging handlers while confirming `requests.get` is skipped when the state gate is closed. - Updated `docs/update-check.md` to list only the actually supported interval units (up to 30 days) and noted the new guard coverage in the changelog. ### ⚙️ Type of Change - [x] 🐞 Bug fix (non-breaking change which fixes an issue) - [ ] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Tests * Expanded test coverage for publish workflow orchestration and update checking interval behavior. * Added comprehensive browser session cleanup tests, including idempotent operations and edge case handling. * Consolidated logging configuration tests with improved handler management validation. * Refined test fixtures and assertions for better test reliability. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-11-17 11:02:18 +01:00
Jens	a3ac27c441	feat: add configurable timeouts (#673 ) ## ℹ️ Description - Related issues: #671, #658 - Introduces configurable timeout controls plus retry/backoff handling for flaky DOM operations. We often see timeouts which are note reproducible in certain configurations. I suspect timeout issues based on a combination of internet speed, browser, os, age of the computer and the weather. This PR introduces a comprehensive config model to tweak timeouts. ## 📋 Changes Summary - add TimeoutConfig to the main config/schema and expose timeouts in README/docs - wire WebScrapingMixin, extractor, update checker, and browser diagnostics to honor the configurable timeouts and retries - update translations/tests to cover the new behaviour and ensure lint/mypy/pyright pipelines remain green ### ⚙️ Type of Change - [ ] 🐞 Bug fix (non-breaking change which fixes an issue) - [x] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Centralized, configurable timeout system for web interactions, detection flows, publishing, and pagination. * Optional retry with exponential backoff for operations that time out. * Improvements * Replaced fixed wait times with dynamic timeouts throughout workflows. * More informative timeout-related messages and diagnostics. * Tests * New and expanded test coverage for timeout behavior, pagination, diagnostics, and retry logic. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-11-13 15:08:52 +01:00
Jens	33d1964f86	feat: speed up and stabilise test suite (#676 ) ## ℹ️ Description Provide a concise summary of the changes introduced in this pull request. - Link to the related issue(s): Issue # - Describe the motivation and context for this change. Refactors the test harness for faster and more reliable feedback: adds deterministic time freezing for update checks, accelerates and refactors smoke tests to run in-process, defaults pytest to xdist with durations tracking, and adjusts CI triggers so PRs run the test matrix only once. ## 📋 Changes Summary - add pytest-xdist + durations reporting defaults, force deterministic locale and slow markers, and document the workflow adjustments - run smoke tests in-process (no subprocess churn), mock update checks/logging, and mark slow specs appropriately - deflake update check interval tests by freezing datetime and simplify FixedDateTime helper - limit GitHub Actions `push` trigger to `main` so feature branches rely on the single pull_request run ### ⚙️ Type of Change Select the type(s) of change(s) included in this pull request: - [ ] 🐞 Bug fix (non-breaking change which fixes an issue) - [x] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist Before requesting a review, confirm the following: - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Tests * Ensure tests run in a consistent English locale and restore prior locale after each run * Mark integration scraping tests as slow for clearer categorization * Replace subprocess-based CLI tests with an in-process runner that returns structured results and captures combined stdout/stderr/logs; disable update checks during smoke tests * Freeze current time in update-check tests for deterministic assertions * Add mock for process enumeration in web‑scraping unit tests to stabilize macOS-specific warnings <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-11-12 21:29:51 +01:00
Jens	339d66ed47	feat: Replace custom RemoteObject wrapper with direct NoDriver 0.47+ usage (#652 ) ## ℹ️ Description Replace custom RemoteObject serialization wrapper with direct NoDriver 0.47+ RemoteObject API usage for better performance and maintainability. - Motivation: The custom wrapper was unnecessary complexity when NoDriver 0.47+ provides direct RemoteObject API - Context: Upgrading from NoDriver 0.39 to 0.47 introduced RemoteObject, and we want to use it as intended - Goal: Future-proof implementation using the standard NoDriver patterns ## 📋 Changes Summary - Replace custom serialization wrapper with direct RemoteObject API usage - Implement proper RemoteObject detection and conversion in web_execute() - Add comprehensive _convert_remote_object_value() method for recursive conversion - Handle key/value list format from deep_serialized_value.value - Add type guards and proper type checking for RemoteObject instances - Maintain internal API stability while using RemoteObject as intended - Add 19 comprehensive test cases covering all conversion scenarios - Application tested and working with real ad download, update and publish ### ⚙️ Type of Change - [x] ✨ New feature (adds new functionality without breaking existing usage) - [x] 🐞 Bug fix (non-breaking change which fixes an issue) ## ✅ Checklist - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (pdm run test). - [x] I have formatted the code (pdm run format). - [x] I have verified that linting passes (pdm run lint). - [x] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.	2025-10-20 08:52:06 +02:00
Sebastian Thomschke	dadd08aedb	build: upgrade to Python 3.14 (#636 ) Co-authored-by: Jens <1742418+1cu@users.noreply.github.com>	2025-10-14 15:56:35 +02:00
Jens Bergmann	37a36988c3	fix: improve Chrome version detection to reuse existing browsers (#615 )	2025-08-20 12:51:13 +02:00
Jens Bergmann	332926519d	feat: chrome version detection clean (#607 )	2025-08-18 13:19:50 +02:00
Jens Bergmann	c9d04da70d	feat: browser connection improvements (#601 )	2025-08-13 09:29:25 +02:00
Jens Bergmann	50656ad7e2	feat: Improve test coverage (#515 ) * test: implement comprehensive test coverage improvements This commit improves test coverage across multiple modules, adding unit tests for core functionality. Key improvements: 1. WebScrapingMixin: - Add comprehensive async error handling tests - Add session management tests (browser crash recovery, session expiration) - Add element interaction tests (custom wait conditions, timeouts) - Add browser configuration tests (extensions, preferences) - Add robust awaitable mocking infrastructure - Rename integration test file to avoid naming conflicts 2. Error Handlers: - Add tests for error message formatting - Add tests for error recovery scenarios - Add tests for error logging functionality 3. Network Utilities: - Add tests for port checking functionality - Add tests for network error handling - Add tests for connection management 4. Pydantic Models: - Add tests for validation cases - Add tests for error handling - Add tests for complex validation scenarios Technical details: - Use TrulyAwaitableMockPage for proper async testing - Add comprehensive mocking for browser and page objects - Add proper cleanup in session management tests - Add browser-specific configuration tests (Chrome/Edge) - Add proper type hints and docstrings Files changed: - Renamed: tests/integration/test_web_scraping_mixin.py → tests/integration/test_web_scraping_mixin_integration.py - Added: tests/unit/test_error_handlers.py - Added: tests/unit/test_net.py - Added: tests/unit/test_pydantics.py - Added: tests/unit/test_web_scraping_mixin.py * test: enhance test coverage with additional edge cases and scenarios This commit extends the test coverage improvements with additional test cases and edge case handling, focusing on browser configuration, error handling, and file utilities. Key improvements: 1. WebScrapingMixin: - Add comprehensive browser binary location detection tests - Add cross-platform browser path detection (Linux, macOS, Windows) - Add browser profile configuration tests - Add session state persistence tests - Add external process termination handling - Add session creation error cleanup tests - Improve browser argument configuration tests - Add extension loading validation tests 2. Error Handlers: - Add debug mode error handling tests - Add specific error type tests (AttributeError, ImportError, NameError, TypeError) - Improve error message formatting tests - Add traceback inclusion verification 3. Pydantic Models: - Add comprehensive validation error message tests - Add tests for various error codes and contexts - Add tests for pluralization in error messages - Add tests for empty error list handling - Add tests for context handling in validation errors 4. File Utilities: - Add comprehensive path resolution tests - Add tests for file and directory reference handling - Add tests for special path cases - Add tests for nonexistent path handling - Add tests for absolute and relative path conversion Technical details: - Add proper type casting for test fixtures - Improve test isolation and cleanup - Add platform-specific browser path detection - Add proper error context handling - Add comprehensive error message formatting tests - Add proper cleanup in session management tests - Add browser-specific configuration tests - Add proper path normalization and resolution tests * fix(test): handle Linux browser paths in web_scraping_mixin test Update mock_exists to properly detect Linux browser binaries in test_browser_profile_configuration, fixing the "Installed browser could not be detected" error. * fix(test): handle Windows browser paths in web_scraping_mixin test Add Windows browser paths to mock_exists function to properly detect browser binaries on Windows platform, fixing the "Specified browser binary does not exist" error.	2025-05-18 19:02:59 +02:00

17 Commits