feat: add configurable timeouts (#673)

## ℹ️ Description
- Related issues: #671, #658
- Introduces configurable timeout controls plus retry/backoff handling
for flaky DOM operations.

We often see timeouts which are note reproducible in certain
configurations. I suspect timeout issues based on a combination of
internet speed, browser, os, age of the computer and the weather.

This PR introduces a comprehensive config model to tweak timeouts.

## 📋 Changes Summary
- add TimeoutConfig to the main config/schema and expose timeouts in
README/docs
- wire WebScrapingMixin, extractor, update checker, and browser
diagnostics to honor the configurable timeouts and retries
- update translations/tests to cover the new behaviour and ensure
lint/mypy/pyright pipelines remain green

### ⚙️ Type of Change
- [ ] 🐞 Bug fix (non-breaking change which fixes an issue)
- [x]  New feature (adds new functionality without breaking existing
usage)
- [ ] 💥 Breaking change (changes that might break existing user setups,
scripts, or configurations)

##  Checklist
- [x] I have reviewed my changes to ensure they meet the project's
standards.
- [x] I have tested my changes and ensured that all tests pass (`pdm run
test`).
- [x] I have formatted the code (`pdm run format`).
- [x] I have verified that linting passes (`pdm run lint`).
- [x] I have updated documentation where necessary.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Centralized, configurable timeout system for web interactions,
detection flows, publishing, and pagination.
* Optional retry with exponential backoff for operations that time out.

* **Improvements**
* Replaced fixed wait times with dynamic timeouts throughout workflows.
  * More informative timeout-related messages and diagnostics.

* **Tests**
* New and expanded test coverage for timeout behavior, pagination,
diagnostics, and retry logic.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This commit is contained in:
Jens
2025-11-13 15:08:52 +01:00
committed by GitHub
parent ac678ed888
commit a3ac27c441
16 changed files with 972 additions and 121 deletions

View File

@@ -187,6 +187,14 @@ All Python files must start with SPDX license headers:
- Use appropriate log levels (DEBUG, INFO, WARNING, ERROR) - Use appropriate log levels (DEBUG, INFO, WARNING, ERROR)
- Log important state changes and decision points - Log important state changes and decision points
#### Timeout configuration
- The default timeout (`timeouts.default`) already wraps all standard DOM helpers (`web_find`, `web_click`, etc.) via `WebScrapingMixin._timeout/_effective_timeout`. Use it unless a workflow clearly needs a different SLA.
- Reserve `timeouts.quick_dom` for transient overlays (shipping dialogs, payment prompts, toast banners) that should render almost instantly; call `self._timeout("quick_dom")` in those spots to keep the UI responsive.
- For single selectors that occasionally need more headroom, pass an inline override instead of creating a new config key, e.g. `custom = self._timeout(override = 12.5); await self.web_find(..., timeout = custom)`.
- Use `_timeout()` when you just need the raw configured value (with optional override); use `_effective_timeout()` when you rely on the global multiplier and retry backoff for a given attempt (e.g. inside `_run_with_timeout_retries`).
- Add a new timeout key only when a recurring workflow has its own timing profile (pagination, captcha detection, publishing confirmations, Chrome probes, etc.). Whenever you add one, extend `TimeoutConfig`, document it in the sample `timeouts:` block in `README.md`, and explain it in `docs/BROWSER_TROUBLESHOOTING.md`.
- Encourage users to raise `timeouts.multiplier` when everything is slow, and override existing keys in `config.yaml` before introducing new ones. This keeps the configuration surface minimal.
#### Examples #### Examples
```python ```python
def parse_duration(text: str) -> timedelta: def parse_duration(text: str) -> timedelta:
@@ -297,4 +305,3 @@ See the [LICENSE.txt](LICENSE.txt) file for our project's licensing. All source
- Use the translation system for all output—**never hardcode German or other languages** in the code. - Use the translation system for all output—**never hardcode German or other languages** in the code.
- If you add or change a user-facing message, update the translation file and ensure that translation completeness tests pass (`tests/unit/test_translations.py`). - If you add or change a user-facing message, update the translation file and ensure that translation completeness tests pass (`tests/unit/test_translations.py`).
- Review the translation guidelines and patterns in the codebase for correct usage. - Review the translation guidelines and patterns in the codebase for correct usage.

View File

@@ -277,6 +277,27 @@ categories:
Verschenken & Tauschen > Verleihen: 272/274 Verschenken & Tauschen > Verleihen: 272/274
Verschenken & Tauschen > Verschenken: 272/192 Verschenken & Tauschen > Verschenken: 272/192
# timeout tuning (optional)
timeouts:
multiplier: 1.0 # Scale all timeouts (e.g. 2.0 for slower networks)
default: 5.0 # Base timeout for web_find/web_click/etc.
page_load: 15.0 # Timeout for web_open page loads
captcha_detection: 2.0 # Timeout for captcha iframe detection
sms_verification: 4.0 # Timeout for SMS verification banners
gdpr_prompt: 10.0 # Timeout when handling GDPR dialogs
publishing_result: 300.0 # Timeout for publishing status checks
publishing_confirmation: 20.0 # Timeout for publish confirmation redirect
pagination_initial: 10.0 # Timeout for first pagination lookup
pagination_follow_up: 5.0 # Timeout for subsequent pagination clicks
quick_dom: 2.0 # Generic short DOM timeout (shipping dialogs, etc.)
update_check: 10.0 # Timeout for GitHub update requests
chrome_remote_probe: 2.0 # Timeout for local remote-debugging probes
chrome_remote_debugging: 5.0 # Timeout for remote debugging API calls
chrome_binary_detection: 10.0 # Timeout for chrome --version subprocess
retry_enabled: true # Enables DOM retry/backoff when timeouts occur
retry_max_attempts: 2
retry_backoff_factor: 1.5
# download configuration # download configuration
download: download:
include_all_matching_shipping_options: false # if true, all shipping options matching the package size will be included include_all_matching_shipping_options: false # if true, all shipping options matching the package size will be included
@@ -329,6 +350,8 @@ login:
password: "" password: ""
``` ```
Slow networks or sluggish remote browsers often just need a higher `timeouts.multiplier`, while truly problematic selectors can get explicit values directly under `timeouts`. Remember to regenerate the schemas after changing the configuration model so editors stay in sync.
### <a name="ad-config"></a>2) Ad configuration ### <a name="ad-config"></a>2) Ad configuration
Each ad is described in a separate JSON or YAML file with prefix `ad_<filename>`. The prefix is configurable in config file. Each ad is described in a separate JSON or YAML file with prefix `ad_<filename>`. The prefix is configurable in config file.

View File

@@ -59,6 +59,18 @@ Please update your configuration to include --user-data-dir for remote debugging
The bot will also provide specific instructions on how to fix your configuration. The bot will also provide specific instructions on how to fix your configuration.
### Issue: Slow page loads or recurring TimeoutError
**Symptoms:**
- `_extract_category_from_ad_page` fails intermittently due to breadcrumb lookups timing out
- Captcha/SMS/GDPR prompts appear right after a timeout
- Requests to GitHub's API fail sporadically with timeout errors
**Solutions:**
1. Increase `timeouts.multiplier` in `config.yaml` (e.g. `2.0` doubles every timeout consistently).
2. Override specific keys under `timeouts` (e.g. `pagination_initial: 20.0`) if only a single selector is problematic.
3. Keep `retry_enabled` on so that DOM lookups are retried with exponential backoff.
## Common Issues and Solutions ## Common Issues and Solutions
### Issue 1: "Failed to connect to browser" with "root" error ### Issue 1: "Failed to connect to browser" with "root" error

View File

@@ -359,6 +359,137 @@
"title": "PublishingConfig", "title": "PublishingConfig",
"type": "object" "type": "object"
}, },
"TimeoutConfig": {
"properties": {
"multiplier": {
"default": 1.0,
"description": "Global multiplier applied to all timeout values.",
"minimum": 0.1,
"title": "Multiplier",
"type": "number"
},
"default": {
"type": "number",
"minimum": 0.0,
"default": 5.0,
"description": "Baseline timeout for DOM interactions.",
"title": "Default"
},
"page_load": {
"default": 15.0,
"description": "Page load timeout for web_open.",
"minimum": 1.0,
"title": "Page Load",
"type": "number"
},
"captcha_detection": {
"default": 2.0,
"description": "Timeout for captcha iframe detection.",
"minimum": 0.1,
"title": "Captcha Detection",
"type": "number"
},
"sms_verification": {
"default": 4.0,
"description": "Timeout for SMS verification prompts.",
"minimum": 0.1,
"title": "Sms Verification",
"type": "number"
},
"gdpr_prompt": {
"default": 10.0,
"description": "Timeout for GDPR/consent dialogs.",
"minimum": 1.0,
"title": "Gdpr Prompt",
"type": "number"
},
"publishing_result": {
"default": 300.0,
"description": "Timeout for publishing result checks.",
"minimum": 10.0,
"title": "Publishing Result",
"type": "number"
},
"publishing_confirmation": {
"default": 20.0,
"description": "Timeout for publish confirmation redirect.",
"minimum": 1.0,
"title": "Publishing Confirmation",
"type": "number"
},
"pagination_initial": {
"default": 10.0,
"description": "Timeout for initial pagination lookup.",
"minimum": 1.0,
"title": "Pagination Initial",
"type": "number"
},
"pagination_follow_up": {
"default": 5.0,
"description": "Timeout for subsequent pagination navigation.",
"minimum": 1.0,
"title": "Pagination Follow Up",
"type": "number"
},
"quick_dom": {
"default": 2.0,
"description": "Generic short timeout for transient UI.",
"minimum": 0.1,
"title": "Quick Dom",
"type": "number"
},
"update_check": {
"default": 10.0,
"description": "Timeout for GitHub update checks.",
"minimum": 1.0,
"title": "Update Check",
"type": "number"
},
"chrome_remote_probe": {
"default": 2.0,
"description": "Timeout for local remote-debugging probes.",
"minimum": 0.1,
"title": "Chrome Remote Probe",
"type": "number"
},
"chrome_remote_debugging": {
"default": 5.0,
"description": "Timeout for remote debugging API calls.",
"minimum": 1.0,
"title": "Chrome Remote Debugging",
"type": "number"
},
"chrome_binary_detection": {
"default": 10.0,
"description": "Timeout for chrome --version subprocesses.",
"minimum": 1.0,
"title": "Chrome Binary Detection",
"type": "number"
},
"retry_enabled": {
"default": true,
"description": "Enable built-in retry/backoff for DOM operations.",
"title": "Retry Enabled",
"type": "boolean"
},
"retry_max_attempts": {
"default": 2,
"description": "Max retry attempts when retry is enabled.",
"minimum": 1,
"title": "Retry Max Attempts",
"type": "integer"
},
"retry_backoff_factor": {
"default": 1.5,
"description": "Exponential factor applied per retry attempt.",
"minimum": 1.0,
"title": "Retry Backoff Factor",
"type": "number"
}
},
"title": "TimeoutConfig",
"type": "object"
},
"UpdateCheckConfig": { "UpdateCheckConfig": {
"description": "Configuration for update checking functionality.\n\nAttributes:\n enabled: Whether update checking is enabled.\n channel: Which release channel to check ('latest' for stable, 'preview' for prereleases).\n interval: How often to check for updates (e.g. '7d', '1d').\n If the interval is invalid, too short (<1d), or too long (>30d),\n the bot will log a warning and use a default interval for this run:\n - 1d for 'preview' channel\n - 7d for 'latest' channel\n The config file is not changed automatically; please fix your config to avoid repeated warnings.", "description": "Configuration for update checking functionality.\n\nAttributes:\n enabled: Whether update checking is enabled.\n channel: Which release channel to check ('latest' for stable, 'preview' for prereleases).\n interval: How often to check for updates (e.g. '7d', '1d').\n If the interval is invalid, too short (<1d), or too long (>30d),\n the bot will log a warning and use a default interval for this run:\n - 1d for 'preview' channel\n - 7d for 'latest' channel\n The config file is not changed automatically; please fix your config to avoid repeated warnings.",
"properties": { "properties": {
@@ -428,6 +559,10 @@
"update_check": { "update_check": {
"$ref": "#/$defs/UpdateCheckConfig", "$ref": "#/$defs/UpdateCheckConfig",
"description": "Update check configuration" "description": "Update check configuration"
},
"timeouts": {
"$ref": "#/$defs/TimeoutConfig",
"description": "Centralized timeout configuration."
} }
}, },
"title": "Config", "title": "Config",

View File

@@ -573,8 +573,9 @@ class KleinanzeigenBot(WebScrapingMixin):
async def check_and_wait_for_captcha(self, *, is_login_page:bool = True) -> None: async def check_and_wait_for_captcha(self, *, is_login_page:bool = True) -> None:
try: try:
captcha_timeout = self._timeout("captcha_detection")
await self.web_find(By.CSS_SELECTOR, await self.web_find(By.CSS_SELECTOR,
"iframe[name^='a-'][src^='https://www.google.com/recaptcha/api2/anchor?']", timeout = 2) "iframe[name^='a-'][src^='https://www.google.com/recaptcha/api2/anchor?']", timeout = captcha_timeout)
if not is_login_page and self.config.captcha.auto_restart: if not is_login_page and self.config.captcha.auto_restart:
LOG.warning("Captcha recognized - auto-restart enabled, abort run...") LOG.warning("Captcha recognized - auto-restart enabled, abort run...")
@@ -624,7 +625,8 @@ class KleinanzeigenBot(WebScrapingMixin):
async def handle_after_login_logic(self) -> None: async def handle_after_login_logic(self) -> None:
try: try:
await self.web_find(By.TEXT, "Wir haben dir gerade einen 6-stelligen Code für die Telefonnummer", timeout = 4) sms_timeout = self._timeout("sms_verification")
await self.web_find(By.TEXT, "Wir haben dir gerade einen 6-stelligen Code für die Telefonnummer", timeout = sms_timeout)
LOG.warning("############################################") LOG.warning("############################################")
LOG.warning("# Device verification message detected. Please follow the instruction displayed in the Browser.") LOG.warning("# Device verification message detected. Please follow the instruction displayed in the Browser.")
LOG.warning("############################################") LOG.warning("############################################")
@@ -634,9 +636,12 @@ class KleinanzeigenBot(WebScrapingMixin):
try: try:
LOG.info("Handling GDPR disclaimer...") LOG.info("Handling GDPR disclaimer...")
await self.web_find(By.ID, "gdpr-banner-accept", timeout = 10) gdpr_timeout = self._timeout("gdpr_prompt")
await self.web_find(By.ID, "gdpr-banner-accept", timeout = gdpr_timeout)
await self.web_click(By.ID, "gdpr-banner-cmp-button") await self.web_click(By.ID, "gdpr-banner-cmp-button")
await self.web_click(By.XPATH, "//div[@id='ConsentManagementPage']//*//button//*[contains(., 'Alle ablehnen und fortfahren')]", timeout = 10) await self.web_click(By.XPATH,
"//div[@id='ConsentManagementPage']//*//button//*[contains(., 'Alle ablehnen und fortfahren')]",
timeout = gdpr_timeout)
except TimeoutError: except TimeoutError:
pass pass
@@ -724,7 +729,8 @@ class KleinanzeigenBot(WebScrapingMixin):
count += 1 count += 1
await self.publish_ad(ad_file, ad_cfg, ad_cfg_orig, published_ads, AdUpdateStrategy.REPLACE) await self.publish_ad(ad_file, ad_cfg, ad_cfg_orig, published_ads, AdUpdateStrategy.REPLACE)
await self.web_await(self.__check_publishing_result, timeout = 5 * 60) publish_timeout = self._timeout("publishing_result")
await self.web_await(self.__check_publishing_result, timeout = publish_timeout)
if self.config.publishing.delete_old_ads == "AFTER_PUBLISH" and not self.keep_old_ads: if self.config.publishing.delete_old_ads == "AFTER_PUBLISH" and not self.keep_old_ads:
await self.delete_ad(ad_cfg, published_ads, delete_old_ads_by_title = False) await self.delete_ad(ad_cfg, published_ads, delete_old_ads_by_title = False)
@@ -924,7 +930,8 @@ class KleinanzeigenBot(WebScrapingMixin):
# wait for payment form if commercial account is used # wait for payment form if commercial account is used
############################# #############################
try: try:
await self.web_find(By.ID, "myftr-shppngcrt-frm", timeout = 2) short_timeout = self._timeout("quick_dom")
await self.web_find(By.ID, "myftr-shppngcrt-frm", timeout = short_timeout)
LOG.warning("############################################") LOG.warning("############################################")
LOG.warning("# Payment form detected! Please proceed with payment.") LOG.warning("# Payment form detected! Please proceed with payment.")
@@ -934,7 +941,8 @@ class KleinanzeigenBot(WebScrapingMixin):
except TimeoutError: except TimeoutError:
pass pass
await self.web_await(lambda: "p-anzeige-aufgeben-bestaetigung.html?adId=" in self.page.url, timeout = 20) confirmation_timeout = self._timeout("publishing_confirmation")
await self.web_await(lambda: "p-anzeige-aufgeben-bestaetigung.html?adId=" in self.page.url, timeout = confirmation_timeout)
# extract the ad id from the URL's query parameter # extract the ad id from the URL's query parameter
current_url_query_params = urllib_parse.parse_qs(urllib_parse.urlparse(self.page.url).query) current_url_query_params = urllib_parse.parse_qs(urllib_parse.urlparse(self.page.url).query)
@@ -986,7 +994,8 @@ class KleinanzeigenBot(WebScrapingMixin):
count += 1 count += 1
await self.publish_ad(ad_file, ad_cfg, ad_cfg_orig, published_ads, AdUpdateStrategy.MODIFY) await self.publish_ad(ad_file, ad_cfg, ad_cfg_orig, published_ads, AdUpdateStrategy.MODIFY)
await self.web_await(self.__check_publishing_result, timeout = 5 * 60) publish_timeout = self._timeout("publishing_result")
await self.web_await(self.__check_publishing_result, timeout = publish_timeout)
LOG.info("############################################") LOG.info("############################################")
LOG.info("DONE: updated %s", pluralize("ad", count)) LOG.info("DONE: updated %s", pluralize("ad", count))
@@ -1080,6 +1089,7 @@ class KleinanzeigenBot(WebScrapingMixin):
LOG.debug("Successfully set attribute field [%s] to [%s]...", special_attribute_key, special_attribute_value_str) LOG.debug("Successfully set attribute field [%s] to [%s]...", special_attribute_key, special_attribute_value_str)
async def __set_shipping(self, ad_cfg:Ad, mode:AdUpdateStrategy = AdUpdateStrategy.REPLACE) -> None: async def __set_shipping(self, ad_cfg:Ad, mode:AdUpdateStrategy = AdUpdateStrategy.REPLACE) -> None:
short_timeout = self._timeout("quick_dom")
if ad_cfg.shipping_type == "PICKUP": if ad_cfg.shipping_type == "PICKUP":
try: try:
await self.web_click(By.ID, "radio-pickup") await self.web_click(By.ID, "radio-pickup")
@@ -1091,7 +1101,7 @@ class KleinanzeigenBot(WebScrapingMixin):
if mode == AdUpdateStrategy.MODIFY: if mode == AdUpdateStrategy.MODIFY:
try: try:
# when "Andere Versandmethoden" is not available, go back and start over new # when "Andere Versandmethoden" is not available, go back and start over new
await self.web_find(By.XPATH, '//dialog//button[contains(., "Andere Versandmethoden")]', timeout = 2) await self.web_find(By.XPATH, '//dialog//button[contains(., "Andere Versandmethoden")]', timeout = short_timeout)
except TimeoutError: except TimeoutError:
await self.web_click(By.XPATH, '//dialog//button[contains(., "Zurück")]') await self.web_click(By.XPATH, '//dialog//button[contains(., "Zurück")]')
@@ -1120,7 +1130,7 @@ class KleinanzeigenBot(WebScrapingMixin):
# (important for mode = UPDATE) # (important for mode = UPDATE)
await self.web_find(By.XPATH, await self.web_find(By.XPATH,
'//input[contains(@placeholder, "Versandkosten (optional)")]', '//input[contains(@placeholder, "Versandkosten (optional)")]',
timeout = 2) timeout = short_timeout)
except TimeoutError: except TimeoutError:
await self.web_click(By.XPATH, '//*[contains(@id, "INDIVIDUAL") and contains(@data-testid, "Individueller Versand")]') await self.web_click(By.XPATH, '//*[contains(@id, "INDIVIDUAL") and contains(@data-testid, "Individueller Versand")]')

View File

@@ -33,7 +33,7 @@ class AdExtractor(WebScrapingMixin):
def __init__(self, browser:Browser, config:Config) -> None: def __init__(self, browser:Browser, config:Config) -> None:
super().__init__() super().__init__()
self.browser = browser self.browser = browser
self.config = config self.config:Config = config
async def download_ad(self, ad_id:int) -> None: async def download_ad(self, ad_id:int) -> None:
""" """
@@ -146,9 +146,10 @@ class AdExtractor(WebScrapingMixin):
# --- Pagination handling --- # --- Pagination handling ---
multi_page = False multi_page = False
pagination_timeout = self._timeout("pagination_initial")
try: try:
# Correct selector: Use uppercase '.Pagination' # Correct selector: Use uppercase '.Pagination'
pagination_section = await self.web_find(By.CSS_SELECTOR, ".Pagination", timeout = 10) # Increased timeout slightly pagination_section = await self.web_find(By.CSS_SELECTOR, ".Pagination", timeout = pagination_timeout) # Increased timeout slightly
# Correct selector: Use 'aria-label' # Correct selector: Use 'aria-label'
# Also check if the button is actually present AND potentially enabled (though enabled check isn't strictly necessary here, only for clicking later) # Also check if the button is actually present AND potentially enabled (though enabled check isn't strictly necessary here, only for clicking later)
next_buttons = await self.web_find_all(By.CSS_SELECTOR, 'button[aria-label="Nächste"]', parent = pagination_section) next_buttons = await self.web_find_all(By.CSS_SELECTOR, 'button[aria-label="Nächste"]', parent = pagination_section)
@@ -204,9 +205,10 @@ class AdExtractor(WebScrapingMixin):
break break
# --- Navigate to next page --- # --- Navigate to next page ---
follow_up_timeout = self._timeout("pagination_follow_up")
try: try:
# Find the pagination section again (scope might have changed after scroll/wait) # Find the pagination section again (scope might have changed after scroll/wait)
pagination_section = await self.web_find(By.CSS_SELECTOR, ".Pagination", timeout = 5) pagination_section = await self.web_find(By.CSS_SELECTOR, ".Pagination", timeout = follow_up_timeout)
# Find the "Next" button using the correct aria-label selector and ensure it's not disabled # Find the "Next" button using the correct aria-label selector and ensure it's not disabled
next_button_element = None next_button_element = None
possible_next_buttons = await self.web_find_all(By.CSS_SELECTOR, 'button[aria-label="Nächste"]', parent = pagination_section) possible_next_buttons = await self.web_find_all(By.CSS_SELECTOR, 'button[aria-label="Nächste"]', parent = pagination_section)
@@ -432,8 +434,19 @@ class AdExtractor(WebScrapingMixin):
# Fallback to legacy selectors in case the breadcrumb structure is unexpected. # Fallback to legacy selectors in case the breadcrumb structure is unexpected.
LOG.debug(_("Falling back to legacy breadcrumb selectors; collected ids: %s"), category_ids) LOG.debug(_("Falling back to legacy breadcrumb selectors; collected ids: %s"), category_ids)
fallback_timeout = self._effective_timeout()
try:
category_first_part = await self.web_find(By.CSS_SELECTOR, "a:nth-of-type(2)", parent = category_line) category_first_part = await self.web_find(By.CSS_SELECTOR, "a:nth-of-type(2)", parent = category_line)
category_second_part = await self.web_find(By.CSS_SELECTOR, "a:nth-of-type(3)", parent = category_line) category_second_part = await self.web_find(By.CSS_SELECTOR, "a:nth-of-type(3)", parent = category_line)
except TimeoutError as exc:
LOG.error(
"Legacy breadcrumb selectors not found within %.1f seconds (collected ids: %s)",
fallback_timeout,
category_ids
)
raise TimeoutError(
_("Unable to locate breadcrumb fallback selectors within %(seconds).1f seconds.") % {"seconds": fallback_timeout}
) from exc
href_first:str = str(category_first_part.attrs["href"]) href_first:str = str(category_first_part.attrs["href"])
href_second:str = str(category_second_part.attrs["href"]) href_second:str = str(category_second_part.attrs["href"])
cat_num_first_raw = href_first.rsplit("/", maxsplit = 1)[-1] cat_num_first_raw = href_first.rsplit("/", maxsplit = 1)[-1]

View File

@@ -114,6 +114,55 @@ class CaptchaConfig(ContextualModel):
restart_delay:str = "6h" restart_delay:str = "6h"
class TimeoutConfig(ContextualModel):
multiplier:float = Field(
default = 1.0,
ge = 0.1,
description = "Global multiplier applied to all timeout values."
)
default:float = Field(default = 5.0, ge = 0.0, description = "Baseline timeout for DOM interactions.")
page_load:float = Field(default = 15.0, ge = 1.0, description = "Page load timeout for web_open.")
captcha_detection:float = Field(default = 2.0, ge = 0.1, description = "Timeout for captcha iframe detection.")
sms_verification:float = Field(default = 4.0, ge = 0.1, description = "Timeout for SMS verification prompts.")
gdpr_prompt:float = Field(default = 10.0, ge = 1.0, description = "Timeout for GDPR/consent dialogs.")
publishing_result:float = Field(default = 300.0, ge = 10.0, description = "Timeout for publishing result checks.")
publishing_confirmation:float = Field(default = 20.0, ge = 1.0, description = "Timeout for publish confirmation redirect.")
pagination_initial:float = Field(default = 10.0, ge = 1.0, description = "Timeout for initial pagination lookup.")
pagination_follow_up:float = Field(default = 5.0, ge = 1.0, description = "Timeout for subsequent pagination navigation.")
quick_dom:float = Field(default = 2.0, ge = 0.1, description = "Generic short timeout for transient UI.")
update_check:float = Field(default = 10.0, ge = 1.0, description = "Timeout for GitHub update checks.")
chrome_remote_probe:float = Field(default = 2.0, ge = 0.1, description = "Timeout for local remote-debugging probes.")
chrome_remote_debugging:float = Field(default = 5.0, ge = 1.0, description = "Timeout for remote debugging API calls.")
chrome_binary_detection:float = Field(default = 10.0, ge = 1.0, description = "Timeout for chrome --version subprocesses.")
retry_enabled:bool = Field(default = True, description = "Enable built-in retry/backoff for DOM operations.")
retry_max_attempts:int = Field(default = 2, ge = 1, description = "Max retry attempts when retry is enabled.")
retry_backoff_factor:float = Field(default = 1.5, ge = 1.0, description = "Exponential factor applied per retry attempt.")
def resolve(self, key:str = "default", override:float | None = None) -> float:
"""
Return the base timeout (seconds) for the given key without applying modifiers.
"""
if override is not None:
return float(override)
if key == "default":
return float(self.default)
attr = getattr(self, key, None)
if isinstance(attr, (int, float)):
return float(attr)
return float(self.default)
def effective(self, key:str = "default", override:float | None = None, *, attempt:int = 0) -> float:
"""
Return the effective timeout (seconds) with multiplier/backoff applied.
"""
base = self.resolve(key, override)
backoff = self.retry_backoff_factor ** attempt if attempt > 0 else 1.0
return base * self.multiplier * backoff
def _validate_glob_pattern(v:str) -> str: def _validate_glob_pattern(v:str) -> str:
if not v.strip(): if not v.strip():
raise ValueError("must be a non-empty, non-blank glob pattern") raise ValueError("must be a non-empty, non-blank glob pattern")
@@ -154,6 +203,7 @@ Example:
login:LoginConfig = Field(default_factory = LoginConfig.model_construct, description = "Login credentials") login:LoginConfig = Field(default_factory = LoginConfig.model_construct, description = "Login credentials")
captcha:CaptchaConfig = Field(default_factory = CaptchaConfig) captcha:CaptchaConfig = Field(default_factory = CaptchaConfig)
update_check:UpdateCheckConfig = Field(default_factory = UpdateCheckConfig, description = "Update check configuration") update_check:UpdateCheckConfig = Field(default_factory = UpdateCheckConfig, description = "Update check configuration")
timeouts:TimeoutConfig = Field(default_factory = TimeoutConfig, description = "Centralized timeout configuration.")
def with_values(self, values:dict[str, Any]) -> Config: def with_values(self, values:dict[str, Any]) -> Config:
return Config.model_validate( return Config.model_validate(

View File

@@ -219,6 +219,8 @@ kleinanzeigen_bot/extract.py:
_extract_category_from_ad_page: _extract_category_from_ad_page:
"Breadcrumb container 'vap-brdcrmb' not found; cannot extract ad category: %s": "Breadcrumb-Container 'vap-brdcrmb' nicht gefunden; kann Anzeigenkategorie nicht extrahieren: %s" "Breadcrumb container 'vap-brdcrmb' not found; cannot extract ad category: %s": "Breadcrumb-Container 'vap-brdcrmb' nicht gefunden; kann Anzeigenkategorie nicht extrahieren: %s"
"Falling back to legacy breadcrumb selectors; collected ids: %s": "Weiche auf ältere Breadcrumb-Selektoren aus; gesammelte IDs: %s" "Falling back to legacy breadcrumb selectors; collected ids: %s": "Weiche auf ältere Breadcrumb-Selektoren aus; gesammelte IDs: %s"
"Legacy breadcrumb selectors not found within %.1f seconds (collected ids: %s)": "Ältere Breadcrumb-Selektoren nicht innerhalb von %.1f Sekunden gefunden (gesammelte IDs: %s)"
"Unable to locate breadcrumb fallback selectors within %(seconds).1f seconds.": "Ältere Breadcrumb-Selektoren konnten nicht innerhalb von %(seconds).1f Sekunden gefunden werden."
################################################# #################################################
kleinanzeigen_bot/utils/i18n.py: kleinanzeigen_bot/utils/i18n.py:
@@ -398,11 +400,6 @@ kleinanzeigen_bot/utils/web_scraping_mixin.py:
web_check: web_check:
"Unsupported attribute: %s": "Nicht unterstütztes Attribut: %s" "Unsupported attribute: %s": "Nicht unterstütztes Attribut: %s"
web_find:
"Unsupported selector type: %s": "Nicht unterstützter Selektor-Typ: %s"
web_find_all:
"Unsupported selector type: %s": "Nicht unterstützter Selektor-Typ: %s"
close_browser_session: close_browser_session:
"Closing Browser session...": "Schließe Browser-Sitzung..." "Closing Browser session...": "Schließe Browser-Sitzung..."
@@ -417,6 +414,12 @@ kleinanzeigen_bot/utils/web_scraping_mixin.py:
web_request: web_request:
" -> HTTP %s [%s]...": " -> HTTP %s [%s]..." " -> HTTP %s [%s]...": " -> HTTP %s [%s]..."
_web_find_once:
"Unsupported selector type: %s": "Nicht unterstützter Selektor-Typ: %s"
_web_find_all_once:
"Unsupported selector type: %s": "Nicht unterstützter Selektor-Typ: %s"
diagnose_browser_issues: diagnose_browser_issues:
"=== Browser Connection Diagnostics ===": "=== Browser-Verbindungsdiagnose ===" "=== Browser Connection Diagnostics ===": "=== Browser-Verbindungsdiagnose ==="
"=== End Diagnostics ===": "=== Ende der Diagnose ===" "=== End Diagnostics ===": "=== Ende der Diagnose ==="
@@ -434,6 +437,8 @@ kleinanzeigen_bot/utils/web_scraping_mixin.py:
"(info) Remote debugging port configured: %d": "(Info) Remote-Debugging-Port konfiguriert: %d" "(info) Remote debugging port configured: %d": "(Info) Remote-Debugging-Port konfiguriert: %d"
"(info) Remote debugging port is not open": "(Info) Remote-Debugging-Port ist nicht offen" "(info) Remote debugging port is not open": "(Info) Remote-Debugging-Port ist nicht offen"
"(warn) Unable to inspect browser processes: %s": "(Warnung) Browser-Prozesse konnten nicht überprüft werden: %s"
"(info) No browser processes currently running": "(Info) Derzeit keine Browser-Prozesse aktiv" "(info) No browser processes currently running": "(Info) Derzeit keine Browser-Prozesse aktiv"
"(fail) Running as root - this can cause browser issues": "(Fehler) Läuft als Root - dies kann Browser-Probleme verursachen" "(fail) Running as root - this can cause browser issues": "(Fehler) Läuft als Root - dies kann Browser-Probleme verursachen"

View File

@@ -49,6 +49,10 @@ class UpdateChecker:
""" """
return __version__ return __version__
def _request_timeout(self) -> float:
"""Return the effective timeout for HTTP calls."""
return self.config.timeouts.effective("update_check")
def _get_commit_hash(self, version:str) -> str | None: def _get_commit_hash(self, version:str) -> str | None:
"""Extract the commit hash from a version string. """Extract the commit hash from a version string.
@@ -74,7 +78,7 @@ class UpdateChecker:
try: try:
response = requests.get( response = requests.get(
f"https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/releases/tags/{tag_name}", f"https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/releases/tags/{tag_name}",
timeout = 10 timeout = self._request_timeout()
) )
response.raise_for_status() response.raise_for_status()
data = response.json() data = response.json()
@@ -97,7 +101,7 @@ class UpdateChecker:
try: try:
response = requests.get( response = requests.get(
f"https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/commits/{commit}", f"https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/commits/{commit}",
timeout = 10 timeout = self._request_timeout()
) )
response.raise_for_status() response.raise_for_status()
data = response.json() data = response.json()
@@ -148,7 +152,7 @@ class UpdateChecker:
# Use /releases/latest endpoint for stable releases # Use /releases/latest endpoint for stable releases
response = requests.get( response = requests.get(
"https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/releases/latest", "https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/releases/latest",
timeout = 10 timeout = self._request_timeout()
) )
response.raise_for_status() response.raise_for_status()
release = response.json() release = response.json()
@@ -160,7 +164,7 @@ class UpdateChecker:
# Use /releases endpoint and select the most recent prerelease # Use /releases endpoint and select the most recent prerelease
response = requests.get( response = requests.get(
"https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/releases", "https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/releases",
timeout = 10 timeout = self._request_timeout()
) )
response.raise_for_status() response.raise_for_status()
releases = response.json() releases = response.json()

View File

@@ -78,23 +78,25 @@ def _normalize_browser_name(browser_name:str) -> str:
return "Chrome" return "Chrome"
def detect_chrome_version_from_binary(binary_path:str) -> ChromeVersionInfo | None: def detect_chrome_version_from_binary(binary_path:str, *, timeout:float | None = None) -> ChromeVersionInfo | None:
""" """
Detect Chrome version by running the browser binary. Detect Chrome version by running the browser binary.
Args: Args:
binary_path: Path to the Chrome binary binary_path: Path to the Chrome binary
timeout: Optional timeout (seconds) for the subprocess call
Returns: Returns:
ChromeVersionInfo if successful, None if detection fails ChromeVersionInfo if successful, None if detection fails
""" """
effective_timeout = timeout if timeout is not None else 10.0
try: try:
# Run browser with --version flag # Run browser with --version flag
result = subprocess.run( # noqa: S603 result = subprocess.run( # noqa: S603
[binary_path, "--version"], [binary_path, "--version"],
check = False, capture_output = True, check = False, capture_output = True,
text = True, text = True,
timeout = 10 timeout = effective_timeout
) )
if result.returncode != 0: if result.returncode != 0:
@@ -114,28 +116,30 @@ def detect_chrome_version_from_binary(binary_path:str) -> ChromeVersionInfo | No
return ChromeVersionInfo(version_string, major_version, browser_name) return ChromeVersionInfo(version_string, major_version, browser_name)
except subprocess.TimeoutExpired: except subprocess.TimeoutExpired:
LOG.debug("Browser version command timed out") LOG.debug("Browser version command timed out after %.1fs", effective_timeout)
return None return None
except (subprocess.SubprocessError, ValueError) as e: except (subprocess.SubprocessError, ValueError) as e:
LOG.debug("Failed to detect browser version: %s", str(e)) LOG.debug("Failed to detect browser version: %s", str(e))
return None return None
def detect_chrome_version_from_remote_debugging(host:str = "127.0.0.1", port:int = 9222) -> ChromeVersionInfo | None: def detect_chrome_version_from_remote_debugging(host:str = "127.0.0.1", port:int = 9222, *, timeout:float | None = None) -> ChromeVersionInfo | None:
""" """
Detect Chrome version from remote debugging API. Detect Chrome version from remote debugging API.
Args: Args:
host: Remote debugging host host: Remote debugging host
port: Remote debugging port port: Remote debugging port
timeout: Optional timeout (seconds) for the HTTP request
Returns: Returns:
ChromeVersionInfo if successful, None if detection fails ChromeVersionInfo if successful, None if detection fails
""" """
effective_timeout = timeout if timeout is not None else 5.0
try: try:
# Query the remote debugging API # Query the remote debugging API
url = f"http://{host}:{port}/json/version" url = f"http://{host}:{port}/json/version"
response = urllib.request.urlopen(url, timeout = 5) # noqa: S310 response = urllib.request.urlopen(url, timeout = effective_timeout) # noqa: S310
version_data = json.loads(response.read().decode()) version_data = json.loads(response.read().decode())
# Extract version information # Extract version information
@@ -200,7 +204,10 @@ def validate_chrome_136_configuration(browser_arguments:list[str], user_data_dir
def get_chrome_version_diagnostic_info( def get_chrome_version_diagnostic_info(
binary_path:str | None = None, binary_path:str | None = None,
remote_host:str = "127.0.0.1", remote_host:str = "127.0.0.1",
remote_port:int | None = None remote_port:int | None = None,
*,
remote_timeout:float | None = None,
binary_timeout:float | None = None
) -> dict[str, Any]: ) -> dict[str, Any]:
""" """
Get comprehensive Chrome version diagnostic information. Get comprehensive Chrome version diagnostic information.
@@ -209,6 +216,8 @@ def get_chrome_version_diagnostic_info(
binary_path: Path to Chrome binary (optional) binary_path: Path to Chrome binary (optional)
remote_host: Remote debugging host remote_host: Remote debugging host
remote_port: Remote debugging port (optional) remote_port: Remote debugging port (optional)
remote_timeout: Timeout for remote debugging detection
binary_timeout: Timeout for binary detection
Returns: Returns:
Dictionary with diagnostic information Dictionary with diagnostic information
@@ -223,7 +232,7 @@ def get_chrome_version_diagnostic_info(
# Try binary detection # Try binary detection
if binary_path: if binary_path:
version_info = detect_chrome_version_from_binary(binary_path) version_info = detect_chrome_version_from_binary(binary_path, timeout = binary_timeout)
if version_info: if version_info:
diagnostic_info["binary_detection"] = { diagnostic_info["binary_detection"] = {
"version_string": version_info.version_string, "version_string": version_info.version_string,
@@ -235,7 +244,7 @@ def get_chrome_version_diagnostic_info(
# Try remote debugging detection # Try remote debugging detection
if remote_port: if remote_port:
version_info = detect_chrome_version_from_remote_debugging(remote_host, remote_port) version_info = detect_chrome_version_from_remote_debugging(remote_host, remote_port, timeout = remote_timeout)
if version_info: if version_info:
diagnostic_info["remote_detection"] = { diagnostic_info["remote_detection"] = {
"version_string": version_info.version_string, "version_string": version_info.version_string,

View File

@@ -2,9 +2,9 @@
# SPDX-License-Identifier: AGPL-3.0-or-later # SPDX-License-Identifier: AGPL-3.0-or-later
# SPDX-ArtifactOfProjectHomePage: https://github.com/Second-Hand-Friends/kleinanzeigen-bot/ # SPDX-ArtifactOfProjectHomePage: https://github.com/Second-Hand-Friends/kleinanzeigen-bot/
import asyncio, enum, inspect, json, os, platform, secrets, shutil, subprocess, urllib.request # isort: skip # noqa: S404 import asyncio, enum, inspect, json, os, platform, secrets, shutil, subprocess, urllib.request # isort: skip # noqa: S404
from collections.abc import Callable, Coroutine, Iterable from collections.abc import Awaitable, Callable, Coroutine, Iterable
from gettext import gettext as _ from gettext import gettext as _
from typing import Any, Final, cast from typing import Any, Final, Optional, cast
try: try:
from typing import Never # type: ignore[attr-defined,unused-ignore] # mypy from typing import Never # type: ignore[attr-defined,unused-ignore] # mypy
@@ -15,10 +15,13 @@ import nodriver, psutil # isort: skip
from typing import TYPE_CHECKING, TypeGuard from typing import TYPE_CHECKING, TypeGuard
from nodriver.core.browser import Browser from nodriver.core.browser import Browser
from nodriver.core.config import Config from nodriver.core.config import Config as NodriverConfig
from nodriver.core.element import Element from nodriver.core.element import Element
from nodriver.core.tab import Tab as Page from nodriver.core.tab import Tab as Page
from kleinanzeigen_bot.model.config_model import Config as BotConfig
from kleinanzeigen_bot.model.config_model import TimeoutConfig
from . import loggers, net from . import loggers, net
from .chrome_version_detector import ( from .chrome_version_detector import (
ChromeVersionInfo, ChromeVersionInfo,
@@ -32,6 +35,7 @@ from .misc import T, ensure
if TYPE_CHECKING: if TYPE_CHECKING:
from nodriver.cdp.runtime import RemoteObject from nodriver.cdp.runtime import RemoteObject
# Constants for RemoteObject conversion # Constants for RemoteObject conversion
_KEY_VALUE_PAIR_SIZE = 2 _KEY_VALUE_PAIR_SIZE = 2
@@ -102,6 +106,69 @@ class WebScrapingMixin:
self.browser_config:Final[BrowserConfig] = BrowserConfig() self.browser_config:Final[BrowserConfig] = BrowserConfig()
self.browser:Browser = None # pyright: ignore[reportAttributeAccessIssue] self.browser:Browser = None # pyright: ignore[reportAttributeAccessIssue]
self.page:Page = None # pyright: ignore[reportAttributeAccessIssue] self.page:Page = None # pyright: ignore[reportAttributeAccessIssue]
self._default_timeout_config:TimeoutConfig | None = None
self.config:BotConfig = cast(BotConfig, None)
def _get_timeout_config(self) -> TimeoutConfig:
config = getattr(self, "config", None)
timeouts:TimeoutConfig | None = None
if config is not None:
timeouts = cast(Optional[TimeoutConfig], getattr(config, "timeouts", None))
if timeouts is not None:
return timeouts
if self._default_timeout_config is None:
self._default_timeout_config = TimeoutConfig()
return self._default_timeout_config
def _timeout(self, key:str = "default", override:float | None = None) -> float:
"""
Return the base timeout (seconds) for a given key without applying multipliers.
"""
return self._get_timeout_config().resolve(key, override)
def _effective_timeout(self, key:str = "default", override:float | None = None, *, attempt:int = 0) -> float:
"""
Return the effective timeout (seconds) with multiplier/backoff applied.
"""
return self._get_timeout_config().effective(key, override, attempt = attempt)
def _timeout_attempts(self) -> int:
cfg = self._get_timeout_config()
if not cfg.retry_enabled:
return 1
# Always perform the initial attempt plus the configured number of retries.
return 1 + cfg.retry_max_attempts
async def _run_with_timeout_retries(
self,
operation:Callable[[float], Awaitable[T]],
*,
description:str,
key:str = "default",
override:float | None = None
) -> T:
"""
Execute an async callable with retry/backoff handling for TimeoutError.
"""
attempts = self._timeout_attempts()
for attempt in range(attempts):
effective_timeout = self._effective_timeout(key, override, attempt = attempt)
try:
return await operation(effective_timeout)
except TimeoutError:
if attempt >= attempts - 1:
raise
LOG.debug(
"Retrying %s after TimeoutError (attempt %d/%d, timeout %.1fs)",
description,
attempt + 1,
attempts,
effective_timeout
)
raise TimeoutError(f"{description} failed without executing operation")
async def create_browser_session(self) -> None: async def create_browser_session(self) -> None:
LOG.info("Creating Browser session...") LOG.info("Creating Browser session...")
@@ -137,7 +204,7 @@ class WebScrapingMixin:
f"Make sure the browser is running and the port is not blocked by firewall.") f"Make sure the browser is running and the port is not blocked by firewall.")
try: try:
cfg = Config( cfg = NodriverConfig(
browser_executable_path = self.browser_config.binary_location # actually not necessary but nodriver fails without browser_executable_path = self.browser_config.binary_location # actually not necessary but nodriver fails without
) )
cfg.host = remote_host cfg.host = remote_host
@@ -207,7 +274,7 @@ class WebScrapingMixin:
if self.browser_config.user_data_dir: if self.browser_config.user_data_dir:
LOG.info(" -> Browser user data dir: %s", self.browser_config.user_data_dir) LOG.info(" -> Browser user data dir: %s", self.browser_config.user_data_dir)
cfg = Config( cfg = NodriverConfig(
headless = False, headless = False,
browser_executable_path = self.browser_config.binary_location, browser_executable_path = self.browser_config.binary_location,
browser_args = browser_args, browser_args = browser_args,
@@ -355,7 +422,8 @@ class WebScrapingMixin:
LOG.info("(ok) Remote debugging port is open") LOG.info("(ok) Remote debugging port is open")
# Try to get more information about the debugging endpoint # Try to get more information about the debugging endpoint
try: try:
response = urllib.request.urlopen(f"http://127.0.0.1:{remote_port}/json/version", timeout = 2) probe_timeout = self._effective_timeout("chrome_remote_probe")
response = urllib.request.urlopen(f"http://127.0.0.1:{remote_port}/json/version", timeout = probe_timeout)
version_info = json.loads(response.read().decode()) version_info = json.loads(response.read().decode())
LOG.info("(ok) Remote debugging API accessible - Browser: %s", version_info.get("Browser", "Unknown")) LOG.info("(ok) Remote debugging API accessible - Browser: %s", version_info.get("Browser", "Unknown"))
except Exception as e: except Exception as e:
@@ -378,6 +446,7 @@ class WebScrapingMixin:
except (AssertionError, TypeError): except (AssertionError, TypeError):
target_browser_name = "" target_browser_name = ""
try:
for proc in psutil.process_iter(["pid", "name", "cmdline"]): for proc in psutil.process_iter(["pid", "name", "cmdline"]):
try: try:
proc_name = proc.info["name"] or "" proc_name = proc.info["name"] or ""
@@ -402,6 +471,9 @@ class WebScrapingMixin:
browser_processes.append(proc.info) browser_processes.append(proc.info)
except (psutil.NoSuchProcess, psutil.AccessDenied): except (psutil.NoSuchProcess, psutil.AccessDenied):
pass pass
except (psutil.Error, PermissionError) as exc:
LOG.warning("(warn) Unable to inspect browser processes: %s", exc)
browser_processes = []
if browser_processes: if browser_processes:
LOG.info("(info) Found %d browser processes running", len(browser_processes)) LOG.info("(info) Found %d browser processes running", len(browser_processes))
@@ -486,15 +558,17 @@ class WebScrapingMixin:
raise AssertionError(_("Installed browser could not be detected")) raise AssertionError(_("Installed browser could not be detected"))
async def web_await(self, condition:Callable[[], T | Never | Coroutine[Any, Any, T | Never]], *, async def web_await(self, condition:Callable[[], T | Never | Coroutine[Any, Any, T | Never]], *,
timeout:int | float = 5, timeout_error_message:str = "") -> T: timeout:int | float | None = None, timeout_error_message:str = "", apply_multiplier:bool = True) -> T:
""" """
Blocks/waits until the given condition is met. Blocks/waits until the given condition is met.
:param timeout: timeout in seconds :param timeout: timeout in seconds (base value, multiplier applied unless disabled)
:raises TimeoutError: if element could not be found within time :raises TimeoutError: if element could not be found within time
""" """
loop = asyncio.get_running_loop() loop = asyncio.get_running_loop()
start_at = loop.time() start_at = loop.time()
base_timeout = timeout if timeout is not None else self._timeout()
effective_timeout = self._effective_timeout(override = base_timeout) if apply_multiplier else base_timeout
while True: while True:
await self.page await self.page
@@ -506,13 +580,13 @@ class WebScrapingMixin:
return result return result
except Exception as ex1: except Exception as ex1:
ex = ex1 ex = ex1
if loop.time() - start_at > timeout: if loop.time() - start_at > effective_timeout:
if ex: if ex:
raise ex raise ex
raise TimeoutError(timeout_error_message or f"Condition not met within {timeout} seconds") raise TimeoutError(timeout_error_message or f"Condition not met within {effective_timeout} seconds")
await self.page.sleep(0.5) await self.page.sleep(0.5)
async def web_check(self, selector_type:By, selector_value:str, attr:Is, *, timeout:int | float = 5) -> bool: async def web_check(self, selector_type:By, selector_value:str, attr:Is, *, timeout:int | float | None = None) -> bool:
""" """
Locates an HTML element and returns a state. Locates an HTML element and returns a state.
@@ -559,7 +633,7 @@ class WebScrapingMixin:
""")) """))
raise AssertionError(_("Unsupported attribute: %s") % attr) raise AssertionError(_("Unsupported attribute: %s") % attr)
async def web_click(self, selector_type:By, selector_value:str, *, timeout:int | float = 5) -> Element: async def web_click(self, selector_type:By, selector_value:str, *, timeout:int | float | None = None) -> Element:
""" """
Locates an HTML element by ID. Locates an HTML element by ID.
@@ -652,91 +726,130 @@ class WebScrapingMixin:
# Return primitive values as-is # Return primitive values as-is
return data return data
async def web_find(self, selector_type:By, selector_value:str, *, parent:Element | None = None, timeout:int | float = 5) -> Element: async def web_find(self, selector_type:By, selector_value:str, *, parent:Element | None = None, timeout:int | float | None = None) -> Element:
""" """
Locates an HTML element by the given selector type and value. Locates an HTML element by the given selector type and value.
:param timeout: timeout in seconds :param timeout: timeout in seconds (base value before multiplier/backoff)
:raises TimeoutError: if element could not be found within time :raises TimeoutError: if element could not be found within time
""" """
async def attempt(effective_timeout:float) -> Element:
return await self._web_find_once(selector_type, selector_value, effective_timeout, parent = parent)
return await self._run_with_timeout_retries(
attempt,
description = f"web_find({selector_type.name}, {selector_value})",
key = "default",
override = timeout
)
async def web_find_all(self, selector_type:By, selector_value:str, *, parent:Element | None = None, timeout:int | float | None = None) -> list[Element]:
"""
Locates multiple HTML elements by the given selector type and value.
:param timeout: timeout in seconds (base value before multiplier/backoff)
:raises TimeoutError: if element could not be found within time
"""
async def attempt(effective_timeout:float) -> list[Element]:
return await self._web_find_all_once(selector_type, selector_value, effective_timeout, parent = parent)
return await self._run_with_timeout_retries(
attempt,
description = f"web_find_all({selector_type.name}, {selector_value})",
key = "default",
override = timeout
)
async def _web_find_once(self, selector_type:By, selector_value:str, timeout:float, *, parent:Element | None = None) -> Element:
timeout_suffix = f" within {timeout} seconds."
match selector_type: match selector_type:
case By.ID: case By.ID:
escaped_id = selector_value.translate(METACHAR_ESCAPER) escaped_id = selector_value.translate(METACHAR_ESCAPER)
return await self.web_await( return await self.web_await(
lambda: self.page.query_selector(f"#{escaped_id}", parent), lambda: self.page.query_selector(f"#{escaped_id}", parent),
timeout = timeout, timeout = timeout,
timeout_error_message = f"No HTML element found with ID '{selector_value}' within {timeout} seconds.") timeout_error_message = f"No HTML element found with ID '{selector_value}'{timeout_suffix}",
apply_multiplier = False)
case By.CLASS_NAME: case By.CLASS_NAME:
escaped_classname = selector_value.translate(METACHAR_ESCAPER) escaped_classname = selector_value.translate(METACHAR_ESCAPER)
return await self.web_await( return await self.web_await(
lambda: self.page.query_selector(f".{escaped_classname}", parent), lambda: self.page.query_selector(f".{escaped_classname}", parent),
timeout = timeout, timeout = timeout,
timeout_error_message = f"No HTML element found with CSS class '{selector_value}' within {timeout} seconds.") timeout_error_message = f"No HTML element found with CSS class '{selector_value}'{timeout_suffix}",
apply_multiplier = False)
case By.TAG_NAME: case By.TAG_NAME:
return await self.web_await( return await self.web_await(
lambda: self.page.query_selector(selector_value, parent), lambda: self.page.query_selector(selector_value, parent),
timeout = timeout, timeout = timeout,
timeout_error_message = f"No HTML element found of tag <{selector_value}> within {timeout} seconds.") timeout_error_message = f"No HTML element found of tag <{selector_value}>{timeout_suffix}",
apply_multiplier = False)
case By.CSS_SELECTOR: case By.CSS_SELECTOR:
return await self.web_await( return await self.web_await(
lambda: self.page.query_selector(selector_value, parent), lambda: self.page.query_selector(selector_value, parent),
timeout = timeout, timeout = timeout,
timeout_error_message = f"No HTML element found using CSS selector '{selector_value}' within {timeout} seconds.") timeout_error_message = f"No HTML element found using CSS selector '{selector_value}'{timeout_suffix}",
apply_multiplier = False)
case By.TEXT: case By.TEXT:
ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}") ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}")
return await self.web_await( return await self.web_await(
lambda: self.page.find_element_by_text(selector_value, best_match = True), lambda: self.page.find_element_by_text(selector_value, best_match = True),
timeout = timeout, timeout = timeout,
timeout_error_message = f"No HTML element found containing text '{selector_value}' within {timeout} seconds.") timeout_error_message = f"No HTML element found containing text '{selector_value}'{timeout_suffix}",
apply_multiplier = False)
case By.XPATH: case By.XPATH:
ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}") ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}")
return await self.web_await( return await self.web_await(
lambda: self.page.find_element_by_text(selector_value, best_match = True), lambda: self.page.find_element_by_text(selector_value, best_match = True),
timeout = timeout, timeout = timeout,
timeout_error_message = f"No HTML element found using XPath '{selector_value}' within {timeout} seconds.") timeout_error_message = f"No HTML element found using XPath '{selector_value}'{timeout_suffix}",
apply_multiplier = False)
raise AssertionError(_("Unsupported selector type: %s") % selector_type) raise AssertionError(_("Unsupported selector type: %s") % selector_type)
async def web_find_all(self, selector_type:By, selector_value:str, *, parent:Element | None = None, timeout:int | float = 5) -> list[Element]: async def _web_find_all_once(self, selector_type:By, selector_value:str, timeout:float, *, parent:Element | None = None) -> list[Element]:
""" timeout_suffix = f" within {timeout} seconds."
Locates an HTML element by ID.
:param timeout: timeout in seconds
:raises TimeoutError: if element could not be found within time
"""
match selector_type: match selector_type:
case By.CLASS_NAME: case By.CLASS_NAME:
escaped_classname = selector_value.translate(METACHAR_ESCAPER) escaped_classname = selector_value.translate(METACHAR_ESCAPER)
return await self.web_await( return await self.web_await(
lambda: self.page.query_selector_all(f".{escaped_classname}", parent), lambda: self.page.query_selector_all(f".{escaped_classname}", parent),
timeout = timeout, timeout = timeout,
timeout_error_message = f"No HTML elements found with CSS class '{selector_value}' within {timeout} seconds.") timeout_error_message = f"No HTML elements found with CSS class '{selector_value}'{timeout_suffix}",
apply_multiplier = False)
case By.CSS_SELECTOR: case By.CSS_SELECTOR:
return await self.web_await( return await self.web_await(
lambda: self.page.query_selector_all(selector_value, parent), lambda: self.page.query_selector_all(selector_value, parent),
timeout = timeout, timeout = timeout,
timeout_error_message = f"No HTML elements found using CSS selector '{selector_value}' within {timeout} seconds.") timeout_error_message = f"No HTML elements found using CSS selector '{selector_value}'{timeout_suffix}",
apply_multiplier = False)
case By.TAG_NAME: case By.TAG_NAME:
return await self.web_await( return await self.web_await(
lambda: self.page.query_selector_all(selector_value, parent), lambda: self.page.query_selector_all(selector_value, parent),
timeout = timeout, timeout = timeout,
timeout_error_message = f"No HTML elements found of tag <{selector_value}> within {timeout} seconds.") timeout_error_message = f"No HTML elements found of tag <{selector_value}>{timeout_suffix}",
apply_multiplier = False)
case By.TEXT: case By.TEXT:
ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}") ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}")
return await self.web_await( return await self.web_await(
lambda: self.page.find_elements_by_text(selector_value), lambda: self.page.find_elements_by_text(selector_value),
timeout = timeout, timeout = timeout,
timeout_error_message = f"No HTML elements found containing text '{selector_value}' within {timeout} seconds.") timeout_error_message = f"No HTML elements found containing text '{selector_value}'{timeout_suffix}",
apply_multiplier = False)
case By.XPATH: case By.XPATH:
ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}") ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}")
return await self.web_await( return await self.web_await(
lambda: self.page.find_elements_by_text(selector_value), lambda: self.page.find_elements_by_text(selector_value),
timeout = timeout, timeout = timeout,
timeout_error_message = f"No HTML elements found using XPath '{selector_value}' within {timeout} seconds.") timeout_error_message = f"No HTML elements found using XPath '{selector_value}'{timeout_suffix}",
apply_multiplier = False)
raise AssertionError(_("Unsupported selector type: %s") % selector_type) raise AssertionError(_("Unsupported selector type: %s") % selector_type)
async def web_input(self, selector_type:By, selector_value:str, text:str | int, *, timeout:int | float = 5) -> Element: async def web_input(self, selector_type:By, selector_value:str, text:str | int, *, timeout:int | float | None = None) -> Element:
""" """
Enters text into an HTML input field. Enters text into an HTML input field.
@@ -749,10 +862,10 @@ class WebScrapingMixin:
await self.web_sleep() await self.web_sleep()
return input_field return input_field
async def web_open(self, url:str, *, timeout:int | float = 15_000, reload_if_already_open:bool = False) -> None: async def web_open(self, url:str, *, timeout:int | float | None = None, reload_if_already_open:bool = False) -> None:
""" """
:param url: url to open in browser :param url: url to open in browser
:param timeout: timespan in seconds within the page needs to be loaded :param timeout: timespan in seconds within the page needs to be loaded (base value)
:param reload_if_already_open: if False does nothing if the URL is already open in the browser :param reload_if_already_open: if False does nothing if the URL is already open in the browser
:raises TimeoutException: if page did not open within given timespan :raises TimeoutException: if page did not open within given timespan
""" """
@@ -761,10 +874,15 @@ class WebScrapingMixin:
LOG.debug(" => skipping, [%s] is already open", url) LOG.debug(" => skipping, [%s] is already open", url)
return return
self.page = await self.browser.get(url = url, new_tab = False, new_window = False) self.page = await self.browser.get(url = url, new_tab = False, new_window = False)
await self.web_await(lambda: self.web_execute("document.readyState == 'complete'"), timeout = timeout, page_timeout = self._effective_timeout("page_load", timeout)
timeout_error_message = f"Page did not finish loading within {timeout} seconds.") await self.web_await(
lambda: self.web_execute("document.readyState == 'complete'"),
timeout = page_timeout,
timeout_error_message = f"Page did not finish loading within {page_timeout} seconds.",
apply_multiplier = False
)
async def web_text(self, selector_type:By, selector_value:str, *, parent:Element | None = None, timeout:int | float = 5) -> str: async def web_text(self, selector_type:By, selector_value:str, *, parent:Element | None = None, timeout:int | float | None = None) -> str:
return str(await (await self.web_find(selector_type, selector_value, parent = parent, timeout = timeout)).apply(""" return str(await (await self.web_find(selector_type, selector_value, parent = parent, timeout = timeout)).apply("""
function (elem) { function (elem) {
let sel = window.getSelection() let sel = window.getSelection()
@@ -835,7 +953,7 @@ class WebScrapingMixin:
await self.web_execute(f"window.scrollTo(0, {current_y_pos})") await self.web_execute(f"window.scrollTo(0, {current_y_pos})")
await asyncio.sleep(scroll_length / scroll_speed / 2) # double speed await asyncio.sleep(scroll_length / scroll_speed / 2) # double speed
async def web_select(self, selector_type:By, selector_value:str, selected_value:Any, timeout:int | float = 5) -> Element: async def web_select(self, selector_type:By, selector_value:str, selected_value:Any, timeout:int | float | None = None) -> Element:
""" """
Selects an <option/> of a <select/> HTML element. Selects an <option/> of a <select/> HTML element.
@@ -895,7 +1013,11 @@ class WebScrapingMixin:
port_available = await self._check_port_with_retry(remote_host, remote_port) port_available = await self._check_port_with_retry(remote_host, remote_port)
if port_available: if port_available:
try: try:
version_info = detect_chrome_version_from_remote_debugging(remote_host, remote_port) version_info = detect_chrome_version_from_remote_debugging(
remote_host,
remote_port,
timeout = self._effective_timeout("chrome_remote_debugging")
)
if version_info: if version_info:
LOG.debug(" -> Detected version from existing browser: %s", version_info) LOG.debug(" -> Detected version from existing browser: %s", version_info)
else: else:
@@ -910,7 +1032,10 @@ class WebScrapingMixin:
binary_path = self.browser_config.binary_location binary_path = self.browser_config.binary_location
if binary_path: if binary_path:
LOG.debug(" -> No remote browser detected, trying binary detection") LOG.debug(" -> No remote browser detected, trying binary detection")
version_info = detect_chrome_version_from_binary(binary_path) version_info = detect_chrome_version_from_binary(
binary_path,
timeout = self._effective_timeout("chrome_binary_detection")
)
# Validate if Chrome 136+ detected # Validate if Chrome 136+ detected
if version_info and version_info.is_chrome_136_plus: if version_info and version_info.is_chrome_136_plus:
@@ -977,7 +1102,10 @@ class WebScrapingMixin:
binary_path = self.browser_config.binary_location binary_path = self.browser_config.binary_location
diagnostic_info = get_chrome_version_diagnostic_info( diagnostic_info = get_chrome_version_diagnostic_info(
binary_path = binary_path, binary_path = binary_path,
remote_port = remote_port if remote_port > 0 else None remote_host = "127.0.0.1",
remote_port = remote_port if remote_port > 0 else None,
remote_timeout = self._effective_timeout("chrome_remote_debugging"),
binary_timeout = self._effective_timeout("chrome_binary_detection")
) )
# Report binary detection results # Report binary detection results

View File

@@ -1,7 +1,9 @@
# SPDX-FileCopyrightText: © Sebastian Thomschke and contributors # SPDX-FileCopyrightText: © Sebastian Thomschke and contributors
# SPDX-License-Identifier: AGPL-3.0-or-later # SPDX-License-Identifier: AGPL-3.0-or-later
# SPDX-ArtifactOfProjectHomePage: https://github.com/Second-Hand-Friends/kleinanzeigen-bot/ # SPDX-ArtifactOfProjectHomePage: https://github.com/Second-Hand-Friends/kleinanzeigen-bot/
from kleinanzeigen_bot.model.config_model import AdDefaults, Config import pytest
from kleinanzeigen_bot.model.config_model import AdDefaults, Config, TimeoutConfig
def test_migrate_legacy_description_prefix() -> None: def test_migrate_legacy_description_prefix() -> None:
@@ -74,3 +76,50 @@ def test_minimal_config_validation() -> None:
config = Config.model_validate(minimal_cfg) config = Config.model_validate(minimal_cfg)
assert config.login.username == "dummy" assert config.login.username == "dummy"
assert config.login.password == "dummy" # noqa: S105 assert config.login.password == "dummy" # noqa: S105
def test_timeout_config_defaults_and_effective_values() -> None:
cfg = Config.model_validate({
"login": {"username": "dummy", "password": "dummy"}, # noqa: S105
"timeouts": {
"multiplier": 2.0,
"pagination_initial": 12.0,
"retry_max_attempts": 3,
"retry_backoff_factor": 2.0
}
})
timeouts = cfg.timeouts
base = timeouts.resolve("pagination_initial")
multiplier = timeouts.multiplier
backoff = timeouts.retry_backoff_factor
assert base == 12.0
assert timeouts.effective("pagination_initial") == base * multiplier * (backoff ** 0)
# attempt 1 should apply backoff factor once in addition to multiplier
assert timeouts.effective("pagination_initial", attempt = 1) == base * multiplier * (backoff ** 1)
def test_validate_glob_pattern_rejects_blank_strings() -> None:
with pytest.raises(ValueError, match = "must be a non-empty, non-blank glob pattern"):
Config.model_validate({
"ad_files": [" "],
"ad_defaults": {"contact": {"name": "dummy", "zipcode": "12345"}},
"login": {"username": "dummy", "password": "dummy"}
})
cfg = Config.model_validate({
"ad_files": ["*.yaml"],
"ad_defaults": {"contact": {"name": "dummy", "zipcode": "12345"}},
"login": {"username": "dummy", "password": "dummy"}
})
assert cfg.ad_files == ["*.yaml"]
def test_timeout_config_resolve_returns_specific_value() -> None:
timeouts = TimeoutConfig(default = 4.0, page_load = 12.5)
assert timeouts.resolve("page_load") == 12.5
def test_timeout_config_resolve_falls_back_to_default() -> None:
timeouts = TimeoutConfig(default = 3.0)
assert timeouts.resolve("nonexistent_key") == 3.0

View File

@@ -412,6 +412,60 @@ class TestAdExtractorNavigation:
call(By.CLASS_NAME, "cardbox", parent = ad_list_container_mock), call(By.CLASS_NAME, "cardbox", parent = ad_list_container_mock),
], any_order = False) ], any_order = False)
@pytest.mark.asyncio
async def test_extract_own_ads_urls_paginates_with_enabled_next_button(self, test_extractor:AdExtractor) -> None:
"""Ensure the paginator clicks the first enabled next button and advances."""
ad_list_container_mock = MagicMock()
pagination_section_mock = MagicMock()
cardbox_page_one = MagicMock()
cardbox_page_two = MagicMock()
link_page_one = MagicMock(attrs = {"href": "/s-anzeige/page-one/111"})
link_page_two = MagicMock(attrs = {"href": "/s-anzeige/page-two/222"})
next_button_enabled = AsyncMock()
next_button_enabled.attrs = {}
disabled_button = MagicMock()
disabled_button.attrs = {"disabled": True}
link_queue = [link_page_one, link_page_two]
next_button_call = {"count": 0}
cardbox_call = {"count": 0}
async def fake_web_find(selector_type:By, selector_value:str, *, parent:Element | None = None,
timeout:int | float | None = None) -> Element:
if selector_type == By.ID and selector_value == "my-manageitems-adlist":
return ad_list_container_mock
if selector_type == By.CSS_SELECTOR and selector_value == ".Pagination":
return pagination_section_mock
if selector_type == By.CSS_SELECTOR and selector_value == "div h3 a.text-onSurface":
return link_queue.pop(0)
raise AssertionError(f"Unexpected selector {selector_type} {selector_value}")
async def fake_web_find_all(selector_type:By, selector_value:str, *, parent:Element | None = None,
timeout:int | float | None = None) -> list[Element]:
if selector_type == By.CSS_SELECTOR and selector_value == 'button[aria-label="Nächste"]':
next_button_call["count"] += 1
if next_button_call["count"] == 1:
return [next_button_enabled] # initial detection -> multi page
if next_button_call["count"] == 2:
return [disabled_button, next_button_enabled] # navigation on page 1
return [] # after navigating, stop
if selector_type == By.CLASS_NAME and selector_value == "cardbox":
cardbox_call["count"] += 1
return [cardbox_page_one] if cardbox_call["count"] == 1 else [cardbox_page_two]
raise AssertionError(f"Unexpected find_all selector {selector_type} {selector_value}")
with patch.object(test_extractor, "web_open", new_callable = AsyncMock), \
patch.object(test_extractor, "web_scroll_page_down", new_callable = AsyncMock), \
patch.object(test_extractor, "web_sleep", new_callable = AsyncMock), \
patch.object(test_extractor, "web_find", new_callable = AsyncMock, side_effect = fake_web_find), \
patch.object(test_extractor, "web_find_all", new_callable = AsyncMock, side_effect = fake_web_find_all):
refs = await test_extractor.extract_own_ads_urls()
assert refs == ["/s-anzeige/page-one/111", "/s-anzeige/page-two/222"]
next_button_enabled.click.assert_awaited() # triggered once during navigation
class TestAdExtractorContent: class TestAdExtractorContent:
"""Tests for content extraction functionality.""" """Tests for content extraction functionality."""
@@ -641,6 +695,24 @@ class TestAdExtractorCategory:
mock_web_find.assert_any_call(By.CSS_SELECTOR, "a:nth-of-type(3)", parent = category_line) mock_web_find.assert_any_call(By.CSS_SELECTOR, "a:nth-of-type(3)", parent = category_line)
mock_web_find_all.assert_awaited_once_with(By.CSS_SELECTOR, "a", parent = category_line) mock_web_find_all.assert_awaited_once_with(By.CSS_SELECTOR, "a", parent = category_line)
@pytest.mark.asyncio
async def test_extract_category_legacy_selectors_timeout(self, extractor:AdExtractor, caplog:pytest.LogCaptureFixture) -> None:
"""Ensure fallback timeout logs the error and re-raises with translated message."""
category_line = MagicMock()
async def fake_web_find(selector_type:By, selector_value:str, *, parent:Element | None = None,
timeout:int | float | None = None) -> Element:
if selector_type == By.ID and selector_value == "vap-brdcrmb":
return category_line
raise TimeoutError("legacy selectors missing")
with patch.object(extractor, "web_find", new_callable = AsyncMock, side_effect = fake_web_find), \
patch.object(extractor, "web_find_all", new_callable = AsyncMock, side_effect = TimeoutError), \
caplog.at_level("ERROR"), pytest.raises(TimeoutError, match = "Unable to locate breadcrumb fallback selectors"):
await extractor._extract_category_from_ad_page()
assert any("Legacy breadcrumb selectors not found" in record.message for record in caplog.records)
@pytest.mark.asyncio @pytest.mark.asyncio
# pylint: disable=protected-access # pylint: disable=protected-access
async def test_extract_special_attributes_empty(self, extractor:AdExtractor) -> None: async def test_extract_special_attributes_empty(self, extractor:AdExtractor) -> None:

View File

@@ -95,6 +95,18 @@ class TestUpdateChecker:
with patch("requests.get", return_value = MagicMock(json = lambda: {"target_commitish": "e7a3d46"})): with patch("requests.get", return_value = MagicMock(json = lambda: {"target_commitish": "e7a3d46"})):
assert checker._get_release_commit("latest") == "e7a3d46" assert checker._get_release_commit("latest") == "e7a3d46"
def test_request_timeout_uses_config(self, config:Config, mocker:"MockerFixture") -> None:
"""Ensure HTTP calls honor the timeout configuration."""
config.timeouts.multiplier = 1.5
checker = UpdateChecker(config)
mock_response = MagicMock(json = lambda: {"target_commitish": "abc"})
mock_get = mocker.patch("requests.get", return_value = mock_response)
checker._get_release_commit("latest")
expected_timeout = config.timeouts.effective("update_check")
assert mock_get.call_args.kwargs["timeout"] == expected_timeout
def test_get_commit_date(self, config:Config) -> None: def test_get_commit_date(self, config:Config) -> None:
"""Test that the commit date is correctly retrieved from the GitHub API.""" """Test that the commit date is correctly retrieved from the GitHub API."""
checker = UpdateChecker(config) checker = UpdateChecker(config)

View File

@@ -8,12 +8,14 @@ All rights reserved.
""" """
import json import json
import logging
import os import os
import platform import platform
import shutil import shutil
import zipfile import zipfile
from collections.abc import Awaitable, Callable
from pathlib import Path from pathlib import Path
from typing import NoReturn, Protocol, cast from typing import Any, NoReturn, Protocol, cast
from unittest.mock import AsyncMock, MagicMock, Mock, mock_open, patch from unittest.mock import AsyncMock, MagicMock, Mock, mock_open, patch
import nodriver import nodriver
@@ -22,6 +24,7 @@ import pytest
from nodriver.core.element import Element from nodriver.core.element import Element
from nodriver.core.tab import Tab as Page from nodriver.core.tab import Tab as Page
from kleinanzeigen_bot.model.config_model import Config
from kleinanzeigen_bot.utils import loggers from kleinanzeigen_bot.utils import loggers
from kleinanzeigen_bot.utils.web_scraping_mixin import By, Is, WebScrapingMixin, _is_admin # noqa: PLC2701 from kleinanzeigen_bot.utils.web_scraping_mixin import By, Is, WebScrapingMixin, _is_admin # noqa: PLC2701
@@ -32,7 +35,13 @@ class ConfigProtocol(Protocol):
browser_args:list[str] browser_args:list[str]
user_data_dir:str | None user_data_dir:str | None
def add_extension(self, ext:str) -> None: ... def add_extension(self, ext:str) -> None:
...
def _nodriver_start_mock() -> Mock:
"""Return the nodriver.start mock with proper typing."""
return cast(Mock, cast(Any, nodriver).start)
class TrulyAwaitableMockPage: class TrulyAwaitableMockPage:
@@ -82,6 +91,7 @@ def web_scraper(mock_browser:AsyncMock, mock_page:TrulyAwaitableMockPage) -> Web
scraper = WebScrapingMixin() scraper = WebScrapingMixin()
scraper.browser = mock_browser scraper.browser = mock_browser
scraper.page = mock_page # type: ignore[unused-ignore,reportAttributeAccessIssue] scraper.page = mock_page # type: ignore[unused-ignore,reportAttributeAccessIssue]
scraper.config = Config.model_validate({"login": {"username": "user@example.com", "password": "secret"}}) # noqa: S105
return scraper return scraper
@@ -156,6 +166,21 @@ class TestWebScrapingErrorHandling:
with pytest.raises(Exception, match = "Cannot clear input"): with pytest.raises(Exception, match = "Cannot clear input"):
await web_scraper.web_input(By.ID, "test-id", "test text") await web_scraper.web_input(By.ID, "test-id", "test text")
@pytest.mark.asyncio
async def test_web_input_success_returns_element(self, web_scraper:WebScrapingMixin, mock_page:TrulyAwaitableMockPage) -> None:
"""Successful web_input should send keys, wait, and return the element."""
mock_element = AsyncMock(spec = Element)
mock_page.query_selector.return_value = mock_element
mock_sleep = AsyncMock()
cast(Any, web_scraper).web_sleep = mock_sleep
result = await web_scraper.web_input(By.ID, "username", "hello world", timeout = 1)
assert result is mock_element
mock_element.clear_input.assert_awaited_once()
mock_element.send_keys.assert_awaited_once_with("hello world")
mock_sleep.assert_awaited_once()
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_web_open_timeout(self, web_scraper:WebScrapingMixin, mock_browser:AsyncMock) -> None: async def test_web_open_timeout(self, web_scraper:WebScrapingMixin, mock_browser:AsyncMock) -> None:
"""Test page load timeout in web_open.""" """Test page load timeout in web_open."""
@@ -173,6 +198,19 @@ class TestWebScrapingErrorHandling:
with pytest.raises(TimeoutError, match = "Page did not finish loading within"): with pytest.raises(TimeoutError, match = "Page did not finish loading within"):
await web_scraper.web_open("https://example.com", timeout = 0.1) await web_scraper.web_open("https://example.com", timeout = 0.1)
@pytest.mark.asyncio
async def test_web_open_skip_when_url_already_loaded(self, web_scraper:WebScrapingMixin, mock_browser:AsyncMock, mock_page:TrulyAwaitableMockPage) -> None:
"""web_open should short-circuit when the requested URL is already active."""
mock_browser.get.reset_mock()
mock_page.url = "https://example.com"
mock_execute = AsyncMock()
cast(Any, web_scraper).web_execute = mock_execute
await web_scraper.web_open("https://example.com", reload_if_already_open = False)
mock_browser.get.assert_not_awaited()
mock_execute.assert_not_called()
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_web_request_invalid_response(self, web_scraper:WebScrapingMixin, mock_page:TrulyAwaitableMockPage) -> None: async def test_web_request_invalid_response(self, web_scraper:WebScrapingMixin, mock_page:TrulyAwaitableMockPage) -> None:
"""Test invalid response handling in web_request.""" """Test invalid response handling in web_request."""
@@ -216,6 +254,179 @@ class TestWebScrapingErrorHandling:
with pytest.raises(Exception, match = "Attribute error"): with pytest.raises(Exception, match = "Attribute error"):
await web_scraper.web_check(By.ID, "test-id", Is.DISPLAYED) await web_scraper.web_check(By.ID, "test-id", Is.DISPLAYED)
@pytest.mark.asyncio
async def test_web_find_applies_timeout_multiplier_and_backoff(self, web_scraper:WebScrapingMixin) -> None:
"""Ensure multiplier/backoff logic is honored when timeouts occur."""
assert web_scraper.config is not None
web_scraper.config.timeouts.multiplier = 2.0
web_scraper.config.timeouts.retry_enabled = True
web_scraper.config.timeouts.retry_max_attempts = 2
web_scraper.config.timeouts.retry_backoff_factor = 2.0
recorded:list[tuple[float, bool]] = []
async def fake_web_await(condition:Callable[[], object], *, timeout:float, timeout_error_message:str = "",
apply_multiplier:bool = True) -> Element:
recorded.append((timeout, apply_multiplier))
raise TimeoutError(timeout_error_message or "timeout")
cast(Any, web_scraper).web_await = fake_web_await
with pytest.raises(TimeoutError):
await web_scraper.web_find(By.ID, "test-id", timeout = 0.5)
assert recorded == [(1.0, False), (2.0, False), (4.0, False)]
class TestTimeoutAndRetryHelpers:
"""Test timeout helper utilities in WebScrapingMixin."""
def test_get_timeout_config_prefers_config_timeouts(self, web_scraper:WebScrapingMixin) -> None:
"""_get_timeout_config should return the config-provided timeout model when available."""
custom_config = Config.model_validate({
"login": {"username": "user@example.com", "password": "secret"}, # noqa: S105
"timeouts": {"default": 7.5}
})
web_scraper.config = custom_config
assert web_scraper._get_timeout_config() is custom_config.timeouts
def test_timeout_attempts_respects_retry_switch(self, web_scraper:WebScrapingMixin) -> None:
"""_timeout_attempts should collapse to a single attempt when retries are disabled."""
web_scraper.config.timeouts.retry_enabled = False
assert web_scraper._timeout_attempts() == 1
web_scraper.config.timeouts.retry_enabled = True
web_scraper.config.timeouts.retry_max_attempts = 3
assert web_scraper._timeout_attempts() == 4
@pytest.mark.asyncio
async def test_run_with_timeout_retries_retries_operation(self, web_scraper:WebScrapingMixin) -> None:
"""_run_with_timeout_retries should retry when TimeoutError is raised before succeeding."""
attempts:list[float] = []
async def flaky_operation(timeout:float) -> str:
attempts.append(timeout)
if len(attempts) == 1:
raise TimeoutError("first attempt")
return "done"
web_scraper.config.timeouts.retry_max_attempts = 1
result = await web_scraper._run_with_timeout_retries(flaky_operation, description = "retry-op")
assert result == "done"
assert len(attempts) == 2
@pytest.mark.asyncio
async def test_run_with_timeout_retries_guard_clause(self, web_scraper:WebScrapingMixin) -> None:
"""_run_with_timeout_retries should guard against zero-attempt edge cases."""
async def never_called(timeout:float) -> None:
pytest.fail("operation should not run when attempts are zero")
with patch.object(web_scraper, "_timeout_attempts", return_value = 0), \
pytest.raises(TimeoutError, match = "guarded-op failed without executing operation"):
await web_scraper._run_with_timeout_retries(never_called, description = "guarded-op")
class TestSelectorTimeoutMessages:
"""Ensure selector helpers provide informative timeout messages."""
@pytest.mark.asyncio
@pytest.mark.parametrize(
("selector_type", "selector_value", "expected_message"),
[
(By.TAG_NAME, "section", "No HTML element found of tag <section> within 2.0 seconds."),
(By.CSS_SELECTOR, ".hero", "No HTML element found using CSS selector '.hero' within 2.0 seconds."),
(By.TEXT, "Submit", "No HTML element found containing text 'Submit' within 2.0 seconds."),
(By.XPATH, "//div[@class='hero']", "No HTML element found using XPath '//div[@class='hero']' within 2.0 seconds."),
]
)
async def test_web_find_timeout_suffixes(
self,
web_scraper:WebScrapingMixin,
selector_type:By,
selector_value:str,
expected_message:str
) -> None:
"""web_find should pass descriptive timeout messages for every selector strategy."""
mock_element = AsyncMock(spec = Element)
mock_wait = AsyncMock(return_value = mock_element)
cast(Any, web_scraper).web_await = mock_wait
result = await web_scraper.web_find(selector_type, selector_value, timeout = 2)
assert result is mock_element
call = mock_wait.await_args_list[0]
assert expected_message == call.kwargs["timeout_error_message"]
assert call.kwargs["apply_multiplier"] is False
@pytest.mark.asyncio
@pytest.mark.parametrize(
("selector_type", "selector_value", "expected_message"),
[
(By.CLASS_NAME, "hero", "No HTML elements found with CSS class 'hero' within 1 seconds."),
(By.CSS_SELECTOR, ".card", "No HTML elements found using CSS selector '.card' within 1 seconds."),
(By.TAG_NAME, "article", "No HTML elements found of tag <article> within 1 seconds."),
(By.TEXT, "Listings", "No HTML elements found containing text 'Listings' within 1 seconds."),
(By.XPATH, "//footer", "No HTML elements found using XPath '//footer' within 1 seconds."),
]
)
async def test_web_find_all_once_timeout_suffixes(
self,
web_scraper:WebScrapingMixin,
selector_type:By,
selector_value:str,
expected_message:str
) -> None:
"""_web_find_all_once should surface informative timeout errors for each selector."""
elements = [AsyncMock(spec = Element)]
mock_wait = AsyncMock(return_value = elements)
cast(Any, web_scraper).web_await = mock_wait
result = await web_scraper._web_find_all_once(selector_type, selector_value, 1)
assert result is elements
call = mock_wait.await_args_list[0]
assert expected_message == call.kwargs["timeout_error_message"]
assert call.kwargs["apply_multiplier"] is False
@pytest.mark.asyncio
async def test_web_find_all_delegates_to_retry_helper(self, web_scraper:WebScrapingMixin) -> None:
"""web_find_all should execute via the timeout retry helper."""
elements = [AsyncMock(spec = Element)]
async def fake_retry(operation:Callable[[float], Awaitable[list[Element]]], **kwargs:Any) -> list[Element]:
assert kwargs["description"] == "web_find_all(CLASS_NAME, hero)"
assert kwargs["override"] == 1.5
result = await operation(0.42)
return result
retry_mock = AsyncMock(side_effect = fake_retry)
once_mock = AsyncMock(return_value = elements)
cast(Any, web_scraper)._run_with_timeout_retries = retry_mock
cast(Any, web_scraper)._web_find_all_once = once_mock
result = await web_scraper.web_find_all(By.CLASS_NAME, "hero", timeout = 1.5)
assert result is elements
retry_call = retry_mock.await_args_list[0]
assert retry_call.kwargs["key"] == "default"
assert retry_call.kwargs["override"] == 1.5
once_call = once_mock.await_args_list[0]
assert once_call.args[:2] == (By.CLASS_NAME, "hero")
assert once_call.args[2] == 0.42
@pytest.mark.asyncio
async def test_web_check_unsupported_attribute(self, web_scraper:WebScrapingMixin, mock_page:TrulyAwaitableMockPage) -> None:
"""web_check should raise for unsupported attribute queries."""
mock_element = AsyncMock(spec = Element)
mock_element.attrs = {}
mock_page.query_selector.return_value = mock_element
with pytest.raises(AssertionError, match = "Unsupported attribute"):
await web_scraper.web_check(By.ID, "test-id", cast(Is, object()), timeout = 0.1)
class TestWebScrapingSessionManagement: class TestWebScrapingSessionManagement:
"""Test session management edge cases in WebScrapingMixin.""" """Test session management edge cases in WebScrapingMixin."""
@@ -299,6 +510,39 @@ class TestWebScrapingSessionManagement:
assert scraper.browser is None assert scraper.browser is None
assert scraper.page is None assert scraper.page is None
class TestWebScrolling:
"""Test scrolling helpers."""
@pytest.mark.asyncio
async def test_web_scroll_page_down_scrolls_and_returns(self, web_scraper:WebScrapingMixin) -> None:
"""web_scroll_page_down should scroll both directions when requested."""
scripts:list[str] = []
async def exec_side_effect(script:str) -> int | None:
scripts.append(script)
if script == "document.body.scrollHeight":
return 20
return None
cast(Any, web_scraper).web_execute = AsyncMock(side_effect = exec_side_effect)
with patch("kleinanzeigen_bot.utils.web_scraping_mixin.asyncio.sleep", new_callable = AsyncMock) as mock_sleep:
await web_scraper.web_scroll_page_down(scroll_length = 10, scroll_speed = 10, scroll_back_top = True)
assert scripts[0] == "document.body.scrollHeight"
# Expect four scrollTo operations: two down, two up
assert scripts.count("document.body.scrollHeight") == 1
scroll_calls = [script for script in scripts if script.startswith("window.scrollTo")]
assert scroll_calls == [
"window.scrollTo(0, 10)",
"window.scrollTo(0, 20)",
"window.scrollTo(0, 10)",
"window.scrollTo(0, 0)"
]
sleep_durations = [call.args[0] for call in mock_sleep.await_args_list]
assert sleep_durations == [1.0, 1.0, 0.5, 0.5]
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_session_expiration_handling(self, web_scraper:WebScrapingMixin, mock_browser:AsyncMock) -> None: async def test_session_expiration_handling(self, web_scraper:WebScrapingMixin, mock_browser:AsyncMock) -> None:
"""Test handling of expired browser sessions.""" """Test handling of expired browser sessions."""
@@ -468,7 +712,7 @@ class TestWebScrapingBrowserConfiguration:
def add_extension(self, ext:str) -> None: def add_extension(self, ext:str) -> None:
self._extensions.append(ext) # Use private extensions list self._extensions.append(ext) # Use private extensions list
# Mock nodriver.start to return a mock browser # type: ignore[attr-defined] # Mock nodriver.start to return a mock browser
mock_browser = AsyncMock() mock_browser = AsyncMock()
mock_browser.websocket_url = "ws://localhost:9222" mock_browser.websocket_url = "ws://localhost:9222"
monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser)) monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser))
@@ -557,7 +801,7 @@ class TestWebScrapingBrowserConfiguration:
def add_extension(self, ext:str) -> None: def add_extension(self, ext:str) -> None:
self.extensions.append(ext) self.extensions.append(ext)
# Mock nodriver.start to return a mock browser # type: ignore[attr-defined] # Mock nodriver.start to return a mock browser
mock_browser = AsyncMock() mock_browser = AsyncMock()
mock_browser.websocket_url = "ws://localhost:9222" mock_browser.websocket_url = "ws://localhost:9222"
monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser)) monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser))
@@ -576,7 +820,7 @@ class TestWebScrapingBrowserConfiguration:
await scraper.create_browser_session() await scraper.create_browser_session()
# Verify browser arguments # Verify browser arguments
config = cast(Mock, nodriver.start).call_args[0][0] # type: ignore[attr-defined] config = _nodriver_start_mock().call_args[0][0]
assert "--custom-arg=value" in config.browser_args assert "--custom-arg=value" in config.browser_args
assert "--another-arg" in config.browser_args assert "--another-arg" in config.browser_args
assert "--incognito" in config.browser_args assert "--incognito" in config.browser_args
@@ -589,7 +833,7 @@ class TestWebScrapingBrowserConfiguration:
await scraper.create_browser_session() await scraper.create_browser_session()
# Verify Edge-specific arguments # Verify Edge-specific arguments
config = cast(Mock, nodriver.start).call_args[0][0] # type: ignore[attr-defined] config = _nodriver_start_mock().call_args[0][0]
assert "-inprivate" in config.browser_args assert "-inprivate" in config.browser_args
assert os.environ.get("MSEDGEDRIVER_TELEMETRY_OPTOUT") == "1" assert os.environ.get("MSEDGEDRIVER_TELEMETRY_OPTOUT") == "1"
@@ -620,7 +864,7 @@ class TestWebScrapingBrowserConfiguration:
with zipfile.ZipFile(ext2, "w") as z: with zipfile.ZipFile(ext2, "w") as z:
z.writestr("manifest.json", '{"name": "Test Extension 2"}') z.writestr("manifest.json", '{"name": "Test Extension 2"}')
# Mock nodriver.start to return a mock browser # type: ignore[attr-defined] # Mock nodriver.start to return a mock browser
mock_browser = AsyncMock() mock_browser = AsyncMock()
mock_browser.websocket_url = "ws://localhost:9222" mock_browser.websocket_url = "ws://localhost:9222"
monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser)) monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser))
@@ -644,7 +888,7 @@ class TestWebScrapingBrowserConfiguration:
await scraper.create_browser_session() await scraper.create_browser_session()
# Verify extensions were loaded # Verify extensions were loaded
config = cast(Mock, nodriver.start).call_args[0][0] # type: ignore[attr-defined] config = _nodriver_start_mock().call_args[0][0]
assert len(config._extensions) == 2 assert len(config._extensions) == 2
for ext_path in config._extensions: for ext_path in config._extensions:
assert os.path.exists(ext_path) assert os.path.exists(ext_path)
@@ -713,7 +957,7 @@ class TestWebScrapingBrowserConfiguration:
def add_extension(self, ext:str) -> None: def add_extension(self, ext:str) -> None:
self._extensions.append(ext) self._extensions.append(ext)
# Mock nodriver.start to return a mock browser # type: ignore[attr-defined] # Mock nodriver.start to return a mock browser
mock_browser = AsyncMock() mock_browser = AsyncMock()
mock_browser.websocket_url = "ws://localhost:9222" mock_browser.websocket_url = "ws://localhost:9222"
monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser)) monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser))
@@ -772,7 +1016,7 @@ class TestWebScrapingBrowserConfiguration:
temp_file = tmp_path / "temp_resource" temp_file = tmp_path / "temp_resource"
temp_file.write_text("test") temp_file.write_text("test")
# Mock nodriver.start to raise an exception # type: ignore[attr-defined] # Mock nodriver.start to raise an exception
async def mock_start_fail(*args:object, **kwargs:object) -> NoReturn: async def mock_start_fail(*args:object, **kwargs:object) -> NoReturn:
if temp_file.exists(): if temp_file.exists():
temp_file.unlink() temp_file.unlink()
@@ -801,7 +1045,7 @@ class TestWebScrapingBrowserConfiguration:
assert scraper.browser is None assert scraper.browser is None
assert scraper.page is None assert scraper.page is None
# Now patch nodriver.start to return a new mock browser each time # type: ignore[attr-defined] # Now patch nodriver.start to return a new mock browser each time
mock_browser = make_mock_browser() mock_browser = make_mock_browser()
mock_page = TrulyAwaitableMockPage() mock_page = TrulyAwaitableMockPage()
mock_browser.get = AsyncMock(return_value = mock_page) mock_browser.get = AsyncMock(return_value = mock_page)
@@ -1445,6 +1689,46 @@ class TestWebScrapingDiagnostics:
# Should not raise any exceptions # Should not raise any exceptions
web_scraper.diagnose_browser_issues() web_scraper.diagnose_browser_issues()
def test_diagnose_browser_issues_handles_per_process_errors(
self, scraper_with_config:WebScrapingMixin, caplog:pytest.LogCaptureFixture
) -> None:
"""diagnose_browser_issues should ignore psutil errors raised per process."""
caplog.set_level(logging.INFO)
class FailingProcess:
@property
def info(self) -> dict[str, object]:
raise psutil.AccessDenied(pid = 999)
with patch("os.path.exists", return_value = True), \
patch("os.access", return_value = True), \
patch("psutil.process_iter", return_value = [FailingProcess()]), \
patch("platform.system", return_value = "Linux"), \
patch("kleinanzeigen_bot.utils.web_scraping_mixin._is_admin", return_value = False), \
patch.object(scraper_with_config, "_diagnose_chrome_version_issues"):
scraper_with_config.browser_config.binary_location = "/usr/bin/chrome"
scraper_with_config.diagnose_browser_issues()
assert "(info) No browser processes currently running" in caplog.text
def test_diagnose_browser_issues_handles_global_psutil_failure(
self, scraper_with_config:WebScrapingMixin, caplog:pytest.LogCaptureFixture
) -> None:
"""diagnose_browser_issues should log a warning if psutil.process_iter fails entirely."""
caplog.set_level(logging.WARNING)
with patch("os.path.exists", return_value = True), \
patch("os.access", return_value = True), \
patch("psutil.process_iter", side_effect = psutil.Error("boom")), \
patch("platform.system", return_value = "Linux"), \
patch("kleinanzeigen_bot.utils.web_scraping_mixin._is_admin", return_value = False), \
patch.object(scraper_with_config, "_diagnose_chrome_version_issues"):
scraper_with_config.browser_config.binary_location = "/usr/bin/chrome"
scraper_with_config.diagnose_browser_issues()
assert "(warn) Unable to inspect browser processes:" in caplog.text
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_validate_chrome_version_configuration_port_open_but_api_inaccessible( async def test_validate_chrome_version_configuration_port_open_but_api_inaccessible(
self, web_scraper:WebScrapingMixin self, web_scraper:WebScrapingMixin

View File

@@ -41,8 +41,11 @@ class TestWebScrapingMixinChromeVersionValidation:
# Test validation # Test validation
await scraper._validate_chrome_version_configuration() await scraper._validate_chrome_version_configuration()
# Verify detection was called correctly # Verify detection was called correctly with timeout
mock_detect.assert_called_once_with("/path/to/chrome") assert mock_detect.call_count == 1
args, kwargs = mock_detect.call_args
assert args[0] == "/path/to/chrome"
assert kwargs["timeout"] == pytest.approx(10.0)
# Verify validation passed (no exception raised) # Verify validation passed (no exception raised)
# The validation is now done internally in _validate_chrome_136_configuration # The validation is now done internally in _validate_chrome_136_configuration
@@ -73,7 +76,10 @@ class TestWebScrapingMixinChromeVersionValidation:
# Test validation should log error but not raise exception due to error handling # Test validation should log error but not raise exception due to error handling
await scraper._validate_chrome_version_configuration() await scraper._validate_chrome_version_configuration()
# Verify error was logged # Verify detection call and logged error
assert mock_detect.call_count == 1
_, kwargs = mock_detect.call_args
assert kwargs["timeout"] == pytest.approx(10.0)
assert "Chrome 136+ configuration validation failed" in caplog.text assert "Chrome 136+ configuration validation failed" in caplog.text
assert "Chrome 136+ requires --user-data-dir" in caplog.text assert "Chrome 136+ requires --user-data-dir" in caplog.text
finally: finally:
@@ -104,12 +110,37 @@ class TestWebScrapingMixinChromeVersionValidation:
await scraper._validate_chrome_version_configuration() await scraper._validate_chrome_version_configuration()
# Verify detection was called but no validation # Verify detection was called but no validation
mock_detect.assert_called_once_with("/path/to/chrome") assert mock_detect.call_count == 1
_, kwargs = mock_detect.call_args
assert kwargs["timeout"] == pytest.approx(10.0)
finally: finally:
# Restore environment # Restore environment
if original_env: if original_env:
os.environ["PYTEST_CURRENT_TEST"] = original_env os.environ["PYTEST_CURRENT_TEST"] = original_env
@patch("kleinanzeigen_bot.utils.chrome_version_detector.detect_chrome_version_from_binary")
@patch("kleinanzeigen_bot.utils.web_scraping_mixin.detect_chrome_version_from_remote_debugging")
async def test_validate_chrome_version_logs_remote_detection(
self,
mock_remote:Mock,
mock_binary:Mock,
scraper:WebScrapingMixin,
caplog:pytest.LogCaptureFixture
) -> None:
"""When a remote browser responds, the detected version should be logged."""
mock_remote.return_value = ChromeVersionInfo("136.0.6778.0", 136, "Chrome")
mock_binary.return_value = None
scraper.browser_config.arguments = ["--remote-debugging-port=9222"]
scraper.browser_config.binary_location = "/path/to/chrome"
caplog.set_level("DEBUG")
with patch.dict(os.environ, {}, clear = True), \
patch.object(scraper, "_check_port_with_retry", return_value = True):
await scraper._validate_chrome_version_configuration()
assert "Detected version from existing browser" in caplog.text
mock_remote.assert_called_once()
@patch("kleinanzeigen_bot.utils.chrome_version_detector.detect_chrome_version_from_binary") @patch("kleinanzeigen_bot.utils.chrome_version_detector.detect_chrome_version_from_binary")
async def test_validate_chrome_version_configuration_no_binary_location( async def test_validate_chrome_version_configuration_no_binary_location(
self, mock_detect:Mock, scraper:WebScrapingMixin self, mock_detect:Mock, scraper:WebScrapingMixin
@@ -145,7 +176,9 @@ class TestWebScrapingMixinChromeVersionValidation:
await scraper._validate_chrome_version_configuration() await scraper._validate_chrome_version_configuration()
# Verify detection was called # Verify detection was called
mock_detect.assert_called_once_with("/path/to/chrome") assert mock_detect.call_count == 1
_, kwargs = mock_detect.call_args
assert kwargs["timeout"] == pytest.approx(10.0)
# Verify debug log message (line 824) # Verify debug log message (line 824)
assert "Could not detect browser version, skipping validation" in caplog.text assert "Could not detect browser version, skipping validation" in caplog.text
@@ -201,10 +234,13 @@ class TestWebScrapingMixinChromeVersionDiagnostics:
assert "Chrome 136+ detected - security validation required" in caplog.text assert "Chrome 136+ detected - security validation required" in caplog.text
# Verify mocks were called # Verify mocks were called
mock_get_diagnostic.assert_called_once_with( assert mock_get_diagnostic.call_count == 1
binary_path = "/path/to/chrome", kwargs = mock_get_diagnostic.call_args.kwargs
remote_port = 9222 assert kwargs["binary_path"] == "/path/to/chrome"
) assert kwargs["remote_port"] == 9222
assert kwargs["remote_host"] == "127.0.0.1"
assert kwargs["remote_timeout"] > 0
assert kwargs["binary_timeout"] > 0
finally: finally:
# Restore environment # Restore environment
if original_env: if original_env:
@@ -364,10 +400,12 @@ class TestWebScrapingMixinChromeVersionDiagnostics:
assert "Chrome pre-136 detected - no special security requirements" in caplog.text assert "Chrome pre-136 detected - no special security requirements" in caplog.text
# Verify that the diagnostic function was called with correct parameters # Verify that the diagnostic function was called with correct parameters
mock_get_diagnostic.assert_called_once_with( assert mock_get_diagnostic.call_count == 1
binary_path = "/path/to/chrome", kwargs = mock_get_diagnostic.call_args.kwargs
remote_port = None assert kwargs["binary_path"] == "/path/to/chrome"
) assert kwargs["remote_port"] is None
assert kwargs["remote_timeout"] > 0
assert kwargs["binary_timeout"] > 0
finally: finally:
# Restore environment # Restore environment
if original_env: if original_env: