mirror of
https://github.com/Second-Hand-Friends/kleinanzeigen-bot.git
synced 2026-03-12 18:41:50 +01:00
feat: add configurable timeouts (#673)
## ℹ️ Description - Related issues: #671, #658 - Introduces configurable timeout controls plus retry/backoff handling for flaky DOM operations. We often see timeouts which are note reproducible in certain configurations. I suspect timeout issues based on a combination of internet speed, browser, os, age of the computer and the weather. This PR introduces a comprehensive config model to tweak timeouts. ## 📋 Changes Summary - add TimeoutConfig to the main config/schema and expose timeouts in README/docs - wire WebScrapingMixin, extractor, update checker, and browser diagnostics to honor the configurable timeouts and retries - update translations/tests to cover the new behaviour and ensure lint/mypy/pyright pipelines remain green ### ⚙️ Type of Change - [ ] 🐞 Bug fix (non-breaking change which fixes an issue) - [x] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Centralized, configurable timeout system for web interactions, detection flows, publishing, and pagination. * Optional retry with exponential backoff for operations that time out. * **Improvements** * Replaced fixed wait times with dynamic timeouts throughout workflows. * More informative timeout-related messages and diagnostics. * **Tests** * New and expanded test coverage for timeout behavior, pagination, diagnostics, and retry logic. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
This commit is contained in:
@@ -187,6 +187,14 @@ All Python files must start with SPDX license headers:
|
|||||||
- Use appropriate log levels (DEBUG, INFO, WARNING, ERROR)
|
- Use appropriate log levels (DEBUG, INFO, WARNING, ERROR)
|
||||||
- Log important state changes and decision points
|
- Log important state changes and decision points
|
||||||
|
|
||||||
|
#### Timeout configuration
|
||||||
|
- The default timeout (`timeouts.default`) already wraps all standard DOM helpers (`web_find`, `web_click`, etc.) via `WebScrapingMixin._timeout/_effective_timeout`. Use it unless a workflow clearly needs a different SLA.
|
||||||
|
- Reserve `timeouts.quick_dom` for transient overlays (shipping dialogs, payment prompts, toast banners) that should render almost instantly; call `self._timeout("quick_dom")` in those spots to keep the UI responsive.
|
||||||
|
- For single selectors that occasionally need more headroom, pass an inline override instead of creating a new config key, e.g. `custom = self._timeout(override = 12.5); await self.web_find(..., timeout = custom)`.
|
||||||
|
- Use `_timeout()` when you just need the raw configured value (with optional override); use `_effective_timeout()` when you rely on the global multiplier and retry backoff for a given attempt (e.g. inside `_run_with_timeout_retries`).
|
||||||
|
- Add a new timeout key only when a recurring workflow has its own timing profile (pagination, captcha detection, publishing confirmations, Chrome probes, etc.). Whenever you add one, extend `TimeoutConfig`, document it in the sample `timeouts:` block in `README.md`, and explain it in `docs/BROWSER_TROUBLESHOOTING.md`.
|
||||||
|
- Encourage users to raise `timeouts.multiplier` when everything is slow, and override existing keys in `config.yaml` before introducing new ones. This keeps the configuration surface minimal.
|
||||||
|
|
||||||
#### Examples
|
#### Examples
|
||||||
```python
|
```python
|
||||||
def parse_duration(text: str) -> timedelta:
|
def parse_duration(text: str) -> timedelta:
|
||||||
@@ -297,4 +305,3 @@ See the [LICENSE.txt](LICENSE.txt) file for our project's licensing. All source
|
|||||||
- Use the translation system for all output—**never hardcode German or other languages** in the code.
|
- Use the translation system for all output—**never hardcode German or other languages** in the code.
|
||||||
- If you add or change a user-facing message, update the translation file and ensure that translation completeness tests pass (`tests/unit/test_translations.py`).
|
- If you add or change a user-facing message, update the translation file and ensure that translation completeness tests pass (`tests/unit/test_translations.py`).
|
||||||
- Review the translation guidelines and patterns in the codebase for correct usage.
|
- Review the translation guidelines and patterns in the codebase for correct usage.
|
||||||
|
|
||||||
|
|||||||
23
README.md
23
README.md
@@ -277,6 +277,27 @@ categories:
|
|||||||
Verschenken & Tauschen > Verleihen: 272/274
|
Verschenken & Tauschen > Verleihen: 272/274
|
||||||
Verschenken & Tauschen > Verschenken: 272/192
|
Verschenken & Tauschen > Verschenken: 272/192
|
||||||
|
|
||||||
|
# timeout tuning (optional)
|
||||||
|
timeouts:
|
||||||
|
multiplier: 1.0 # Scale all timeouts (e.g. 2.0 for slower networks)
|
||||||
|
default: 5.0 # Base timeout for web_find/web_click/etc.
|
||||||
|
page_load: 15.0 # Timeout for web_open page loads
|
||||||
|
captcha_detection: 2.0 # Timeout for captcha iframe detection
|
||||||
|
sms_verification: 4.0 # Timeout for SMS verification banners
|
||||||
|
gdpr_prompt: 10.0 # Timeout when handling GDPR dialogs
|
||||||
|
publishing_result: 300.0 # Timeout for publishing status checks
|
||||||
|
publishing_confirmation: 20.0 # Timeout for publish confirmation redirect
|
||||||
|
pagination_initial: 10.0 # Timeout for first pagination lookup
|
||||||
|
pagination_follow_up: 5.0 # Timeout for subsequent pagination clicks
|
||||||
|
quick_dom: 2.0 # Generic short DOM timeout (shipping dialogs, etc.)
|
||||||
|
update_check: 10.0 # Timeout for GitHub update requests
|
||||||
|
chrome_remote_probe: 2.0 # Timeout for local remote-debugging probes
|
||||||
|
chrome_remote_debugging: 5.0 # Timeout for remote debugging API calls
|
||||||
|
chrome_binary_detection: 10.0 # Timeout for chrome --version subprocess
|
||||||
|
retry_enabled: true # Enables DOM retry/backoff when timeouts occur
|
||||||
|
retry_max_attempts: 2
|
||||||
|
retry_backoff_factor: 1.5
|
||||||
|
|
||||||
# download configuration
|
# download configuration
|
||||||
download:
|
download:
|
||||||
include_all_matching_shipping_options: false # if true, all shipping options matching the package size will be included
|
include_all_matching_shipping_options: false # if true, all shipping options matching the package size will be included
|
||||||
@@ -329,6 +350,8 @@ login:
|
|||||||
password: ""
|
password: ""
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Slow networks or sluggish remote browsers often just need a higher `timeouts.multiplier`, while truly problematic selectors can get explicit values directly under `timeouts`. Remember to regenerate the schemas after changing the configuration model so editors stay in sync.
|
||||||
|
|
||||||
### <a name="ad-config"></a>2) Ad configuration
|
### <a name="ad-config"></a>2) Ad configuration
|
||||||
|
|
||||||
Each ad is described in a separate JSON or YAML file with prefix `ad_<filename>`. The prefix is configurable in config file.
|
Each ad is described in a separate JSON or YAML file with prefix `ad_<filename>`. The prefix is configurable in config file.
|
||||||
|
|||||||
@@ -59,6 +59,18 @@ Please update your configuration to include --user-data-dir for remote debugging
|
|||||||
|
|
||||||
The bot will also provide specific instructions on how to fix your configuration.
|
The bot will also provide specific instructions on how to fix your configuration.
|
||||||
|
|
||||||
|
### Issue: Slow page loads or recurring TimeoutError
|
||||||
|
|
||||||
|
**Symptoms:**
|
||||||
|
- `_extract_category_from_ad_page` fails intermittently due to breadcrumb lookups timing out
|
||||||
|
- Captcha/SMS/GDPR prompts appear right after a timeout
|
||||||
|
- Requests to GitHub's API fail sporadically with timeout errors
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. Increase `timeouts.multiplier` in `config.yaml` (e.g. `2.0` doubles every timeout consistently).
|
||||||
|
2. Override specific keys under `timeouts` (e.g. `pagination_initial: 20.0`) if only a single selector is problematic.
|
||||||
|
3. Keep `retry_enabled` on so that DOM lookups are retried with exponential backoff.
|
||||||
|
|
||||||
## Common Issues and Solutions
|
## Common Issues and Solutions
|
||||||
|
|
||||||
### Issue 1: "Failed to connect to browser" with "root" error
|
### Issue 1: "Failed to connect to browser" with "root" error
|
||||||
|
|||||||
@@ -359,6 +359,137 @@
|
|||||||
"title": "PublishingConfig",
|
"title": "PublishingConfig",
|
||||||
"type": "object"
|
"type": "object"
|
||||||
},
|
},
|
||||||
|
"TimeoutConfig": {
|
||||||
|
"properties": {
|
||||||
|
"multiplier": {
|
||||||
|
"default": 1.0,
|
||||||
|
"description": "Global multiplier applied to all timeout values.",
|
||||||
|
"minimum": 0.1,
|
||||||
|
"title": "Multiplier",
|
||||||
|
"type": "number"
|
||||||
|
},
|
||||||
|
"default": {
|
||||||
|
"type": "number",
|
||||||
|
"minimum": 0.0,
|
||||||
|
"default": 5.0,
|
||||||
|
"description": "Baseline timeout for DOM interactions.",
|
||||||
|
"title": "Default"
|
||||||
|
},
|
||||||
|
"page_load": {
|
||||||
|
"default": 15.0,
|
||||||
|
"description": "Page load timeout for web_open.",
|
||||||
|
"minimum": 1.0,
|
||||||
|
"title": "Page Load",
|
||||||
|
"type": "number"
|
||||||
|
},
|
||||||
|
"captcha_detection": {
|
||||||
|
"default": 2.0,
|
||||||
|
"description": "Timeout for captcha iframe detection.",
|
||||||
|
"minimum": 0.1,
|
||||||
|
"title": "Captcha Detection",
|
||||||
|
"type": "number"
|
||||||
|
},
|
||||||
|
"sms_verification": {
|
||||||
|
"default": 4.0,
|
||||||
|
"description": "Timeout for SMS verification prompts.",
|
||||||
|
"minimum": 0.1,
|
||||||
|
"title": "Sms Verification",
|
||||||
|
"type": "number"
|
||||||
|
},
|
||||||
|
"gdpr_prompt": {
|
||||||
|
"default": 10.0,
|
||||||
|
"description": "Timeout for GDPR/consent dialogs.",
|
||||||
|
"minimum": 1.0,
|
||||||
|
"title": "Gdpr Prompt",
|
||||||
|
"type": "number"
|
||||||
|
},
|
||||||
|
"publishing_result": {
|
||||||
|
"default": 300.0,
|
||||||
|
"description": "Timeout for publishing result checks.",
|
||||||
|
"minimum": 10.0,
|
||||||
|
"title": "Publishing Result",
|
||||||
|
"type": "number"
|
||||||
|
},
|
||||||
|
"publishing_confirmation": {
|
||||||
|
"default": 20.0,
|
||||||
|
"description": "Timeout for publish confirmation redirect.",
|
||||||
|
"minimum": 1.0,
|
||||||
|
"title": "Publishing Confirmation",
|
||||||
|
"type": "number"
|
||||||
|
},
|
||||||
|
"pagination_initial": {
|
||||||
|
"default": 10.0,
|
||||||
|
"description": "Timeout for initial pagination lookup.",
|
||||||
|
"minimum": 1.0,
|
||||||
|
"title": "Pagination Initial",
|
||||||
|
"type": "number"
|
||||||
|
},
|
||||||
|
"pagination_follow_up": {
|
||||||
|
"default": 5.0,
|
||||||
|
"description": "Timeout for subsequent pagination navigation.",
|
||||||
|
"minimum": 1.0,
|
||||||
|
"title": "Pagination Follow Up",
|
||||||
|
"type": "number"
|
||||||
|
},
|
||||||
|
"quick_dom": {
|
||||||
|
"default": 2.0,
|
||||||
|
"description": "Generic short timeout for transient UI.",
|
||||||
|
"minimum": 0.1,
|
||||||
|
"title": "Quick Dom",
|
||||||
|
"type": "number"
|
||||||
|
},
|
||||||
|
"update_check": {
|
||||||
|
"default": 10.0,
|
||||||
|
"description": "Timeout for GitHub update checks.",
|
||||||
|
"minimum": 1.0,
|
||||||
|
"title": "Update Check",
|
||||||
|
"type": "number"
|
||||||
|
},
|
||||||
|
"chrome_remote_probe": {
|
||||||
|
"default": 2.0,
|
||||||
|
"description": "Timeout for local remote-debugging probes.",
|
||||||
|
"minimum": 0.1,
|
||||||
|
"title": "Chrome Remote Probe",
|
||||||
|
"type": "number"
|
||||||
|
},
|
||||||
|
"chrome_remote_debugging": {
|
||||||
|
"default": 5.0,
|
||||||
|
"description": "Timeout for remote debugging API calls.",
|
||||||
|
"minimum": 1.0,
|
||||||
|
"title": "Chrome Remote Debugging",
|
||||||
|
"type": "number"
|
||||||
|
},
|
||||||
|
"chrome_binary_detection": {
|
||||||
|
"default": 10.0,
|
||||||
|
"description": "Timeout for chrome --version subprocesses.",
|
||||||
|
"minimum": 1.0,
|
||||||
|
"title": "Chrome Binary Detection",
|
||||||
|
"type": "number"
|
||||||
|
},
|
||||||
|
"retry_enabled": {
|
||||||
|
"default": true,
|
||||||
|
"description": "Enable built-in retry/backoff for DOM operations.",
|
||||||
|
"title": "Retry Enabled",
|
||||||
|
"type": "boolean"
|
||||||
|
},
|
||||||
|
"retry_max_attempts": {
|
||||||
|
"default": 2,
|
||||||
|
"description": "Max retry attempts when retry is enabled.",
|
||||||
|
"minimum": 1,
|
||||||
|
"title": "Retry Max Attempts",
|
||||||
|
"type": "integer"
|
||||||
|
},
|
||||||
|
"retry_backoff_factor": {
|
||||||
|
"default": 1.5,
|
||||||
|
"description": "Exponential factor applied per retry attempt.",
|
||||||
|
"minimum": 1.0,
|
||||||
|
"title": "Retry Backoff Factor",
|
||||||
|
"type": "number"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"title": "TimeoutConfig",
|
||||||
|
"type": "object"
|
||||||
|
},
|
||||||
"UpdateCheckConfig": {
|
"UpdateCheckConfig": {
|
||||||
"description": "Configuration for update checking functionality.\n\nAttributes:\n enabled: Whether update checking is enabled.\n channel: Which release channel to check ('latest' for stable, 'preview' for prereleases).\n interval: How often to check for updates (e.g. '7d', '1d').\n If the interval is invalid, too short (<1d), or too long (>30d),\n the bot will log a warning and use a default interval for this run:\n - 1d for 'preview' channel\n - 7d for 'latest' channel\n The config file is not changed automatically; please fix your config to avoid repeated warnings.",
|
"description": "Configuration for update checking functionality.\n\nAttributes:\n enabled: Whether update checking is enabled.\n channel: Which release channel to check ('latest' for stable, 'preview' for prereleases).\n interval: How often to check for updates (e.g. '7d', '1d').\n If the interval is invalid, too short (<1d), or too long (>30d),\n the bot will log a warning and use a default interval for this run:\n - 1d for 'preview' channel\n - 7d for 'latest' channel\n The config file is not changed automatically; please fix your config to avoid repeated warnings.",
|
||||||
"properties": {
|
"properties": {
|
||||||
@@ -428,6 +559,10 @@
|
|||||||
"update_check": {
|
"update_check": {
|
||||||
"$ref": "#/$defs/UpdateCheckConfig",
|
"$ref": "#/$defs/UpdateCheckConfig",
|
||||||
"description": "Update check configuration"
|
"description": "Update check configuration"
|
||||||
|
},
|
||||||
|
"timeouts": {
|
||||||
|
"$ref": "#/$defs/TimeoutConfig",
|
||||||
|
"description": "Centralized timeout configuration."
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"title": "Config",
|
"title": "Config",
|
||||||
|
|||||||
@@ -573,8 +573,9 @@ class KleinanzeigenBot(WebScrapingMixin):
|
|||||||
|
|
||||||
async def check_and_wait_for_captcha(self, *, is_login_page:bool = True) -> None:
|
async def check_and_wait_for_captcha(self, *, is_login_page:bool = True) -> None:
|
||||||
try:
|
try:
|
||||||
|
captcha_timeout = self._timeout("captcha_detection")
|
||||||
await self.web_find(By.CSS_SELECTOR,
|
await self.web_find(By.CSS_SELECTOR,
|
||||||
"iframe[name^='a-'][src^='https://www.google.com/recaptcha/api2/anchor?']", timeout = 2)
|
"iframe[name^='a-'][src^='https://www.google.com/recaptcha/api2/anchor?']", timeout = captcha_timeout)
|
||||||
|
|
||||||
if not is_login_page and self.config.captcha.auto_restart:
|
if not is_login_page and self.config.captcha.auto_restart:
|
||||||
LOG.warning("Captcha recognized - auto-restart enabled, abort run...")
|
LOG.warning("Captcha recognized - auto-restart enabled, abort run...")
|
||||||
@@ -624,7 +625,8 @@ class KleinanzeigenBot(WebScrapingMixin):
|
|||||||
|
|
||||||
async def handle_after_login_logic(self) -> None:
|
async def handle_after_login_logic(self) -> None:
|
||||||
try:
|
try:
|
||||||
await self.web_find(By.TEXT, "Wir haben dir gerade einen 6-stelligen Code für die Telefonnummer", timeout = 4)
|
sms_timeout = self._timeout("sms_verification")
|
||||||
|
await self.web_find(By.TEXT, "Wir haben dir gerade einen 6-stelligen Code für die Telefonnummer", timeout = sms_timeout)
|
||||||
LOG.warning("############################################")
|
LOG.warning("############################################")
|
||||||
LOG.warning("# Device verification message detected. Please follow the instruction displayed in the Browser.")
|
LOG.warning("# Device verification message detected. Please follow the instruction displayed in the Browser.")
|
||||||
LOG.warning("############################################")
|
LOG.warning("############################################")
|
||||||
@@ -634,9 +636,12 @@ class KleinanzeigenBot(WebScrapingMixin):
|
|||||||
|
|
||||||
try:
|
try:
|
||||||
LOG.info("Handling GDPR disclaimer...")
|
LOG.info("Handling GDPR disclaimer...")
|
||||||
await self.web_find(By.ID, "gdpr-banner-accept", timeout = 10)
|
gdpr_timeout = self._timeout("gdpr_prompt")
|
||||||
|
await self.web_find(By.ID, "gdpr-banner-accept", timeout = gdpr_timeout)
|
||||||
await self.web_click(By.ID, "gdpr-banner-cmp-button")
|
await self.web_click(By.ID, "gdpr-banner-cmp-button")
|
||||||
await self.web_click(By.XPATH, "//div[@id='ConsentManagementPage']//*//button//*[contains(., 'Alle ablehnen und fortfahren')]", timeout = 10)
|
await self.web_click(By.XPATH,
|
||||||
|
"//div[@id='ConsentManagementPage']//*//button//*[contains(., 'Alle ablehnen und fortfahren')]",
|
||||||
|
timeout = gdpr_timeout)
|
||||||
except TimeoutError:
|
except TimeoutError:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
@@ -724,7 +729,8 @@ class KleinanzeigenBot(WebScrapingMixin):
|
|||||||
count += 1
|
count += 1
|
||||||
|
|
||||||
await self.publish_ad(ad_file, ad_cfg, ad_cfg_orig, published_ads, AdUpdateStrategy.REPLACE)
|
await self.publish_ad(ad_file, ad_cfg, ad_cfg_orig, published_ads, AdUpdateStrategy.REPLACE)
|
||||||
await self.web_await(self.__check_publishing_result, timeout = 5 * 60)
|
publish_timeout = self._timeout("publishing_result")
|
||||||
|
await self.web_await(self.__check_publishing_result, timeout = publish_timeout)
|
||||||
|
|
||||||
if self.config.publishing.delete_old_ads == "AFTER_PUBLISH" and not self.keep_old_ads:
|
if self.config.publishing.delete_old_ads == "AFTER_PUBLISH" and not self.keep_old_ads:
|
||||||
await self.delete_ad(ad_cfg, published_ads, delete_old_ads_by_title = False)
|
await self.delete_ad(ad_cfg, published_ads, delete_old_ads_by_title = False)
|
||||||
@@ -924,7 +930,8 @@ class KleinanzeigenBot(WebScrapingMixin):
|
|||||||
# wait for payment form if commercial account is used
|
# wait for payment form if commercial account is used
|
||||||
#############################
|
#############################
|
||||||
try:
|
try:
|
||||||
await self.web_find(By.ID, "myftr-shppngcrt-frm", timeout = 2)
|
short_timeout = self._timeout("quick_dom")
|
||||||
|
await self.web_find(By.ID, "myftr-shppngcrt-frm", timeout = short_timeout)
|
||||||
|
|
||||||
LOG.warning("############################################")
|
LOG.warning("############################################")
|
||||||
LOG.warning("# Payment form detected! Please proceed with payment.")
|
LOG.warning("# Payment form detected! Please proceed with payment.")
|
||||||
@@ -934,7 +941,8 @@ class KleinanzeigenBot(WebScrapingMixin):
|
|||||||
except TimeoutError:
|
except TimeoutError:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
await self.web_await(lambda: "p-anzeige-aufgeben-bestaetigung.html?adId=" in self.page.url, timeout = 20)
|
confirmation_timeout = self._timeout("publishing_confirmation")
|
||||||
|
await self.web_await(lambda: "p-anzeige-aufgeben-bestaetigung.html?adId=" in self.page.url, timeout = confirmation_timeout)
|
||||||
|
|
||||||
# extract the ad id from the URL's query parameter
|
# extract the ad id from the URL's query parameter
|
||||||
current_url_query_params = urllib_parse.parse_qs(urllib_parse.urlparse(self.page.url).query)
|
current_url_query_params = urllib_parse.parse_qs(urllib_parse.urlparse(self.page.url).query)
|
||||||
@@ -986,7 +994,8 @@ class KleinanzeigenBot(WebScrapingMixin):
|
|||||||
count += 1
|
count += 1
|
||||||
|
|
||||||
await self.publish_ad(ad_file, ad_cfg, ad_cfg_orig, published_ads, AdUpdateStrategy.MODIFY)
|
await self.publish_ad(ad_file, ad_cfg, ad_cfg_orig, published_ads, AdUpdateStrategy.MODIFY)
|
||||||
await self.web_await(self.__check_publishing_result, timeout = 5 * 60)
|
publish_timeout = self._timeout("publishing_result")
|
||||||
|
await self.web_await(self.__check_publishing_result, timeout = publish_timeout)
|
||||||
|
|
||||||
LOG.info("############################################")
|
LOG.info("############################################")
|
||||||
LOG.info("DONE: updated %s", pluralize("ad", count))
|
LOG.info("DONE: updated %s", pluralize("ad", count))
|
||||||
@@ -1080,6 +1089,7 @@ class KleinanzeigenBot(WebScrapingMixin):
|
|||||||
LOG.debug("Successfully set attribute field [%s] to [%s]...", special_attribute_key, special_attribute_value_str)
|
LOG.debug("Successfully set attribute field [%s] to [%s]...", special_attribute_key, special_attribute_value_str)
|
||||||
|
|
||||||
async def __set_shipping(self, ad_cfg:Ad, mode:AdUpdateStrategy = AdUpdateStrategy.REPLACE) -> None:
|
async def __set_shipping(self, ad_cfg:Ad, mode:AdUpdateStrategy = AdUpdateStrategy.REPLACE) -> None:
|
||||||
|
short_timeout = self._timeout("quick_dom")
|
||||||
if ad_cfg.shipping_type == "PICKUP":
|
if ad_cfg.shipping_type == "PICKUP":
|
||||||
try:
|
try:
|
||||||
await self.web_click(By.ID, "radio-pickup")
|
await self.web_click(By.ID, "radio-pickup")
|
||||||
@@ -1091,7 +1101,7 @@ class KleinanzeigenBot(WebScrapingMixin):
|
|||||||
if mode == AdUpdateStrategy.MODIFY:
|
if mode == AdUpdateStrategy.MODIFY:
|
||||||
try:
|
try:
|
||||||
# when "Andere Versandmethoden" is not available, go back and start over new
|
# when "Andere Versandmethoden" is not available, go back and start over new
|
||||||
await self.web_find(By.XPATH, '//dialog//button[contains(., "Andere Versandmethoden")]', timeout = 2)
|
await self.web_find(By.XPATH, '//dialog//button[contains(., "Andere Versandmethoden")]', timeout = short_timeout)
|
||||||
except TimeoutError:
|
except TimeoutError:
|
||||||
await self.web_click(By.XPATH, '//dialog//button[contains(., "Zurück")]')
|
await self.web_click(By.XPATH, '//dialog//button[contains(., "Zurück")]')
|
||||||
|
|
||||||
@@ -1120,7 +1130,7 @@ class KleinanzeigenBot(WebScrapingMixin):
|
|||||||
# (important for mode = UPDATE)
|
# (important for mode = UPDATE)
|
||||||
await self.web_find(By.XPATH,
|
await self.web_find(By.XPATH,
|
||||||
'//input[contains(@placeholder, "Versandkosten (optional)")]',
|
'//input[contains(@placeholder, "Versandkosten (optional)")]',
|
||||||
timeout = 2)
|
timeout = short_timeout)
|
||||||
except TimeoutError:
|
except TimeoutError:
|
||||||
await self.web_click(By.XPATH, '//*[contains(@id, "INDIVIDUAL") and contains(@data-testid, "Individueller Versand")]')
|
await self.web_click(By.XPATH, '//*[contains(@id, "INDIVIDUAL") and contains(@data-testid, "Individueller Versand")]')
|
||||||
|
|
||||||
|
|||||||
@@ -33,7 +33,7 @@ class AdExtractor(WebScrapingMixin):
|
|||||||
def __init__(self, browser:Browser, config:Config) -> None:
|
def __init__(self, browser:Browser, config:Config) -> None:
|
||||||
super().__init__()
|
super().__init__()
|
||||||
self.browser = browser
|
self.browser = browser
|
||||||
self.config = config
|
self.config:Config = config
|
||||||
|
|
||||||
async def download_ad(self, ad_id:int) -> None:
|
async def download_ad(self, ad_id:int) -> None:
|
||||||
"""
|
"""
|
||||||
@@ -146,9 +146,10 @@ class AdExtractor(WebScrapingMixin):
|
|||||||
|
|
||||||
# --- Pagination handling ---
|
# --- Pagination handling ---
|
||||||
multi_page = False
|
multi_page = False
|
||||||
|
pagination_timeout = self._timeout("pagination_initial")
|
||||||
try:
|
try:
|
||||||
# Correct selector: Use uppercase '.Pagination'
|
# Correct selector: Use uppercase '.Pagination'
|
||||||
pagination_section = await self.web_find(By.CSS_SELECTOR, ".Pagination", timeout = 10) # Increased timeout slightly
|
pagination_section = await self.web_find(By.CSS_SELECTOR, ".Pagination", timeout = pagination_timeout) # Increased timeout slightly
|
||||||
# Correct selector: Use 'aria-label'
|
# Correct selector: Use 'aria-label'
|
||||||
# Also check if the button is actually present AND potentially enabled (though enabled check isn't strictly necessary here, only for clicking later)
|
# Also check if the button is actually present AND potentially enabled (though enabled check isn't strictly necessary here, only for clicking later)
|
||||||
next_buttons = await self.web_find_all(By.CSS_SELECTOR, 'button[aria-label="Nächste"]', parent = pagination_section)
|
next_buttons = await self.web_find_all(By.CSS_SELECTOR, 'button[aria-label="Nächste"]', parent = pagination_section)
|
||||||
@@ -204,9 +205,10 @@ class AdExtractor(WebScrapingMixin):
|
|||||||
break
|
break
|
||||||
|
|
||||||
# --- Navigate to next page ---
|
# --- Navigate to next page ---
|
||||||
|
follow_up_timeout = self._timeout("pagination_follow_up")
|
||||||
try:
|
try:
|
||||||
# Find the pagination section again (scope might have changed after scroll/wait)
|
# Find the pagination section again (scope might have changed after scroll/wait)
|
||||||
pagination_section = await self.web_find(By.CSS_SELECTOR, ".Pagination", timeout = 5)
|
pagination_section = await self.web_find(By.CSS_SELECTOR, ".Pagination", timeout = follow_up_timeout)
|
||||||
# Find the "Next" button using the correct aria-label selector and ensure it's not disabled
|
# Find the "Next" button using the correct aria-label selector and ensure it's not disabled
|
||||||
next_button_element = None
|
next_button_element = None
|
||||||
possible_next_buttons = await self.web_find_all(By.CSS_SELECTOR, 'button[aria-label="Nächste"]', parent = pagination_section)
|
possible_next_buttons = await self.web_find_all(By.CSS_SELECTOR, 'button[aria-label="Nächste"]', parent = pagination_section)
|
||||||
@@ -432,8 +434,19 @@ class AdExtractor(WebScrapingMixin):
|
|||||||
|
|
||||||
# Fallback to legacy selectors in case the breadcrumb structure is unexpected.
|
# Fallback to legacy selectors in case the breadcrumb structure is unexpected.
|
||||||
LOG.debug(_("Falling back to legacy breadcrumb selectors; collected ids: %s"), category_ids)
|
LOG.debug(_("Falling back to legacy breadcrumb selectors; collected ids: %s"), category_ids)
|
||||||
|
fallback_timeout = self._effective_timeout()
|
||||||
|
try:
|
||||||
category_first_part = await self.web_find(By.CSS_SELECTOR, "a:nth-of-type(2)", parent = category_line)
|
category_first_part = await self.web_find(By.CSS_SELECTOR, "a:nth-of-type(2)", parent = category_line)
|
||||||
category_second_part = await self.web_find(By.CSS_SELECTOR, "a:nth-of-type(3)", parent = category_line)
|
category_second_part = await self.web_find(By.CSS_SELECTOR, "a:nth-of-type(3)", parent = category_line)
|
||||||
|
except TimeoutError as exc:
|
||||||
|
LOG.error(
|
||||||
|
"Legacy breadcrumb selectors not found within %.1f seconds (collected ids: %s)",
|
||||||
|
fallback_timeout,
|
||||||
|
category_ids
|
||||||
|
)
|
||||||
|
raise TimeoutError(
|
||||||
|
_("Unable to locate breadcrumb fallback selectors within %(seconds).1f seconds.") % {"seconds": fallback_timeout}
|
||||||
|
) from exc
|
||||||
href_first:str = str(category_first_part.attrs["href"])
|
href_first:str = str(category_first_part.attrs["href"])
|
||||||
href_second:str = str(category_second_part.attrs["href"])
|
href_second:str = str(category_second_part.attrs["href"])
|
||||||
cat_num_first_raw = href_first.rsplit("/", maxsplit = 1)[-1]
|
cat_num_first_raw = href_first.rsplit("/", maxsplit = 1)[-1]
|
||||||
|
|||||||
@@ -114,6 +114,55 @@ class CaptchaConfig(ContextualModel):
|
|||||||
restart_delay:str = "6h"
|
restart_delay:str = "6h"
|
||||||
|
|
||||||
|
|
||||||
|
class TimeoutConfig(ContextualModel):
|
||||||
|
multiplier:float = Field(
|
||||||
|
default = 1.0,
|
||||||
|
ge = 0.1,
|
||||||
|
description = "Global multiplier applied to all timeout values."
|
||||||
|
)
|
||||||
|
default:float = Field(default = 5.0, ge = 0.0, description = "Baseline timeout for DOM interactions.")
|
||||||
|
page_load:float = Field(default = 15.0, ge = 1.0, description = "Page load timeout for web_open.")
|
||||||
|
captcha_detection:float = Field(default = 2.0, ge = 0.1, description = "Timeout for captcha iframe detection.")
|
||||||
|
sms_verification:float = Field(default = 4.0, ge = 0.1, description = "Timeout for SMS verification prompts.")
|
||||||
|
gdpr_prompt:float = Field(default = 10.0, ge = 1.0, description = "Timeout for GDPR/consent dialogs.")
|
||||||
|
publishing_result:float = Field(default = 300.0, ge = 10.0, description = "Timeout for publishing result checks.")
|
||||||
|
publishing_confirmation:float = Field(default = 20.0, ge = 1.0, description = "Timeout for publish confirmation redirect.")
|
||||||
|
pagination_initial:float = Field(default = 10.0, ge = 1.0, description = "Timeout for initial pagination lookup.")
|
||||||
|
pagination_follow_up:float = Field(default = 5.0, ge = 1.0, description = "Timeout for subsequent pagination navigation.")
|
||||||
|
quick_dom:float = Field(default = 2.0, ge = 0.1, description = "Generic short timeout for transient UI.")
|
||||||
|
update_check:float = Field(default = 10.0, ge = 1.0, description = "Timeout for GitHub update checks.")
|
||||||
|
chrome_remote_probe:float = Field(default = 2.0, ge = 0.1, description = "Timeout for local remote-debugging probes.")
|
||||||
|
chrome_remote_debugging:float = Field(default = 5.0, ge = 1.0, description = "Timeout for remote debugging API calls.")
|
||||||
|
chrome_binary_detection:float = Field(default = 10.0, ge = 1.0, description = "Timeout for chrome --version subprocesses.")
|
||||||
|
retry_enabled:bool = Field(default = True, description = "Enable built-in retry/backoff for DOM operations.")
|
||||||
|
retry_max_attempts:int = Field(default = 2, ge = 1, description = "Max retry attempts when retry is enabled.")
|
||||||
|
retry_backoff_factor:float = Field(default = 1.5, ge = 1.0, description = "Exponential factor applied per retry attempt.")
|
||||||
|
|
||||||
|
def resolve(self, key:str = "default", override:float | None = None) -> float:
|
||||||
|
"""
|
||||||
|
Return the base timeout (seconds) for the given key without applying modifiers.
|
||||||
|
"""
|
||||||
|
if override is not None:
|
||||||
|
return float(override)
|
||||||
|
|
||||||
|
if key == "default":
|
||||||
|
return float(self.default)
|
||||||
|
|
||||||
|
attr = getattr(self, key, None)
|
||||||
|
if isinstance(attr, (int, float)):
|
||||||
|
return float(attr)
|
||||||
|
|
||||||
|
return float(self.default)
|
||||||
|
|
||||||
|
def effective(self, key:str = "default", override:float | None = None, *, attempt:int = 0) -> float:
|
||||||
|
"""
|
||||||
|
Return the effective timeout (seconds) with multiplier/backoff applied.
|
||||||
|
"""
|
||||||
|
base = self.resolve(key, override)
|
||||||
|
backoff = self.retry_backoff_factor ** attempt if attempt > 0 else 1.0
|
||||||
|
return base * self.multiplier * backoff
|
||||||
|
|
||||||
|
|
||||||
def _validate_glob_pattern(v:str) -> str:
|
def _validate_glob_pattern(v:str) -> str:
|
||||||
if not v.strip():
|
if not v.strip():
|
||||||
raise ValueError("must be a non-empty, non-blank glob pattern")
|
raise ValueError("must be a non-empty, non-blank glob pattern")
|
||||||
@@ -154,6 +203,7 @@ Example:
|
|||||||
login:LoginConfig = Field(default_factory = LoginConfig.model_construct, description = "Login credentials")
|
login:LoginConfig = Field(default_factory = LoginConfig.model_construct, description = "Login credentials")
|
||||||
captcha:CaptchaConfig = Field(default_factory = CaptchaConfig)
|
captcha:CaptchaConfig = Field(default_factory = CaptchaConfig)
|
||||||
update_check:UpdateCheckConfig = Field(default_factory = UpdateCheckConfig, description = "Update check configuration")
|
update_check:UpdateCheckConfig = Field(default_factory = UpdateCheckConfig, description = "Update check configuration")
|
||||||
|
timeouts:TimeoutConfig = Field(default_factory = TimeoutConfig, description = "Centralized timeout configuration.")
|
||||||
|
|
||||||
def with_values(self, values:dict[str, Any]) -> Config:
|
def with_values(self, values:dict[str, Any]) -> Config:
|
||||||
return Config.model_validate(
|
return Config.model_validate(
|
||||||
|
|||||||
@@ -219,6 +219,8 @@ kleinanzeigen_bot/extract.py:
|
|||||||
_extract_category_from_ad_page:
|
_extract_category_from_ad_page:
|
||||||
"Breadcrumb container 'vap-brdcrmb' not found; cannot extract ad category: %s": "Breadcrumb-Container 'vap-brdcrmb' nicht gefunden; kann Anzeigenkategorie nicht extrahieren: %s"
|
"Breadcrumb container 'vap-brdcrmb' not found; cannot extract ad category: %s": "Breadcrumb-Container 'vap-brdcrmb' nicht gefunden; kann Anzeigenkategorie nicht extrahieren: %s"
|
||||||
"Falling back to legacy breadcrumb selectors; collected ids: %s": "Weiche auf ältere Breadcrumb-Selektoren aus; gesammelte IDs: %s"
|
"Falling back to legacy breadcrumb selectors; collected ids: %s": "Weiche auf ältere Breadcrumb-Selektoren aus; gesammelte IDs: %s"
|
||||||
|
"Legacy breadcrumb selectors not found within %.1f seconds (collected ids: %s)": "Ältere Breadcrumb-Selektoren nicht innerhalb von %.1f Sekunden gefunden (gesammelte IDs: %s)"
|
||||||
|
"Unable to locate breadcrumb fallback selectors within %(seconds).1f seconds.": "Ältere Breadcrumb-Selektoren konnten nicht innerhalb von %(seconds).1f Sekunden gefunden werden."
|
||||||
|
|
||||||
#################################################
|
#################################################
|
||||||
kleinanzeigen_bot/utils/i18n.py:
|
kleinanzeigen_bot/utils/i18n.py:
|
||||||
@@ -398,11 +400,6 @@ kleinanzeigen_bot/utils/web_scraping_mixin.py:
|
|||||||
web_check:
|
web_check:
|
||||||
"Unsupported attribute: %s": "Nicht unterstütztes Attribut: %s"
|
"Unsupported attribute: %s": "Nicht unterstütztes Attribut: %s"
|
||||||
|
|
||||||
web_find:
|
|
||||||
"Unsupported selector type: %s": "Nicht unterstützter Selektor-Typ: %s"
|
|
||||||
|
|
||||||
web_find_all:
|
|
||||||
"Unsupported selector type: %s": "Nicht unterstützter Selektor-Typ: %s"
|
|
||||||
close_browser_session:
|
close_browser_session:
|
||||||
"Closing Browser session...": "Schließe Browser-Sitzung..."
|
"Closing Browser session...": "Schließe Browser-Sitzung..."
|
||||||
|
|
||||||
@@ -417,6 +414,12 @@ kleinanzeigen_bot/utils/web_scraping_mixin.py:
|
|||||||
web_request:
|
web_request:
|
||||||
" -> HTTP %s [%s]...": " -> HTTP %s [%s]..."
|
" -> HTTP %s [%s]...": " -> HTTP %s [%s]..."
|
||||||
|
|
||||||
|
_web_find_once:
|
||||||
|
"Unsupported selector type: %s": "Nicht unterstützter Selektor-Typ: %s"
|
||||||
|
|
||||||
|
_web_find_all_once:
|
||||||
|
"Unsupported selector type: %s": "Nicht unterstützter Selektor-Typ: %s"
|
||||||
|
|
||||||
diagnose_browser_issues:
|
diagnose_browser_issues:
|
||||||
"=== Browser Connection Diagnostics ===": "=== Browser-Verbindungsdiagnose ==="
|
"=== Browser Connection Diagnostics ===": "=== Browser-Verbindungsdiagnose ==="
|
||||||
"=== End Diagnostics ===": "=== Ende der Diagnose ==="
|
"=== End Diagnostics ===": "=== Ende der Diagnose ==="
|
||||||
@@ -434,6 +437,8 @@ kleinanzeigen_bot/utils/web_scraping_mixin.py:
|
|||||||
"(info) Remote debugging port configured: %d": "(Info) Remote-Debugging-Port konfiguriert: %d"
|
"(info) Remote debugging port configured: %d": "(Info) Remote-Debugging-Port konfiguriert: %d"
|
||||||
"(info) Remote debugging port is not open": "(Info) Remote-Debugging-Port ist nicht offen"
|
"(info) Remote debugging port is not open": "(Info) Remote-Debugging-Port ist nicht offen"
|
||||||
|
|
||||||
|
"(warn) Unable to inspect browser processes: %s": "(Warnung) Browser-Prozesse konnten nicht überprüft werden: %s"
|
||||||
|
|
||||||
"(info) No browser processes currently running": "(Info) Derzeit keine Browser-Prozesse aktiv"
|
"(info) No browser processes currently running": "(Info) Derzeit keine Browser-Prozesse aktiv"
|
||||||
"(fail) Running as root - this can cause browser issues": "(Fehler) Läuft als Root - dies kann Browser-Probleme verursachen"
|
"(fail) Running as root - this can cause browser issues": "(Fehler) Läuft als Root - dies kann Browser-Probleme verursachen"
|
||||||
|
|
||||||
|
|||||||
@@ -49,6 +49,10 @@ class UpdateChecker:
|
|||||||
"""
|
"""
|
||||||
return __version__
|
return __version__
|
||||||
|
|
||||||
|
def _request_timeout(self) -> float:
|
||||||
|
"""Return the effective timeout for HTTP calls."""
|
||||||
|
return self.config.timeouts.effective("update_check")
|
||||||
|
|
||||||
def _get_commit_hash(self, version:str) -> str | None:
|
def _get_commit_hash(self, version:str) -> str | None:
|
||||||
"""Extract the commit hash from a version string.
|
"""Extract the commit hash from a version string.
|
||||||
|
|
||||||
@@ -74,7 +78,7 @@ class UpdateChecker:
|
|||||||
try:
|
try:
|
||||||
response = requests.get(
|
response = requests.get(
|
||||||
f"https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/releases/tags/{tag_name}",
|
f"https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/releases/tags/{tag_name}",
|
||||||
timeout = 10
|
timeout = self._request_timeout()
|
||||||
)
|
)
|
||||||
response.raise_for_status()
|
response.raise_for_status()
|
||||||
data = response.json()
|
data = response.json()
|
||||||
@@ -97,7 +101,7 @@ class UpdateChecker:
|
|||||||
try:
|
try:
|
||||||
response = requests.get(
|
response = requests.get(
|
||||||
f"https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/commits/{commit}",
|
f"https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/commits/{commit}",
|
||||||
timeout = 10
|
timeout = self._request_timeout()
|
||||||
)
|
)
|
||||||
response.raise_for_status()
|
response.raise_for_status()
|
||||||
data = response.json()
|
data = response.json()
|
||||||
@@ -148,7 +152,7 @@ class UpdateChecker:
|
|||||||
# Use /releases/latest endpoint for stable releases
|
# Use /releases/latest endpoint for stable releases
|
||||||
response = requests.get(
|
response = requests.get(
|
||||||
"https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/releases/latest",
|
"https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/releases/latest",
|
||||||
timeout = 10
|
timeout = self._request_timeout()
|
||||||
)
|
)
|
||||||
response.raise_for_status()
|
response.raise_for_status()
|
||||||
release = response.json()
|
release = response.json()
|
||||||
@@ -160,7 +164,7 @@ class UpdateChecker:
|
|||||||
# Use /releases endpoint and select the most recent prerelease
|
# Use /releases endpoint and select the most recent prerelease
|
||||||
response = requests.get(
|
response = requests.get(
|
||||||
"https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/releases",
|
"https://api.github.com/repos/Second-Hand-Friends/kleinanzeigen-bot/releases",
|
||||||
timeout = 10
|
timeout = self._request_timeout()
|
||||||
)
|
)
|
||||||
response.raise_for_status()
|
response.raise_for_status()
|
||||||
releases = response.json()
|
releases = response.json()
|
||||||
|
|||||||
@@ -78,23 +78,25 @@ def _normalize_browser_name(browser_name:str) -> str:
|
|||||||
return "Chrome"
|
return "Chrome"
|
||||||
|
|
||||||
|
|
||||||
def detect_chrome_version_from_binary(binary_path:str) -> ChromeVersionInfo | None:
|
def detect_chrome_version_from_binary(binary_path:str, *, timeout:float | None = None) -> ChromeVersionInfo | None:
|
||||||
"""
|
"""
|
||||||
Detect Chrome version by running the browser binary.
|
Detect Chrome version by running the browser binary.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
binary_path: Path to the Chrome binary
|
binary_path: Path to the Chrome binary
|
||||||
|
timeout: Optional timeout (seconds) for the subprocess call
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
ChromeVersionInfo if successful, None if detection fails
|
ChromeVersionInfo if successful, None if detection fails
|
||||||
"""
|
"""
|
||||||
|
effective_timeout = timeout if timeout is not None else 10.0
|
||||||
try:
|
try:
|
||||||
# Run browser with --version flag
|
# Run browser with --version flag
|
||||||
result = subprocess.run( # noqa: S603
|
result = subprocess.run( # noqa: S603
|
||||||
[binary_path, "--version"],
|
[binary_path, "--version"],
|
||||||
check = False, capture_output = True,
|
check = False, capture_output = True,
|
||||||
text = True,
|
text = True,
|
||||||
timeout = 10
|
timeout = effective_timeout
|
||||||
)
|
)
|
||||||
|
|
||||||
if result.returncode != 0:
|
if result.returncode != 0:
|
||||||
@@ -114,28 +116,30 @@ def detect_chrome_version_from_binary(binary_path:str) -> ChromeVersionInfo | No
|
|||||||
return ChromeVersionInfo(version_string, major_version, browser_name)
|
return ChromeVersionInfo(version_string, major_version, browser_name)
|
||||||
|
|
||||||
except subprocess.TimeoutExpired:
|
except subprocess.TimeoutExpired:
|
||||||
LOG.debug("Browser version command timed out")
|
LOG.debug("Browser version command timed out after %.1fs", effective_timeout)
|
||||||
return None
|
return None
|
||||||
except (subprocess.SubprocessError, ValueError) as e:
|
except (subprocess.SubprocessError, ValueError) as e:
|
||||||
LOG.debug("Failed to detect browser version: %s", str(e))
|
LOG.debug("Failed to detect browser version: %s", str(e))
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
def detect_chrome_version_from_remote_debugging(host:str = "127.0.0.1", port:int = 9222) -> ChromeVersionInfo | None:
|
def detect_chrome_version_from_remote_debugging(host:str = "127.0.0.1", port:int = 9222, *, timeout:float | None = None) -> ChromeVersionInfo | None:
|
||||||
"""
|
"""
|
||||||
Detect Chrome version from remote debugging API.
|
Detect Chrome version from remote debugging API.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
host: Remote debugging host
|
host: Remote debugging host
|
||||||
port: Remote debugging port
|
port: Remote debugging port
|
||||||
|
timeout: Optional timeout (seconds) for the HTTP request
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
ChromeVersionInfo if successful, None if detection fails
|
ChromeVersionInfo if successful, None if detection fails
|
||||||
"""
|
"""
|
||||||
|
effective_timeout = timeout if timeout is not None else 5.0
|
||||||
try:
|
try:
|
||||||
# Query the remote debugging API
|
# Query the remote debugging API
|
||||||
url = f"http://{host}:{port}/json/version"
|
url = f"http://{host}:{port}/json/version"
|
||||||
response = urllib.request.urlopen(url, timeout = 5) # noqa: S310
|
response = urllib.request.urlopen(url, timeout = effective_timeout) # noqa: S310
|
||||||
version_data = json.loads(response.read().decode())
|
version_data = json.loads(response.read().decode())
|
||||||
|
|
||||||
# Extract version information
|
# Extract version information
|
||||||
@@ -200,7 +204,10 @@ def validate_chrome_136_configuration(browser_arguments:list[str], user_data_dir
|
|||||||
def get_chrome_version_diagnostic_info(
|
def get_chrome_version_diagnostic_info(
|
||||||
binary_path:str | None = None,
|
binary_path:str | None = None,
|
||||||
remote_host:str = "127.0.0.1",
|
remote_host:str = "127.0.0.1",
|
||||||
remote_port:int | None = None
|
remote_port:int | None = None,
|
||||||
|
*,
|
||||||
|
remote_timeout:float | None = None,
|
||||||
|
binary_timeout:float | None = None
|
||||||
) -> dict[str, Any]:
|
) -> dict[str, Any]:
|
||||||
"""
|
"""
|
||||||
Get comprehensive Chrome version diagnostic information.
|
Get comprehensive Chrome version diagnostic information.
|
||||||
@@ -209,6 +216,8 @@ def get_chrome_version_diagnostic_info(
|
|||||||
binary_path: Path to Chrome binary (optional)
|
binary_path: Path to Chrome binary (optional)
|
||||||
remote_host: Remote debugging host
|
remote_host: Remote debugging host
|
||||||
remote_port: Remote debugging port (optional)
|
remote_port: Remote debugging port (optional)
|
||||||
|
remote_timeout: Timeout for remote debugging detection
|
||||||
|
binary_timeout: Timeout for binary detection
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
Dictionary with diagnostic information
|
Dictionary with diagnostic information
|
||||||
@@ -223,7 +232,7 @@ def get_chrome_version_diagnostic_info(
|
|||||||
|
|
||||||
# Try binary detection
|
# Try binary detection
|
||||||
if binary_path:
|
if binary_path:
|
||||||
version_info = detect_chrome_version_from_binary(binary_path)
|
version_info = detect_chrome_version_from_binary(binary_path, timeout = binary_timeout)
|
||||||
if version_info:
|
if version_info:
|
||||||
diagnostic_info["binary_detection"] = {
|
diagnostic_info["binary_detection"] = {
|
||||||
"version_string": version_info.version_string,
|
"version_string": version_info.version_string,
|
||||||
@@ -235,7 +244,7 @@ def get_chrome_version_diagnostic_info(
|
|||||||
|
|
||||||
# Try remote debugging detection
|
# Try remote debugging detection
|
||||||
if remote_port:
|
if remote_port:
|
||||||
version_info = detect_chrome_version_from_remote_debugging(remote_host, remote_port)
|
version_info = detect_chrome_version_from_remote_debugging(remote_host, remote_port, timeout = remote_timeout)
|
||||||
if version_info:
|
if version_info:
|
||||||
diagnostic_info["remote_detection"] = {
|
diagnostic_info["remote_detection"] = {
|
||||||
"version_string": version_info.version_string,
|
"version_string": version_info.version_string,
|
||||||
|
|||||||
@@ -2,9 +2,9 @@
|
|||||||
# SPDX-License-Identifier: AGPL-3.0-or-later
|
# SPDX-License-Identifier: AGPL-3.0-or-later
|
||||||
# SPDX-ArtifactOfProjectHomePage: https://github.com/Second-Hand-Friends/kleinanzeigen-bot/
|
# SPDX-ArtifactOfProjectHomePage: https://github.com/Second-Hand-Friends/kleinanzeigen-bot/
|
||||||
import asyncio, enum, inspect, json, os, platform, secrets, shutil, subprocess, urllib.request # isort: skip # noqa: S404
|
import asyncio, enum, inspect, json, os, platform, secrets, shutil, subprocess, urllib.request # isort: skip # noqa: S404
|
||||||
from collections.abc import Callable, Coroutine, Iterable
|
from collections.abc import Awaitable, Callable, Coroutine, Iterable
|
||||||
from gettext import gettext as _
|
from gettext import gettext as _
|
||||||
from typing import Any, Final, cast
|
from typing import Any, Final, Optional, cast
|
||||||
|
|
||||||
try:
|
try:
|
||||||
from typing import Never # type: ignore[attr-defined,unused-ignore] # mypy
|
from typing import Never # type: ignore[attr-defined,unused-ignore] # mypy
|
||||||
@@ -15,10 +15,13 @@ import nodriver, psutil # isort: skip
|
|||||||
from typing import TYPE_CHECKING, TypeGuard
|
from typing import TYPE_CHECKING, TypeGuard
|
||||||
|
|
||||||
from nodriver.core.browser import Browser
|
from nodriver.core.browser import Browser
|
||||||
from nodriver.core.config import Config
|
from nodriver.core.config import Config as NodriverConfig
|
||||||
from nodriver.core.element import Element
|
from nodriver.core.element import Element
|
||||||
from nodriver.core.tab import Tab as Page
|
from nodriver.core.tab import Tab as Page
|
||||||
|
|
||||||
|
from kleinanzeigen_bot.model.config_model import Config as BotConfig
|
||||||
|
from kleinanzeigen_bot.model.config_model import TimeoutConfig
|
||||||
|
|
||||||
from . import loggers, net
|
from . import loggers, net
|
||||||
from .chrome_version_detector import (
|
from .chrome_version_detector import (
|
||||||
ChromeVersionInfo,
|
ChromeVersionInfo,
|
||||||
@@ -32,6 +35,7 @@ from .misc import T, ensure
|
|||||||
if TYPE_CHECKING:
|
if TYPE_CHECKING:
|
||||||
from nodriver.cdp.runtime import RemoteObject
|
from nodriver.cdp.runtime import RemoteObject
|
||||||
|
|
||||||
|
|
||||||
# Constants for RemoteObject conversion
|
# Constants for RemoteObject conversion
|
||||||
_KEY_VALUE_PAIR_SIZE = 2
|
_KEY_VALUE_PAIR_SIZE = 2
|
||||||
|
|
||||||
@@ -102,6 +106,69 @@ class WebScrapingMixin:
|
|||||||
self.browser_config:Final[BrowserConfig] = BrowserConfig()
|
self.browser_config:Final[BrowserConfig] = BrowserConfig()
|
||||||
self.browser:Browser = None # pyright: ignore[reportAttributeAccessIssue]
|
self.browser:Browser = None # pyright: ignore[reportAttributeAccessIssue]
|
||||||
self.page:Page = None # pyright: ignore[reportAttributeAccessIssue]
|
self.page:Page = None # pyright: ignore[reportAttributeAccessIssue]
|
||||||
|
self._default_timeout_config:TimeoutConfig | None = None
|
||||||
|
self.config:BotConfig = cast(BotConfig, None)
|
||||||
|
|
||||||
|
def _get_timeout_config(self) -> TimeoutConfig:
|
||||||
|
config = getattr(self, "config", None)
|
||||||
|
timeouts:TimeoutConfig | None = None
|
||||||
|
if config is not None:
|
||||||
|
timeouts = cast(Optional[TimeoutConfig], getattr(config, "timeouts", None))
|
||||||
|
if timeouts is not None:
|
||||||
|
return timeouts
|
||||||
|
|
||||||
|
if self._default_timeout_config is None:
|
||||||
|
self._default_timeout_config = TimeoutConfig()
|
||||||
|
return self._default_timeout_config
|
||||||
|
|
||||||
|
def _timeout(self, key:str = "default", override:float | None = None) -> float:
|
||||||
|
"""
|
||||||
|
Return the base timeout (seconds) for a given key without applying multipliers.
|
||||||
|
"""
|
||||||
|
return self._get_timeout_config().resolve(key, override)
|
||||||
|
|
||||||
|
def _effective_timeout(self, key:str = "default", override:float | None = None, *, attempt:int = 0) -> float:
|
||||||
|
"""
|
||||||
|
Return the effective timeout (seconds) with multiplier/backoff applied.
|
||||||
|
"""
|
||||||
|
return self._get_timeout_config().effective(key, override, attempt = attempt)
|
||||||
|
|
||||||
|
def _timeout_attempts(self) -> int:
|
||||||
|
cfg = self._get_timeout_config()
|
||||||
|
if not cfg.retry_enabled:
|
||||||
|
return 1
|
||||||
|
# Always perform the initial attempt plus the configured number of retries.
|
||||||
|
return 1 + cfg.retry_max_attempts
|
||||||
|
|
||||||
|
async def _run_with_timeout_retries(
|
||||||
|
self,
|
||||||
|
operation:Callable[[float], Awaitable[T]],
|
||||||
|
*,
|
||||||
|
description:str,
|
||||||
|
key:str = "default",
|
||||||
|
override:float | None = None
|
||||||
|
) -> T:
|
||||||
|
"""
|
||||||
|
Execute an async callable with retry/backoff handling for TimeoutError.
|
||||||
|
"""
|
||||||
|
attempts = self._timeout_attempts()
|
||||||
|
|
||||||
|
for attempt in range(attempts):
|
||||||
|
effective_timeout = self._effective_timeout(key, override, attempt = attempt)
|
||||||
|
try:
|
||||||
|
return await operation(effective_timeout)
|
||||||
|
except TimeoutError:
|
||||||
|
if attempt >= attempts - 1:
|
||||||
|
raise
|
||||||
|
LOG.debug(
|
||||||
|
"Retrying %s after TimeoutError (attempt %d/%d, timeout %.1fs)",
|
||||||
|
description,
|
||||||
|
attempt + 1,
|
||||||
|
attempts,
|
||||||
|
effective_timeout
|
||||||
|
)
|
||||||
|
|
||||||
|
raise TimeoutError(f"{description} failed without executing operation")
|
||||||
|
|
||||||
async def create_browser_session(self) -> None:
|
async def create_browser_session(self) -> None:
|
||||||
LOG.info("Creating Browser session...")
|
LOG.info("Creating Browser session...")
|
||||||
@@ -137,7 +204,7 @@ class WebScrapingMixin:
|
|||||||
f"Make sure the browser is running and the port is not blocked by firewall.")
|
f"Make sure the browser is running and the port is not blocked by firewall.")
|
||||||
|
|
||||||
try:
|
try:
|
||||||
cfg = Config(
|
cfg = NodriverConfig(
|
||||||
browser_executable_path = self.browser_config.binary_location # actually not necessary but nodriver fails without
|
browser_executable_path = self.browser_config.binary_location # actually not necessary but nodriver fails without
|
||||||
)
|
)
|
||||||
cfg.host = remote_host
|
cfg.host = remote_host
|
||||||
@@ -207,7 +274,7 @@ class WebScrapingMixin:
|
|||||||
if self.browser_config.user_data_dir:
|
if self.browser_config.user_data_dir:
|
||||||
LOG.info(" -> Browser user data dir: %s", self.browser_config.user_data_dir)
|
LOG.info(" -> Browser user data dir: %s", self.browser_config.user_data_dir)
|
||||||
|
|
||||||
cfg = Config(
|
cfg = NodriverConfig(
|
||||||
headless = False,
|
headless = False,
|
||||||
browser_executable_path = self.browser_config.binary_location,
|
browser_executable_path = self.browser_config.binary_location,
|
||||||
browser_args = browser_args,
|
browser_args = browser_args,
|
||||||
@@ -355,7 +422,8 @@ class WebScrapingMixin:
|
|||||||
LOG.info("(ok) Remote debugging port is open")
|
LOG.info("(ok) Remote debugging port is open")
|
||||||
# Try to get more information about the debugging endpoint
|
# Try to get more information about the debugging endpoint
|
||||||
try:
|
try:
|
||||||
response = urllib.request.urlopen(f"http://127.0.0.1:{remote_port}/json/version", timeout = 2)
|
probe_timeout = self._effective_timeout("chrome_remote_probe")
|
||||||
|
response = urllib.request.urlopen(f"http://127.0.0.1:{remote_port}/json/version", timeout = probe_timeout)
|
||||||
version_info = json.loads(response.read().decode())
|
version_info = json.loads(response.read().decode())
|
||||||
LOG.info("(ok) Remote debugging API accessible - Browser: %s", version_info.get("Browser", "Unknown"))
|
LOG.info("(ok) Remote debugging API accessible - Browser: %s", version_info.get("Browser", "Unknown"))
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
@@ -378,6 +446,7 @@ class WebScrapingMixin:
|
|||||||
except (AssertionError, TypeError):
|
except (AssertionError, TypeError):
|
||||||
target_browser_name = ""
|
target_browser_name = ""
|
||||||
|
|
||||||
|
try:
|
||||||
for proc in psutil.process_iter(["pid", "name", "cmdline"]):
|
for proc in psutil.process_iter(["pid", "name", "cmdline"]):
|
||||||
try:
|
try:
|
||||||
proc_name = proc.info["name"] or ""
|
proc_name = proc.info["name"] or ""
|
||||||
@@ -402,6 +471,9 @@ class WebScrapingMixin:
|
|||||||
browser_processes.append(proc.info)
|
browser_processes.append(proc.info)
|
||||||
except (psutil.NoSuchProcess, psutil.AccessDenied):
|
except (psutil.NoSuchProcess, psutil.AccessDenied):
|
||||||
pass
|
pass
|
||||||
|
except (psutil.Error, PermissionError) as exc:
|
||||||
|
LOG.warning("(warn) Unable to inspect browser processes: %s", exc)
|
||||||
|
browser_processes = []
|
||||||
|
|
||||||
if browser_processes:
|
if browser_processes:
|
||||||
LOG.info("(info) Found %d browser processes running", len(browser_processes))
|
LOG.info("(info) Found %d browser processes running", len(browser_processes))
|
||||||
@@ -486,15 +558,17 @@ class WebScrapingMixin:
|
|||||||
raise AssertionError(_("Installed browser could not be detected"))
|
raise AssertionError(_("Installed browser could not be detected"))
|
||||||
|
|
||||||
async def web_await(self, condition:Callable[[], T | Never | Coroutine[Any, Any, T | Never]], *,
|
async def web_await(self, condition:Callable[[], T | Never | Coroutine[Any, Any, T | Never]], *,
|
||||||
timeout:int | float = 5, timeout_error_message:str = "") -> T:
|
timeout:int | float | None = None, timeout_error_message:str = "", apply_multiplier:bool = True) -> T:
|
||||||
"""
|
"""
|
||||||
Blocks/waits until the given condition is met.
|
Blocks/waits until the given condition is met.
|
||||||
|
|
||||||
:param timeout: timeout in seconds
|
:param timeout: timeout in seconds (base value, multiplier applied unless disabled)
|
||||||
:raises TimeoutError: if element could not be found within time
|
:raises TimeoutError: if element could not be found within time
|
||||||
"""
|
"""
|
||||||
loop = asyncio.get_running_loop()
|
loop = asyncio.get_running_loop()
|
||||||
start_at = loop.time()
|
start_at = loop.time()
|
||||||
|
base_timeout = timeout if timeout is not None else self._timeout()
|
||||||
|
effective_timeout = self._effective_timeout(override = base_timeout) if apply_multiplier else base_timeout
|
||||||
|
|
||||||
while True:
|
while True:
|
||||||
await self.page
|
await self.page
|
||||||
@@ -506,13 +580,13 @@ class WebScrapingMixin:
|
|||||||
return result
|
return result
|
||||||
except Exception as ex1:
|
except Exception as ex1:
|
||||||
ex = ex1
|
ex = ex1
|
||||||
if loop.time() - start_at > timeout:
|
if loop.time() - start_at > effective_timeout:
|
||||||
if ex:
|
if ex:
|
||||||
raise ex
|
raise ex
|
||||||
raise TimeoutError(timeout_error_message or f"Condition not met within {timeout} seconds")
|
raise TimeoutError(timeout_error_message or f"Condition not met within {effective_timeout} seconds")
|
||||||
await self.page.sleep(0.5)
|
await self.page.sleep(0.5)
|
||||||
|
|
||||||
async def web_check(self, selector_type:By, selector_value:str, attr:Is, *, timeout:int | float = 5) -> bool:
|
async def web_check(self, selector_type:By, selector_value:str, attr:Is, *, timeout:int | float | None = None) -> bool:
|
||||||
"""
|
"""
|
||||||
Locates an HTML element and returns a state.
|
Locates an HTML element and returns a state.
|
||||||
|
|
||||||
@@ -559,7 +633,7 @@ class WebScrapingMixin:
|
|||||||
"""))
|
"""))
|
||||||
raise AssertionError(_("Unsupported attribute: %s") % attr)
|
raise AssertionError(_("Unsupported attribute: %s") % attr)
|
||||||
|
|
||||||
async def web_click(self, selector_type:By, selector_value:str, *, timeout:int | float = 5) -> Element:
|
async def web_click(self, selector_type:By, selector_value:str, *, timeout:int | float | None = None) -> Element:
|
||||||
"""
|
"""
|
||||||
Locates an HTML element by ID.
|
Locates an HTML element by ID.
|
||||||
|
|
||||||
@@ -652,91 +726,130 @@ class WebScrapingMixin:
|
|||||||
# Return primitive values as-is
|
# Return primitive values as-is
|
||||||
return data
|
return data
|
||||||
|
|
||||||
async def web_find(self, selector_type:By, selector_value:str, *, parent:Element | None = None, timeout:int | float = 5) -> Element:
|
async def web_find(self, selector_type:By, selector_value:str, *, parent:Element | None = None, timeout:int | float | None = None) -> Element:
|
||||||
"""
|
"""
|
||||||
Locates an HTML element by the given selector type and value.
|
Locates an HTML element by the given selector type and value.
|
||||||
|
|
||||||
:param timeout: timeout in seconds
|
:param timeout: timeout in seconds (base value before multiplier/backoff)
|
||||||
:raises TimeoutError: if element could not be found within time
|
:raises TimeoutError: if element could not be found within time
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
async def attempt(effective_timeout:float) -> Element:
|
||||||
|
return await self._web_find_once(selector_type, selector_value, effective_timeout, parent = parent)
|
||||||
|
|
||||||
|
return await self._run_with_timeout_retries(
|
||||||
|
attempt,
|
||||||
|
description = f"web_find({selector_type.name}, {selector_value})",
|
||||||
|
key = "default",
|
||||||
|
override = timeout
|
||||||
|
)
|
||||||
|
|
||||||
|
async def web_find_all(self, selector_type:By, selector_value:str, *, parent:Element | None = None, timeout:int | float | None = None) -> list[Element]:
|
||||||
|
"""
|
||||||
|
Locates multiple HTML elements by the given selector type and value.
|
||||||
|
|
||||||
|
:param timeout: timeout in seconds (base value before multiplier/backoff)
|
||||||
|
:raises TimeoutError: if element could not be found within time
|
||||||
|
"""
|
||||||
|
|
||||||
|
async def attempt(effective_timeout:float) -> list[Element]:
|
||||||
|
return await self._web_find_all_once(selector_type, selector_value, effective_timeout, parent = parent)
|
||||||
|
|
||||||
|
return await self._run_with_timeout_retries(
|
||||||
|
attempt,
|
||||||
|
description = f"web_find_all({selector_type.name}, {selector_value})",
|
||||||
|
key = "default",
|
||||||
|
override = timeout
|
||||||
|
)
|
||||||
|
|
||||||
|
async def _web_find_once(self, selector_type:By, selector_value:str, timeout:float, *, parent:Element | None = None) -> Element:
|
||||||
|
timeout_suffix = f" within {timeout} seconds."
|
||||||
|
|
||||||
match selector_type:
|
match selector_type:
|
||||||
case By.ID:
|
case By.ID:
|
||||||
escaped_id = selector_value.translate(METACHAR_ESCAPER)
|
escaped_id = selector_value.translate(METACHAR_ESCAPER)
|
||||||
return await self.web_await(
|
return await self.web_await(
|
||||||
lambda: self.page.query_selector(f"#{escaped_id}", parent),
|
lambda: self.page.query_selector(f"#{escaped_id}", parent),
|
||||||
timeout = timeout,
|
timeout = timeout,
|
||||||
timeout_error_message = f"No HTML element found with ID '{selector_value}' within {timeout} seconds.")
|
timeout_error_message = f"No HTML element found with ID '{selector_value}'{timeout_suffix}",
|
||||||
|
apply_multiplier = False)
|
||||||
case By.CLASS_NAME:
|
case By.CLASS_NAME:
|
||||||
escaped_classname = selector_value.translate(METACHAR_ESCAPER)
|
escaped_classname = selector_value.translate(METACHAR_ESCAPER)
|
||||||
return await self.web_await(
|
return await self.web_await(
|
||||||
lambda: self.page.query_selector(f".{escaped_classname}", parent),
|
lambda: self.page.query_selector(f".{escaped_classname}", parent),
|
||||||
timeout = timeout,
|
timeout = timeout,
|
||||||
timeout_error_message = f"No HTML element found with CSS class '{selector_value}' within {timeout} seconds.")
|
timeout_error_message = f"No HTML element found with CSS class '{selector_value}'{timeout_suffix}",
|
||||||
|
apply_multiplier = False)
|
||||||
case By.TAG_NAME:
|
case By.TAG_NAME:
|
||||||
return await self.web_await(
|
return await self.web_await(
|
||||||
lambda: self.page.query_selector(selector_value, parent),
|
lambda: self.page.query_selector(selector_value, parent),
|
||||||
timeout = timeout,
|
timeout = timeout,
|
||||||
timeout_error_message = f"No HTML element found of tag <{selector_value}> within {timeout} seconds.")
|
timeout_error_message = f"No HTML element found of tag <{selector_value}>{timeout_suffix}",
|
||||||
|
apply_multiplier = False)
|
||||||
case By.CSS_SELECTOR:
|
case By.CSS_SELECTOR:
|
||||||
return await self.web_await(
|
return await self.web_await(
|
||||||
lambda: self.page.query_selector(selector_value, parent),
|
lambda: self.page.query_selector(selector_value, parent),
|
||||||
timeout = timeout,
|
timeout = timeout,
|
||||||
timeout_error_message = f"No HTML element found using CSS selector '{selector_value}' within {timeout} seconds.")
|
timeout_error_message = f"No HTML element found using CSS selector '{selector_value}'{timeout_suffix}",
|
||||||
|
apply_multiplier = False)
|
||||||
case By.TEXT:
|
case By.TEXT:
|
||||||
ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}")
|
ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}")
|
||||||
return await self.web_await(
|
return await self.web_await(
|
||||||
lambda: self.page.find_element_by_text(selector_value, best_match = True),
|
lambda: self.page.find_element_by_text(selector_value, best_match = True),
|
||||||
timeout = timeout,
|
timeout = timeout,
|
||||||
timeout_error_message = f"No HTML element found containing text '{selector_value}' within {timeout} seconds.")
|
timeout_error_message = f"No HTML element found containing text '{selector_value}'{timeout_suffix}",
|
||||||
|
apply_multiplier = False)
|
||||||
case By.XPATH:
|
case By.XPATH:
|
||||||
ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}")
|
ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}")
|
||||||
return await self.web_await(
|
return await self.web_await(
|
||||||
lambda: self.page.find_element_by_text(selector_value, best_match = True),
|
lambda: self.page.find_element_by_text(selector_value, best_match = True),
|
||||||
timeout = timeout,
|
timeout = timeout,
|
||||||
timeout_error_message = f"No HTML element found using XPath '{selector_value}' within {timeout} seconds.")
|
timeout_error_message = f"No HTML element found using XPath '{selector_value}'{timeout_suffix}",
|
||||||
|
apply_multiplier = False)
|
||||||
|
|
||||||
raise AssertionError(_("Unsupported selector type: %s") % selector_type)
|
raise AssertionError(_("Unsupported selector type: %s") % selector_type)
|
||||||
|
|
||||||
async def web_find_all(self, selector_type:By, selector_value:str, *, parent:Element | None = None, timeout:int | float = 5) -> list[Element]:
|
async def _web_find_all_once(self, selector_type:By, selector_value:str, timeout:float, *, parent:Element | None = None) -> list[Element]:
|
||||||
"""
|
timeout_suffix = f" within {timeout} seconds."
|
||||||
Locates an HTML element by ID.
|
|
||||||
|
|
||||||
:param timeout: timeout in seconds
|
|
||||||
:raises TimeoutError: if element could not be found within time
|
|
||||||
"""
|
|
||||||
match selector_type:
|
match selector_type:
|
||||||
case By.CLASS_NAME:
|
case By.CLASS_NAME:
|
||||||
escaped_classname = selector_value.translate(METACHAR_ESCAPER)
|
escaped_classname = selector_value.translate(METACHAR_ESCAPER)
|
||||||
return await self.web_await(
|
return await self.web_await(
|
||||||
lambda: self.page.query_selector_all(f".{escaped_classname}", parent),
|
lambda: self.page.query_selector_all(f".{escaped_classname}", parent),
|
||||||
timeout = timeout,
|
timeout = timeout,
|
||||||
timeout_error_message = f"No HTML elements found with CSS class '{selector_value}' within {timeout} seconds.")
|
timeout_error_message = f"No HTML elements found with CSS class '{selector_value}'{timeout_suffix}",
|
||||||
|
apply_multiplier = False)
|
||||||
case By.CSS_SELECTOR:
|
case By.CSS_SELECTOR:
|
||||||
return await self.web_await(
|
return await self.web_await(
|
||||||
lambda: self.page.query_selector_all(selector_value, parent),
|
lambda: self.page.query_selector_all(selector_value, parent),
|
||||||
timeout = timeout,
|
timeout = timeout,
|
||||||
timeout_error_message = f"No HTML elements found using CSS selector '{selector_value}' within {timeout} seconds.")
|
timeout_error_message = f"No HTML elements found using CSS selector '{selector_value}'{timeout_suffix}",
|
||||||
|
apply_multiplier = False)
|
||||||
case By.TAG_NAME:
|
case By.TAG_NAME:
|
||||||
return await self.web_await(
|
return await self.web_await(
|
||||||
lambda: self.page.query_selector_all(selector_value, parent),
|
lambda: self.page.query_selector_all(selector_value, parent),
|
||||||
timeout = timeout,
|
timeout = timeout,
|
||||||
timeout_error_message = f"No HTML elements found of tag <{selector_value}> within {timeout} seconds.")
|
timeout_error_message = f"No HTML elements found of tag <{selector_value}>{timeout_suffix}",
|
||||||
|
apply_multiplier = False)
|
||||||
case By.TEXT:
|
case By.TEXT:
|
||||||
ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}")
|
ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}")
|
||||||
return await self.web_await(
|
return await self.web_await(
|
||||||
lambda: self.page.find_elements_by_text(selector_value),
|
lambda: self.page.find_elements_by_text(selector_value),
|
||||||
timeout = timeout,
|
timeout = timeout,
|
||||||
timeout_error_message = f"No HTML elements found containing text '{selector_value}' within {timeout} seconds.")
|
timeout_error_message = f"No HTML elements found containing text '{selector_value}'{timeout_suffix}",
|
||||||
|
apply_multiplier = False)
|
||||||
case By.XPATH:
|
case By.XPATH:
|
||||||
ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}")
|
ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}")
|
||||||
return await self.web_await(
|
return await self.web_await(
|
||||||
lambda: self.page.find_elements_by_text(selector_value),
|
lambda: self.page.find_elements_by_text(selector_value),
|
||||||
timeout = timeout,
|
timeout = timeout,
|
||||||
timeout_error_message = f"No HTML elements found using XPath '{selector_value}' within {timeout} seconds.")
|
timeout_error_message = f"No HTML elements found using XPath '{selector_value}'{timeout_suffix}",
|
||||||
|
apply_multiplier = False)
|
||||||
|
|
||||||
raise AssertionError(_("Unsupported selector type: %s") % selector_type)
|
raise AssertionError(_("Unsupported selector type: %s") % selector_type)
|
||||||
|
|
||||||
async def web_input(self, selector_type:By, selector_value:str, text:str | int, *, timeout:int | float = 5) -> Element:
|
async def web_input(self, selector_type:By, selector_value:str, text:str | int, *, timeout:int | float | None = None) -> Element:
|
||||||
"""
|
"""
|
||||||
Enters text into an HTML input field.
|
Enters text into an HTML input field.
|
||||||
|
|
||||||
@@ -749,10 +862,10 @@ class WebScrapingMixin:
|
|||||||
await self.web_sleep()
|
await self.web_sleep()
|
||||||
return input_field
|
return input_field
|
||||||
|
|
||||||
async def web_open(self, url:str, *, timeout:int | float = 15_000, reload_if_already_open:bool = False) -> None:
|
async def web_open(self, url:str, *, timeout:int | float | None = None, reload_if_already_open:bool = False) -> None:
|
||||||
"""
|
"""
|
||||||
:param url: url to open in browser
|
:param url: url to open in browser
|
||||||
:param timeout: timespan in seconds within the page needs to be loaded
|
:param timeout: timespan in seconds within the page needs to be loaded (base value)
|
||||||
:param reload_if_already_open: if False does nothing if the URL is already open in the browser
|
:param reload_if_already_open: if False does nothing if the URL is already open in the browser
|
||||||
:raises TimeoutException: if page did not open within given timespan
|
:raises TimeoutException: if page did not open within given timespan
|
||||||
"""
|
"""
|
||||||
@@ -761,10 +874,15 @@ class WebScrapingMixin:
|
|||||||
LOG.debug(" => skipping, [%s] is already open", url)
|
LOG.debug(" => skipping, [%s] is already open", url)
|
||||||
return
|
return
|
||||||
self.page = await self.browser.get(url = url, new_tab = False, new_window = False)
|
self.page = await self.browser.get(url = url, new_tab = False, new_window = False)
|
||||||
await self.web_await(lambda: self.web_execute("document.readyState == 'complete'"), timeout = timeout,
|
page_timeout = self._effective_timeout("page_load", timeout)
|
||||||
timeout_error_message = f"Page did not finish loading within {timeout} seconds.")
|
await self.web_await(
|
||||||
|
lambda: self.web_execute("document.readyState == 'complete'"),
|
||||||
|
timeout = page_timeout,
|
||||||
|
timeout_error_message = f"Page did not finish loading within {page_timeout} seconds.",
|
||||||
|
apply_multiplier = False
|
||||||
|
)
|
||||||
|
|
||||||
async def web_text(self, selector_type:By, selector_value:str, *, parent:Element | None = None, timeout:int | float = 5) -> str:
|
async def web_text(self, selector_type:By, selector_value:str, *, parent:Element | None = None, timeout:int | float | None = None) -> str:
|
||||||
return str(await (await self.web_find(selector_type, selector_value, parent = parent, timeout = timeout)).apply("""
|
return str(await (await self.web_find(selector_type, selector_value, parent = parent, timeout = timeout)).apply("""
|
||||||
function (elem) {
|
function (elem) {
|
||||||
let sel = window.getSelection()
|
let sel = window.getSelection()
|
||||||
@@ -835,7 +953,7 @@ class WebScrapingMixin:
|
|||||||
await self.web_execute(f"window.scrollTo(0, {current_y_pos})")
|
await self.web_execute(f"window.scrollTo(0, {current_y_pos})")
|
||||||
await asyncio.sleep(scroll_length / scroll_speed / 2) # double speed
|
await asyncio.sleep(scroll_length / scroll_speed / 2) # double speed
|
||||||
|
|
||||||
async def web_select(self, selector_type:By, selector_value:str, selected_value:Any, timeout:int | float = 5) -> Element:
|
async def web_select(self, selector_type:By, selector_value:str, selected_value:Any, timeout:int | float | None = None) -> Element:
|
||||||
"""
|
"""
|
||||||
Selects an <option/> of a <select/> HTML element.
|
Selects an <option/> of a <select/> HTML element.
|
||||||
|
|
||||||
@@ -895,7 +1013,11 @@ class WebScrapingMixin:
|
|||||||
port_available = await self._check_port_with_retry(remote_host, remote_port)
|
port_available = await self._check_port_with_retry(remote_host, remote_port)
|
||||||
if port_available:
|
if port_available:
|
||||||
try:
|
try:
|
||||||
version_info = detect_chrome_version_from_remote_debugging(remote_host, remote_port)
|
version_info = detect_chrome_version_from_remote_debugging(
|
||||||
|
remote_host,
|
||||||
|
remote_port,
|
||||||
|
timeout = self._effective_timeout("chrome_remote_debugging")
|
||||||
|
)
|
||||||
if version_info:
|
if version_info:
|
||||||
LOG.debug(" -> Detected version from existing browser: %s", version_info)
|
LOG.debug(" -> Detected version from existing browser: %s", version_info)
|
||||||
else:
|
else:
|
||||||
@@ -910,7 +1032,10 @@ class WebScrapingMixin:
|
|||||||
binary_path = self.browser_config.binary_location
|
binary_path = self.browser_config.binary_location
|
||||||
if binary_path:
|
if binary_path:
|
||||||
LOG.debug(" -> No remote browser detected, trying binary detection")
|
LOG.debug(" -> No remote browser detected, trying binary detection")
|
||||||
version_info = detect_chrome_version_from_binary(binary_path)
|
version_info = detect_chrome_version_from_binary(
|
||||||
|
binary_path,
|
||||||
|
timeout = self._effective_timeout("chrome_binary_detection")
|
||||||
|
)
|
||||||
|
|
||||||
# Validate if Chrome 136+ detected
|
# Validate if Chrome 136+ detected
|
||||||
if version_info and version_info.is_chrome_136_plus:
|
if version_info and version_info.is_chrome_136_plus:
|
||||||
@@ -977,7 +1102,10 @@ class WebScrapingMixin:
|
|||||||
binary_path = self.browser_config.binary_location
|
binary_path = self.browser_config.binary_location
|
||||||
diagnostic_info = get_chrome_version_diagnostic_info(
|
diagnostic_info = get_chrome_version_diagnostic_info(
|
||||||
binary_path = binary_path,
|
binary_path = binary_path,
|
||||||
remote_port = remote_port if remote_port > 0 else None
|
remote_host = "127.0.0.1",
|
||||||
|
remote_port = remote_port if remote_port > 0 else None,
|
||||||
|
remote_timeout = self._effective_timeout("chrome_remote_debugging"),
|
||||||
|
binary_timeout = self._effective_timeout("chrome_binary_detection")
|
||||||
)
|
)
|
||||||
|
|
||||||
# Report binary detection results
|
# Report binary detection results
|
||||||
|
|||||||
@@ -1,7 +1,9 @@
|
|||||||
# SPDX-FileCopyrightText: © Sebastian Thomschke and contributors
|
# SPDX-FileCopyrightText: © Sebastian Thomschke and contributors
|
||||||
# SPDX-License-Identifier: AGPL-3.0-or-later
|
# SPDX-License-Identifier: AGPL-3.0-or-later
|
||||||
# SPDX-ArtifactOfProjectHomePage: https://github.com/Second-Hand-Friends/kleinanzeigen-bot/
|
# SPDX-ArtifactOfProjectHomePage: https://github.com/Second-Hand-Friends/kleinanzeigen-bot/
|
||||||
from kleinanzeigen_bot.model.config_model import AdDefaults, Config
|
import pytest
|
||||||
|
|
||||||
|
from kleinanzeigen_bot.model.config_model import AdDefaults, Config, TimeoutConfig
|
||||||
|
|
||||||
|
|
||||||
def test_migrate_legacy_description_prefix() -> None:
|
def test_migrate_legacy_description_prefix() -> None:
|
||||||
@@ -74,3 +76,50 @@ def test_minimal_config_validation() -> None:
|
|||||||
config = Config.model_validate(minimal_cfg)
|
config = Config.model_validate(minimal_cfg)
|
||||||
assert config.login.username == "dummy"
|
assert config.login.username == "dummy"
|
||||||
assert config.login.password == "dummy" # noqa: S105
|
assert config.login.password == "dummy" # noqa: S105
|
||||||
|
|
||||||
|
|
||||||
|
def test_timeout_config_defaults_and_effective_values() -> None:
|
||||||
|
cfg = Config.model_validate({
|
||||||
|
"login": {"username": "dummy", "password": "dummy"}, # noqa: S105
|
||||||
|
"timeouts": {
|
||||||
|
"multiplier": 2.0,
|
||||||
|
"pagination_initial": 12.0,
|
||||||
|
"retry_max_attempts": 3,
|
||||||
|
"retry_backoff_factor": 2.0
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
timeouts = cfg.timeouts
|
||||||
|
base = timeouts.resolve("pagination_initial")
|
||||||
|
multiplier = timeouts.multiplier
|
||||||
|
backoff = timeouts.retry_backoff_factor
|
||||||
|
assert base == 12.0
|
||||||
|
assert timeouts.effective("pagination_initial") == base * multiplier * (backoff ** 0)
|
||||||
|
# attempt 1 should apply backoff factor once in addition to multiplier
|
||||||
|
assert timeouts.effective("pagination_initial", attempt = 1) == base * multiplier * (backoff ** 1)
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_glob_pattern_rejects_blank_strings() -> None:
|
||||||
|
with pytest.raises(ValueError, match = "must be a non-empty, non-blank glob pattern"):
|
||||||
|
Config.model_validate({
|
||||||
|
"ad_files": [" "],
|
||||||
|
"ad_defaults": {"contact": {"name": "dummy", "zipcode": "12345"}},
|
||||||
|
"login": {"username": "dummy", "password": "dummy"}
|
||||||
|
})
|
||||||
|
|
||||||
|
cfg = Config.model_validate({
|
||||||
|
"ad_files": ["*.yaml"],
|
||||||
|
"ad_defaults": {"contact": {"name": "dummy", "zipcode": "12345"}},
|
||||||
|
"login": {"username": "dummy", "password": "dummy"}
|
||||||
|
})
|
||||||
|
assert cfg.ad_files == ["*.yaml"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_timeout_config_resolve_returns_specific_value() -> None:
|
||||||
|
timeouts = TimeoutConfig(default = 4.0, page_load = 12.5)
|
||||||
|
assert timeouts.resolve("page_load") == 12.5
|
||||||
|
|
||||||
|
|
||||||
|
def test_timeout_config_resolve_falls_back_to_default() -> None:
|
||||||
|
timeouts = TimeoutConfig(default = 3.0)
|
||||||
|
assert timeouts.resolve("nonexistent_key") == 3.0
|
||||||
|
|||||||
@@ -412,6 +412,60 @@ class TestAdExtractorNavigation:
|
|||||||
call(By.CLASS_NAME, "cardbox", parent = ad_list_container_mock),
|
call(By.CLASS_NAME, "cardbox", parent = ad_list_container_mock),
|
||||||
], any_order = False)
|
], any_order = False)
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_extract_own_ads_urls_paginates_with_enabled_next_button(self, test_extractor:AdExtractor) -> None:
|
||||||
|
"""Ensure the paginator clicks the first enabled next button and advances."""
|
||||||
|
ad_list_container_mock = MagicMock()
|
||||||
|
pagination_section_mock = MagicMock()
|
||||||
|
cardbox_page_one = MagicMock()
|
||||||
|
cardbox_page_two = MagicMock()
|
||||||
|
link_page_one = MagicMock(attrs = {"href": "/s-anzeige/page-one/111"})
|
||||||
|
link_page_two = MagicMock(attrs = {"href": "/s-anzeige/page-two/222"})
|
||||||
|
|
||||||
|
next_button_enabled = AsyncMock()
|
||||||
|
next_button_enabled.attrs = {}
|
||||||
|
disabled_button = MagicMock()
|
||||||
|
disabled_button.attrs = {"disabled": True}
|
||||||
|
|
||||||
|
link_queue = [link_page_one, link_page_two]
|
||||||
|
next_button_call = {"count": 0}
|
||||||
|
cardbox_call = {"count": 0}
|
||||||
|
|
||||||
|
async def fake_web_find(selector_type:By, selector_value:str, *, parent:Element | None = None,
|
||||||
|
timeout:int | float | None = None) -> Element:
|
||||||
|
if selector_type == By.ID and selector_value == "my-manageitems-adlist":
|
||||||
|
return ad_list_container_mock
|
||||||
|
if selector_type == By.CSS_SELECTOR and selector_value == ".Pagination":
|
||||||
|
return pagination_section_mock
|
||||||
|
if selector_type == By.CSS_SELECTOR and selector_value == "div h3 a.text-onSurface":
|
||||||
|
return link_queue.pop(0)
|
||||||
|
raise AssertionError(f"Unexpected selector {selector_type} {selector_value}")
|
||||||
|
|
||||||
|
async def fake_web_find_all(selector_type:By, selector_value:str, *, parent:Element | None = None,
|
||||||
|
timeout:int | float | None = None) -> list[Element]:
|
||||||
|
if selector_type == By.CSS_SELECTOR and selector_value == 'button[aria-label="Nächste"]':
|
||||||
|
next_button_call["count"] += 1
|
||||||
|
if next_button_call["count"] == 1:
|
||||||
|
return [next_button_enabled] # initial detection -> multi page
|
||||||
|
if next_button_call["count"] == 2:
|
||||||
|
return [disabled_button, next_button_enabled] # navigation on page 1
|
||||||
|
return [] # after navigating, stop
|
||||||
|
if selector_type == By.CLASS_NAME and selector_value == "cardbox":
|
||||||
|
cardbox_call["count"] += 1
|
||||||
|
return [cardbox_page_one] if cardbox_call["count"] == 1 else [cardbox_page_two]
|
||||||
|
raise AssertionError(f"Unexpected find_all selector {selector_type} {selector_value}")
|
||||||
|
|
||||||
|
with patch.object(test_extractor, "web_open", new_callable = AsyncMock), \
|
||||||
|
patch.object(test_extractor, "web_scroll_page_down", new_callable = AsyncMock), \
|
||||||
|
patch.object(test_extractor, "web_sleep", new_callable = AsyncMock), \
|
||||||
|
patch.object(test_extractor, "web_find", new_callable = AsyncMock, side_effect = fake_web_find), \
|
||||||
|
patch.object(test_extractor, "web_find_all", new_callable = AsyncMock, side_effect = fake_web_find_all):
|
||||||
|
|
||||||
|
refs = await test_extractor.extract_own_ads_urls()
|
||||||
|
|
||||||
|
assert refs == ["/s-anzeige/page-one/111", "/s-anzeige/page-two/222"]
|
||||||
|
next_button_enabled.click.assert_awaited() # triggered once during navigation
|
||||||
|
|
||||||
|
|
||||||
class TestAdExtractorContent:
|
class TestAdExtractorContent:
|
||||||
"""Tests for content extraction functionality."""
|
"""Tests for content extraction functionality."""
|
||||||
@@ -641,6 +695,24 @@ class TestAdExtractorCategory:
|
|||||||
mock_web_find.assert_any_call(By.CSS_SELECTOR, "a:nth-of-type(3)", parent = category_line)
|
mock_web_find.assert_any_call(By.CSS_SELECTOR, "a:nth-of-type(3)", parent = category_line)
|
||||||
mock_web_find_all.assert_awaited_once_with(By.CSS_SELECTOR, "a", parent = category_line)
|
mock_web_find_all.assert_awaited_once_with(By.CSS_SELECTOR, "a", parent = category_line)
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_extract_category_legacy_selectors_timeout(self, extractor:AdExtractor, caplog:pytest.LogCaptureFixture) -> None:
|
||||||
|
"""Ensure fallback timeout logs the error and re-raises with translated message."""
|
||||||
|
category_line = MagicMock()
|
||||||
|
|
||||||
|
async def fake_web_find(selector_type:By, selector_value:str, *, parent:Element | None = None,
|
||||||
|
timeout:int | float | None = None) -> Element:
|
||||||
|
if selector_type == By.ID and selector_value == "vap-brdcrmb":
|
||||||
|
return category_line
|
||||||
|
raise TimeoutError("legacy selectors missing")
|
||||||
|
|
||||||
|
with patch.object(extractor, "web_find", new_callable = AsyncMock, side_effect = fake_web_find), \
|
||||||
|
patch.object(extractor, "web_find_all", new_callable = AsyncMock, side_effect = TimeoutError), \
|
||||||
|
caplog.at_level("ERROR"), pytest.raises(TimeoutError, match = "Unable to locate breadcrumb fallback selectors"):
|
||||||
|
await extractor._extract_category_from_ad_page()
|
||||||
|
|
||||||
|
assert any("Legacy breadcrumb selectors not found" in record.message for record in caplog.records)
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
# pylint: disable=protected-access
|
# pylint: disable=protected-access
|
||||||
async def test_extract_special_attributes_empty(self, extractor:AdExtractor) -> None:
|
async def test_extract_special_attributes_empty(self, extractor:AdExtractor) -> None:
|
||||||
|
|||||||
@@ -95,6 +95,18 @@ class TestUpdateChecker:
|
|||||||
with patch("requests.get", return_value = MagicMock(json = lambda: {"target_commitish": "e7a3d46"})):
|
with patch("requests.get", return_value = MagicMock(json = lambda: {"target_commitish": "e7a3d46"})):
|
||||||
assert checker._get_release_commit("latest") == "e7a3d46"
|
assert checker._get_release_commit("latest") == "e7a3d46"
|
||||||
|
|
||||||
|
def test_request_timeout_uses_config(self, config:Config, mocker:"MockerFixture") -> None:
|
||||||
|
"""Ensure HTTP calls honor the timeout configuration."""
|
||||||
|
config.timeouts.multiplier = 1.5
|
||||||
|
checker = UpdateChecker(config)
|
||||||
|
mock_response = MagicMock(json = lambda: {"target_commitish": "abc"})
|
||||||
|
mock_get = mocker.patch("requests.get", return_value = mock_response)
|
||||||
|
|
||||||
|
checker._get_release_commit("latest")
|
||||||
|
|
||||||
|
expected_timeout = config.timeouts.effective("update_check")
|
||||||
|
assert mock_get.call_args.kwargs["timeout"] == expected_timeout
|
||||||
|
|
||||||
def test_get_commit_date(self, config:Config) -> None:
|
def test_get_commit_date(self, config:Config) -> None:
|
||||||
"""Test that the commit date is correctly retrieved from the GitHub API."""
|
"""Test that the commit date is correctly retrieved from the GitHub API."""
|
||||||
checker = UpdateChecker(config)
|
checker = UpdateChecker(config)
|
||||||
|
|||||||
@@ -8,12 +8,14 @@ All rights reserved.
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
import json
|
import json
|
||||||
|
import logging
|
||||||
import os
|
import os
|
||||||
import platform
|
import platform
|
||||||
import shutil
|
import shutil
|
||||||
import zipfile
|
import zipfile
|
||||||
|
from collections.abc import Awaitable, Callable
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import NoReturn, Protocol, cast
|
from typing import Any, NoReturn, Protocol, cast
|
||||||
from unittest.mock import AsyncMock, MagicMock, Mock, mock_open, patch
|
from unittest.mock import AsyncMock, MagicMock, Mock, mock_open, patch
|
||||||
|
|
||||||
import nodriver
|
import nodriver
|
||||||
@@ -22,6 +24,7 @@ import pytest
|
|||||||
from nodriver.core.element import Element
|
from nodriver.core.element import Element
|
||||||
from nodriver.core.tab import Tab as Page
|
from nodriver.core.tab import Tab as Page
|
||||||
|
|
||||||
|
from kleinanzeigen_bot.model.config_model import Config
|
||||||
from kleinanzeigen_bot.utils import loggers
|
from kleinanzeigen_bot.utils import loggers
|
||||||
from kleinanzeigen_bot.utils.web_scraping_mixin import By, Is, WebScrapingMixin, _is_admin # noqa: PLC2701
|
from kleinanzeigen_bot.utils.web_scraping_mixin import By, Is, WebScrapingMixin, _is_admin # noqa: PLC2701
|
||||||
|
|
||||||
@@ -32,7 +35,13 @@ class ConfigProtocol(Protocol):
|
|||||||
browser_args:list[str]
|
browser_args:list[str]
|
||||||
user_data_dir:str | None
|
user_data_dir:str | None
|
||||||
|
|
||||||
def add_extension(self, ext:str) -> None: ...
|
def add_extension(self, ext:str) -> None:
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
def _nodriver_start_mock() -> Mock:
|
||||||
|
"""Return the nodriver.start mock with proper typing."""
|
||||||
|
return cast(Mock, cast(Any, nodriver).start)
|
||||||
|
|
||||||
|
|
||||||
class TrulyAwaitableMockPage:
|
class TrulyAwaitableMockPage:
|
||||||
@@ -82,6 +91,7 @@ def web_scraper(mock_browser:AsyncMock, mock_page:TrulyAwaitableMockPage) -> Web
|
|||||||
scraper = WebScrapingMixin()
|
scraper = WebScrapingMixin()
|
||||||
scraper.browser = mock_browser
|
scraper.browser = mock_browser
|
||||||
scraper.page = mock_page # type: ignore[unused-ignore,reportAttributeAccessIssue]
|
scraper.page = mock_page # type: ignore[unused-ignore,reportAttributeAccessIssue]
|
||||||
|
scraper.config = Config.model_validate({"login": {"username": "user@example.com", "password": "secret"}}) # noqa: S105
|
||||||
return scraper
|
return scraper
|
||||||
|
|
||||||
|
|
||||||
@@ -156,6 +166,21 @@ class TestWebScrapingErrorHandling:
|
|||||||
with pytest.raises(Exception, match = "Cannot clear input"):
|
with pytest.raises(Exception, match = "Cannot clear input"):
|
||||||
await web_scraper.web_input(By.ID, "test-id", "test text")
|
await web_scraper.web_input(By.ID, "test-id", "test text")
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_web_input_success_returns_element(self, web_scraper:WebScrapingMixin, mock_page:TrulyAwaitableMockPage) -> None:
|
||||||
|
"""Successful web_input should send keys, wait, and return the element."""
|
||||||
|
mock_element = AsyncMock(spec = Element)
|
||||||
|
mock_page.query_selector.return_value = mock_element
|
||||||
|
mock_sleep = AsyncMock()
|
||||||
|
cast(Any, web_scraper).web_sleep = mock_sleep
|
||||||
|
|
||||||
|
result = await web_scraper.web_input(By.ID, "username", "hello world", timeout = 1)
|
||||||
|
|
||||||
|
assert result is mock_element
|
||||||
|
mock_element.clear_input.assert_awaited_once()
|
||||||
|
mock_element.send_keys.assert_awaited_once_with("hello world")
|
||||||
|
mock_sleep.assert_awaited_once()
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
async def test_web_open_timeout(self, web_scraper:WebScrapingMixin, mock_browser:AsyncMock) -> None:
|
async def test_web_open_timeout(self, web_scraper:WebScrapingMixin, mock_browser:AsyncMock) -> None:
|
||||||
"""Test page load timeout in web_open."""
|
"""Test page load timeout in web_open."""
|
||||||
@@ -173,6 +198,19 @@ class TestWebScrapingErrorHandling:
|
|||||||
with pytest.raises(TimeoutError, match = "Page did not finish loading within"):
|
with pytest.raises(TimeoutError, match = "Page did not finish loading within"):
|
||||||
await web_scraper.web_open("https://example.com", timeout = 0.1)
|
await web_scraper.web_open("https://example.com", timeout = 0.1)
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_web_open_skip_when_url_already_loaded(self, web_scraper:WebScrapingMixin, mock_browser:AsyncMock, mock_page:TrulyAwaitableMockPage) -> None:
|
||||||
|
"""web_open should short-circuit when the requested URL is already active."""
|
||||||
|
mock_browser.get.reset_mock()
|
||||||
|
mock_page.url = "https://example.com"
|
||||||
|
mock_execute = AsyncMock()
|
||||||
|
cast(Any, web_scraper).web_execute = mock_execute
|
||||||
|
|
||||||
|
await web_scraper.web_open("https://example.com", reload_if_already_open = False)
|
||||||
|
|
||||||
|
mock_browser.get.assert_not_awaited()
|
||||||
|
mock_execute.assert_not_called()
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
async def test_web_request_invalid_response(self, web_scraper:WebScrapingMixin, mock_page:TrulyAwaitableMockPage) -> None:
|
async def test_web_request_invalid_response(self, web_scraper:WebScrapingMixin, mock_page:TrulyAwaitableMockPage) -> None:
|
||||||
"""Test invalid response handling in web_request."""
|
"""Test invalid response handling in web_request."""
|
||||||
@@ -216,6 +254,179 @@ class TestWebScrapingErrorHandling:
|
|||||||
with pytest.raises(Exception, match = "Attribute error"):
|
with pytest.raises(Exception, match = "Attribute error"):
|
||||||
await web_scraper.web_check(By.ID, "test-id", Is.DISPLAYED)
|
await web_scraper.web_check(By.ID, "test-id", Is.DISPLAYED)
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_web_find_applies_timeout_multiplier_and_backoff(self, web_scraper:WebScrapingMixin) -> None:
|
||||||
|
"""Ensure multiplier/backoff logic is honored when timeouts occur."""
|
||||||
|
assert web_scraper.config is not None
|
||||||
|
web_scraper.config.timeouts.multiplier = 2.0
|
||||||
|
web_scraper.config.timeouts.retry_enabled = True
|
||||||
|
web_scraper.config.timeouts.retry_max_attempts = 2
|
||||||
|
web_scraper.config.timeouts.retry_backoff_factor = 2.0
|
||||||
|
|
||||||
|
recorded:list[tuple[float, bool]] = []
|
||||||
|
|
||||||
|
async def fake_web_await(condition:Callable[[], object], *, timeout:float, timeout_error_message:str = "",
|
||||||
|
apply_multiplier:bool = True) -> Element:
|
||||||
|
recorded.append((timeout, apply_multiplier))
|
||||||
|
raise TimeoutError(timeout_error_message or "timeout")
|
||||||
|
|
||||||
|
cast(Any, web_scraper).web_await = fake_web_await
|
||||||
|
|
||||||
|
with pytest.raises(TimeoutError):
|
||||||
|
await web_scraper.web_find(By.ID, "test-id", timeout = 0.5)
|
||||||
|
|
||||||
|
assert recorded == [(1.0, False), (2.0, False), (4.0, False)]
|
||||||
|
|
||||||
|
|
||||||
|
class TestTimeoutAndRetryHelpers:
|
||||||
|
"""Test timeout helper utilities in WebScrapingMixin."""
|
||||||
|
|
||||||
|
def test_get_timeout_config_prefers_config_timeouts(self, web_scraper:WebScrapingMixin) -> None:
|
||||||
|
"""_get_timeout_config should return the config-provided timeout model when available."""
|
||||||
|
custom_config = Config.model_validate({
|
||||||
|
"login": {"username": "user@example.com", "password": "secret"}, # noqa: S105
|
||||||
|
"timeouts": {"default": 7.5}
|
||||||
|
})
|
||||||
|
web_scraper.config = custom_config
|
||||||
|
|
||||||
|
assert web_scraper._get_timeout_config() is custom_config.timeouts
|
||||||
|
|
||||||
|
def test_timeout_attempts_respects_retry_switch(self, web_scraper:WebScrapingMixin) -> None:
|
||||||
|
"""_timeout_attempts should collapse to a single attempt when retries are disabled."""
|
||||||
|
web_scraper.config.timeouts.retry_enabled = False
|
||||||
|
assert web_scraper._timeout_attempts() == 1
|
||||||
|
|
||||||
|
web_scraper.config.timeouts.retry_enabled = True
|
||||||
|
web_scraper.config.timeouts.retry_max_attempts = 3
|
||||||
|
assert web_scraper._timeout_attempts() == 4
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_run_with_timeout_retries_retries_operation(self, web_scraper:WebScrapingMixin) -> None:
|
||||||
|
"""_run_with_timeout_retries should retry when TimeoutError is raised before succeeding."""
|
||||||
|
attempts:list[float] = []
|
||||||
|
|
||||||
|
async def flaky_operation(timeout:float) -> str:
|
||||||
|
attempts.append(timeout)
|
||||||
|
if len(attempts) == 1:
|
||||||
|
raise TimeoutError("first attempt")
|
||||||
|
return "done"
|
||||||
|
|
||||||
|
web_scraper.config.timeouts.retry_max_attempts = 1
|
||||||
|
result = await web_scraper._run_with_timeout_retries(flaky_operation, description = "retry-op")
|
||||||
|
|
||||||
|
assert result == "done"
|
||||||
|
assert len(attempts) == 2
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_run_with_timeout_retries_guard_clause(self, web_scraper:WebScrapingMixin) -> None:
|
||||||
|
"""_run_with_timeout_retries should guard against zero-attempt edge cases."""
|
||||||
|
async def never_called(timeout:float) -> None:
|
||||||
|
pytest.fail("operation should not run when attempts are zero")
|
||||||
|
|
||||||
|
with patch.object(web_scraper, "_timeout_attempts", return_value = 0), \
|
||||||
|
pytest.raises(TimeoutError, match = "guarded-op failed without executing operation"):
|
||||||
|
await web_scraper._run_with_timeout_retries(never_called, description = "guarded-op")
|
||||||
|
|
||||||
|
|
||||||
|
class TestSelectorTimeoutMessages:
|
||||||
|
"""Ensure selector helpers provide informative timeout messages."""
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
@pytest.mark.parametrize(
|
||||||
|
("selector_type", "selector_value", "expected_message"),
|
||||||
|
[
|
||||||
|
(By.TAG_NAME, "section", "No HTML element found of tag <section> within 2.0 seconds."),
|
||||||
|
(By.CSS_SELECTOR, ".hero", "No HTML element found using CSS selector '.hero' within 2.0 seconds."),
|
||||||
|
(By.TEXT, "Submit", "No HTML element found containing text 'Submit' within 2.0 seconds."),
|
||||||
|
(By.XPATH, "//div[@class='hero']", "No HTML element found using XPath '//div[@class='hero']' within 2.0 seconds."),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
async def test_web_find_timeout_suffixes(
|
||||||
|
self,
|
||||||
|
web_scraper:WebScrapingMixin,
|
||||||
|
selector_type:By,
|
||||||
|
selector_value:str,
|
||||||
|
expected_message:str
|
||||||
|
) -> None:
|
||||||
|
"""web_find should pass descriptive timeout messages for every selector strategy."""
|
||||||
|
mock_element = AsyncMock(spec = Element)
|
||||||
|
mock_wait = AsyncMock(return_value = mock_element)
|
||||||
|
cast(Any, web_scraper).web_await = mock_wait
|
||||||
|
|
||||||
|
result = await web_scraper.web_find(selector_type, selector_value, timeout = 2)
|
||||||
|
|
||||||
|
assert result is mock_element
|
||||||
|
call = mock_wait.await_args_list[0]
|
||||||
|
assert expected_message == call.kwargs["timeout_error_message"]
|
||||||
|
assert call.kwargs["apply_multiplier"] is False
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
@pytest.mark.parametrize(
|
||||||
|
("selector_type", "selector_value", "expected_message"),
|
||||||
|
[
|
||||||
|
(By.CLASS_NAME, "hero", "No HTML elements found with CSS class 'hero' within 1 seconds."),
|
||||||
|
(By.CSS_SELECTOR, ".card", "No HTML elements found using CSS selector '.card' within 1 seconds."),
|
||||||
|
(By.TAG_NAME, "article", "No HTML elements found of tag <article> within 1 seconds."),
|
||||||
|
(By.TEXT, "Listings", "No HTML elements found containing text 'Listings' within 1 seconds."),
|
||||||
|
(By.XPATH, "//footer", "No HTML elements found using XPath '//footer' within 1 seconds."),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
async def test_web_find_all_once_timeout_suffixes(
|
||||||
|
self,
|
||||||
|
web_scraper:WebScrapingMixin,
|
||||||
|
selector_type:By,
|
||||||
|
selector_value:str,
|
||||||
|
expected_message:str
|
||||||
|
) -> None:
|
||||||
|
"""_web_find_all_once should surface informative timeout errors for each selector."""
|
||||||
|
elements = [AsyncMock(spec = Element)]
|
||||||
|
mock_wait = AsyncMock(return_value = elements)
|
||||||
|
cast(Any, web_scraper).web_await = mock_wait
|
||||||
|
|
||||||
|
result = await web_scraper._web_find_all_once(selector_type, selector_value, 1)
|
||||||
|
|
||||||
|
assert result is elements
|
||||||
|
call = mock_wait.await_args_list[0]
|
||||||
|
assert expected_message == call.kwargs["timeout_error_message"]
|
||||||
|
assert call.kwargs["apply_multiplier"] is False
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_web_find_all_delegates_to_retry_helper(self, web_scraper:WebScrapingMixin) -> None:
|
||||||
|
"""web_find_all should execute via the timeout retry helper."""
|
||||||
|
elements = [AsyncMock(spec = Element)]
|
||||||
|
|
||||||
|
async def fake_retry(operation:Callable[[float], Awaitable[list[Element]]], **kwargs:Any) -> list[Element]:
|
||||||
|
assert kwargs["description"] == "web_find_all(CLASS_NAME, hero)"
|
||||||
|
assert kwargs["override"] == 1.5
|
||||||
|
result = await operation(0.42)
|
||||||
|
return result
|
||||||
|
|
||||||
|
retry_mock = AsyncMock(side_effect = fake_retry)
|
||||||
|
once_mock = AsyncMock(return_value = elements)
|
||||||
|
cast(Any, web_scraper)._run_with_timeout_retries = retry_mock
|
||||||
|
cast(Any, web_scraper)._web_find_all_once = once_mock
|
||||||
|
|
||||||
|
result = await web_scraper.web_find_all(By.CLASS_NAME, "hero", timeout = 1.5)
|
||||||
|
|
||||||
|
assert result is elements
|
||||||
|
retry_call = retry_mock.await_args_list[0]
|
||||||
|
assert retry_call.kwargs["key"] == "default"
|
||||||
|
assert retry_call.kwargs["override"] == 1.5
|
||||||
|
|
||||||
|
once_call = once_mock.await_args_list[0]
|
||||||
|
assert once_call.args[:2] == (By.CLASS_NAME, "hero")
|
||||||
|
assert once_call.args[2] == 0.42
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_web_check_unsupported_attribute(self, web_scraper:WebScrapingMixin, mock_page:TrulyAwaitableMockPage) -> None:
|
||||||
|
"""web_check should raise for unsupported attribute queries."""
|
||||||
|
mock_element = AsyncMock(spec = Element)
|
||||||
|
mock_element.attrs = {}
|
||||||
|
mock_page.query_selector.return_value = mock_element
|
||||||
|
|
||||||
|
with pytest.raises(AssertionError, match = "Unsupported attribute"):
|
||||||
|
await web_scraper.web_check(By.ID, "test-id", cast(Is, object()), timeout = 0.1)
|
||||||
|
|
||||||
|
|
||||||
class TestWebScrapingSessionManagement:
|
class TestWebScrapingSessionManagement:
|
||||||
"""Test session management edge cases in WebScrapingMixin."""
|
"""Test session management edge cases in WebScrapingMixin."""
|
||||||
@@ -299,6 +510,39 @@ class TestWebScrapingSessionManagement:
|
|||||||
assert scraper.browser is None
|
assert scraper.browser is None
|
||||||
assert scraper.page is None
|
assert scraper.page is None
|
||||||
|
|
||||||
|
|
||||||
|
class TestWebScrolling:
|
||||||
|
"""Test scrolling helpers."""
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_web_scroll_page_down_scrolls_and_returns(self, web_scraper:WebScrapingMixin) -> None:
|
||||||
|
"""web_scroll_page_down should scroll both directions when requested."""
|
||||||
|
scripts:list[str] = []
|
||||||
|
|
||||||
|
async def exec_side_effect(script:str) -> int | None:
|
||||||
|
scripts.append(script)
|
||||||
|
if script == "document.body.scrollHeight":
|
||||||
|
return 20
|
||||||
|
return None
|
||||||
|
|
||||||
|
cast(Any, web_scraper).web_execute = AsyncMock(side_effect = exec_side_effect)
|
||||||
|
|
||||||
|
with patch("kleinanzeigen_bot.utils.web_scraping_mixin.asyncio.sleep", new_callable = AsyncMock) as mock_sleep:
|
||||||
|
await web_scraper.web_scroll_page_down(scroll_length = 10, scroll_speed = 10, scroll_back_top = True)
|
||||||
|
|
||||||
|
assert scripts[0] == "document.body.scrollHeight"
|
||||||
|
# Expect four scrollTo operations: two down, two up
|
||||||
|
assert scripts.count("document.body.scrollHeight") == 1
|
||||||
|
scroll_calls = [script for script in scripts if script.startswith("window.scrollTo")]
|
||||||
|
assert scroll_calls == [
|
||||||
|
"window.scrollTo(0, 10)",
|
||||||
|
"window.scrollTo(0, 20)",
|
||||||
|
"window.scrollTo(0, 10)",
|
||||||
|
"window.scrollTo(0, 0)"
|
||||||
|
]
|
||||||
|
sleep_durations = [call.args[0] for call in mock_sleep.await_args_list]
|
||||||
|
assert sleep_durations == [1.0, 1.0, 0.5, 0.5]
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
async def test_session_expiration_handling(self, web_scraper:WebScrapingMixin, mock_browser:AsyncMock) -> None:
|
async def test_session_expiration_handling(self, web_scraper:WebScrapingMixin, mock_browser:AsyncMock) -> None:
|
||||||
"""Test handling of expired browser sessions."""
|
"""Test handling of expired browser sessions."""
|
||||||
@@ -468,7 +712,7 @@ class TestWebScrapingBrowserConfiguration:
|
|||||||
def add_extension(self, ext:str) -> None:
|
def add_extension(self, ext:str) -> None:
|
||||||
self._extensions.append(ext) # Use private extensions list
|
self._extensions.append(ext) # Use private extensions list
|
||||||
|
|
||||||
# Mock nodriver.start to return a mock browser # type: ignore[attr-defined]
|
# Mock nodriver.start to return a mock browser
|
||||||
mock_browser = AsyncMock()
|
mock_browser = AsyncMock()
|
||||||
mock_browser.websocket_url = "ws://localhost:9222"
|
mock_browser.websocket_url = "ws://localhost:9222"
|
||||||
monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser))
|
monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser))
|
||||||
@@ -557,7 +801,7 @@ class TestWebScrapingBrowserConfiguration:
|
|||||||
def add_extension(self, ext:str) -> None:
|
def add_extension(self, ext:str) -> None:
|
||||||
self.extensions.append(ext)
|
self.extensions.append(ext)
|
||||||
|
|
||||||
# Mock nodriver.start to return a mock browser # type: ignore[attr-defined]
|
# Mock nodriver.start to return a mock browser
|
||||||
mock_browser = AsyncMock()
|
mock_browser = AsyncMock()
|
||||||
mock_browser.websocket_url = "ws://localhost:9222"
|
mock_browser.websocket_url = "ws://localhost:9222"
|
||||||
monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser))
|
monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser))
|
||||||
@@ -576,7 +820,7 @@ class TestWebScrapingBrowserConfiguration:
|
|||||||
await scraper.create_browser_session()
|
await scraper.create_browser_session()
|
||||||
|
|
||||||
# Verify browser arguments
|
# Verify browser arguments
|
||||||
config = cast(Mock, nodriver.start).call_args[0][0] # type: ignore[attr-defined]
|
config = _nodriver_start_mock().call_args[0][0]
|
||||||
assert "--custom-arg=value" in config.browser_args
|
assert "--custom-arg=value" in config.browser_args
|
||||||
assert "--another-arg" in config.browser_args
|
assert "--another-arg" in config.browser_args
|
||||||
assert "--incognito" in config.browser_args
|
assert "--incognito" in config.browser_args
|
||||||
@@ -589,7 +833,7 @@ class TestWebScrapingBrowserConfiguration:
|
|||||||
await scraper.create_browser_session()
|
await scraper.create_browser_session()
|
||||||
|
|
||||||
# Verify Edge-specific arguments
|
# Verify Edge-specific arguments
|
||||||
config = cast(Mock, nodriver.start).call_args[0][0] # type: ignore[attr-defined]
|
config = _nodriver_start_mock().call_args[0][0]
|
||||||
assert "-inprivate" in config.browser_args
|
assert "-inprivate" in config.browser_args
|
||||||
assert os.environ.get("MSEDGEDRIVER_TELEMETRY_OPTOUT") == "1"
|
assert os.environ.get("MSEDGEDRIVER_TELEMETRY_OPTOUT") == "1"
|
||||||
|
|
||||||
@@ -620,7 +864,7 @@ class TestWebScrapingBrowserConfiguration:
|
|||||||
with zipfile.ZipFile(ext2, "w") as z:
|
with zipfile.ZipFile(ext2, "w") as z:
|
||||||
z.writestr("manifest.json", '{"name": "Test Extension 2"}')
|
z.writestr("manifest.json", '{"name": "Test Extension 2"}')
|
||||||
|
|
||||||
# Mock nodriver.start to return a mock browser # type: ignore[attr-defined]
|
# Mock nodriver.start to return a mock browser
|
||||||
mock_browser = AsyncMock()
|
mock_browser = AsyncMock()
|
||||||
mock_browser.websocket_url = "ws://localhost:9222"
|
mock_browser.websocket_url = "ws://localhost:9222"
|
||||||
monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser))
|
monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser))
|
||||||
@@ -644,7 +888,7 @@ class TestWebScrapingBrowserConfiguration:
|
|||||||
await scraper.create_browser_session()
|
await scraper.create_browser_session()
|
||||||
|
|
||||||
# Verify extensions were loaded
|
# Verify extensions were loaded
|
||||||
config = cast(Mock, nodriver.start).call_args[0][0] # type: ignore[attr-defined]
|
config = _nodriver_start_mock().call_args[0][0]
|
||||||
assert len(config._extensions) == 2
|
assert len(config._extensions) == 2
|
||||||
for ext_path in config._extensions:
|
for ext_path in config._extensions:
|
||||||
assert os.path.exists(ext_path)
|
assert os.path.exists(ext_path)
|
||||||
@@ -713,7 +957,7 @@ class TestWebScrapingBrowserConfiguration:
|
|||||||
def add_extension(self, ext:str) -> None:
|
def add_extension(self, ext:str) -> None:
|
||||||
self._extensions.append(ext)
|
self._extensions.append(ext)
|
||||||
|
|
||||||
# Mock nodriver.start to return a mock browser # type: ignore[attr-defined]
|
# Mock nodriver.start to return a mock browser
|
||||||
mock_browser = AsyncMock()
|
mock_browser = AsyncMock()
|
||||||
mock_browser.websocket_url = "ws://localhost:9222"
|
mock_browser.websocket_url = "ws://localhost:9222"
|
||||||
monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser))
|
monkeypatch.setattr(nodriver, "start", AsyncMock(return_value = mock_browser))
|
||||||
@@ -772,7 +1016,7 @@ class TestWebScrapingBrowserConfiguration:
|
|||||||
temp_file = tmp_path / "temp_resource"
|
temp_file = tmp_path / "temp_resource"
|
||||||
temp_file.write_text("test")
|
temp_file.write_text("test")
|
||||||
|
|
||||||
# Mock nodriver.start to raise an exception # type: ignore[attr-defined]
|
# Mock nodriver.start to raise an exception
|
||||||
async def mock_start_fail(*args:object, **kwargs:object) -> NoReturn:
|
async def mock_start_fail(*args:object, **kwargs:object) -> NoReturn:
|
||||||
if temp_file.exists():
|
if temp_file.exists():
|
||||||
temp_file.unlink()
|
temp_file.unlink()
|
||||||
@@ -801,7 +1045,7 @@ class TestWebScrapingBrowserConfiguration:
|
|||||||
assert scraper.browser is None
|
assert scraper.browser is None
|
||||||
assert scraper.page is None
|
assert scraper.page is None
|
||||||
|
|
||||||
# Now patch nodriver.start to return a new mock browser each time # type: ignore[attr-defined]
|
# Now patch nodriver.start to return a new mock browser each time
|
||||||
mock_browser = make_mock_browser()
|
mock_browser = make_mock_browser()
|
||||||
mock_page = TrulyAwaitableMockPage()
|
mock_page = TrulyAwaitableMockPage()
|
||||||
mock_browser.get = AsyncMock(return_value = mock_page)
|
mock_browser.get = AsyncMock(return_value = mock_page)
|
||||||
@@ -1445,6 +1689,46 @@ class TestWebScrapingDiagnostics:
|
|||||||
# Should not raise any exceptions
|
# Should not raise any exceptions
|
||||||
web_scraper.diagnose_browser_issues()
|
web_scraper.diagnose_browser_issues()
|
||||||
|
|
||||||
|
def test_diagnose_browser_issues_handles_per_process_errors(
|
||||||
|
self, scraper_with_config:WebScrapingMixin, caplog:pytest.LogCaptureFixture
|
||||||
|
) -> None:
|
||||||
|
"""diagnose_browser_issues should ignore psutil errors raised per process."""
|
||||||
|
caplog.set_level(logging.INFO)
|
||||||
|
|
||||||
|
class FailingProcess:
|
||||||
|
|
||||||
|
@property
|
||||||
|
def info(self) -> dict[str, object]:
|
||||||
|
raise psutil.AccessDenied(pid = 999)
|
||||||
|
|
||||||
|
with patch("os.path.exists", return_value = True), \
|
||||||
|
patch("os.access", return_value = True), \
|
||||||
|
patch("psutil.process_iter", return_value = [FailingProcess()]), \
|
||||||
|
patch("platform.system", return_value = "Linux"), \
|
||||||
|
patch("kleinanzeigen_bot.utils.web_scraping_mixin._is_admin", return_value = False), \
|
||||||
|
patch.object(scraper_with_config, "_diagnose_chrome_version_issues"):
|
||||||
|
scraper_with_config.browser_config.binary_location = "/usr/bin/chrome"
|
||||||
|
scraper_with_config.diagnose_browser_issues()
|
||||||
|
|
||||||
|
assert "(info) No browser processes currently running" in caplog.text
|
||||||
|
|
||||||
|
def test_diagnose_browser_issues_handles_global_psutil_failure(
|
||||||
|
self, scraper_with_config:WebScrapingMixin, caplog:pytest.LogCaptureFixture
|
||||||
|
) -> None:
|
||||||
|
"""diagnose_browser_issues should log a warning if psutil.process_iter fails entirely."""
|
||||||
|
caplog.set_level(logging.WARNING)
|
||||||
|
|
||||||
|
with patch("os.path.exists", return_value = True), \
|
||||||
|
patch("os.access", return_value = True), \
|
||||||
|
patch("psutil.process_iter", side_effect = psutil.Error("boom")), \
|
||||||
|
patch("platform.system", return_value = "Linux"), \
|
||||||
|
patch("kleinanzeigen_bot.utils.web_scraping_mixin._is_admin", return_value = False), \
|
||||||
|
patch.object(scraper_with_config, "_diagnose_chrome_version_issues"):
|
||||||
|
scraper_with_config.browser_config.binary_location = "/usr/bin/chrome"
|
||||||
|
scraper_with_config.diagnose_browser_issues()
|
||||||
|
|
||||||
|
assert "(warn) Unable to inspect browser processes:" in caplog.text
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
async def test_validate_chrome_version_configuration_port_open_but_api_inaccessible(
|
async def test_validate_chrome_version_configuration_port_open_but_api_inaccessible(
|
||||||
self, web_scraper:WebScrapingMixin
|
self, web_scraper:WebScrapingMixin
|
||||||
|
|||||||
@@ -41,8 +41,11 @@ class TestWebScrapingMixinChromeVersionValidation:
|
|||||||
# Test validation
|
# Test validation
|
||||||
await scraper._validate_chrome_version_configuration()
|
await scraper._validate_chrome_version_configuration()
|
||||||
|
|
||||||
# Verify detection was called correctly
|
# Verify detection was called correctly with timeout
|
||||||
mock_detect.assert_called_once_with("/path/to/chrome")
|
assert mock_detect.call_count == 1
|
||||||
|
args, kwargs = mock_detect.call_args
|
||||||
|
assert args[0] == "/path/to/chrome"
|
||||||
|
assert kwargs["timeout"] == pytest.approx(10.0)
|
||||||
|
|
||||||
# Verify validation passed (no exception raised)
|
# Verify validation passed (no exception raised)
|
||||||
# The validation is now done internally in _validate_chrome_136_configuration
|
# The validation is now done internally in _validate_chrome_136_configuration
|
||||||
@@ -73,7 +76,10 @@ class TestWebScrapingMixinChromeVersionValidation:
|
|||||||
# Test validation should log error but not raise exception due to error handling
|
# Test validation should log error but not raise exception due to error handling
|
||||||
await scraper._validate_chrome_version_configuration()
|
await scraper._validate_chrome_version_configuration()
|
||||||
|
|
||||||
# Verify error was logged
|
# Verify detection call and logged error
|
||||||
|
assert mock_detect.call_count == 1
|
||||||
|
_, kwargs = mock_detect.call_args
|
||||||
|
assert kwargs["timeout"] == pytest.approx(10.0)
|
||||||
assert "Chrome 136+ configuration validation failed" in caplog.text
|
assert "Chrome 136+ configuration validation failed" in caplog.text
|
||||||
assert "Chrome 136+ requires --user-data-dir" in caplog.text
|
assert "Chrome 136+ requires --user-data-dir" in caplog.text
|
||||||
finally:
|
finally:
|
||||||
@@ -104,12 +110,37 @@ class TestWebScrapingMixinChromeVersionValidation:
|
|||||||
await scraper._validate_chrome_version_configuration()
|
await scraper._validate_chrome_version_configuration()
|
||||||
|
|
||||||
# Verify detection was called but no validation
|
# Verify detection was called but no validation
|
||||||
mock_detect.assert_called_once_with("/path/to/chrome")
|
assert mock_detect.call_count == 1
|
||||||
|
_, kwargs = mock_detect.call_args
|
||||||
|
assert kwargs["timeout"] == pytest.approx(10.0)
|
||||||
finally:
|
finally:
|
||||||
# Restore environment
|
# Restore environment
|
||||||
if original_env:
|
if original_env:
|
||||||
os.environ["PYTEST_CURRENT_TEST"] = original_env
|
os.environ["PYTEST_CURRENT_TEST"] = original_env
|
||||||
|
|
||||||
|
@patch("kleinanzeigen_bot.utils.chrome_version_detector.detect_chrome_version_from_binary")
|
||||||
|
@patch("kleinanzeigen_bot.utils.web_scraping_mixin.detect_chrome_version_from_remote_debugging")
|
||||||
|
async def test_validate_chrome_version_logs_remote_detection(
|
||||||
|
self,
|
||||||
|
mock_remote:Mock,
|
||||||
|
mock_binary:Mock,
|
||||||
|
scraper:WebScrapingMixin,
|
||||||
|
caplog:pytest.LogCaptureFixture
|
||||||
|
) -> None:
|
||||||
|
"""When a remote browser responds, the detected version should be logged."""
|
||||||
|
mock_remote.return_value = ChromeVersionInfo("136.0.6778.0", 136, "Chrome")
|
||||||
|
mock_binary.return_value = None
|
||||||
|
scraper.browser_config.arguments = ["--remote-debugging-port=9222"]
|
||||||
|
scraper.browser_config.binary_location = "/path/to/chrome"
|
||||||
|
caplog.set_level("DEBUG")
|
||||||
|
|
||||||
|
with patch.dict(os.environ, {}, clear = True), \
|
||||||
|
patch.object(scraper, "_check_port_with_retry", return_value = True):
|
||||||
|
await scraper._validate_chrome_version_configuration()
|
||||||
|
|
||||||
|
assert "Detected version from existing browser" in caplog.text
|
||||||
|
mock_remote.assert_called_once()
|
||||||
|
|
||||||
@patch("kleinanzeigen_bot.utils.chrome_version_detector.detect_chrome_version_from_binary")
|
@patch("kleinanzeigen_bot.utils.chrome_version_detector.detect_chrome_version_from_binary")
|
||||||
async def test_validate_chrome_version_configuration_no_binary_location(
|
async def test_validate_chrome_version_configuration_no_binary_location(
|
||||||
self, mock_detect:Mock, scraper:WebScrapingMixin
|
self, mock_detect:Mock, scraper:WebScrapingMixin
|
||||||
@@ -145,7 +176,9 @@ class TestWebScrapingMixinChromeVersionValidation:
|
|||||||
await scraper._validate_chrome_version_configuration()
|
await scraper._validate_chrome_version_configuration()
|
||||||
|
|
||||||
# Verify detection was called
|
# Verify detection was called
|
||||||
mock_detect.assert_called_once_with("/path/to/chrome")
|
assert mock_detect.call_count == 1
|
||||||
|
_, kwargs = mock_detect.call_args
|
||||||
|
assert kwargs["timeout"] == pytest.approx(10.0)
|
||||||
|
|
||||||
# Verify debug log message (line 824)
|
# Verify debug log message (line 824)
|
||||||
assert "Could not detect browser version, skipping validation" in caplog.text
|
assert "Could not detect browser version, skipping validation" in caplog.text
|
||||||
@@ -201,10 +234,13 @@ class TestWebScrapingMixinChromeVersionDiagnostics:
|
|||||||
assert "Chrome 136+ detected - security validation required" in caplog.text
|
assert "Chrome 136+ detected - security validation required" in caplog.text
|
||||||
|
|
||||||
# Verify mocks were called
|
# Verify mocks were called
|
||||||
mock_get_diagnostic.assert_called_once_with(
|
assert mock_get_diagnostic.call_count == 1
|
||||||
binary_path = "/path/to/chrome",
|
kwargs = mock_get_diagnostic.call_args.kwargs
|
||||||
remote_port = 9222
|
assert kwargs["binary_path"] == "/path/to/chrome"
|
||||||
)
|
assert kwargs["remote_port"] == 9222
|
||||||
|
assert kwargs["remote_host"] == "127.0.0.1"
|
||||||
|
assert kwargs["remote_timeout"] > 0
|
||||||
|
assert kwargs["binary_timeout"] > 0
|
||||||
finally:
|
finally:
|
||||||
# Restore environment
|
# Restore environment
|
||||||
if original_env:
|
if original_env:
|
||||||
@@ -364,10 +400,12 @@ class TestWebScrapingMixinChromeVersionDiagnostics:
|
|||||||
assert "Chrome pre-136 detected - no special security requirements" in caplog.text
|
assert "Chrome pre-136 detected - no special security requirements" in caplog.text
|
||||||
|
|
||||||
# Verify that the diagnostic function was called with correct parameters
|
# Verify that the diagnostic function was called with correct parameters
|
||||||
mock_get_diagnostic.assert_called_once_with(
|
assert mock_get_diagnostic.call_count == 1
|
||||||
binary_path = "/path/to/chrome",
|
kwargs = mock_get_diagnostic.call_args.kwargs
|
||||||
remote_port = None
|
assert kwargs["binary_path"] == "/path/to/chrome"
|
||||||
)
|
assert kwargs["remote_port"] is None
|
||||||
|
assert kwargs["remote_timeout"] > 0
|
||||||
|
assert kwargs["binary_timeout"] > 0
|
||||||
finally:
|
finally:
|
||||||
# Restore environment
|
# Restore environment
|
||||||
if original_env:
|
if original_env:
|
||||||
|
|||||||
Reference in New Issue
Block a user