mirror of
https://github.com/Second-Hand-Friends/kleinanzeigen-bot.git
synced 2026-03-12 10:31:50 +01:00
fix: use native page xpath api for xpath selectors (#853)
## ℹ️ Description *Provide a concise summary of the changes introduced in this pull request.* - Link to the related issue(s): n/a - Describe the motivation and context for this change. This replaces the stacked XPath work from #845 with a standalone fix from `main`. It makes `By.XPATH` use the native page XPath API instead of routing XPath selectors through text lookup. ## 📋 Changes Summary - Add private XPath helpers in `WebScrapingMixin` for first-match and all-match lookups. - Route `By.XPATH` in `_web_find_once()` and `_web_find_all_once()` through `page.xpath(...)`. - Add unit coverage for XPath helper behavior, empty results, and unsupported parent scoping. - No configuration changes or new dependencies. ### ⚙️ Type of Change Select the type(s) of change(s) included in this pull request: - [x] 🐞 Bug fix (non-breaking change which fixes an issue) - [ ] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist Before requesting a review, confirm the following: - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactoring** * Improved web scraping element selection reliability through streamlined XPath operations and better internal helper methods. * **Tests** * Added comprehensive unit tests for XPath-based element lookup operations to ensure consistent behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
This commit is contained in:
@@ -1009,6 +1009,17 @@ class WebScrapingMixin:
|
||||
# Return primitive values as-is
|
||||
return data
|
||||
|
||||
async def _xpath_first(self, selector_value:str) -> Element | None:
|
||||
matches = await self.page.xpath(selector_value, timeout = 0)
|
||||
for match in matches:
|
||||
if match is not None:
|
||||
return cast(Element, match)
|
||||
return None
|
||||
|
||||
async def _xpath_all(self, selector_value:str) -> list[Element]:
|
||||
matches = await self.page.xpath(selector_value, timeout = 0)
|
||||
return [cast(Element, match) for match in matches if match is not None]
|
||||
|
||||
async def web_find(self, selector_type:By, selector_value:str, *, parent:Element | None = None, timeout:int | float | None = None) -> Element:
|
||||
"""
|
||||
Locates an HTML element by the given selector type and value.
|
||||
@@ -1084,7 +1095,7 @@ class WebScrapingMixin:
|
||||
case By.XPATH:
|
||||
ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}")
|
||||
return await self.web_await(
|
||||
lambda: self.page.find_element_by_text(selector_value, best_match = True),
|
||||
lambda: self._xpath_first(selector_value),
|
||||
timeout = timeout,
|
||||
timeout_error_message = f"No HTML element found using XPath '{selector_value}'{timeout_suffix}",
|
||||
apply_multiplier = False,
|
||||
@@ -1129,7 +1140,7 @@ class WebScrapingMixin:
|
||||
case By.XPATH:
|
||||
ensure(not parent, f"Specifying a parent element currently not supported with selector type: {selector_type}")
|
||||
return await self.web_await(
|
||||
lambda: self.page.find_elements_by_text(selector_value),
|
||||
lambda: self._xpath_all(selector_value),
|
||||
timeout = timeout,
|
||||
timeout_error_message = f"No HTML elements found using XPath '{selector_value}'{timeout_suffix}",
|
||||
apply_multiplier = False,
|
||||
|
||||
Reference in New Issue
Block a user