mirror of
https://github.com/Second-Hand-Friends/kleinanzeigen-bot.git
synced 2026-03-12 02:31:45 +01:00
fix: harden category extraction breadcrumb parsing (#668)
## ℹ️ Description - Link to the related issue(s): Issue #667 - Harden breadcrumb category extraction so downloads no longer fail when the breadcrumb structure changes. ## 📋 Changes Summary - Parse breadcrumb anchors dynamically and fall back with debug logging when legacy selectors are needed. - Added unit coverage for multi-anchor, single-anchor, and fallback scenarios to keep diff coverage above 80%. - Documented required lint/format/test steps in PR checklist; no new dependencies. ### ⚙️ Type of Change - [x] 🐞 Bug fix (non-breaking change which fixes an issue) - [ ] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved category extraction accuracy with enhanced breadcrumb parsing. * Better handling for listings with a single breadcrumb (returns stable category identifier). * More resilient fallback when breadcrumb data is missing or malformed. * Safer normalization of category identifiers to avoid incorrect parsing across site variations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
This commit is contained in:
@@ -216,6 +216,10 @@ kleinanzeigen_bot/extract.py:
|
||||
_extract_contact_from_ad_page:
|
||||
"No street given in the contact.": "Keine Straße in den Kontaktdaten angegeben."
|
||||
|
||||
_extract_category_from_ad_page:
|
||||
"Breadcrumb container 'vap-brdcrmb' not found; cannot extract ad category: %s": "Breadcrumb-Container 'vap-brdcrmb' nicht gefunden; kann Anzeigenkategorie nicht extrahieren: %s"
|
||||
"Falling back to legacy breadcrumb selectors; collected ids: %s": "Weiche auf ältere Breadcrumb-Selektoren aus; gesammelte IDs: %s"
|
||||
|
||||
#################################################
|
||||
kleinanzeigen_bot/utils/i18n.py:
|
||||
#################################################
|
||||
|
||||
Reference in New Issue
Block a user