kleinanzeigen-bot

mirror of https://github.com/Second-Hand-Friends/kleinanzeigen-bot.git synced 2026-03-12 02:31:45 +01:00

Author	SHA1	Message	Date
Jens	71028ea844	fix: serialize downloaded ad timestamps as schema-compliant strings (#863 ) ## ℹ️ Description - Link to the related issue(s): Issue # - Fixes drift where `pdm run app download` wrote timestamp values in YAML-native datetime form that could violate `schemas/ad.schema.json` string expectations. - Ensures downloaded ads persist `created_on`/`updated_on` as JSON-serialized ISO-8601 strings and adds a regression test validating written YAML against the schema. ## 📋 Changes Summary - Updated downloader save path to use `ad_cfg.model_dump(mode = \"json\")` before writing YAML in `src/kleinanzeigen_bot/extract.py`. - Updated existing `download_ad` unit assertion to match JSON-mode serialization. - Added `test_download_ad_writes_schema_compliant_yaml` in `tests/unit/test_extract.py` that writes a real tmp YAML file and validates it against `schemas/ad.schema.json` with `jsonschema`. - Added dev dependency `jsonschema>=4.26.0` (and lockfile updates). - Dependencies/config updates introduced: new dev dependency (`jsonschema`) for full schema validation in tests. ### ⚙️ Type of Change - [x] 🐞 Bug fix (non-breaking change which fixes an issue) - [ ] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit # Release Notes * Bug Fixes * Improved ad data serialization to ensure consistent JSON format when saving ad configurations. * Tests * Added schema validation tests to verify ad YAML output compliance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-03-08 23:10:16 +01:00
Jens	c4a2d1c4f5	fix: continue own-ad extraction when links are incomplete (#854 )	2026-03-02 06:05:21 +01:00
Jens	4282b05ff3	fix: add explicit workspace mode resolution for --config (#818 )	2026-02-11 05:35:41 +01:00
Jens	a8051c3814	feat: cache published ads data to avoid repetitive API calls during ad download (#809 )	2026-02-03 14:51:59 +01:00
Jens	96f465d5bc	fix: JSON API Pagination for >25 Ads (#797 ) ## ℹ️ Description Provide a concise summary of the changes introduced in this pull request. - Link to the related issue(s): Closes #789 (completes the fix started in #793) - Motivation: Fix JSON API pagination for accounts with >25 ads. Aligns pagination logic with weidi’s approach (starts at page 1), while hardening error handling and tests. Based on https://github.com/weidi/kleinanzeigen-bot/pull/1. ## 📋 Changes Summary - Added pagination helper to fetch all published ads and use it in delete/extend/publish/update flows - Added robust handling for malformed JSON payloads and unexpected ads types (with translated warnings) - Improved sell_directly extraction with pagination, bounds checks, and shared coercion helper - Added/updated tests for pagination and edge cases; updated assertions to pytest.fail style ### ⚙️ Type of Change Select the type(s) of change(s) included in this pull request: - [x] 🐞 Bug fix (non-breaking change which fixes an issue) - [ ] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist Before requesting a review, confirm the following: - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test:cov:unified`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Reliable multi-page fetching for published ads and buy-now eligibility checks. * Bug Fixes * Safer pagination with per-page JSON handling, limits and improved termination diagnostics; ensures pageNum is used when needed. * Tests * New comprehensive pagination tests and updates to existing tests to reflect multi-page behavior. * Chores * Added a utility to safely coerce page numbers; minor utility signature cleanup. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-01-31 22:17:37 +01:00
Jens	7098719d5b	fix: extend command fails with >25 ads due to pagination (#793 )	2026-01-28 06:08:03 +01:00
Jens	6cc17f869c	fix: keep shipping_type SHIPPING for individual postage (#785 )	2026-01-24 15:31:22 +01:00
Jens	eda1b4d0ec	feat: add browser profile XDG support and documentation (#777 )	2026-01-23 22:45:22 +01:00
Jens	e8cf10101d	feat: integrate XDG paths into bot core (#776 ) ## ℹ️ Description Wire XDG path resolution into main bot components. - Link to the related issue(s): N/A (new feature) - Integrates installation mode detection into bot core ## 📋 Changes Summary - Added `finalize_installation_mode()` method for mode detection - UpdateChecker, AdExtractor now respect installation mode - Dynamic browser profile defaults (resolved at runtime) - German translations for installation mode messages - Comprehensive tests for installation mode integration Part 2 of 3 for XDG support - Depends on: PR #775 (must be merged first) - Will rebase on main after merge of previous PR ### ⚙️ Type of Change - [x] ✨ New feature (adds new functionality without breaking existing usage) ## ✅ Checklist - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Support for portable and XDG (system-wide) installation modes with automatic detection and interactive first-run setup. * Config and paths standardized so app stores config, downloads, logs, and browser profiles in appropriate locations per mode. * Update checker improved for more reliable version/commit detection. * Chores * Moved dependency to runtime: platformdirs added to main dependencies. * Tests * Added comprehensive tests for installation modes, path utilities, and related behaviors. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-01-23 07:36:10 +01:00
Jens	183c01078e	fix: correct sell_directly extraction using JSON API (#765 )	2026-01-17 16:34:31 +01:00
Heavenfighter	066ecc87b8	fix: take care of changed belen_conf keys (#758 ) ## ℹ️ Description This PR takes care of the changed belen_conf dictionary. So extracting special attributes and third category will work again. - Link to the related issue(s): Issue #757 ## 📋 Changes Summary - changed belen_conf keys from "dimension108" to "ad_attributes" and "dimension92" to "l3_category_id" ### ⚙️ Type of Change Select the type(s) of change(s) included in this pull request: - [x] 🐞 Bug fix (non-breaking change which fixes an issue) - [ ] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist Before requesting a review, confirm the following: - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Chores * Updated internal data extraction sources for category and attribute information to align with current analytics configuration. * Updated test suite to reflect configuration changes. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Jens <1742418+1cu@users.noreply.github.com>	2026-01-08 22:16:46 +01:00
Jens	ba9b14b71b	fix: address codeql notes and warnings (#740 )	2025-12-20 18:17:51 +01:00
Jens	0b995fae18	fix: handle Unicode normalization in save_dict for umlauts (#728 ) (#729 )	2025-12-15 20:46:10 +01:00
Jens	220c01f257	fix: eliminate async safety violations and migrate to pathlib (#697 ) ## ℹ️ Description Eliminate all blocking I/O operations in async contexts and modernize file path handling by migrating from os.path to pathlib.Path. - Link to the related issue(s): #692 - Get rid of the TODO in pyproject.toml - The added debug logging will ease the troubleshooting for path related issues. ## 📋 Changes Summary - Enable ASYNC210, ASYNC230, ASYNC240, ASYNC250 Ruff rules - Wrap blocking urllib.request.urlopen() in run_in_executor - Wrap blocking file operations (open, write) in run_in_executor - Replace blocking os.path calls with async helpers using run_in_executor - Replace blocking input() with await ainput() - Migrate extract.py from os.path to pathlib.Path - Use Path() constructor and / operator for path joining - Use Path.mkdir(), Path.rename() in executor instead of os functions - Create mockable _path_exists() and _path_is_dir() helpers - Add debug logging for all file system operations ### ⚙️ Type of Change Select the type(s) of change(s) included in this pull request: - [X] 🐞 Bug fix (non-breaking change which fixes an issue) - [ ] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist Before requesting a review, confirm the following: - [X] I have reviewed my changes to ensure they meet the project's standards. - [X] I have tested my changes and ensured that all tests pass (`pdm run test`). - [X] I have formatted the code (`pdm run format`). - [X] I have verified that linting passes (`pdm run lint`). - [X] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Refactor * Made user prompt non‑blocking to improve responsiveness. * Converted filesystem/path handling and prefs I/O to async‑friendly operations; moved blocking network and file work to background tasks. * Added async file/path helpers and async port‑check before browser connections. * Tests * Expanded unit tests for path helpers, image download success/failure, prefs writing, and directory creation/renaming workflows. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-12-05 20:53:40 +01:00
Jens	6c2cba50fa	fix: Handle missing dimension108 in special attributes extraction (#706 )	2025-12-04 14:01:11 +01:00
Jens	a3ac27c441	feat: add configurable timeouts (#673 ) ## ℹ️ Description - Related issues: #671, #658 - Introduces configurable timeout controls plus retry/backoff handling for flaky DOM operations. We often see timeouts which are note reproducible in certain configurations. I suspect timeout issues based on a combination of internet speed, browser, os, age of the computer and the weather. This PR introduces a comprehensive config model to tweak timeouts. ## 📋 Changes Summary - add TimeoutConfig to the main config/schema and expose timeouts in README/docs - wire WebScrapingMixin, extractor, update checker, and browser diagnostics to honor the configurable timeouts and retries - update translations/tests to cover the new behaviour and ensure lint/mypy/pyright pipelines remain green ### ⚙️ Type of Change - [ ] 🐞 Bug fix (non-breaking change which fixes an issue) - [x] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Centralized, configurable timeout system for web interactions, detection flows, publishing, and pagination. * Optional retry with exponential backoff for operations that time out. * Improvements * Replaced fixed wait times with dynamic timeouts throughout workflows. * More informative timeout-related messages and diagnostics. * Tests * New and expanded test coverage for timeout behavior, pagination, diagnostics, and retry logic. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-11-13 15:08:52 +01:00
Jens	e76abc66e8	fix: harden category extraction breadcrumb parsing (#668 ) ## ℹ️ Description - Link to the related issue(s): Issue #667 - Harden breadcrumb category extraction so downloads no longer fail when the breadcrumb structure changes. ## 📋 Changes Summary - Parse breadcrumb anchors dynamically and fall back with debug logging when legacy selectors are needed. - Added unit coverage for multi-anchor, single-anchor, and fallback scenarios to keep diff coverage above 80%. - Documented required lint/format/test steps in PR checklist; no new dependencies. ### ⚙️ Type of Change - [x] 🐞 Bug fix (non-breaking change which fixes an issue) - [ ] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Improved category extraction accuracy with enhanced breadcrumb parsing. * Better handling for listings with a single breadcrumb (returns stable category identifier). * More resilient fallback when breadcrumb data is missing or malformed. * Safer normalization of category identifiers to avoid incorrect parsing across site variations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-10-28 15:10:01 +01:00
Jens	7b4b7907d0	feat: cleanup test structure and remove BelenConf testing (#639 )	2025-10-14 09:50:50 +02:00
Jens	36ca178574	feat: upgrade nodriver from 0.39 to 0.47 (#635 ) ## ℹ️ Description Upgrade nodriver dependency from pinned version 0.39.0 to latest 0.47.0 to resolve browser startup issues and JavaScript evaluation problems that affected versions 0.40-0.44. - Link to the related issue(s): Resolves nodriver compatibility issues - This upgrade addresses browser startup problems and window.BelenConf evaluation failures that were blocking the use of newer nodriver versions. ## 📋 Changes Summary - Updated nodriver dependency from pinned 0.39.0 to >=0.47.0 in pyproject.toml - Fixed RemoteObject handling in web_execute method for nodriver 0.47 compatibility - Added comprehensive BelenConf test fixture with real production data structure - Added integration test to validate window.BelenConf evaluation works correctly - Added German translation for new error message - Replaced real user data with privacy-safe dummy data in test fixtures ### 🔧 Type Safety Improvements Added explicit `str()` conversions to resolve type inference issues: The comprehensive BelenConf test fixture contains deeply nested data structures that caused pyright's type checker to infer complex dictionary types throughout the codebase. To ensure type safety and prevent runtime errors, I added explicit `str()` conversions in key locations: - CSRF tokens: `str(csrf_token)` - Ensures CSRF tokens are treated as strings - Special attributes: `str(special_attribute_value)` - Converts special attribute values to strings - DOM attributes: `str(special_attr_elem.attrs.id)` - Ensures element IDs are strings - URL handling: `str(current_img_url)` and `str(href_attributes)` - Converts URLs and href attributes to strings - Price values: `str(ad_cfg.price)` - Ensures price values are strings These conversions are defensive programming measures that ensure backward compatibility and prevent type-related runtime errors, even if the underlying data structures change in the future. ### ⚙️ Type of Change - [x] ✨ New feature (adds new functionality without breaking existing usage) - [ ] 🐞 Bug fix (non-breaking change which fixes an issue) - [ ] 💥 Breaking change (changes that might break existing user setups, scripts, or configurations) ## ✅ Checklist Before requesting a review, confirm the following: - [x] I have reviewed my changes to ensure they meet the project's standards. - [x] I have tested my changes and ensured that all tests pass (`pdm run test`). - [x] I have formatted the code (`pdm run format`). - [x] I have verified that linting passes (`pdm run lint`). - [x] I have updated documentation where necessary. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.	2025-10-12 21:22:46 +02:00
Jens Bergmann	df24a675a9	fix: resolve #612 FileNotFoundError and improve ad download architecture (#613 )	2025-08-17 17:49:00 +02:00
Jens Bergmann	91a40b0116	feat: enhanced folder naming (#599 )	2025-08-12 10:43:26 +02:00
Heavenfighter	6b29b9d314	fix: "No HTML element found using CSS selector" during ad download (#594 )	2025-08-06 15:15:11 +02:00
Jens Bergmann	c3499b3824	feat: add version to banner (#560 )	2025-06-22 21:11:13 +02:00
Heavenfighter	0305a10eae	Refactored category and special attribute (#550 )	2025-06-12 14:08:06 +02:00
sebthom	6ede14596d	feat: add type safe Ad model	2025-05-15 12:07:49 +02:00
sebthom	1369da1c34	feat: add type safe Config model	2025-05-15 12:07:49 +02:00
Heavenfighter	0faa022e4d	fix: Unable to download single ad (#509 )	2025-05-14 11:24:16 +02:00
Benedikt	8b2d61b1d4	fix: improve login detection with fallback element (#493 ) - Add fallback check for user-email element when mr-medium is not found - Improve login detection reliability - Add test case for alternative login element	2025-04-30 17:50:58 +02:00
Benedikt	9bcc669c48	feat: add support for multiple matching shipping options (#483 )	2025-04-29 21:02:09 +02:00
sebthom	bda0acf943	refact: enable ruff preview rules	2025-04-28 13:17:23 +02:00
sebthom	ef923a8337	refact: apply consistent formatting	2025-04-28 12:55:28 +02:00
sebthom	376ec76226	refact: use ruff instead of autopep8,bandit,pylint for linting	2025-04-28 12:51:51 +02:00
marvinkcode	79af6ba861	fix: Correct pagination selectors and logic for issue #477 (#479 )	2025-04-21 20:26:02 +02:00
Jens Bergmann	4051620aed	enh: allow per-ad overriding of global description affixes (#416 )	2025-02-11 23:39:26 +01:00
sebthom	2402ba2572	refact: reorganize utility modules	2025-02-10 06:23:17 +01:00
Jens Bergmann	affde0debf	test: Enhance test coverage for KleinanzeigenBot initialization and core functionality (#408 )	2025-02-09 03:33:01 +01:00
1cu	f4f00b9563	test: Add comprehensive test suite for extract.py (#400 )	2025-02-05 23:35:45 +01:00

37 Commits