Commit Graph

47 Commits

Author SHA1 Message Date
Jens
a3ac27c441 feat: add configurable timeouts (#673)
## ℹ️ Description
- Related issues: #671, #658
- Introduces configurable timeout controls plus retry/backoff handling
for flaky DOM operations.

We often see timeouts which are note reproducible in certain
configurations. I suspect timeout issues based on a combination of
internet speed, browser, os, age of the computer and the weather.

This PR introduces a comprehensive config model to tweak timeouts.

## 📋 Changes Summary
- add TimeoutConfig to the main config/schema and expose timeouts in
README/docs
- wire WebScrapingMixin, extractor, update checker, and browser
diagnostics to honor the configurable timeouts and retries
- update translations/tests to cover the new behaviour and ensure
lint/mypy/pyright pipelines remain green

### ⚙️ Type of Change
- [ ] 🐞 Bug fix (non-breaking change which fixes an issue)
- [x]  New feature (adds new functionality without breaking existing
usage)
- [ ] 💥 Breaking change (changes that might break existing user setups,
scripts, or configurations)

##  Checklist
- [x] I have reviewed my changes to ensure they meet the project's
standards.
- [x] I have tested my changes and ensured that all tests pass (`pdm run
test`).
- [x] I have formatted the code (`pdm run format`).
- [x] I have verified that linting passes (`pdm run lint`).
- [x] I have updated documentation where necessary.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Centralized, configurable timeout system for web interactions,
detection flows, publishing, and pagination.
* Optional retry with exponential backoff for operations that time out.

* **Improvements**
* Replaced fixed wait times with dynamic timeouts throughout workflows.
  * More informative timeout-related messages and diagnostics.

* **Tests**
* New and expanded test coverage for timeout behavior, pagination,
diagnostics, and retry logic.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-11-13 15:08:52 +01:00
Jens
e76abc66e8 fix: harden category extraction breadcrumb parsing (#668)
## ℹ️ Description
- Link to the related issue(s): Issue #667
- Harden breadcrumb category extraction so downloads no longer fail when
the breadcrumb structure changes.

## 📋 Changes Summary
- Parse breadcrumb anchors dynamically and fall back with debug logging
when legacy selectors are needed.
- Added unit coverage for multi-anchor, single-anchor, and fallback
scenarios to keep diff coverage above 80%.
- Documented required lint/format/test steps in PR checklist; no new
dependencies.

### ⚙️ Type of Change
- [x] 🐞 Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (adds new functionality without breaking existing
usage)
- [ ] 💥 Breaking change (changes that might break existing user setups,
scripts, or configurations)

##  Checklist
- [x] I have reviewed my changes to ensure they meet the project's
standards.
- [x] I have tested my changes and ensured that all tests pass (`pdm run
test`).
- [x] I have formatted the code (`pdm run format`).
- [x] I have verified that linting passes (`pdm run lint`).
- [x] I have updated documentation where necessary.

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved category extraction accuracy with enhanced breadcrumb
parsing.
* Better handling for listings with a single breadcrumb (returns stable
category identifier).
* More resilient fallback when breadcrumb data is missing or malformed.
* Safer normalization of category identifiers to avoid incorrect parsing
across site variations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-10-28 15:10:01 +01:00
Jens
36ca178574 feat: upgrade nodriver from 0.39 to 0.47 (#635)
## ℹ️ Description
Upgrade nodriver dependency from pinned version 0.39.0 to latest 0.47.0
to resolve browser startup issues and JavaScript evaluation problems
that affected versions 0.40-0.44.

- Link to the related issue(s): Resolves nodriver compatibility issues
- This upgrade addresses browser startup problems and window.BelenConf
evaluation failures that were blocking the use of newer nodriver
versions.

## 📋 Changes Summary

- Updated nodriver dependency from pinned 0.39.0 to >=0.47.0 in
pyproject.toml
- Fixed RemoteObject handling in web_execute method for nodriver 0.47
compatibility
- Added comprehensive BelenConf test fixture with real production data
structure
- Added integration test to validate window.BelenConf evaluation works
correctly
- Added German translation for new error message
- Replaced real user data with privacy-safe dummy data in test fixtures

### 🔧 Type Safety Improvements

**Added explicit `str()` conversions to resolve type inference issues:**

The comprehensive BelenConf test fixture contains deeply nested data
structures that caused pyright's type checker to infer complex
dictionary types throughout the codebase. To ensure type safety and
prevent runtime errors, I added explicit `str()` conversions in key
locations:

- **CSRF tokens**: `str(csrf_token)` - Ensures CSRF tokens are treated
as strings
- **Special attributes**: `str(special_attribute_value)` - Converts
special attribute values to strings
- **DOM attributes**: `str(special_attr_elem.attrs.id)` - Ensures
element IDs are strings
- **URL handling**: `str(current_img_url)` and `str(href_attributes)` -
Converts URLs and href attributes to strings
- **Price values**: `str(ad_cfg.price)` - Ensures price values are
strings

These conversions are defensive programming measures that ensure
backward compatibility and prevent type-related runtime errors, even if
the underlying data structures change in the future.

### ⚙️ Type of Change
- [x]  New feature (adds new functionality without breaking existing
usage)
- [ ] 🐞 Bug fix (non-breaking change which fixes an issue)
- [ ] 💥 Breaking change (changes that might break existing user setups,
scripts, or configurations)

##  Checklist
Before requesting a review, confirm the following:
- [x] I have reviewed my changes to ensure they meet the project's
standards.
- [x] I have tested my changes and ensured that all tests pass (`pdm run
test`).
- [x] I have formatted the code (`pdm run format`).
- [x] I have verified that linting passes (`pdm run lint`).
- [x] I have updated documentation where necessary.

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
2025-10-12 21:22:46 +02:00
Jens Bergmann
df24a675a9 fix: resolve #612 FileNotFoundError and improve ad download architecture (#613) 2025-08-17 17:49:00 +02:00
Jens Bergmann
91a40b0116 feat: enhanced folder naming (#599) 2025-08-12 10:43:26 +02:00
Heavenfighter
6b29b9d314 fix: "No HTML element found using CSS selector" during ad download (#594) 2025-08-06 15:15:11 +02:00
Jeppy
15b3698114 fix: dimension92 may not be defined in universalAnalyticsOpts (#555) 2025-06-16 12:46:13 +02:00
Heavenfighter
0305a10eae Refactored category and special attribute (#550) 2025-06-12 14:08:06 +02:00
Heavenfighter
4d48427234 fix: detect payment form and wait or user input (#520)
Co-authored-by: Jens Bergmann <1742418+1cu@users.noreply.github.com>
2025-06-10 15:51:59 +02:00
sebthom
3978d85cb4 fix: ruff PLC0207 missing-maxsplit-arg 2025-06-09 20:58:04 +02:00
Heavenfighter
347c67a388 fixes #512 (#519)
Refactored images extraction. Now directly using galleryimage-elements instead of carousel.
2025-05-25 22:28:20 +02:00
sebthom
85a5cf5224 feat: improve content_hash calculation 2025-05-15 12:07:49 +02:00
sebthom
6ede14596d feat: add type safe Ad model 2025-05-15 12:07:49 +02:00
sebthom
1369da1c34 feat: add type safe Config model 2025-05-15 12:07:49 +02:00
Heavenfighter
0faa022e4d fix: Unable to download single ad (#509) 2025-05-14 11:24:16 +02:00
Benedikt
9bcc669c48 feat: add support for multiple matching shipping options (#483) 2025-04-29 21:02:09 +02:00
sebthom
bda0acf943 refact: enable ruff preview rules 2025-04-28 13:17:23 +02:00
sebthom
ef923a8337 refact: apply consistent formatting 2025-04-28 12:55:28 +02:00
sebthom
376ec76226 refact: use ruff instead of autopep8,bandit,pylint for linting 2025-04-28 12:51:51 +02:00
sebthom
7b0774874e fix: harden extract_ad_id_from_ad_url 2025-04-27 14:23:56 +02:00
marvinkcode
79af6ba861 fix: Correct pagination selectors and logic for issue #477 (#479) 2025-04-21 20:26:02 +02:00
Heavenfighter
20f3f87864 fixes #475 CSS selector 'button' not found
Element button was changed to em.
2025-04-18 13:44:00 +02:00
Jens Bergmann
4051620aed enh: allow per-ad overriding of global description affixes (#416) 2025-02-11 23:39:26 +01:00
Heavenfighter
820ae8966e fix: download all ads not working anymore #420 (#421)
renamed h2 to h3
2025-02-11 12:33:32 -06:00
sebthom
2402ba2572 refact: reorganize utility modules 2025-02-10 06:23:17 +01:00
1cu
f01109c956 feat: add hash-based ad change detection (#343) (#388)
Co-authored-by: sebthom <sebthom@users.noreply.github.com>
2025-01-26 23:37:33 +01:00
Heavenfighter
ca876e628b fix shipping options when downloading. Fixes #375 (#376) 2025-01-10 16:05:11 +01:00
Heavenfighter
f9eb6185c7 fix: failed to set special attributes #334 (#370) 2025-01-09 17:01:48 +01:00
sebthom
9d54a949e7 feat: add multi-language support 2024-12-27 13:04:30 +01:00
sebthom
26f05b5506 fix: category value incomplete when downloading ads 2024-11-25 00:03:48 +01:00
sebthom
a419c48805 refact: remove redundant comments 2024-11-22 12:30:50 +01:00
sebthom
6a315c97ce feat: remove default prefix/suffix text from downloaded ads 2024-11-21 23:28:13 +01:00
sebthom
735e564c76 fix: save location #296 2024-11-21 22:53:49 +01:00
sebthom
86c3aeea85 fix: downloaded images have wrong file extension #348 2024-11-21 22:53:35 +01:00
Saghalt
b9e1f8c327 fix: ValueError when downloading ads without special_attributes (#330) 2024-09-02 20:55:21 +02:00
Jeppy
71eb632191 FIX extract special attributes from ad page
Format of special attribute changed to "key:value|key:value".
Instead of transforming the string to JSON, directly create a dictionary from belen_conf.
2024-07-23 11:42:41 +02:00
Saghalt
eab9874bdb fix: special attributes cannot be parsed as JSON #312 2024-06-11 10:55:03 +02:00
Jeppy
b30867ca48 FIX extract sell directly from ad page
Web element with id `j-buy-now` does not exist anymore. Fetch the `payment-buttons-sidebar` instead and check the text for `Direkt kaufen`
2024-05-30 19:26:37 +02:00
Kjell Knudsen
ba73ebb393 fix navigation button selector 2024-05-11 15:49:03 +02:00
Tobias Faber
2c7d165b6e Fix download on given IDs list 2024-04-01 23:03:27 +02:00
Tobias Faber
114afb6a73 fix: download of shipping info. Fixes #282 (#286) 2024-03-29 14:45:21 +01:00
Tobias Faber
db465af9b7 Fix VB Price with thousand separator 2024-03-29 13:39:09 +01:00
SphaeroX
5c8e00df52 fix: No HTML element found with ID 'my-manageads-adlist' (#284) 2024-03-28 19:45:42 +01:00
sebthom
7c982ad502 fix: don't hardcode republication_interval. Fixes #271 2024-03-14 12:51:19 +01:00
Samuel
d7fec9e4ce Fix: Crash on downloading ads with prices >=1000 Eur (#267)
Co-authored-by: Sebastian Thomschke <sebthom@users.noreply.github.com>
2024-03-08 12:06:47 +01:00
sebthom
a441c5de73 replace selenium with nodriver 2024-03-07 20:33:23 +01:00
sebthom
9caa7a7124 use venv 2024-03-04 10:07:47 +01:00