Files
kleinanzeigen-bot/docs/CONFIGURATION.md

411 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Configuration Reference
Complete reference for `config.yaml`, the main configuration file for kleinanzeigen-bot.
## Quick Start
To generate a default configuration file with all current defaults:
```bash
kleinanzeigen-bot create-config
```
For full JSON schema with IDE autocompletion support, see:
- [schemas/config.schema.json](https://raw.githubusercontent.com/Second-Hand-Friends/kleinanzeigen-bot/main/schemas/config.schema.json)
To enable IDE autocompletion in `config.yaml`, add this at the top of the file:
```yaml
# yaml-language-server: $schema=https://raw.githubusercontent.com/Second-Hand-Friends/kleinanzeigen-bot/main/schemas/config.schema.json
```
For ad files, use the ad schema instead:
```yaml
# yaml-language-server: $schema=https://raw.githubusercontent.com/Second-Hand-Friends/kleinanzeigen-bot/main/schemas/ad.schema.json
```
## Minimal Configuration Example
Here's the smallest viable `config.yaml` to get started. Only the **login** section is required—everything else uses sensible defaults:
```yaml
# yaml-language-server: $schema=https://raw.githubusercontent.com/Second-Hand-Friends/kleinanzeigen-bot/main/schemas/config.schema.json
# REQUIRED: Your kleinanzeigen.de credentials
login:
username: "your_username"
password: "your_password"
# OPTIONAL: Where to find your ad files (default pattern shown)
# ad_files:
# - "./**/ad_*.{json,yml,yaml}"
# OPTIONAL: Default values for all ads
# ad_defaults:
# price_type: NEGOTIABLE
# shipping_type: SHIPPING
# republication_interval: 7
```
Run `kleinanzeigen-bot create-config` to generate a complete configuration with all available options and their default values.
The `ad_files` setting controls where the bot looks for your ad YAML files (default pattern: `./**/ad_*.{json,yml,yaml}`). The `ad_defaults` section lets you set default values that apply to all ads—things like price type, shipping options, and republication interval.
📖 **[Complete Ad Configuration Reference →](AD_CONFIGURATION.md)**
Full documentation for ad YAML files including automatic price reduction, description prefix/suffix, shipping options, category IDs, and special attributes.
## File Location
The bot looks for `config.yaml` in the current directory by default. You can specify a different location using `--config`:
```bash
kleinanzeigen-bot --config /path/to/config.yaml publish
```
`--config` selects the configuration file only. Workspace behavior is controlled by installation mode (`portable` or `xdg`) and can be overridden via `--workspace-mode=portable|xdg` (see [Installation Modes](#installation-modes)).
Valid file extensions: `.json`, `.yaml`, `.yml`
## Configuration Structure
### ad_files
Glob (wildcard) patterns to select ad configuration files. If relative paths are specified, they are relative to this configuration file.
```yaml
ad_files:
- "./**/ad_*.{json,yml,yaml}"
```
### ad_defaults
Default values for ads that can be overridden in each ad configuration file.
```yaml
ad_defaults:
active: true
type: OFFER # one of: OFFER, WANTED
description_prefix: ""
description_suffix: ""
price_type: NEGOTIABLE # one of: FIXED, NEGOTIABLE, GIVE_AWAY, NOT_APPLICABLE
shipping_type: SHIPPING # one of: PICKUP, SHIPPING, NOT_APPLICABLE
# NOTE: shipping_costs and shipping_options must be configured per-ad, not as defaults
sell_directly: false # requires shipping_type SHIPPING to take effect
contact:
name: ""
street: ""
zipcode: ""
phone: "" # IMPORTANT: surround phone number with quotes to prevent removal of leading zeros
republication_interval: 7 # every X days ads should be re-published
```
> **Tip:** For current defaults of all timeout and diagnostic settings, run `kleinanzeigen-bot create-config` or see the [JSON schema](https://raw.githubusercontent.com/Second-Hand-Friends/kleinanzeigen-bot/main/schemas/config.schema.json).
### categories
Additional name to category ID mappings. See the default list at:
[https://github.com/Second-Hand-Friends/kleinanzeigen-bot/blob/main/src/kleinanzeigen_bot/resources/categories.yaml](https://github.com/Second-Hand-Friends/kleinanzeigen-bot/blob/main/src/kleinanzeigen_bot/resources/categories.yaml)
```yaml
categories:
Verschenken & Tauschen > Tauschen: 272/273
Verschenken & Tauschen > Verleihen: 272/274
Verschenken & Tauschen > Verschenken: 272/192
```
### timeouts
Timeout tuning for various browser operations. Adjust these if you experience slow page loads or recurring timeouts.
```yaml
timeouts:
multiplier: 1.0 # Scale all timeouts (e.g. 2.0 for slower networks)
default: 5.0 # Base timeout for web_find/web_click/etc.
page_load: 15.0 # Timeout for web_open page loads
captcha_detection: 2.0 # Timeout for captcha iframe detection
sms_verification: 4.0 # Timeout for SMS verification banners
email_verification: 4.0 # Timeout for email verification prompts
gdpr_prompt: 10.0 # Timeout when handling GDPR dialogs
login_detection: 10.0 # Timeout for DOM-based login detection (primary method)
publishing_result: 300.0 # Timeout for publishing status checks
publishing_confirmation: 20.0 # Timeout for publish confirmation redirect
image_upload: 30.0 # Timeout for image upload and server-side processing
pagination_initial: 10.0 # Timeout for first pagination lookup
pagination_follow_up: 5.0 # Timeout for subsequent pagination clicks
quick_dom: 2.0 # Generic short DOM timeout (shipping dialogs, etc.)
update_check: 10.0 # Timeout for GitHub update requests
chrome_remote_probe: 2.0 # Timeout for local remote-debugging probes
chrome_remote_debugging: 5.0 # Timeout for remote debugging API calls
chrome_binary_detection: 10.0 # Timeout for chrome --version subprocess
retry_enabled: true # Enables DOM retry/backoff when timeouts occur
retry_max_attempts: 2
retry_backoff_factor: 1.5
```
**Timeout tuning tips:**
- Slow networks or sluggish remote browsers often just need a higher `timeouts.multiplier`
- For truly problematic selectors, override specific keys directly under `timeouts`
- Keep `retry_enabled` on so DOM lookups are retried with exponential backoff
For more details on timeout configuration and troubleshooting, see [Browser Troubleshooting](./BROWSER_TROUBLESHOOTING.md).
### download
Download configuration for the `download` command.
```yaml
download:
include_all_matching_shipping_options: false # if true, all shipping options matching the package size will be included
excluded_shipping_options: [] # list of shipping options to exclude, e.g. ['DHL_2', 'DHL_5']
folder_name_max_length: 100 # maximum length for folder names when downloading ads (default: 100)
rename_existing_folders: false # if true, rename existing folders without titles to include titles (default: false)
```
### publishing
Publishing configuration.
```yaml
publishing:
delete_old_ads: "AFTER_PUBLISH" # one of: AFTER_PUBLISH, BEFORE_PUBLISH, NEVER
delete_old_ads_by_title: true # only works if delete_old_ads is set to BEFORE_PUBLISH
```
### captcha
Captcha handling configuration. Enable automatic restart to avoid manual confirmation after captchas.
```yaml
captcha:
auto_restart: true # If true, the bot aborts when a Captcha appears and retries publishing later
# If false (default), the Captcha must be solved manually to continue
restart_delay: 1h 30m # Time to wait before retrying after a Captcha was encountered (default: 6h)
```
### browser
Browser configuration. These settings control how the bot launches and connects to Chromium-based browsers.
```yaml
browser:
# See: https://peter.sh/experiments/chromium-command-line-switches/
arguments:
# Example arguments
- --disable-dev-shm-usage
- --no-sandbox
# --headless
# --start-maximized
binary_location: # path to custom browser executable, if not specified will be looked up on PATH
extensions: [] # a list of .crx extension files to be loaded
use_private_window: true
user_data_dir: "" # see https://github.com/chromium/chromium/blob/main/docs/user_data_dir.md
profile_name: ""
```
**Common browser arguments:**
- `--disable-dev-shm-usage` - Avoids shared memory issues in Docker environments
- `--no-sandbox` - Required when running as root (not recommended)
- `--headless` - Run browser in headless mode (no GUI)
- `--start-maximized` - Start browser maximized
For detailed browser connection troubleshooting, including Chrome 136+ security requirements and remote debugging setup, see [Browser Troubleshooting](./BROWSER_TROUBLESHOOTING.md).
### update_check
Update check configuration to automatically check for newer versions on GitHub.
```yaml
update_check:
enabled: true # Enable/disable update checks
channel: latest # One of: latest, preview
interval: 7d # Check interval (e.g. 7d for 7 days)
```
**Interval format:**
- `s`: seconds, `m`: minutes, `h`: hours, `d`: days
- Examples: `7d` (7 days), `12h` (12 hours), `30d` (30 days)
- Validation: minimum 1 day, maximum 30 days
**Channels:**
- `latest`: Only final releases
- `preview`: Includes pre-releases
### login
Login credentials.
```yaml
login:
username: ""
password: ""
```
> **Security Note:** Never commit your credentials to version control. Keep your `config.yaml` secure and exclude it from git if it contains sensitive information.
### diagnostics
Diagnostics configuration for troubleshooting login detection issues and publish failures.
```yaml
diagnostics:
capture_on:
login_detection: false # Capture screenshot + HTML when login state is UNKNOWN
publish: false # Capture screenshot + HTML + JSON on each failed publish attempt (timeouts/protocol errors)
capture_log_copy: false # Copy entire bot log file when diagnostics are captured (may duplicate log content)
pause_on_login_detection_failure: false # Pause for manual inspection (interactive only)
timing_collection: true # Collect timeout timing data locally for troubleshooting and tuning
output_dir: "" # Custom output directory (see "Output locations (default)" below)
```
**Migration Note:**
Old diagnostics keys have been renamed/moved. Update configs and CI/automation accordingly:
- `login_detection_capture``capture_on.login_detection`
- `publish_error_capture``capture_on.publish`
`capture_log_copy` is a new top-level flag. It may copy the same log multiple times during a single run if multiple diagnostic events are triggered.
**Login Detection Behavior:**
The bot uses a layered approach to detect login state, prioritizing stealth over reliability:
1. **DOM check (primary method - preferred for stealth)**: Checks for user profile elements
- Looks for `.mr-medium` element containing username
- Falls back to `#user-email` ID
- Uses `login_detection` timeout (default: 10.0 seconds)
- Minimizes bot-like behavior by avoiding JSON API requests
2. **Auth probe fallback (more reliable)**: Sends a GET request to `{root_url}/m-meine-anzeigen-verwalten.json?sort=DEFAULT`
- Returns `LOGGED_IN` if response is HTTP 200 with valid JSON containing `"ads"` key
- Returns `LOGGED_OUT` if response is HTTP 401/403 or HTML contains login markers
- Returns `UNKNOWN` on timeouts, assertion failures, or unexpected response bodies
- Only used when DOM check is inconclusive (UNKNOWN or timed out)
**Optional diagnostics:**
- Enable `capture_on.login_detection` to capture screenshots and HTML dumps when state is `UNKNOWN`
- Enable `capture_on.publish` to capture screenshots, HTML dumps, and JSON payloads for each failed publish attempt (e.g., attempts 13).
- Enable `capture_log_copy` to copy the entire bot log file when a diagnostic event triggers (e.g., `capture_on.publish` or `capture_on.login_detection`):
- If multiple diagnostics trigger in the same run, the log will be copied multiple times
- Review or redact artifacts before sharing publicly
- Enable `pause_on_login_detection_failure` to pause the bot for manual inspection in interactive sessions. This requires `capture_on.login_detection=true`; if this is not enabled, the runtime will fail startup with a validation error.
- Use custom `output_dir` to specify where artifacts are saved
**Output locations (default):**
- **Portable mode + `--config /path/to/config.yaml`**: `/path/to/.temp/diagnostics/` (portable runtime files are placed next to the selected config file)
- **Portable mode without `--config`**: `./.temp/diagnostics/` (current working directory)
- **User directories mode**: `~/.cache/kleinanzeigen-bot/diagnostics/` (Linux), `~/Library/Caches/kleinanzeigen-bot/diagnostics/` (macOS), or `%LOCALAPPDATA%\kleinanzeigen-bot\Cache\diagnostics\` (Windows)
- **Custom**: Path resolved relative to your `config.yaml` if `output_dir` is specified
**Timing collection output (default):**
- **Portable mode**: `./.temp/timing/timing_data.json`
- **User directories mode**: `~/.cache/kleinanzeigen-bot/timing/timing_data.json` (Linux) or `~/Library/Caches/kleinanzeigen-bot/timing/timing_data.json` (macOS)
- Data is grouped by run/session and retained for 30 days via automatic cleanup during each data write
Example structure:
```json
[
{
"session_id": "abc12345",
"command": "publish",
"started_at": "2026-02-07T10:00:00+01:00",
"ended_at": "2026-02-07T10:04:30+01:00",
"records": [
{
"operation_key": "default",
"operation_type": "web_find",
"effective_timeout_sec": 5.0,
"actual_duration_sec": 1.2,
"attempt_index": 0,
"success": true
}
]
}
]
```
How to read it quickly:
- Group by `command` and `session_id` first to compare slow vs fast runs
- Look for high `actual_duration_sec` values near `effective_timeout_sec` and repeated `success: false` entries
- `attempt_index` is zero-based (`0` first attempt, `1` first retry)
- Use `operation_key` + `operation_type` to identify which timeout bucket (`default`, `page_load`, etc.) needs tuning
- For deeper timeout tuning workflow, see [Browser Troubleshooting](./BROWSER_TROUBLESHOOTING.md)
> **⚠️ PII Warning:** HTML dumps, JSON payloads, timing data JSON files (for example `timing_data.json`), and log copies may contain PII. Typical examples include account email, ad titles/descriptions, contact info, and prices. Log copies are produced by `capture_log_copy` when diagnostics capture runs, such as `capture_on.publish` or `capture_on.login_detection`. Review or redact these artifacts before sharing them publicly.
## Installation Modes
On first run, when the `--workspace-mode` flag is not provided, the app may ask which installation mode to use. In non-interactive environments, it defaults to portable mode.
1. **Portable mode (recommended for most users, especially on Windows):**
- Stores config, logs, downloads, and state in the current directory
- No admin permissions required
- Easy backup/migration; works from USB drives
2. **User directories mode (advanced users / multi-user setups):**
- Stores files in OS-standard locations
- Cleaner directory structure; better separation from working directory
- Requires proper permissions for user data directories
**OS notes:**
- **Windows:** User directories mode uses AppData (Roaming/Local); portable keeps everything beside the `.exe`.
- **Linux:** User directories mode uses `~/.config/kleinanzeigen-bot/config.yaml`, `~/.local/state/kleinanzeigen-bot/`, and `~/.cache/kleinanzeigen-bot/`; portable stays in the current working directory (for example `./config.yaml`, `./.temp/`, `./downloaded-ads/`).
- **macOS:** User directories mode uses `~/Library/Application Support/kleinanzeigen-bot/config.yaml` (config), `~/Library/Application Support/kleinanzeigen-bot/` (state/runtime), and `~/Library/Caches/kleinanzeigen-bot/` (cache/diagnostics); portable stays in the current directory.
### Mixed footprint cleanup
If both portable and XDG footprints exist, `--config` without `--workspace-mode` is intentionally rejected to avoid silent behavior changes.
A footprint is the set of files/directories the bot creates for one mode (configuration file, runtime state/cache directories, and `downloaded-ads`).
Use one explicit run to choose a mode:
```bash
kleinanzeigen-bot --workspace-mode=portable --config /path/to/config.yaml verify
```
or
```bash
kleinanzeigen-bot --workspace-mode=xdg --config /path/to/config.yaml verify
```
Then remove the unused footprint directories/files to make auto-detection unambiguous for future runs.
- Remove **portable footprint** items in your working location: `config.yaml`, `.temp/` (Windows: `.temp\`), and `downloaded-ads/` (Windows: `downloaded-ads\`). Back up or move `config.yaml` to your desired location before deleting it.
- Remove **user directories footprint** items:
Linux: `~/.config/kleinanzeigen-bot/`, `~/.local/state/kleinanzeigen-bot/`, `~/.cache/kleinanzeigen-bot/`.
macOS: `~/Library/Application Support/kleinanzeigen-bot/`, `~/Library/Caches/kleinanzeigen-bot/`.
Windows: `%APPDATA%\kleinanzeigen-bot\`, `%LOCALAPPDATA%\kleinanzeigen-bot\`, `%LOCALAPPDATA%\kleinanzeigen-bot\Cache\`.
## Getting Current Defaults
To see all current default values, run:
```bash
kleinanzeigen-bot create-config
```
This generates a config file with `exclude_none=True`, giving you all the non-None defaults.
For the complete machine-readable reference, see the [JSON schema](https://raw.githubusercontent.com/Second-Hand-Friends/kleinanzeigen-bot/main/schemas/config.schema.json).