Advanced Topics¶
Deep dives into Xeepy internals and advanced usage patterns.
Architecture¶
-
Understand how Xeepy components work together
-
Extend Xeepy with custom plugins
-
Build your own scrapers
-
Optimize for speed and efficiency
Infrastructure¶
-
Configure proxy rotation for stealth
-
Avoid detection and blocks
-
Scale across multiple machines
-
Production container deployment
Development¶
-
Test your Xeepy integrations
-
Graceful error recovery
-
Event-driven integrations
-
Run Xeepy as a service
Quick Links¶
| Topic | Description | Difficulty |
|---|---|---|
| Architecture | System design overview | Intermediate |
| Custom Scrapers | Build new scrapers | Advanced |
| Proxies | Proxy configuration | Intermediate |
| Stealth | Detection avoidance | Advanced |
| Distributed | Multi-machine scaling | Expert |
| Performance | Speed optimization | Intermediate |
| Docker | Container deployment | Intermediate |
| Testing | Testing strategies | Intermediate |
| Errors | Error handling | Beginner |
| Plugins | Plugin development | Advanced |
Architecture Overview¶
┌─────────────────────────────────────────────────────────────────┐
│ Xeepy │
├─────────────────────────────────────────────────────────────────┤
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Scrapers │ │ Actions │ │ Monitor │ │ Analytics│ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │ │
│ ┌────▼─────────────▼─────────────▼─────────────▼────┐ │
│ │ Core │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ Browser │ │ Auth │ │Rate Limiter│ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ └───────────────────────┬───────────────────────────┘ │
│ │ │
│ ┌───────────────────────▼───────────────────────────┐ │
│ │ Storage │ │
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │
│ │ │ SQLite │ │ CSV │ │ JSON │ │ Excel │ │ │
│ │ └────────┘ └────────┘ └────────┘ └────────┘ │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Key Concepts¶
Browser Management¶
Xeepy uses Playwright for browser automation:
from xeepy.core.browser import BrowserManager
class BrowserManager:
"""Manages browser lifecycle and page pool."""
async def start(self) -> None:
"""Launch browser with stealth configuration."""
async def new_page(self) -> Page:
"""Create new page with rate limiting."""
async def close(self) -> None:
"""Clean shutdown with session save."""
Rate Limiting¶
Intelligent rate limiting protects your account:
from xeepy.core.rate_limiter import RateLimiter
class RateLimiter:
"""Adaptive rate limiter with backoff."""
def __init__(
self,
requests_per_minute: int = 20,
burst_limit: int = 5,
backoff_factor: float = 2.0
): ...
async def wait(self) -> None:
"""Wait for rate limit clearance."""
def record_response(self, status: int) -> None:
"""Adjust limits based on response."""
Event System¶
Xeepy emits events for monitoring:
from xeepy.core.events import EventEmitter
class Xeepy(EventEmitter):
"""Emits events during operations."""
# Events:
# - "scrape:start", "scrape:complete", "scrape:error"
# - "action:start", "action:complete", "action:error"
# - "auth:login", "auth:logout", "auth:expired"
# - "rate_limit:warning", "rate_limit:hit"
# Subscribe to events
x.on("scrape:complete", lambda data: print(f"Scraped {len(data)} items"))
x.on("rate_limit:warning", lambda: print("Slowing down..."))
Configuration Hierarchy¶
Configuration is loaded in this order (later overrides earlier):
- Defaults - Built-in defaults
- System config -
/etc/xeepy/config.toml - User config -
~/.config/xeepy/config.toml - Project config -
./xeepy.toml - Environment variables -
XEEPY_* - CLI arguments -
--option value - Runtime -
x.config.setting = value
Extension Points¶
Xeepy is designed for extensibility:
Custom Scrapers¶
from xeepy.scrapers.base import BaseScraper
class MyScraper(BaseScraper):
"""Custom scraper implementation."""
async def scrape(self, target: str, **kwargs) -> ScrapeResult:
# Your implementation
pass
Custom Actions¶
from xeepy.actions.base import BaseAction
class MyAction(BaseAction):
"""Custom action implementation."""
async def execute(self, **kwargs) -> ActionResult:
# Your implementation
pass
Custom Notifications¶
from xeepy.notifications.base import BaseNotifier
class MyNotifier(BaseNotifier):
"""Custom notification channel."""
async def send(self, message: str, **kwargs) -> bool:
# Your implementation
pass
Performance Characteristics¶
| Operation | Typical Speed | Memory Usage |
|---|---|---|
| Profile scrape | 1-2 sec | ~50 MB |
| 100 tweets | 10-20 sec | ~100 MB |
| 1000 followers | 60-120 sec | ~200 MB |
| Follow action | 2-3 sec | ~50 MB |
| Like action | 1-2 sec | ~50 MB |
Security Considerations¶
- Sessions are stored encrypted by default
- Credentials never logged
- Rate limiting protects accounts
- Proxy support for IP rotation
- Stealth mode to avoid detection
Next Steps¶
- Architecture Deep Dive - Understand the internals
- Performance Tuning - Optimize your usage
- Stealth Mode - Avoid detection
- Distributed Scraping - Scale up