Scraping Guide¶

Xeepy provides powerful, flexible scraping capabilities for X/Twitter. This guide covers all scraping features with detailed examples.

Overview¶

Xeepy can scrape virtually any public data from X/Twitter:

Replies

Scrape all replies to any tweet
Profiles

Get detailed user profile information
Followers

Extract follower lists with metadata
Following

Get who a user follows
Tweets

Scrape user tweets and timelines
Threads

Unroll and extract full threads
Search

Search tweets with advanced filters
Hashtags

Scrape tweets by hashtag
Media

Extract images and videos
Lists

Scrape list members and tweets

Quick Start¶

from xeepy import Xeepy

async with Xeepy() as x:
    # Scrape 100 replies to a tweet
    replies = await x.scrape.replies(
        "https://x.com/elonmusk/status/1234567890",
        limit=100
    )

    # Export to CSV
    x.export.to_csv(replies, "replies.csv")

Common Patterns¶

Scrape with Progress¶

async with Xeepy() as x:
    async for tweet in x.scrape.tweets_stream("username", limit=1000):
        print(f"Got tweet: {tweet.text[:50]}...")

        # Process each tweet as it comes
        await process_tweet(tweet)

Scrape Multiple Users¶

async with Xeepy() as x:
    users = ["user1", "user2", "user3"]

    for user in users:
        tweets = await x.scrape.tweets(user, limit=100)
        x.export.to_csv(tweets, f"{user}_tweets.csv")

Handle Large Datasets¶

async with Xeepy() as x:
    # Scrape in batches to avoid memory issues
    async for batch in x.scrape.followers_batched("popular_user", batch_size=100):
        # Process and save each batch
        x.export.append_csv(batch, "followers.csv")
        print(f"Processed {len(batch)} followers")

Rate Limiting¶

Xeepy automatically handles rate limiting to protect your account:

async with Xeepy() as x:
    # Default: 20 requests/minute (safe)
    replies = await x.scrape.replies(url, limit=1000)

    # Customize rate limit
    x.config.rate_limit.requests_per_minute = 30

Be Respectful

Higher rate limits increase detection risk. Stick to defaults unless you have a specific need.

Data Models¶

All scraped data uses typed models for consistency:

# Tweet model
reply.id           # Tweet ID
reply.text         # Tweet content
reply.author       # User model
reply.created_at   # Datetime
reply.likes        # Like count
reply.retweets     # Retweet count
reply.replies      # Reply count
reply.url          # Tweet URL

# User model
user.id            # User ID
user.username      # Handle (without @)
user.name          # Display name
user.bio           # Bio/description
user.followers_count
user.following_count
user.tweet_count
user.verified
user.created_at

Export Options¶

Every scraping function integrates with export:

async with Xeepy() as x:
    data = await x.scrape.replies(url, limit=100)

    # Multiple export formats
    x.export.to_csv(data, "data.csv")
    x.export.to_json(data, "data.json")
    x.export.to_excel(data, "data.xlsx")
    x.export.to_parquet(data, "data.parquet")

    # Database export
    await x.export.to_database(data, "sqlite:///data.db")

Best Practices¶

Start small - Test with limit=10 before scaling up
Use caching - Avoid re-scraping the same data
Respect rate limits - Don't disable built-in protections
Handle errors - Network issues happen; use try/except
Store incrementally - Save data as you scrape for large jobs

Detailed Guides¶

Choose a specific scraping topic:

Replies Scraping - Extract conversation threads
Profile Scraping - Get user details
Followers Scraping - Build follower lists
Tweet Scraping - Get user timelines
Search Scraping - Find tweets by query
Hashtag Scraping - Monitor hashtag activity
Thread Unrolling - Extract full threads
Media Scraping - Download images/videos