Skip to content

Scraping Guide

Xeepy provides powerful, flexible scraping capabilities for X/Twitter. This guide covers all scraping features with detailed examples.

Overview

Xeepy can scrape virtually any public data from X/Twitter:

  • Replies

    Scrape all replies to any tweet

  • Profiles

    Get detailed user profile information

  • Followers

    Extract follower lists with metadata

  • Following

    Get who a user follows

  • Tweets

    Scrape user tweets and timelines

  • Threads

    Unroll and extract full threads

  • Search

    Search tweets with advanced filters

  • Hashtags

    Scrape tweets by hashtag

  • Media

    Extract images and videos

  • Lists

    Scrape list members and tweets

Quick Start

from xeepy import Xeepy

async with Xeepy() as x:
    # Scrape 100 replies to a tweet
    replies = await x.scrape.replies(
        "https://x.com/elonmusk/status/1234567890",
        limit=100
    )

    # Export to CSV
    x.export.to_csv(replies, "replies.csv")

Common Patterns

Scrape with Progress

async with Xeepy() as x:
    async for tweet in x.scrape.tweets_stream("username", limit=1000):
        print(f"Got tweet: {tweet.text[:50]}...")

        # Process each tweet as it comes
        await process_tweet(tweet)

Scrape Multiple Users

async with Xeepy() as x:
    users = ["user1", "user2", "user3"]

    for user in users:
        tweets = await x.scrape.tweets(user, limit=100)
        x.export.to_csv(tweets, f"{user}_tweets.csv")

Handle Large Datasets

async with Xeepy() as x:
    # Scrape in batches to avoid memory issues
    async for batch in x.scrape.followers_batched("popular_user", batch_size=100):
        # Process and save each batch
        x.export.append_csv(batch, "followers.csv")
        print(f"Processed {len(batch)} followers")

Rate Limiting

Xeepy automatically handles rate limiting to protect your account:

async with Xeepy() as x:
    # Default: 20 requests/minute (safe)
    replies = await x.scrape.replies(url, limit=1000)

    # Customize rate limit
    x.config.rate_limit.requests_per_minute = 30

Be Respectful

Higher rate limits increase detection risk. Stick to defaults unless you have a specific need.

Data Models

All scraped data uses typed models for consistency:

# Tweet model
reply.id           # Tweet ID
reply.text         # Tweet content
reply.author       # User model
reply.created_at   # Datetime
reply.likes        # Like count
reply.retweets     # Retweet count
reply.replies      # Reply count
reply.url          # Tweet URL

# User model
user.id            # User ID
user.username      # Handle (without @)
user.name          # Display name
user.bio           # Bio/description
user.followers_count
user.following_count
user.tweet_count
user.verified
user.created_at

Export Options

Every scraping function integrates with export:

async with Xeepy() as x:
    data = await x.scrape.replies(url, limit=100)

    # Multiple export formats
    x.export.to_csv(data, "data.csv")
    x.export.to_json(data, "data.json")
    x.export.to_excel(data, "data.xlsx")
    x.export.to_parquet(data, "data.parquet")

    # Database export
    await x.export.to_database(data, "sqlite:///data.db")

Best Practices

  1. Start small - Test with limit=10 before scaling up
  2. Use caching - Avoid re-scraping the same data
  3. Respect rate limits - Don't disable built-in protections
  4. Handle errors - Network issues happen; use try/except
  5. Store incrementally - Save data as you scrape for large jobs

Detailed Guides

Choose a specific scraping topic: