Scraping Replies¶
Learn how to scrape all replies to any tweet, with filtering, threading, and export options.
Basic Usage¶
from xeepy import Xeepy
async with Xeepy() as x:
replies = await x.scrape.replies(
"https://x.com/elonmusk/status/1234567890"
)
for reply in replies:
print(f"@{reply.author.username}: {reply.text}")
Parameters¶
replies = await x.scrape.replies(
url: str, # Tweet URL or ID
limit: int = 100, # Max replies to scrape
include_author_replies: bool = True, # Include OP's replies
sort_by: str = "top", # "top", "recent", "controversial"
min_likes: int = 0, # Filter by minimum likes
verified_only: bool = False, # Only verified accounts
)
Advanced Examples¶
Get All Replies (No Limit)¶
async with Xeepy() as x:
# Warning: Can be slow for viral tweets
replies = await x.scrape.replies(
tweet_url,
limit=None # Get all available replies
)
Filter High-Engagement Replies¶
async with Xeepy() as x:
replies = await x.scrape.replies(
tweet_url,
limit=500,
min_likes=10, # Only replies with 10+ likes
sort_by="top"
)
Stream Replies in Real-Time¶
async with Xeepy() as x:
async for reply in x.scrape.replies_stream(tweet_url):
print(f"New reply: {reply.text[:50]}...")
# Process each reply as it arrives
if "question" in reply.text.lower():
await notify_about_question(reply)
Get Reply Threads¶
async with Xeepy() as x:
# Get replies with their nested replies
replies = await x.scrape.replies(
tweet_url,
include_nested=True, # Get replies to replies
max_depth=3 # How deep to go
)
for reply in replies:
print(f"@{reply.author.username}: {reply.text}")
for nested in reply.replies:
print(f" └─ @{nested.author.username}: {nested.text}")
Analyze Sentiment of Replies¶
from xeepy import Xeepy
from xeepy.ai import ContentGenerator
async with Xeepy() as x:
replies = await x.scrape.replies(tweet_url, limit=100)
ai = ContentGenerator(provider="openai")
sentiments = {"positive": 0, "negative": 0, "neutral": 0}
for reply in replies:
result = await ai.analyze_sentiment(reply.text)
sentiments[result.label] += 1
print(f"Sentiment breakdown: {sentiments}")
Export Replies¶
async with Xeepy() as x:
replies = await x.scrape.replies(tweet_url, limit=500)
# CSV with all fields
x.export.to_csv(replies, "replies.csv")
# JSON for programmatic use
x.export.to_json(replies, "replies.json")
# Excel with formatting
x.export.to_excel(replies, "replies.xlsx")
Customize Export Fields¶
async with Xeepy() as x:
replies = await x.scrape.replies(tweet_url)
# Export specific fields only
x.export.to_csv(
replies,
"replies.csv",
fields=["author.username", "text", "likes", "created_at"]
)
CLI Usage¶
# Basic scrape
xeepy scrape replies https://x.com/user/status/123
# With options
xeepy scrape replies https://x.com/user/status/123 \
--limit 500 \
--min-likes 5 \
--sort top \
--output replies.csv
# JSON output
xeepy scrape replies https://x.com/user/status/123 \
--format json \
--output replies.json
Reply Data Model¶
@dataclass
class Reply:
id: str # Reply tweet ID
text: str # Reply content
author: User # Author details
created_at: datetime # When posted
likes: int # Like count
retweets: int # Retweet count
replies: int # Reply count
url: str # Reply URL
in_reply_to: str # Parent tweet ID
conversation_id: str # Thread root ID
is_author_reply: bool # Is OP replying?
media: List[Media] # Attached media
hashtags: List[str] # Hashtags used
mentions: List[str] # Users mentioned
nested_replies: List[Reply] # Replies to this reply
Use Cases¶
Find Questions to Answer¶
async with Xeepy() as x:
replies = await x.scrape.replies(my_tweet_url)
questions = [r for r in replies if "?" in r.text]
print(f"Found {len(questions)} questions to answer:")
for q in questions:
print(f" @{q.author.username}: {q.text}")
Identify Influencer Replies¶
async with Xeepy() as x:
replies = await x.scrape.replies(tweet_url, limit=500)
# Find replies from accounts with 10k+ followers
influencer_replies = [
r for r in replies
if r.author.followers_count >= 10000
]
print(f"Influencer replies ({len(influencer_replies)}):")
for r in influencer_replies:
print(f" @{r.author.username} ({r.author.followers_count:,} followers)")
Build Community List from Engagers¶
async with Xeepy() as x:
# Get people who engage with your content
replies = await x.scrape.replies(my_tweet_url)
engaged_users = [r.author for r in replies]
# Export for follow-up
x.export.to_csv(engaged_users, "engaged_community.csv")
Troubleshooting¶
Why am I getting fewer replies than expected?
X/Twitter's API doesn't return all replies. Hidden replies, spam-filtered replies, and very old replies may not be accessible.
Replies are loading slowly
Large reply threads require pagination. Consider:
- Using
limitto cap results - Using
min_likesto filter - Using
sort_by="top"to get best replies first