# Reddit Search Sources

## DuckDuckGo Site Search

### Endpoint
- **URL**: `https://duckduckgo.com/html/`
- **Method**: GET
- **Query format**: `site:reddit.com <query> [subreddit:<name>]`

### Parameters
| Parameter | Description | Example |
|-----------|-------------|---------|
| q | Search query | `site:reddit.com machine learning` |
| | With subreddit filter | `site:reddit.com vpn r/privacy` |

### How It Works

1. **Query**: DuckDuckGo searches the web for `site:reddit.com <query>`
2. **HTML Response**: Returns HTML page with search results
3. **Parse**: Extract results using regex (fast, no heavy dependencies)
4. **Redirects**: Follow DuckDuckGo's `duckduckgo.com/l/?uddg=` redirects to get actual Reddit URLs
5. **Filter**: Only include Reddit.com URLs

### Example URLs
```
https://duckduckgo.com/html/?q=site%3Areddit.com+machine+learning
https://duckduckgo.com/html/?q=site%3Areddit.com+vpn+subreddit%3Ar%2Fprivacy
```

### Limitations

| Aspect | Status | Notes |
|---------|---------|--------|
| API Access | ❌ | No official API for HTML search |
| Rate Limits | ❌ | None documented, but be reasonable |
| Time Filter | ❌ | Cannot filter by post date |
| Engagement Data | ❌ | No upvote/comment counts |
| Freshness | ⚠️ | Depends on DuckDuckGo's crawling schedule |
| Result Count | ⚠️ | HTML parsing may return ~10-20 results |

### HTML Structure

```html
<a class="result__a" href="https://duckduckgo.com/l/?uddg=...">
    Post Title
</a>

<a class="result__snippet">
    Brief content preview...
</a>
```

### Redirect Handling

DuckDuckGo uses URL redirects to track clicks:
```
Original: https://duckduckgo.com/l/?uddg=https%3A%2F%2Fwww.reddit.com%2F...
Extracted: https://www.reddit.com/...
```

The script follows these redirects to get the actual Reddit URL.

---

## Comparison with Alternatives

| Source | Free | API | Rate Limit | Coverage | Stability |
|--------|------|-----|-----------|----------|-----------|
| DuckDuckGo | Yes | No | None | High | ⭐⭐⭐⭐⭐ |
| PullPush API | Yes | Yes | None | High | ⭐⭐⭐ |
| Old Reddit | Yes | No | Medium | High | ⭐⭐⭐⭐⭐ |
| Official Reddit API | No | Yes | Strict | Very High | ⭐⭐⭐⭐⭐⭐ |

---

## Troubleshooting

### DuckDuckGo Search Issues

#### No Results
- Query may be too specific
- DuckDuckGo may not have indexed the content yet
- Try broader terms or different subreddit

#### Wrong/Redirected URLs
- Script automatically follows `uddg=` redirects
- Check if URL extraction failed (console errors)
- Verify DuckDuckGo HTML structure hasn't changed

#### Slow Response
- Network connectivity issues
- DuckDuckGo server issues
- Try reducing `--limit` to 10

---

## Rate Limiting Best Practices

1. **Space out requests**: 1-2 seconds between queries
2. **Limit result size**: Use `--limit 10` for quick searches
3. **Cache results**: Store for 5-15 minutes if searching same query
4. **Use specific queries**: Avoid overly broad searches

---

## Future Improvements

Possible enhancements:

1. **Multiple Search Engines**: Add Bing or Brave search as fallback
2. **Time Filtering**: Use Google's `after:` syntax if possible
3. **Engagement Scraping**: Parse Reddit pages for upvote/comment counts
4. **Async Requests**: Parallel multiple search sources
5. **Result Caching**: Implement local cache for repeated queries

---

## Technical Notes

### Regex Patterns Used

**Link Extraction**:
```regex
<a[^>]*class="result__a"[^>]*href="([^"]*)"[^>]*>(.*?)</a>
```

**Snippet Extraction**:
```regex
<a[^>]*class="result__snippet"[^>]*>(.*?)</a>
```

**Subreddit Extraction**:
```regex
reddit\.com/r/([^/]+)
```

**Redirect URL Extraction**:
```regex
uddg=([^&]+)
```

### HTML Entity Unescaping

```python
title = title.replace('&amp;', '&').replace('&lt;', '<').replace('&gt;', '>').replace('&quot;', '"')
```

### URL Decoding

```python
from urllib.parse import unquote
url = unquote(redirected_url)
```

---

## Why DuckDuckGo?

**Advantages:**
- ✅ Free and no API key required
- ✅ Supports site search via `site:` operator
- ✅ Privacy-focused, less tracking
- ✅ Fast HTML response (no JSON parsing)
- ✅ Good coverage of Reddit content

**Disadvantages:**
- ❌ No official API for programmatic access
- ❌ HTML parsing may break if structure changes
- ❌ No time filtering capabilities
- ❌ No engagement metrics (upvotes, comments)

---

## Alternative Search Engines

### 1. Brave Search
- **URL**: `https://search.brave.com/search`
- **Pros**: Privacy-focused, good results
- **Cons**: Similar limitations to DuckDuckGo

### 2. Startpage
- **URL**: `https://www.startpage.com/sp/search`
- **Pros**: Google results with privacy
- **Cons**: May have CAPTCHA, rate limits

### 3. Bing
- **URL**: `https://www.bing.com/search`
- **Pros**: Large index, official API available
- **Cons**: Requires API key for reliable access

### 4. Google Custom Search
- **URL**: `https://www.googleapis.com/customsearch/v1`
- **Pros**: Official Google API, reliable
- **Cons**: Requires API key, free tier limited

---

## Legal and Ethical Considerations

1. **Terms of Service**: DuckDuckGo allows automated search with reasonable usage
2. **Rate Limiting**: Be respectful, don't spam
3. **User Agent**: Use legitimate user-agent string
4. **Attribution**: This skill uses DuckDuckGo as a search source

---

## Security Notes

- **No Authentication**: No API keys or tokens needed
- **HTTPS**: All connections use HTTPS
- **Input Validation**: Query terms are URL-encoded
- **Output Sanitization**: HTML entities are properly escaped