---
name: reddit-search
description: "Reddit search using self-hosted SearXNG metasearch"
version: 3.0.0
metadata:
  hermes:
    tags: [reddit, search, research, social-media, searxng]
---

# Reddit Search

Search Reddit using self-hosted SearXNG metasearch engine. Aggregates results from multiple search engines (Google, Brave, Startpage).

## Features

- ⚡ **Performance**: Fast API-based search using self-hosted SearXNG
- 🎯 **Relevance**: Title and content matching scored intelligently
- 🔍 **Multi-engine**: Aggregates results from Google, Brave, Startpage
- 📊 **Top 20**: Returns most relevant 20 results
- 🔗 **Direct Links**: Extracts actual Reddit URLs
- 🔄 **Deduplication**: Filters duplicate URLs across engines

## Usage

### Basic search
```bash
python ~/.hermes/skills/research/reddit-search/scripts/search_reddit.py "machine learning tutorial"
```

### Search in specific subreddit
```bash
python ~/.hermes/skills/research/reddit-search/scripts/search_reddit.py "best vpn" --subreddit r/vpn
```

### Limit results
```bash
python ~/.hermes/skills/research/reddit-search/scripts/search_reddit.py "crypto news" --limit 10
```

### Python script usage
```python
from hermes_tools import terminal

result = terminal("python ~/.hermes/skills/research/reddit-search/scripts/search_reddit.py 'machine learning'")
# Returns top 20 formatted results + JSON
```

## Results Format

Each result includes:
- **Title**: Post title
- **Subreddit**: r/subreddit_name
- **URL**: Direct link to the post
- **Snippet**: Brief content preview
- **Source**: Search engine name (google, brave, startpage)
- **Engines**: List of engines that returned this result
- **Relevance Score**: Match quality (0-100+)
- **SearXNG Score**: Aggregated score from metasearch

## How It Works

1. **Query SearXNG**: Searches `site:reddit.com <query>` via local API
2. **Multi-engine aggregation**: Queries Google, Brave, Startpage in parallel
3. **Parse JSON**: Fast JSON-based result extraction
4. **Extract Reddit URLs**: Filters and validates Reddit URLs
5. **Score Results**: Ranks by SearXNG score + title/content match + engine diversity
6. **Return Top 20**: Most relevant results

## Ranking

Results are scored using relevance algorithm:

### Relevance Score
- **SearXNG base score**: Aggregated from multiple engines (scaled x5)
- **Title match**: +15 points per matching keyword
- **Content match**: +5 points per matching keyword
- **Engine diversity**: +3 points per engine that returned the result

**Example**: A post returned by multiple engines with exact title match would score ~40+ points and rank highly.

## Requirements

- Self-hosted SearXNG instance running (default: `http://127.0.0.1:8080`)
- SearXNG configuration requirements:
  - `limiter: false` (disabled rate limiting to allow API calls)
  - `formats: [html, json, csv, rss]` (JSON output enabled)
  - Multiple engines enabled (google, brave, startpage recommended for Reddit search)
- Python `requests` library installed on system

### SearXNG Setup Example

If not already set up, here's the minimal configuration required:

**Settings file** (`~/searxng-settings/settings.yml` or `/etc/searxng/settings.yml`):
```yaml
use_default_settings: true

search:
  formats:
    - html
    - json  # ← Critical: Required for API output
    - csv
    - rss

server:
  limiter: false  # ← Critical: Disable rate limiting for API access
  secret_key: "your-secret-key"
```

**Start SearXNG**:
```bash
# From SearXNG directory
source ~/searxng-venv/bin/activate
export SEARXNG_SETTINGS_PATH=~/searxng-settings/settings.yml
export SEARXNG_PORT=8080
python ~/searxng/searx/webapp.py
```

**Verify SearXNG is working**:
```bash
# Test basic connectivity
curl http://127.0.0.1:8080/

# Test JSON API output
curl -s "http://127.0.0.1:8080/search?q=test&format=json" | python3 -m json.tool | head -20
```

### SearXNG Setup

If not already set up:
```bash
# Clone and install SearXNG
git clone https://github.com/searxng/searxng.git ~/searxng
cd ~/searxng
python3 -m venv ~/searxng-venv
source ~/searxng-venv/bin/activate
pip install -e .
```

Configuration example (`~/searxng-settings/settings.yml`):
```yaml
use_default_settings: true

search:
  formats:
    - html
    - json
    - csv
    - rss

server:
  limiter: false
  secret_key: "your-secret-key"
```

Run SearXNG:
```bash
source ~/searxng-venv/bin/activate
export SEARXNG_SETTINGS_PATH=~/searxng-settings/settings.yml
export SEARXNG_PORT=8080
python ~/searxng/searx/webapp.py
```

## Limitations

| Aspect | Status | Notes |
|---------|---------|--------|
| Time filter | ⚠️ | Limited - SearXNG may provide publishedDate if available from source |
| Engagement score | ❌ | No upvote/comment data available (search engines don't index) |
| Real-time | ✅ | Depends on search engines' index freshness |
| Rate limit | ✅ | Self-hosted instance, no external limits (configure limiter: false) |
| Bot Protection | ✅ | No bot issues - uses JSON API instead of HTML scraping |

## Troubleshooting

### No results returned
- Query may be too specific - try broader terms
- Some subreddits may be private/banned
- Check SearXNG is running: `curl http://127.0.0.1:8080/`
- Verify engines are enabled in Searxng config (google, brave, startpage)

### Connection refused (Error 111)
- Check if SearXNG is running: `ps aux | grep searx`
- Verify correct port: Default is 8080, but may differ based on config
- Start SearXNG if needed:
  ```bash
  source ~/searxng-venv/bin/activate
  export SEARXNG_SETTINGS_PATH=~/searxng-settings/settings.yml
  export SEARXNG_PORT=8080
  python ~/searxng/searx/webapp.py
  ```

### JSON format not available (Error 403 Forbidden)
- **Error**: `403 Forbidden` when querying with `&format=json`
- **Cause**: JSON format not enabled in Searxng search formats
- **Fix**: Add JSON format to Searxng config:
  ```yaml
  search:
    formats:
      - html
      - json  # ← Add this line
  ```
- Then restart SearXNG and retry search

### Rate limited (Error 429 Too Many Requests)
- **Error**: `429 Too Many Requests` from SearXNG
- **Cause**: Rate limiter enabled in server configuration
- **Fix**: Disable limiter in settings:
  ```yaml
  server:
    limiter: false  # ← Change from true
  ```
- Restart SearXNG: Kill process and start again

### Slow response
- Network connectivity issues
- Some search engines may be slow (SearXNG aggregates multiple)
- Try reducing result count with `--limit 10`
- Disable slow engines in Searxng config (edit settings.yml)
- Check SearXNG logs for engine timeouts

## Examples

Search for tech discussions:
```bash
python ~/.hermes/skills/research/reddit-search/scripts/search_reddit.py "docker compose"
```

Find specific subreddit content:
```bash
python ~/.hermes/skills/research/reddit-search/scripts/search_reddit.py "self hosting" --subreddit r/selfhosted
```

Research product feedback:
```bash
python ~/.hermes/skills/research/reddit-search/scripts/search_reddit.py "vercel review"
```

## Data Sources

**SearXNG (Self-hosted)**
- **Type**: Metasearch engine API
- **Cost**: Free (self-hosted)
- **Auth**: None required
- **Coverage**: Aggregates multiple search engines (Google, Brave, Startpage, etc.)
- **Engines**: Configurable - recommend: google, brave, startpage, duckduckgo

**Search Engines (via SearXNG)**
- **Google**: Comprehensive index, good for recent content
- **Brave**: Privacy-focused, good for tech topics
- **Startpage**: Google results with privacy protection
- **DuckDuckGo**: Alternative index

See `references/sources.md` for details.

## Troubleshooting

### SearXNG-Specific Issues
For detailed SearXNG configuration and troubleshooting (rate limiting, JSON access, setup), see `references/searxng-troubleshooting.md`.

### No results returned

## Skill Implementation

The search logic is implemented in `scripts/search_reddit.py`. It:
1. Queries SearXNG with `site:reddit.com` filter
2. Parses JSON results
3. Extracts and validates Reddit URLs
4. Scores and ranks by relevance (SearXNG score + title/content match + engine diversity)
5. Returns top 20 formatted results

Use via terminal() or call directly from other skills.
