Substacks

API Docs

API Documentation

REST API and real-time WebSocket interface for Substacks.

Introduction

All REST endpoints return JSON. The API is read-only — the scraper populates data automatically based on the configured newsletters.

Real-time updates (new articles, scrape events) are pushed over a WebSocket connection.

Rate Limiting

All endpoints are rate-limited per IP address.

HeaderDescription
X-RateLimit-Limit Maximum requests per window (default: 100)
X-RateLimit-Remaining Requests remaining in the current window
X-RateLimit-Reset Seconds until the window resets
Retry-After Seconds to wait (only on 429 responses)

When the limit is exceeded, the server responds with:

{
  "error": "Too many requests"
}

429 — wait for Retry-After seconds before retrying.

List Articles

GET /api/articles

Retrieve a paginated list of scraped articles, optionally filtered by newsletter.

Query ParamTypeDescription
limit integer Max articles to return. Default: 100
offset integer Number of articles to skip. Default: 0
newsletter string Filter by newsletter URL. Optional

Example

GET /api/articles?limit=10&offset=0&newsletter=https://example.substack.com

Response 200

[
  {
    "id": 1,
    "newsletter": "https://example.substack.com",
    "slug": "my-first-post",
    "title": "My First Post",
    "subtitle": "A subtitle",
    "canonical_url": "https://example.substack.com/p/my-first-post",
    "post_date": "2025-01-15T12:00:00.000Z",
    "content_html": "<p>...</p>",
    "content_markdown": "# My First Post\n...",
    "scraped_at": "2025-01-15T13:00:00.000Z",
    "notified": 1
  }
]

Get Article

GET /api/articles/:newsletter/:slug

Retrieve a single article by its newsletter URL and slug.

Path ParamTypeDescription
newsletter string URL-encoded newsletter URL
slug string URL-encoded article slug

Example

GET /api/articles/https%3A%2F%2Fexample.substack.com/my-first-post

Response 200

{
  "id": 1,
  "newsletter": "https://example.substack.com",
  "slug": "my-first-post",
  "title": "My First Post",
  ...
}

Not found 404

{ "error": "Article not found" }

List Newsletters

GET /api/newsletters

Retrieve all configured newsletters.

Response 200

[
  {
    "url": "https://example.substack.com",
    "name": "Example Newsletter",
    "last_checked_at": "2025-01-15T13:00:00.000Z"
  }
]

Stats

GET /api/stats

Retrieve aggregate statistics about scraped content.

Response 200

{
  "totalArticles": 42,
  "totalNewsletters": 3,
  "newsletters": [
    {
      "url": "https://example.substack.com",
      "name": "Example Newsletter",
      "last_checked_at": "2025-01-15T13:00:00.000Z",
      "articleCount": 42
    }
  ]
}

Health Check

GET /health

Simple health check endpoint. Not rate-limited.

Response 200

{
  "status": "ok",
  "uptime": 12345.678
}

WebSocket — Connecting

WS ws://localhost:3000/ws

Connect via a standard WebSocket upgrade request. Use wss:// if running behind TLS.

Example (JavaScript)

const ws = new WebSocket("ws://localhost:3000/ws");

ws.onopen = () => {
  console.log("Connected");
};

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);
  console.log(msg.type, msg);
};

ws.onclose = () => {
  // Reconnect after a delay
  setTimeout(() => connectWs(), 3000);
};

The server is send-only — client-to-server messages are ignored.

WebSocket — Message Types

All messages are JSON-encoded with a type field.

new_article

Sent when a new article has been scraped and stored.

{
  "type": "new_article",
  "article": {
    "id": 1,
    "newsletter": "https://example.substack.com",
    "slug": "my-first-post",
    "title": "My First Post",
    "subtitle": "A subtitle",
    "canonical_url": "https://example.substack.com/p/my-first-post",
    "post_date": "2025-01-15T12:00:00.000Z",
    "content_html": "<p>...</p>",
    "content_markdown": "# My First Post\n...",
    "scraped_at": "2025-01-15T13:00:00.000Z",
    "notified": 1
  }
}
scrape_started

Sent when the scraper begins checking a newsletter for new posts.

{
  "type": "scrape_started",
  "newsletter": "https://example.substack.com"
}
scrape_complete

Sent when the scraper finishes checking a newsletter.

{
  "type": "scrape_complete",
  "newsletter": "https://example.substack.com",
  "newCount": 3
}