API Documentation
REST API and real-time WebSocket interface for Substacks.
Introduction
All REST endpoints return JSON. The API is read-only — the scraper populates data automatically based on the configured newsletters.
Real-time updates (new articles, scrape events) are pushed over a WebSocket connection.
Rate Limiting
All endpoints are rate-limited per IP address.
| Header | Description |
|---|---|
X-RateLimit-Limit |
Maximum requests per window (default: 100) |
X-RateLimit-Remaining |
Requests remaining in the current window |
X-RateLimit-Reset |
Seconds until the window resets |
Retry-After |
Seconds to wait (only on 429 responses) |
When the limit is exceeded, the server responds with:
{
"error": "Too many requests"
}
429 — wait for Retry-After seconds before retrying.
List Articles
Retrieve a paginated list of scraped articles, optionally filtered by newsletter.
| Query Param | Type | Description |
|---|---|---|
| limit | integer | Max articles to return. Default: 100 |
| offset | integer | Number of articles to skip. Default: 0 |
| newsletter | string | Filter by newsletter URL. Optional |
Example
GET /api/articles?limit=10&offset=0&newsletter=https://example.substack.com
Response 200
[
{
"id": 1,
"newsletter": "https://example.substack.com",
"slug": "my-first-post",
"title": "My First Post",
"subtitle": "A subtitle",
"canonical_url": "https://example.substack.com/p/my-first-post",
"post_date": "2025-01-15T12:00:00.000Z",
"content_html": "<p>...</p>",
"content_markdown": "# My First Post\n...",
"scraped_at": "2025-01-15T13:00:00.000Z",
"notified": 1
}
]
Get Article
Retrieve a single article by its newsletter URL and slug.
| Path Param | Type | Description |
|---|---|---|
| newsletter | string | URL-encoded newsletter URL |
| slug | string | URL-encoded article slug |
Example
GET /api/articles/https%3A%2F%2Fexample.substack.com/my-first-post
Response 200
{
"id": 1,
"newsletter": "https://example.substack.com",
"slug": "my-first-post",
"title": "My First Post",
...
}
Not found 404
{ "error": "Article not found" }
List Newsletters
Retrieve all configured newsletters.
Response 200
[
{
"url": "https://example.substack.com",
"name": "Example Newsletter",
"last_checked_at": "2025-01-15T13:00:00.000Z"
}
]
Stats
Retrieve aggregate statistics about scraped content.
Response 200
{
"totalArticles": 42,
"totalNewsletters": 3,
"newsletters": [
{
"url": "https://example.substack.com",
"name": "Example Newsletter",
"last_checked_at": "2025-01-15T13:00:00.000Z",
"articleCount": 42
}
]
}
Health Check
Simple health check endpoint. Not rate-limited.
Response 200
{
"status": "ok",
"uptime": 12345.678
}
WebSocket — Connecting
Connect via a standard WebSocket upgrade request. Use wss:// if running behind TLS.
Example (JavaScript)
const ws = new WebSocket("ws://localhost:3000/ws");
ws.onopen = () => {
console.log("Connected");
};
ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
console.log(msg.type, msg);
};
ws.onclose = () => {
// Reconnect after a delay
setTimeout(() => connectWs(), 3000);
};
The server is send-only — client-to-server messages are ignored.
WebSocket — Message Types
All messages are JSON-encoded with a type field.
new_articleSent when a new article has been scraped and stored.
{
"type": "new_article",
"article": {
"id": 1,
"newsletter": "https://example.substack.com",
"slug": "my-first-post",
"title": "My First Post",
"subtitle": "A subtitle",
"canonical_url": "https://example.substack.com/p/my-first-post",
"post_date": "2025-01-15T12:00:00.000Z",
"content_html": "<p>...</p>",
"content_markdown": "# My First Post\n...",
"scraped_at": "2025-01-15T13:00:00.000Z",
"notified": 1
}
}
scrape_startedSent when the scraper begins checking a newsletter for new posts.
{
"type": "scrape_started",
"newsletter": "https://example.substack.com"
}
scrape_completeSent when the scraper finishes checking a newsletter.
{
"type": "scrape_complete",
"newsletter": "https://example.substack.com",
"newCount": 3
}