The GitHub connector synchronizes files from a GitHub repository branch into your knowledge base. It uses the Git tree API for efficient change detection without downloading file contents during listing.
Authentication
API Key — Personal access token (classic or fine-grained). Requires:
- Repository
Contents: Readpermission
Configuration
| Field | Type | Required | Description |
|---|---|---|---|
| Repository | Input | Yes | Owner/repo (e.g., facebook/react) |
| Branch | Input | No | Branch name (default: main) |
| Path Filter | Input | No | Only sync files in this path (e.g., docs/) |
| File Extensions | Input | No | Comma-separated extensions (e.g., .md,.txt,.mdx) |
| Max Files | Input | No | Limit the number of synced files |
How It Works
- Tree Fetch — Fetches the full recursive Git tree for the branch in one API call
- Filtering — Filters by path prefix and file extensions
- Stub Creation — Creates lightweight stubs using Git blob SHA as content hash
- Content Deferred — File contents are fetched only for new/changed files via
getDocument - Pagination — Offset-based batching (30 files per page) over the filtered tree
Content Hash Strategy
Uses Git blob SHA prefixed with git-sha: — this enables change detection without downloading file contents, which is critical for large repositories.
For repositories with 50K+ files, the entire filtered tree is cached in syncContext to avoid repeated API calls.
Metadata Tags
| Tag | Type | Description |
|---|---|---|
| File Path | Text | Full repository path |
| Repository | Text | Owner/repo name |
| Branch | Text | Git branch |
| File Size | Number | File size in bytes |
| Last Modified | Date | Last commit date from HTTP header |
Limitations
- Files larger than GitHub's content API limit are skipped
- Binary files are not supported
- Rate limiting: 5000 requests/hour for authenticated users

