GitHub Connector

The GitHub connector synchronizes files from a GitHub repository branch into your knowledge base. It uses the Git tree API for efficient change detection without downloading file contents during listing.

Authentication

API Key — Personal access token (classic or fine-grained). Requires:

Repository Contents: Read permission

Configuration

Field	Type	Required	Description
Repository	Input	Yes	Owner/repo (e.g., `facebook/react`)
Branch	Input	No	Branch name (default: `main`)
Path Filter	Input	No	Only sync files in this path (e.g., `docs/`)
File Extensions	Input	No	Comma-separated extensions (e.g., `.md,.txt,.mdx`)
Max Files	Input	No	Limit the number of synced files

How It Works

Tree Fetch — Fetches the full recursive Git tree for the branch in one API call
Filtering — Filters by path prefix and file extensions
Stub Creation — Creates lightweight stubs using Git blob SHA as content hash
Content Deferred — File contents are fetched only for new/changed files via getDocument
Pagination — Offset-based batching (30 files per page) over the filtered tree

Content Hash Strategy

Uses Git blob SHA prefixed with git-sha: — this enables change detection without downloading file contents, which is critical for large repositories.

For repositories with 50K+ files, the entire filtered tree is cached in syncContext to avoid repeated API calls.

Metadata Tags

Tag	Type	Description
File Path	Text	Full repository path
Repository	Text	Owner/repo name
Branch	Text	Git branch
File Size	Number	File size in bytes
Last Modified	Date	Last commit date from HTTP header

Limitations

Files larger than GitHub's content API limit are skipped
Binary files are not supported
Rate limiting: 5000 requests/hour for authenticated users