• handfuloflight 2 hours ago

    One moment you're speaking about context but talking in kilobytes, can you confirm the token savings data?

    And when you say only returns summaries, does this mean there is LLM model calls happening in the sandbox?

    • mksglu an hour ago

      For your second question: No LLM calls. Context Mode uses algorithmic processing — FTS5 indexing with BM25 ranking and Porter stemming. Raw output gets chunked and indexed in a SQLite database inside the sandbox, and only the relevant snippets matching your intent are returned to context. It's purely deterministic text processing, no model inference involved.

      • handfuloflight an hour ago

        Excellent, thank you for your responses. Will be putting it through a test drive.

        • mksglu 35 minutes ago

          Sure, thank you for your comment!

      • mksglu 2 hours ago

        Hey! Thank you for your comment! There are test examples in the README. Could you please try them? Your feedback is valuable.

      • rcarmo 17 minutes ago

        Nice trick. I’m going to see how I can apply it to tool calls in pi.dev as well

        • mksglu 10 minutes ago

          That means a lot, thank you! Would love to hear your feedback once you try it — and an upvote would be much appreciated if you find it useful

        • vicchenai an hour ago

          The BM25+FTS5 approach without LLM calls is the right call - deterministic, no added latency, no extra token spend on compression itself.

          The tradeoff I want to understand better: how does it handle cases where the relevant signal is in the "low-ranked" 310 KB, but you just haven't formed the query that would surface it yet? The compression is necessarily lossy - is there a raw mode fallback for when the summarized context produces unexpected downstream results?

          Also curious about the token count methodology - are you measuring Claude's tokenizer specifically, or a proxy?

          • mksglu an hour ago

            Great questions.

            --

            On lossy compression and the "unsurfaced signal" problem:

            Nothing is thrown away. The full output is indexed into a persistent SQLite FTS5 store — the 310 KB stays in the knowledge base, only the search results enter context. If the first query misses something, you (or the model) can call search(queries: ["different angle", "another term"]) as many times as needed against the same indexed data. The vocabulary of distinctive terms is returned with every intent-search result specifically to help form better follow-up queries.

            The fallback chain: if intent-scoped search returns nothing, it splits the intent into individual words and ranks by match count. If that still misses, batch_execute has a three-tier fallback — source-scoped search → boosted search with section titles → global search across all indexed content.

            There's no explicit "raw mode" toggle, but if you omit the intent parameter, execute returns the full stdout directly (smart-truncated at 60% head / 40% tail if it exceeds the buffer). So the escape hatch is: don't pass intent, get raw output.

            On token counting:

            It's a bytes/4 estimate using Buffer.byteLength() (UTF-8), not an actual tokenizer. Marked as "estimated (~)" in stats output. It's a rough proxy — Claude's tokenizer would give slightly different numbers — but directionally accurate for measuring relative savings. The percentage reduction (e.g., "98%") is measured in bytes, not tokens, comparing raw output size vs. what actually enters the conversation context.

          • sim04ful 2 hours ago

            Looks pretty interesting. How could i use this on other MCP clients e.g OpenCode ?

            • mksglu 2 hours ago

              Hey! Thank you for your comment! You can actually use an MCP on this basis, but I haven't tested it yet. I'll look into it as soon as possible. Your feedback is valuable.

              • nightmunnas an hour ago

                nice, I'd love to se it for codex and opencode

                • mksglu an hour ago

                  Thanks! Context Mode is a standard MCP server, so it works with any client that supports MCP — including Codex and opencode.

                  Codex CLI:

                    codex mcp add context-mode -- npx -y context-mode
                  
                  Or in ~/.codex/config.toml:

                    [mcp_servers.context-mode]
                    command = "npx"
                    args = ["-y", "context-mode"]
                  
                  opencode:

                  In opencode.json:

                    {
                      "mcp": {
                        "context-mode": {
                          "type": "local",
                          "command": ["npx", "-y", "context-mode"],
                          "enabled": true
                        }
                      }
                    }
                  
                  We haven't tested yet — would love to hear if anyone tries it!