Contact

  • Sydney
    NSW, Australia
    hello@korijacobsen.au

Downloads

Resume

Follow Me

Case Study - Gemini Competition Entry

A Chrome sidebar extension built for a Google Gemini competition, enabling page-aware chat, selective element querying, and voice-driven interaction directly within the browser.

Date
Roles
Developer
Tech
Chrome Extension API,
Gemini API,
TypeScript,
IndexedDB,
Web Speech API

Core Capabilities

Page-aware conversational context: When the sidebar is opened on a page, a content script parses the page structure and text content and injects it into the conversation context. This allowed users to ask questions like 'Summarise this page', 'What should I do next?', 'Explain this section in simpler terms', or 'Translate this content'.

The competition explicitly encouraged leveraging Gemini's large context window, and this design focused on making the current page itself the primary source of context, rather than relying on manual copy/paste.

Selective Interaction Mode

Beyond whole-page context, users could enter a selective mode via the sidebar action bar.

In this mode users could highlight specific elements on the page, and select text or image elements and pass them to Gemini. Responses were scoped tightly to the selected element.

This made it possible to ask targeted questions such as clarifying a single paragraph in a dense article, describing an image, or understanding a specific form field or UI element.

The selective mode significantly improved accuracy by reducing irrelevant context and enabled more assistive use cases, such as describing page elements for low-vision users.

Voice Output with Synchronized Highlighting

All Gemini responses could be read aloud using text-to-speech.

Key features: Audio blobs generated per response and stored in IndexedDB, word-by-word transcript highlighting during playback, and persistent audio tied to conversation history.

This was intentionally designed to feel closer to an assistive reading experience rather than a generic 'read aloud' button.

Voice Input for Contextual Queries

Users could also ask questions using voice input: Speech was converted to text, the query was combined with the current page or selected element context, Gemini generated a response scoped to that context, and the response could then be read aloud using TTS.

This created a fully voice-driven loop for interacting with web content, especially useful for accessibility-oriented scenarios.

Technical Implementation

Custom sidebar UI with chat history and action bar, conversation memory and history persisted in IndexedDB, markdown-rendered Gemini outputs, user-supplied API keys (Gemini for LLM, Whisper for speech-to-text), content scripts for DOM extraction and element selection, and background/service worker coordination for request handling.

The entire system was built in a limited timeframe (a few weeks of part-time work), prioritizing core interaction flows over polish.

Constraints & Trade-offs

The project deliberately stopped short of becoming a full product due to Chrome extension constraints.

In particular: Sensitive API keys cannot safely live in extension client code, supporting first-party authentication would require a separate backend service, and proxying requests through an external service would fundamentally change the architecture.

Rather than overengineering around these limitations, I treated the extension as a prototype and capability exploration, not a production deployment.

Reflection

This project reinforced several practical lessons: Browser extensions are powerful but heavily constrained environments, context quality matters more than model complexity, selective scoping dramatically improves AI usefulness, and accessibility-oriented features often emerge naturally from good interaction design.

Google has since released native Gemini functionality in Chrome that overlaps with many of these ideas, validating the direction even if my implementation was intentionally lightweight and time-bound.

More case studies

JobRef

A product for field service businesses to document work transparently using images as the primary source of truth. Designed to make job documentation feel natural rather than like admin.

Read More

Multi-Tenant SaaS Template (Remix, RBAC & Billing)

A reusable SaaS foundation with authentication, role-based access control, billing, and audit logging. Built to handle the boring but critical parts of production SaaS correctly from day one.

Read More

FactDat

An exploratory system for extracting and verifying factual claims from spoken content. It was focused on end-to-end 'claim checking', but the new focus should be on claim atomisation, traceability, and grounded reasoning

Read More