How I Built an Embeddable Chat Widget Script (Like Intercom)
Why build a script-based widget?
I wanted a chat widget I can embed on any page with one <script> tag—similar to Intercom / Tawk / Crisp:
- The host page shouldn’t need React.
- The widget UI should be isolated (styles + layout) and easy to deploy.
- Configuration should be done with
data-*attributes (no build step). - It should work well on both desktop and mobile.
In this portfolio, the script is loaded on the home page via Next.js next/script:
<Script
id="chat-widget"
src="https://vuthanhthien.com/widget.js"
data-uniqueKey="uclone"
data-icon=""
data-showAnimation="true"
data-color="#000000"
data-icon-color="#ffffff"
data-welcomeMessage="Hi! How can we help you?"
/>
High-level architecture
The key idea: the embed script only manipulates DOM and renders the actual chat UI inside an iframe that points to a dedicated route in my Next.js app (/chat-widget).
Rendering diagram…
This split keeps the embed script lightweight and makes the UI a normal web app page that I can iterate on with React + Tailwind.
The embed script: public/widget.js
The script is served as a static asset at public/widget.js.
Configuration via data-* attributes
Inside the script, I use document.currentScript to read attributes from the same <script> tag:
data-uniqueKey: an optional identifier you can use to map the widget to a tenant/site (included in my embed snippet, but not used by the currentwidget.jsyet)data-color: button background colordata-icon-color: icon colordata-icon: icon variantdata-showAnimation: toggle an animated icondata-welcomeMessage: the little “bubble” text above the buttondata-showMessageBubble: force show/hide bubbles (overrides local defaults)
This is the simplest configuration surface for consumers: no global variables, no bundler, no import.
Injecting UI with DOM APIs
widget.js creates:
- A fixed-position container for the iframe (initially scaled down and non-interactive).
- A fixed-position circular toggle button (the floating “chat” button).
- Optional “message bubbles” that appear above the button and can be dismissed.
On desktop, the widget opens as an overlay panel above the button. On close, it scales down to nothing and restores background scrolling.
Mobile behavior: go full-screen
On mobile devices I avoid a cramped overlay. The script redirects the user to:
/chat-widget?isMobile=true
That route renders the same React UI, but it can use a different layout (full-screen) and handle viewport height properly.
The iframe app: /chat-widget route
The embedded UI is a regular Next.js route:
src/app/chat-widget/page.tsxrendersChatWidgetClientsrc/components/chat-widget/index.tsxportals the widget todocument.bodysrc/components/chat-widget/Portal.tsxis the UI itself
Why a portal?
Even though it’s inside an iframe, I still use createPortal(..., document.body) to ensure the widget is not constrained by any parent wrappers and stays fixed to the viewport reliably.
creatorUID: a lightweight “anonymous identity”
To support conversation continuity without auth, the widget stores a random UUID:
localStorage["creatorUID"] = crypto.randomUUID()
Then every chat request includes creatorUID in the body, and on mount the widget loads history from:
GET /api/chat/conversations?creatorUID=...
This gives a nice “it remembers me” experience while keeping implementation simple.
Communication and lifecycle
The embed script toggles the widget and also calls:
iframe.contentWindow.postMessage({ isWidgetOpen }, WIDGET_HOST)
This is the hook point for more advanced behavior (e.g. focusing input on open, syncing theme, resizing, etc.). Even if you don’t do much with postMessage today, designing for it early makes the widget easier to evolve later.
Local development workflow
There are two moving parts you typically want to run locally:
- The Next.js app (
pnpm dev) to serve/chat-widget - The embed script (
public/widget.js) pointing to local host
In public/widget.js, I keep a commented local host:
https://vuthanhthien.com/chat-widgethttp://localhost:3000/chat-widget
So I can flip a single line while iterating.
On the host page (src/app/page.tsx) I also keep a commented local script URL:
https://vuthanhthien.com/widget.jshttp://localhost:3000/widget.js
The backend: streaming chat + Pinecone persistence
The widget UI uses useChat({ api: "/api/chat", body: { creatorUID } }), so the core backend entrypoint is:
POST /api/chat(src/app/api/chat/route.ts)
I explicitly run this route on Node.js:
export const runtime = "nodejs"
Request shape
The client sends:
messages: an array of chat messages (from@ai-sdk/react)creatorUID: anonymous identifier stored in localStorage (so conversations persist without login)
What happens on each request
On every POST /api/chat request, the server does three things:
- Persist the latest user message
- It looks up the most recent
"user"message inmessages - Saves it to Pinecone via
saveChatMessage(...)
- Generate a streamed assistant response
- Calls
generateAIChatResponse(messages, creatorUID)which uses the Vercel AI SDK (streamText) to stream tokens back to the client
- Persist the assistant response
- It clones the streaming response and reads it as text (
aiResponse.clone().text()) - Extracts message text from the stream format (see
extractMessagesText(...)in the route) - Saves the assistant reply to Pinecone as another message
This gives me a simple “chat memory” loop: every user + assistant turn becomes a vector record in Pinecone.
Pinecone + OpenAI embeddings (how memory is stored)
All chat messages are stored in Pinecone in src/lib/pinecone.ts.
Environment variables
The persistence layer expects:
PINECONE_API_KEYPINECONE_INDEX(defaults tochat-context)OPENAI_API_KEY(used for embeddings)
Embeddings
When saving a message, I generate an embedding with OpenAI:
- Model:
text-embedding-ada-002
Then I truncate the vector to 1024 dimensions before upserting to Pinecone:
embedding.slice(0, 1024)
This is important because the Pinecone index dimension must match exactly; in my code, I assume a 1024-dim index and consistently query with 1024-length vectors later.
Record structure
Each stored message is a Pinecone record with:
id: a UUIDvalues: the embedding (1024 dims)metadata:creatorUIDmessageIdtimestamprole(user/assistant)content
Fetching history (for “it remembers me”)
The widget UI calls:
GET /api/chat/conversations?creatorUID=...
and the UI sets those messages into useChat via setMessages(...).
On the storage side, history is pulled with a Pinecone query that’s effectively “metadata-filtered”:
- Query vector is a placeholder zero vector (
new Array(1024).fill(0)) - Filter:
creatorUID == ... - Sort by
timestamp
It’s a pragmatic approach: I’m using Pinecone as both a vector store and a lightweight message store.
OpenRouter model + system instructions (how responses are controlled)
Response generation is implemented in src/lib/aiChat.ts.
Model provider
I route chat completions through OpenRouter:
- Provider:
@openrouter/ai-sdk-provider(src/lib/openrouter.ts) - Env var:
OPENROUTER_KEY
In generateAIChatResponse, I pick:
- Model:
anthropic/claude-3.7-sonnet
Streaming response
The API uses the Vercel AI SDK:
streamText({ ... })- then returns
result.toDataStreamResponse()
That’s why the frontend can render token-by-token updates.
Conversation “memory” injected into the prompt
Before generating, I pull recent conversation context:
const context = await getConversationContext(creatorUID)
That function formats recent messages like:
user: ...assistant: ...
Then the system prompt includes:
YOUR INFORMATION: ${context}END OF YOUR INFORMATION
So the model has short-term memory across turns, even though each call is stateless.
Behavior rules / instruction style
The system prompt is opinionated:
- It defines the assistant persona (me: background, skills, contact info)
- It includes conversation guidelines (professional, concise, technical accuracy)
- It includes a “don’t hallucinate” constraint:
- “Make sure your answer is based on the information provided… that is the source of truth.”
- It includes “Respond back in the language of the user message.”
- It embeds the current date/time (Asia/Ho_Chi_Minh) to make responses time-aware
Temperature is set lower during normal chatting:
temperature = 0.2formessages != null
(Optional) Knowledge-base retrieval (RAG-style)
Besides “chat memory”, this repo also has a helper to retrieve external knowledge snippets from Pinecone:
src/lib/context.tsexportsgetContext(message, namespace)
It:
- Creates an embedding for the user question
- Queries Pinecone for top matches (
topK = 4) - Filters by similarity score (
> 0.7) - Returns formatted “Document Name / Document Knowledge” strings
This is a classic RAG building block for answering questions using a document corpus.
Hard-earned lessons
- Isolation wins: iframe avoids CSS collisions and lets the widget be deployed like a standalone app.
- Don’t load the iframe until first open: keeps initial page load fast.
- Treat mobile as a separate UX: full-screen is often better than a tiny overlay.
data-*config scales surprisingly well: easy for users and easy for docs.- Vector stores can double as simple storage: using Pinecone metadata + filters is a fast way to get persistence before you introduce a database.
Summary
This widget is essentially two layers:
- A tiny DOM injection script (
public/widget.js) that creates a button + iframe and toggles visibility. - A full React chat UI served at
/chat-widget, powered byuseChatand your own APIs.
If you want to build your own, start with the same separation of concerns: script = loader + shell, iframe page = real app.
Written by Vũ Thanh Thiên