How I Built an Embeddable Chat Widget Script (Like Intercom)

Why build a script-based widget?

I wanted a chat widget I can embed on any page with one <script> tag—similar to Intercom / Tawk / Crisp:

The host page shouldn’t need React.
The widget UI should be isolated (styles + layout) and easy to deploy.
Configuration should be done with data-* attributes (no build step).
It should work well on both desktop and mobile.

In this portfolio, the script is loaded on the home page via Next.js next/script:

<Script
  id="chat-widget"
  src="https://vuthanhthien.com/widget.js"
  data-uniqueKey="uclone"
  data-icon=""
  data-showAnimation="true"
  data-color="#000000"
  data-icon-color="#ffffff"
  data-welcomeMessage="Hi! How can we help you?"
/>

High-level architecture

The key idea: the embed script only manipulates DOM and renders the actual chat UI inside an iframe that points to a dedicated route in my Next.js app (/chat-widget).

Rendering diagram…

This split keeps the embed script lightweight and makes the UI a normal web app page that I can iterate on with React + Tailwind.

The embed script: `public/widget.js`

The script is served as a static asset at public/widget.js.

Configuration via `data-*` attributes

Inside the script, I use document.currentScript to read attributes from the same <script> tag:

data-uniqueKey: an optional identifier you can use to map the widget to a tenant/site (included in my embed snippet, but not used by the current widget.js yet)
data-color: button background color
data-icon-color: icon color
data-icon: icon variant
data-showAnimation: toggle an animated icon
data-welcomeMessage: the little “bubble” text above the button
data-showMessageBubble: force show/hide bubbles (overrides local defaults)

This is the simplest configuration surface for consumers: no global variables, no bundler, no import.

Injecting UI with DOM APIs

widget.js creates:

A fixed-position container for the iframe (initially scaled down and non-interactive).
A fixed-position circular toggle button (the floating “chat” button).
Optional “message bubbles” that appear above the button and can be dismissed.

On desktop, the widget opens as an overlay panel above the button. On close, it scales down to nothing and restores background scrolling.

Mobile behavior: go full-screen

On mobile devices I avoid a cramped overlay. The script redirects the user to:

/chat-widget?isMobile=true

That route renders the same React UI, but it can use a different layout (full-screen) and handle viewport height properly.

The iframe app: `/chat-widget` route

The embedded UI is a regular Next.js route:

src/app/chat-widget/page.tsx renders ChatWidgetClient
src/components/chat-widget/index.tsx portals the widget to document.body
src/components/chat-widget/Portal.tsx is the UI itself

Why a portal?

Even though it’s inside an iframe, I still use createPortal(..., document.body) to ensure the widget is not constrained by any parent wrappers and stays fixed to the viewport reliably.

`creatorUID`: a lightweight “anonymous identity”

To support conversation continuity without auth, the widget stores a random UUID:

localStorage["creatorUID"] = crypto.randomUUID()

Then every chat request includes creatorUID in the body, and on mount the widget loads history from:

GET /api/chat/conversations?creatorUID=...

This gives a nice “it remembers me” experience while keeping implementation simple.

Communication and lifecycle

The embed script toggles the widget and also calls:

iframe.contentWindow.postMessage({ isWidgetOpen }, WIDGET_HOST)

This is the hook point for more advanced behavior (e.g. focusing input on open, syncing theme, resizing, etc.). Even if you don’t do much with postMessage today, designing for it early makes the widget easier to evolve later.

Local development workflow

There are two moving parts you typically want to run locally:

The Next.js app (pnpm dev) to serve /chat-widget
The embed script (public/widget.js) pointing to local host

In public/widget.js, I keep a commented local host:

https://vuthanhthien.com/chat-widget
http://localhost:3000/chat-widget

So I can flip a single line while iterating.

On the host page (src/app/page.tsx) I also keep a commented local script URL:

https://vuthanhthien.com/widget.js
http://localhost:3000/widget.js

The backend: streaming chat + Pinecone persistence

The widget UI uses useChat({ api: "/api/chat", body: { creatorUID } }), so the core backend entrypoint is:

POST /api/chat (src/app/api/chat/route.ts)

I explicitly run this route on Node.js:

export const runtime = "nodejs"

Request shape

The client sends:

messages: an array of chat messages (from @ai-sdk/react)
creatorUID: anonymous identifier stored in localStorage (so conversations persist without login)

What happens on each request

On every POST /api/chat request, the server does three things:

Persist the latest user message

It looks up the most recent "user" message in messages
Saves it to Pinecone via saveChatMessage(...)

Generate a streamed assistant response

Calls generateAIChatResponse(messages, creatorUID) which uses the Vercel AI SDK (streamText) to stream tokens back to the client

Persist the assistant response

It clones the streaming response and reads it as text (aiResponse.clone().text())
Extracts message text from the stream format (see extractMessagesText(...) in the route)
Saves the assistant reply to Pinecone as another message

This gives me a simple “chat memory” loop: every user + assistant turn becomes a vector record in Pinecone.

Pinecone + OpenAI embeddings (how memory is stored)

All chat messages are stored in Pinecone in src/lib/pinecone.ts.

Environment variables

The persistence layer expects:

PINECONE_API_KEY
PINECONE_INDEX (defaults to chat-context)
OPENAI_API_KEY (used for embeddings)

Embeddings

When saving a message, I generate an embedding with OpenAI:

Model: text-embedding-ada-002

Then I truncate the vector to 1024 dimensions before upserting to Pinecone:

embedding.slice(0, 1024)

This is important because the Pinecone index dimension must match exactly; in my code, I assume a 1024-dim index and consistently query with 1024-length vectors later.

Record structure

Each stored message is a Pinecone record with:

id: a UUID
values: the embedding (1024 dims)
metadata:
- creatorUID
- messageId
- timestamp
- role (user / assistant)
- content

Fetching history (for “it remembers me”)

The widget UI calls:

GET /api/chat/conversations?creatorUID=...

and the UI sets those messages into useChat via setMessages(...).

On the storage side, history is pulled with a Pinecone query that’s effectively “metadata-filtered”:

Query vector is a placeholder zero vector (new Array(1024).fill(0))
Filter: creatorUID == ...
Sort by timestamp

It’s a pragmatic approach: I’m using Pinecone as both a vector store and a lightweight message store.

OpenRouter model + system instructions (how responses are controlled)

Response generation is implemented in src/lib/aiChat.ts.

Model provider

I route chat completions through OpenRouter:

Provider: @openrouter/ai-sdk-provider (src/lib/openrouter.ts)
Env var: OPENROUTER_KEY

In generateAIChatResponse, I pick:

Model: anthropic/claude-3.7-sonnet

Streaming response

The API uses the Vercel AI SDK:

streamText({ ... })
then returns result.toDataStreamResponse()

That’s why the frontend can render token-by-token updates.

Conversation “memory” injected into the prompt

Before generating, I pull recent conversation context:

const context = await getConversationContext(creatorUID)

That function formats recent messages like:

user: ...
assistant: ...

Then the system prompt includes:

YOUR INFORMATION: ${context}
END OF YOUR INFORMATION

So the model has short-term memory across turns, even though each call is stateless.

Behavior rules / instruction style

The system prompt is opinionated:

It defines the assistant persona (me: background, skills, contact info)
It includes conversation guidelines (professional, concise, technical accuracy)
It includes a “don’t hallucinate” constraint:
- “Make sure your answer is based on the information provided… that is the source of truth.”
It includes “Respond back in the language of the user message.”
It embeds the current date/time (Asia/Ho_Chi_Minh) to make responses time-aware

Temperature is set lower during normal chatting:

temperature = 0.2 for messages != null

(Optional) Knowledge-base retrieval (RAG-style)

Besides “chat memory”, this repo also has a helper to retrieve external knowledge snippets from Pinecone:

src/lib/context.ts exports getContext(message, namespace)

It:

Creates an embedding for the user question
Queries Pinecone for top matches (topK = 4)
Filters by similarity score (> 0.7)
Returns formatted “Document Name / Document Knowledge” strings

This is a classic RAG building block for answering questions using a document corpus.

Hard-earned lessons

Isolation wins: iframe avoids CSS collisions and lets the widget be deployed like a standalone app.
Don’t load the iframe until first open: keeps initial page load fast.
Treat mobile as a separate UX: full-screen is often better than a tiny overlay.
data-* config scales surprisingly well: easy for users and easy for docs.
Vector stores can double as simple storage: using Pinecone metadata + filters is a fast way to get persistence before you introduce a database.

Summary

This widget is essentially two layers:

A tiny DOM injection script (public/widget.js) that creates a button + iframe and toggles visibility.
A full React chat UI served at /chat-widget, powered by useChat and your own APIs.

If you want to build your own, start with the same separation of concerns: script = loader + shell, iframe page = real app.

Why build a script-based widget?

I wanted a chat widget I can embed on any page with one <script> tag—similar to Intercom / Tawk / Crisp:

The host page shouldn’t need React.
The widget UI should be isolated (styles + layout) and easy to deploy.
Configuration should be done with data-* attributes (no build step).
It should work well on both desktop and mobile.

In this portfolio, the script is loaded on the home page via Next.js next/script:

<Script
  id="chat-widget"
  src="https://vuthanhthien.com/widget.js"
  data-uniqueKey="uclone"
  data-icon=""
  data-showAnimation="true"
  data-color="#000000"
  data-icon-color="#ffffff"
  data-welcomeMessage="Hi! How can we help you?"
/>

High-level architecture

The key idea: the embed script only manipulates DOM and renders the actual chat UI inside an iframe that points to a dedicated route in my Next.js app (/chat-widget).

Rendering diagram…

This split keeps the embed script lightweight and makes the UI a normal web app page that I can iterate on with React + Tailwind.

The embed script: `public/widget.js`

The script is served as a static asset at public/widget.js.

Configuration via `data-*` attributes

Inside the script, I use document.currentScript to read attributes from the same <script> tag:

data-uniqueKey: an optional identifier you can use to map the widget to a tenant/site (included in my embed snippet, but not used by the current widget.js yet)
data-color: button background color
data-icon-color: icon color
data-icon: icon variant
data-showAnimation: toggle an animated icon
data-welcomeMessage: the little “bubble” text above the button
data-showMessageBubble: force show/hide bubbles (overrides local defaults)

This is the simplest configuration surface for consumers: no global variables, no bundler, no import.

Injecting UI with DOM APIs

widget.js creates:

A fixed-position container for the iframe (initially scaled down and non-interactive).
A fixed-position circular toggle button (the floating “chat” button).
Optional “message bubbles” that appear above the button and can be dismissed.

On desktop, the widget opens as an overlay panel above the button. On close, it scales down to nothing and restores background scrolling.

Mobile behavior: go full-screen

On mobile devices I avoid a cramped overlay. The script redirects the user to:

/chat-widget?isMobile=true

That route renders the same React UI, but it can use a different layout (full-screen) and handle viewport height properly.

The iframe app: `/chat-widget` route

The embedded UI is a regular Next.js route:

src/app/chat-widget/page.tsx renders ChatWidgetClient
src/components/chat-widget/index.tsx portals the widget to document.body
src/components/chat-widget/Portal.tsx is the UI itself

Why a portal?

Even though it’s inside an iframe, I still use createPortal(..., document.body) to ensure the widget is not constrained by any parent wrappers and stays fixed to the viewport reliably.

`creatorUID`: a lightweight “anonymous identity”

To support conversation continuity without auth, the widget stores a random UUID:

localStorage["creatorUID"] = crypto.randomUUID()

Then every chat request includes creatorUID in the body, and on mount the widget loads history from:

GET /api/chat/conversations?creatorUID=...

This gives a nice “it remembers me” experience while keeping implementation simple.

Communication and lifecycle

The embed script toggles the widget and also calls:

iframe.contentWindow.postMessage({ isWidgetOpen }, WIDGET_HOST)

Local development workflow

There are two moving parts you typically want to run locally:

The Next.js app (pnpm dev) to serve /chat-widget
The embed script (public/widget.js) pointing to local host

In public/widget.js, I keep a commented local host:

https://vuthanhthien.com/chat-widget
http://localhost:3000/chat-widget

So I can flip a single line while iterating.

On the host page (src/app/page.tsx) I also keep a commented local script URL:

https://vuthanhthien.com/widget.js
http://localhost:3000/widget.js

The backend: streaming chat + Pinecone persistence

The widget UI uses useChat({ api: "/api/chat", body: { creatorUID } }), so the core backend entrypoint is:

POST /api/chat (src/app/api/chat/route.ts)

I explicitly run this route on Node.js:

export const runtime = "nodejs"

Request shape

The client sends:

messages: an array of chat messages (from @ai-sdk/react)
creatorUID: anonymous identifier stored in localStorage (so conversations persist without login)

What happens on each request

On every POST /api/chat request, the server does three things:

Persist the latest user message

It looks up the most recent "user" message in messages
Saves it to Pinecone via saveChatMessage(...)

Generate a streamed assistant response

Calls generateAIChatResponse(messages, creatorUID) which uses the Vercel AI SDK (streamText) to stream tokens back to the client

Persist the assistant response

It clones the streaming response and reads it as text (aiResponse.clone().text())
Extracts message text from the stream format (see extractMessagesText(...) in the route)
Saves the assistant reply to Pinecone as another message

This gives me a simple “chat memory” loop: every user + assistant turn becomes a vector record in Pinecone.

Pinecone + OpenAI embeddings (how memory is stored)

All chat messages are stored in Pinecone in src/lib/pinecone.ts.

Environment variables

The persistence layer expects:

PINECONE_API_KEY
PINECONE_INDEX (defaults to chat-context)
OPENAI_API_KEY (used for embeddings)

Embeddings

When saving a message, I generate an embedding with OpenAI:

Model: text-embedding-ada-002

Then I truncate the vector to 1024 dimensions before upserting to Pinecone:

embedding.slice(0, 1024)

This is important because the Pinecone index dimension must match exactly; in my code, I assume a 1024-dim index and consistently query with 1024-length vectors later.

Record structure

Each stored message is a Pinecone record with:

id: a UUID
values: the embedding (1024 dims)
metadata:
- creatorUID
- messageId
- timestamp
- role (user / assistant)
- content

Fetching history (for “it remembers me”)

The widget UI calls:

GET /api/chat/conversations?creatorUID=...

and the UI sets those messages into useChat via setMessages(...).

On the storage side, history is pulled with a Pinecone query that’s effectively “metadata-filtered”:

Query vector is a placeholder zero vector (new Array(1024).fill(0))
Filter: creatorUID == ...
Sort by timestamp

It’s a pragmatic approach: I’m using Pinecone as both a vector store and a lightweight message store.

OpenRouter model + system instructions (how responses are controlled)

Response generation is implemented in src/lib/aiChat.ts.

Model provider

I route chat completions through OpenRouter:

Provider: @openrouter/ai-sdk-provider (src/lib/openrouter.ts)
Env var: OPENROUTER_KEY

In generateAIChatResponse, I pick:

Model: anthropic/claude-3.7-sonnet

Streaming response

The API uses the Vercel AI SDK:

streamText({ ... })
then returns result.toDataStreamResponse()

That’s why the frontend can render token-by-token updates.

Conversation “memory” injected into the prompt

Before generating, I pull recent conversation context:

const context = await getConversationContext(creatorUID)

That function formats recent messages like:

user: ...
assistant: ...

Then the system prompt includes:

YOUR INFORMATION: ${context}
END OF YOUR INFORMATION

So the model has short-term memory across turns, even though each call is stateless.

Behavior rules / instruction style

The system prompt is opinionated:

It defines the assistant persona (me: background, skills, contact info)
It includes conversation guidelines (professional, concise, technical accuracy)
It includes a “don’t hallucinate” constraint:
- “Make sure your answer is based on the information provided… that is the source of truth.”
It includes “Respond back in the language of the user message.”
It embeds the current date/time (Asia/Ho_Chi_Minh) to make responses time-aware

Temperature is set lower during normal chatting:

temperature = 0.2 for messages != null

(Optional) Knowledge-base retrieval (RAG-style)

Besides “chat memory”, this repo also has a helper to retrieve external knowledge snippets from Pinecone:

src/lib/context.ts exports getContext(message, namespace)

It:

Creates an embedding for the user question
Queries Pinecone for top matches (topK = 4)
Filters by similarity score (> 0.7)
Returns formatted “Document Name / Document Knowledge” strings

This is a classic RAG building block for answering questions using a document corpus.

Hard-earned lessons

Isolation wins: iframe avoids CSS collisions and lets the widget be deployed like a standalone app.
Don’t load the iframe until first open: keeps initial page load fast.
Treat mobile as a separate UX: full-screen is often better than a tiny overlay.
data-* config scales surprisingly well: easy for users and easy for docs.
Vector stores can double as simple storage: using Pinecone metadata + filters is a fast way to get persistence before you introduce a database.

Summary

This widget is essentially two layers:

A tiny DOM injection script (public/widget.js) that creates a button + iframe and toggles visibility.
A full React chat UI served at /chat-widget, powered by useChat and your own APIs.

If you want to build your own, start with the same separation of concerns: script = loader + shell, iframe page = real app.

How I Built an Embeddable Chat Widget Script (Like Intercom)

Why build a script-based widget?

High-level architecture

The embed script: public/widget.js

Configuration via data-* attributes

Injecting UI with DOM APIs

Mobile behavior: go full-screen

The iframe app: /chat-widget route

Why a portal?

creatorUID: a lightweight “anonymous identity”

Communication and lifecycle

Local development workflow

The backend: streaming chat + Pinecone persistence

Request shape

What happens on each request

Pinecone + OpenAI embeddings (how memory is stored)

Environment variables

Embeddings

Record structure

Fetching history (for “it remembers me”)

OpenRouter model + system instructions (how responses are controlled)

Model provider

Streaming response

Conversation “memory” injected into the prompt

Behavior rules / instruction style

(Optional) Knowledge-base retrieval (RAG-style)

Hard-earned lessons

Summary

How I Built an Embeddable Chat Widget Script (Like Intercom)

Why build a script-based widget?

High-level architecture

The embed script: public/widget.js

Configuration via data-* attributes

Injecting UI with DOM APIs

Mobile behavior: go full-screen

The iframe app: /chat-widget route

Why a portal?

creatorUID: a lightweight “anonymous identity”

Communication and lifecycle

Local development workflow

The backend: streaming chat + Pinecone persistence

Request shape

What happens on each request

Pinecone + OpenAI embeddings (how memory is stored)

Environment variables

Embeddings

Record structure

Fetching history (for “it remembers me”)

OpenRouter model + system instructions (how responses are controlled)

Model provider

Streaming response

Conversation “memory” injected into the prompt

Behavior rules / instruction style

(Optional) Knowledge-base retrieval (RAG-style)

Hard-earned lessons

Summary

The embed script: `public/widget.js`

Configuration via `data-*` attributes

The iframe app: `/chat-widget` route

`creatorUID`: a lightweight “anonymous identity”

The embed script: `public/widget.js`

Configuration via `data-*` attributes

The iframe app: `/chat-widget` route

`creatorUID`: a lightweight “anonymous identity”