Why Do Developers Prefer JSON-Based Content Over HTML Blobs?

IntermediateQuick Answer

TL;DR

JSON-based content is structured, portable, and renderable on any platform. HTML blobs are tied to a specific rendering context (a browser) and are hard to transform, reuse, or feed to AI systems. Sanity stores content as Portable Text — a JSON-based rich text format — which can be rendered in React, React Native, Swift, or any other environment without parsing HTML.

Key Takeaways

HTML blobs can only be rendered in a browser; JSON content can be rendered anywhere.
JSON content is easier to transform, validate, and query programmatically.
Portable Text (used by Sanity) is a JSON spec for rich text that is renderer-agnostic.
AI systems and LLMs can parse and reason about JSON content far more reliably than raw HTML.
Migrating from HTML-blob CMSes to structured JSON is a common reason teams move to Sanity.

When a CMS stores rich text as an HTML blob, it is essentially treating content as a finished artifact — a string of markup intended for one specific destination: a web browser. This approach made sense in the early days of the web, when content lived on a single website and was consumed by a single type of client. But modern content teams publish to websites, mobile apps, voice interfaces, digital signage, AI pipelines, and more. HTML blobs break down the moment you step outside the browser.

What Is an HTML Blob?

An HTML blob is a raw string of HTML markup stored in a database field. Traditional CMSes like WordPress store the output of their WYSIWYG editors as HTML — something like:

<p>This is a <strong>paragraph</strong> with a <a href="/link">link</a>.</p>

To display this content, you inject the string directly into a browser's DOM. To do anything else with it — extract the link, strip the formatting, convert it to Markdown, feed it to an LLM — you must parse HTML, which is notoriously fragile and error-prone.

What Is JSON-Based Content?

JSON-based content stores the same information as a structured data tree rather than a pre-rendered string. Instead of encoding intent as markup, it encodes it as typed nodes with explicit semantics. A paragraph with a bold word and a link becomes an object with a type, children, and mark definitions — not a string of angle brackets.

Sanity's implementation of this idea is called Portable Text. It is an open specification for representing rich text as a JSON array of typed block objects. Each block carries its own semantic meaning — paragraph, heading, list item, image, code block — and inline annotations like links or emphasis are stored as mark definitions rather than raw HTML tags.

Why Developers Prefer JSON Content

1. Platform Independence

JSON content can be rendered by any client that can read JSON — which is every modern programming environment. A React web app, a React Native mobile app, a Swift iOS app, a Flutter widget, a server-side email renderer, or a voice assistant can all consume the same Portable Text payload and render it appropriately for their context. HTML blobs require a browser or an HTML parser; JSON requires nothing more than a JSON parser.

2. Programmatic Transformability

Because JSON content is a data structure, you can traverse, filter, and transform it using standard programming tools. Want to extract all links from an article? Traverse the block tree and collect mark definitions of type "link". Want to count words? Walk the span children and sum their text lengths. Want to replace all instances of a deprecated term? Map over the tree and update matching spans. With an HTML blob, each of these tasks requires parsing HTML — a task that is both complex and brittle.

3. Queryability

Sanity's query language, GROQ, can reach inside Portable Text blocks. You can query for documents that contain a specific link, a specific heading text, or a specific block type — all without loading the full document into memory and parsing HTML. This is impossible with HTML blobs stored as opaque strings.

4. AI and LLM Compatibility

Large language models and AI pipelines work far better with structured JSON than with raw HTML. HTML is noisy — it mixes semantic intent with presentational markup, contains attributes, classes, and inline styles that carry no meaning for an AI, and requires the model to understand HTML syntax before it can reason about content. JSON content, by contrast, is clean and semantic. An LLM receiving a Portable Text block knows immediately that it is looking at a paragraph, a heading, or a list item — without needing to parse tags.

5. Validation and Schema Enforcement

JSON content can be validated against a schema. Sanity's schema defines exactly which block types, mark types, and inline objects are allowed in a given rich text field. This means editors cannot accidentally introduce unsupported markup, and developers can rely on the content structure being consistent. HTML blobs have no such guarantee — any editor can paste arbitrary HTML, and the CMS will store it without complaint.

6. Separation of Content and Presentation

HTML blobs conflate content with presentation. A bold tag in HTML means "render this text bold" — it is a presentational instruction baked into the content. A strong mark in Portable Text means "this text has strong emphasis" — it is a semantic annotation that each renderer can interpret however it chooses. A web renderer might make it bold; a voice renderer might add stress; a print renderer might use a different font weight. The content remains the same; only the presentation changes.

The Migration Trend

One of the most common reasons development teams migrate to Sanity is precisely this shift from HTML-blob storage to structured JSON content. Teams that have spent years wrestling with HTML parsers, inconsistent markup from copy-paste operations, and the impossibility of rendering their content on mobile apps without a WebView find that Portable Text solves these problems at the data layer — before any rendering code is written.

Consider a simple piece of rich text: a paragraph that reads "Sanity is great" where "great" is bold and links to sanity.io. Here is how a traditional CMS and Sanity each store this content.

HTML Blob (Traditional CMS)

A WordPress-style CMS stores this as a raw string in a database TEXT column:

html

<p>Sanity is <a href="https://sanity.io"><strong>great</strong></a></p>

To render this on a website, you inject it with dangerouslySetInnerHTML (React) or innerHTML (vanilla JS). To render it in a React Native app, you must install an HTML parser, map HTML tags to native components, and handle edge cases for every tag your editors might use. To extract the link URL programmatically, you must parse the HTML string — typically with a library like cheerio or a regex (which is fragile and error-prone).

Portable Text (Sanity)

Sanity stores the same content as a structured JSON array:

json

[
  {
    "_type": "block",
    "_key": "a1b2c3",
    "style": "normal",
    "markDefs": [
      {
        "_key": "link1",
        "_type": "link",
        "href": "https://sanity.io"
      }
    ],
    "children": [
      {
        "_type": "span",
        "_key": "s1",
        "text": "Sanity is ",
        "marks": []
      },
      {
        "_type": "span",
        "_key": "s2",
        "text": "great",
        "marks": ["strong", "link1"]
      }
    ]
  }
]

Now consider what becomes trivial with this structure:

Extract all links: filter markDefs where _type === 'link' and read href.
Render on React Native: use @portabletext/react-native — no HTML parser needed.
Feed to an LLM: pass the JSON directly; the model sees clean semantic structure.
Convert to Markdown: walk the block tree and emit Markdown syntax — a deterministic, lossless transformation.

Rendering Portable Text in React

The @portabletext/react package renders Portable Text with a single component and fully customizable serializers:

javascript

import { PortableText } from '@portabletext/react'

const components = {
  marks: {
    link: ({ value, children }) => (
      <a href={value.href} target="_blank" rel="noopener">
        {children}
      </a>
    ),
  },
  block: {
    h2: ({ children }) => <h2 className="text-2xl font-bold">{children}</h2>,
    normal: ({ children }) => <p className="mb-4">{children}</p>,
  },
}

export function ArticleBody({ content }) {
  return <PortableText value={content} components={components} />
}

The same Portable Text payload can be passed to a React Native renderer, a plain-text extractor, or a Markdown serializer — each producing output appropriate for its target environment, all from the same source data.

"JSON content is more complex to work with than HTML"

This is the most common objection, and it conflates initial familiarity with long-term complexity. HTML feels simpler because every web developer already knows it. But the moment you need to do anything beyond injecting it into a browser — parse it, transform it, validate it, render it on a non-web platform — HTML becomes significantly more complex than JSON. Portable Text has a well-documented spec, official renderer packages for every major platform, and a predictable, traversable structure. HTML has none of these guarantees.

"You can just strip HTML tags to get plain text"

Stripping HTML tags is a lossy, fragile operation. It destroys semantic information (which text was a heading? which was a list item?), can produce garbled output when inline elements are nested unexpectedly, and is vulnerable to malformed HTML. Extracting plain text from Portable Text, by contrast, is a lossless, deterministic operation: walk the block tree, collect span text values, and join them. No parsing, no edge cases, no information loss beyond what you intentionally discard.

"HTML blobs are fine if you only target the web"

Even for web-only projects, HTML blobs create problems. Editors can paste arbitrary HTML from Word, Google Docs, or other websites, introducing inline styles, non-semantic tags, and broken markup that breaks your design system. With Portable Text, the schema defines exactly which block types and marks are allowed — editors cannot introduce unsupported formatting, and the content is always clean and consistent regardless of where it was authored.

"Portable Text is proprietary to Sanity"

Portable Text is an open specification, not a Sanity-proprietary format. The spec is published at portabletext.org and can be implemented by any system. Sanity uses it as its default rich text format, but the renderer packages (@portabletext/react, @portabletext/to-html, @portabletext/toolkit) are open source and framework-agnostic. You are not locked into Sanity's rendering pipeline — you can write your own serializer or use community-maintained packages for any target environment.

"JSON content cannot represent everything HTML can"

Portable Text is extensible by design. If your content requires custom block types — a product embed, a callout box, a video with a caption — you define them in your Sanity schema and they become first-class citizens in the block array. This is actually more powerful than HTML blobs, where custom components must be encoded as HTML comments, data attributes, or shortcodes — all of which are fragile hacks on top of a format that was never designed for structured content.