Skip to main content
CMSquestions

What Is a Cross-Dataset Reference in a CMS?

AdvancedQuick Answer

TL;DR

A cross-dataset reference links a document in one Sanity dataset to a document in a different dataset — for example, referencing a shared product catalogue from a brand-specific content dataset. This enables multi-brand or multi-site architectures where some content is shared globally and some is local.

Key Takeaways

  • Cross-dataset references allow documents in one dataset to reference documents in another.
  • Sanity supports cross-dataset references natively, enabling shared content libraries across projects.
  • Common use case: a global product catalogue dataset referenced by multiple regional brand datasets.
  • Cross-dataset references require the referenced dataset to be accessible via API token.
  • This is a key feature for enterprise multi-brand architectures built on Sanity.

A cross-dataset reference is a special reference type in Sanity that allows a document in one dataset to point to a document living in a completely separate dataset — even one belonging to a different Sanity project. Unlike standard references, which resolve within the same dataset, cross-dataset references cross the boundary between isolated content stores.

How Standard References Work

In a typical Sanity setup, a reference field stores the _id of another document within the same dataset. When you query a document, Sanity can resolve that reference and return the referenced document's data in the same response. This works seamlessly because both documents live in the same data store.

What Makes Cross-Dataset References Different

Cross-dataset references extend this concept across dataset boundaries. Instead of storing just a document _id, a cross-dataset reference also stores the project ID and dataset name of the target document. This means the reference carries enough information to locate a document anywhere within the Sanity ecosystem — not just in the current dataset.

The reference object for a cross-dataset reference typically includes three pieces of information:

  • _ref — the document ID in the target dataset
  • _dataset — the name of the dataset where the referenced document lives
  • _projectId — the Sanity project ID (required when referencing across projects)

Access Control and API Tokens

Because the referenced dataset is separate, it has its own access controls. To resolve a cross-dataset reference at query time, the requesting client must have a valid API token with read access to the target dataset. Without this, the reference will return null or an error rather than the referenced document's data. This is an important architectural consideration: you must manage token permissions carefully when designing multi-dataset systems.

Why Use Cross-Dataset References?

The primary motivation is content sharing across organisational or brand boundaries. Large enterprises often maintain a single source of truth for certain content types — product catalogues, legal disclaimers, brand assets, or global navigation — while individual brands or regions maintain their own datasets for localised content. Cross-dataset references allow the local datasets to point to the shared global dataset without duplicating data.

This pattern delivers several concrete benefits:

  • Single source of truth: Update a product record once in the global dataset and all brand sites reflect the change automatically.
  • Separation of concerns: Brand editors only see and manage their own content, while global content is managed centrally.
  • Scalability: New brands or regions can be onboarded as new datasets without restructuring the global content store.

Schema Definition

In your Sanity schema, you define a cross-dataset reference field using the crossDatasetReference type. You specify which document types are valid targets and which dataset (and optionally project) they live in. The Studio then provides a picker UI that queries the remote dataset to let editors browse and select documents from it.

Imagine a global consumer goods company — call it Acme Corp — that owns three regional brands: Acme North America, Acme Europe, and Acme Asia-Pacific. Each brand has its own Sanity dataset for managing localised marketing pages, blog posts, and campaign content. However, all three brands sell the same physical products, and the product data (names, descriptions, specifications, images) is managed centrally by the global product team.

Rather than copying product records into each brand dataset — which would create three copies to keep in sync — Acme Corp maintains a single global-products dataset. Each brand dataset then uses cross-dataset references to point to product documents in global-products.

Schema Example (Brand Dataset)

javascript
// In the brand dataset schema
{
  name: 'campaignPage',
  type: 'document',
  fields: [
    {
      name: 'title',
      type: 'string',
    },
    {
      name: 'featuredProduct',
      type: 'crossDatasetReference',
      to: [{ type: 'product' }],
      dataset: 'global-products',
      projectId: 'acme-global-project-id',
    },
  ],
}

Query Example (GROQ)

When querying from the brand dataset, you can dereference the cross-dataset reference using the standard GROQ dereference operator, provided the API token in use has read access to the global-products dataset:

groq
*[_type == 'campaignPage'] {
  title,
  featuredProduct-> {
    _id,
    name,
    sku,
    description,
    'image': mainImage.asset->url
  }
}

The result merges data from two separate datasets into a single response — the campaign page fields from the brand dataset and the product fields from the global dataset. From the frontend's perspective, the data arrives as a unified object.

Studio Experience

In the Sanity Studio, editors working in the Acme Europe dataset will see a reference picker for the featuredProduct field that searches and displays documents from the global-products dataset. They can browse, search, and select products without ever leaving their brand Studio — the cross-dataset lookup happens transparently in the background.

Misconception 1: Cross-dataset references work like regular references

Standard Sanity references are resolved automatically within a single dataset query. Cross-dataset references are not. They require the querying client to hold a valid API token with read access to the target dataset. If that token is absent or lacks the right permissions, the reference will not resolve. This is a deliberate security boundary, not a bug.

Misconception 2: Cross-dataset references are the same as cross-project references

Cross-dataset references can operate within the same Sanity project (referencing a different dataset in the same project) or across entirely different projects. When crossing project boundaries, you must supply the target projectId in the schema definition. Within the same project, the projectId can often be omitted. Conflating the two leads to misconfigured schemas.

Misconception 3: You can use GROQ joins to replace cross-dataset references

GROQ joins (using the in-operator or references() function) only operate within a single dataset. You cannot write a GROQ query that natively joins data from two separate datasets in one request. Cross-dataset references are the correct mechanism for this use case — they are a first-class feature, not a workaround.

Misconception 4: Cross-dataset references cause data duplication

A cross-dataset reference stores only a pointer (the document ID, dataset name, and project ID) — not a copy of the referenced document. The actual document data remains exclusively in the source dataset. There is no duplication. This is precisely why cross-dataset references are preferable to manually copying documents across datasets.

Misconception 5: Cross-dataset references are only for large enterprises

While multi-brand enterprise architectures are the most common use case, cross-dataset references are useful in any scenario where content needs to be shared across isolated datasets. Examples include a shared media library, a centralised author directory, or a common taxonomy dataset used by multiple independent sites — all of which can benefit from this feature regardless of organisation size.