AISENSE AI Data Feed Specification v1.0
Purpose
The AISENSE AI Data Feed format defines a machine-readable JSON structure designed to make text content easily ingestible by AI systems such as:
Large Language Models (LLMs)
Retrieval Augmented Generation (RAG) systems
Knowledge graph builders
AI search engines
semantic indexing pipelines
The format is designed to reduce the need for HTML parsing and enable direct AI ingestion.
Versioning
Every feed item should include a version identifier.
Example:
{
"spec_version": "1.0",
"content": { ... }
}Version rules:
Minor changes do not break compatibility
Major versions may introduce structural changes
Document Structure
Each feed item is a standalone JSON document.
Example:
{
"spec_version": "1.0",
"content": {
"text": "Example article content.",
"summary": "Short summary of the content.",
"keywords": ["ai","example"],
"entities": [],
"links": []
},
"structure": {
"@context": "https://schema.org",
"@type": "Article"
},
"ai_meta": {
"token_est": 120,
"chars": 780,
"crawler_hint": "normal",
"richness_score": 3,
"embedding_ready": true
}
}Core Fields
spec_version
Specifies the version of the AISENSE AI Data Feed format.
Type: string
Required: yes
Example:
content Object
Contains the main text content and semantic metadata.
text
Cleaned primary content text.
Type: string
Required: yes
summary
Short summary of the content.
Type: string
Optional
keywords
Keywords describing the content topic.
Type: array of strings
Example:
entities
Named entities extracted from the text.
Supported types include:
Person
Organization
Location
Product
Event
Example:
"entities": [
{
"type": "Organization",
"name": "Example Corp"
}
]
links
URLs related to the content.
Typical uses:
canonical source
raw text version
related documentation
Example:
"links": [
"https://example.com/article",
"https://data.example.com/article.txt"
]
structured_data
Schema.org compatible structured metadata.
Example:
"structured_data": {
"mainEntity": [
{
"@type": "Question",
"name": "What is AI?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Artificial intelligence refers to..."
}
}
]
}
structure Object
Defines the semantic structure of the document using schema.org.
Example:
"structure": {
"@context": "https://schema.org",
"@type": "FAQ"
}
Common types:
Article
FAQ
HowTo
Product
Dataset
Documentation
Guide
ai_meta Object
Metadata intended for AI ingestion pipelines.
token_est
Estimated token count of the text.
Used for cost estimation in LLM pipelines.
Type: integer
chars
Character length of the main text.
Type: integer
crawler_hint
Hint describing the content density.
Allowed values:
normal content
rich content
richness_score
Semantic richness score.
Example scale:
1 minimal content
3 standard content
5 highly structured content
embedding_ready
Indicates that the text is already cleaned and suitable for direct embedding.
Type: boolean
Optional Fields
reasoning
Explanation of how the content structure was generated.
Example:
This field is optional and mainly intended for debugging or transparency.
File Naming Convention
Recommended naming:
Example:
Where:
timestamp = Unix timestamp
hash = unique identifier
Feed Directory Layout
Example structure:
/2026
/03
/11
content_1773235424_xxxxx.json
This structure allows efficient chronological crawling.
Feed Discovery
Publishers should expose a discovery endpoint.
Example:
Example discovery file:
{
"spec_version": "1.0",
"feed_url": "https://data.example.com/content/",
"updated": "2026-03-11T10:00:00Z"
}JSON Schema
Example simplified schema:
{
"type": "object",
"required": ["spec_version","content"],
"properties": {“spec_version”: {
“type”: “string”
},
“content”: {
“type”: “object”,
“required”: [“text”],
“properties”: {
“text”: { “type”: “string” },
“summary”: { “type”: “string” },
“keywords”: {
“type”: “array”,
“items”: { “type”: “string” }
}
}
},
“ai_meta”: {
“type”: “object”,
“properties”: {
“token_est”: { “type”: “integer” },
“chars”: { “type”: “integer” },
“embedding_ready”: { “type”: “boolean” }
}
}
}
}
Example AI Ingestion Pipeline
Example workflow for AI systems consuming the feed:
↓
crawl new JSON files
↓
extract content.text
↓
generate embeddings
↓
store in vector database
↓
index metadata
Design Principles
The format is designed to:
minimize parsing complexity
preserve source attribution
support semantic structure
enable fast ingestion for AI systems
remain human-readable
Open Adoption
The AISENSE AI Data Feed format is intended as an open format that can be implemented by any publisher or platform.
No dependency on AISENSE infrastructure is required.