AISENSE AI Data Feed Specification v1.0
Overview
The AISENSE AI Data Feed format is a machine-readable JSON structure designed to make text content easily ingestible by AI systems such as large language models, RAG pipelines, search engines, and knowledge graph builders.
The format focuses on:
clean text extraction
structured metadata
schema.org compatibility
AI ingestion hints
source attribution
The goal is to allow AI systems to consume content without complex HTML parsing.
Document Structure
Each feed item is published as a JSON document.
Example:
{
"content": {
"text": "Example text content.",
"summary": "Short summary of the content.",
"keywords": ["example","ai","content"],
"entities": [],
"links": [],
"structured_data": {}
},
"structure": {
"@context": "https://schema.org",
"@type": "Article"
},
"ai_meta": {
"token_est": 100,
"chars": 650,
"crawler_hint": "normal",
"richness_score": 3,
"embedding_ready": true
}
}Top-Level Fields
content
Contains the primary textual data and semantic metadata.
text
The cleaned main text content.
Type: string
Required: yes
Example:
summary
Short machine-generated or user-provided summary.
Type: string
Required: optional
keywords
Keywords describing the content topic.
Type: array of strings
Example:
entities
Named entities extracted from the text.
Possible types:
Person
Organization
Location
Product
Event
Type: array
Example:
"entities": [
{
"type": "Organization",
"name": "Example Corp"
}
]
links
Links related to the content.
Typical uses:
original source
raw text version
related resources
Type: array of URLs
Example:
"links": [
"https://example.com/article",
"https://data.example.com/article.txt"
]
structured_data
Schema.org compatible structured metadata.
Type: object
Example:
"structured_data": {
"mainEntity": [
{
"@type": "Question",
"name": "What is AI?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Artificial intelligence is..."
}
}
]
}structure
Defines the semantic structure of the document using schema.org.
Type: object
Example:
"structure": {
"@context": "https://schema.org",
"@type": "FAQ"
}Possible values:
Article
FAQ
HowTo
Product
Dataset
Documentation
Guide
ai_meta
Metadata specifically intended for AI ingestion pipelines.
token_est
Estimated token size of the content.
Used by AI pipelines to estimate processing cost.
Type: integer
chars
Character length of the main content.
Type: integer
crawler_hint
Hint for crawlers about content density.
Possible values:
low density content
normal content
rich content
richness_score
Approximate semantic richness of the content.
Scale example:
1 very simple text
3 normal content
5 complex structured content
embedding_ready
Indicates whether the text is clean enough to be directly embedded.
Type: boolean
Example:
Optional Field
reasoning
Explanation of how the system structured the content.
Used for transparency and debugging.
Example:
File Naming Convention
Example:
Example:
Where:
timestamp = Unix timestamp
hash = unique identifier
Feed Layout Example
/2026
/03
/11
content_1773235424_xxxxx.json
This allows crawlers to ingest new data efficiently by date.
Typical AI Pipeline
Example ingestion workflow:
↓
download JSON
↓
read content.text
↓
generate embedding
↓
store in vector database
↓
link back to source
Design Goals
The format is designed to:
minimize parsing complexity
preserve attribution
support schema.org semantics
enable fast ingestion into AI systems
remain human-readable
Live system
Resources:
AISENSE – AI DATA FEED GENERATOR
Live service: https://data.aisenseapi.com/
License and Usage
The AISENSE AI Data Feed format is intended as an open format that can be implemented by any platform or publisher. Extended spesification
No dependency on AISENSE infrastructure is required to adopt the format.