RAG Articles - Enterprise Knowledge https://enterprise-knowledge.com/tag/rag/ Mon, 03 Nov 2025 21:25:55 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.2 https://enterprise-knowledge.com/wp-content/uploads/2022/04/EK_Icon_512x512.svg RAG Articles - Enterprise Knowledge https://enterprise-knowledge.com/tag/rag/ 32 32 LLM Solutions PoC to Production: From RAGs to Riches (Part 1) https://enterprise-knowledge.com/llm-solutions-poc-to-production-from-rags-to-riches-part-1/ Wed, 30 Jul 2025 19:14:23 +0000 https://enterprise-knowledge.com/?p=25063 In the past year, many of the organizations EK has partnered with have been developing Large Language Model (LLM) based Proof-of-Concepts (PoCs). These projects are often pushed for by an enthusiastic IT Team, or internal initiative – with the low … Continue reading

The post LLM Solutions PoC to Production: From RAGs to Riches (Part 1) appeared first on Enterprise Knowledge.

]]>
In the past year, many of the organizations EK has partnered with have been developing Large Language Model (LLM) based Proof-of-Concepts (PoCs). These projects are often pushed for by an enthusiastic IT Team, or internal initiative – with the low barrier to entry and cost in LLM development making it an easy project for executives to greenlight. Despite initial optimism, these LLM PoCs rarely reach the enterprise-grade implementations promised due to factors such as organizational buy-in, technical complexity, security concerns, misalignment on content readiness for AI solutions, and a lack of investment in key infrastructure. For example, Gartner has predicted that 30% of GenerativeAI projects will be abandoned after PoC by the end of 2025. This blog provides an overview of EK’s approach to evaluating and roadmapping an LLM solution from PoC to production, and highlights several dimensions important to successfully scaling an LLM-based enterprise solution.

 

Organizational Implementation Considerations:

Before starting on the technical journey from “RAGs to Riches”, there are several considerations for an organization before, during, and after creating a production solution. By taking into account each of these considerations, a production LLM solution has a much higher chance of success.

Before: Aligning Business Outcomes

Prior to building out a production LLM solution, a team will have developed a PoC LLM solution that is able to answer a limited set of use cases. Before the start of production development, it is imperative that business outcomes and the priorities of key stakeholders are aligned with project goals. This often looks like mapping business outcomes – such as enhanced customer interactions, operational efficiency, or reduced compliance risk to quantifiable outcomes such as shorter response times and findability of information. It is important to ensure these business goals translate from development to production and adoption by customers. Besides meeting technical functionality, setting up clear customer and organizational goals will help to ensure the production LLM solution continues to have organizational support throughout its entire lifecycle.

During: Training Talent and Proving Solutions

Building out a production LLM solution will require a team with specialized skills in natural language processing (NLP), prompt engineering, semantic integration, and embedding strategies. In addition, EK recommends investing resources into content strategists and SMEs who understand the state of their organization’s data and/or content. These roles in particular are critical to help prepare content for AI solutions, ensuring the LLM solution has comprehensive and semantically meaningful content. Organizations that EK has worked with have successfully launched and maintained production LLM solutions by proactively investing in these skills for organizational staff. This helps organizations build resilience in the overall solution, driving success in LLM solution development.

After: Infrastructure Planning and Roadmapping

To maintain a production LLM solution after it has been deployed to end-users, organizations must account for the infrastructure investments and operational costs needed, as well as necessary content and data maintenance. Some of these resources might include enterprise licensing, additional software infrastructure, and ongoing support costs. While many of these additional costs can be mitigated by effectively aligning business outcomes and training organizational talent, there still needs to be a roadmap and investment into the future infrastructure (both systematically and content-wise) of the LLM production solution.

 

Technical Criteria for Evaluating LLM PoCs:

In parallel with the organizational implementation considerations, and from EK’s depth of experience in developing LLM MVPs, designing enterprise AI architecture, and implementing more advanced LLM solutions such as Semantic RAG, EK has developed 7 key dimensions that can be used to evaluate the effectiveness of an LLM PoC:

Figure 1: Dimensions for Evaluating an LLM Solution

1. Depth of Interaction: refers to how deeply and dynamically users can engage with the LLM solution. At a lower level, interaction might simply involve asking questions and receiving direct answers, while at the highest level, intelligent agents act on behalf of the user autonomously to leverage multiple tools and execute tasks.

2. Freshness of Information: describes how frequently the content and data behind the semantic search solution are updated and how quickly users receive these updates. While lower freshness implies data updated infrequently, at higher freshness levels, data is updated frequently or even continuously which helps to ensure users are always interacting with the most current, accurate, and updated information available.

3. Level of Explanation: refers to how transparently the LLM solution communicates the rationale behind its responses. At a lower level of explanation, users simply are receiving answers without clear reasoning. In contrast, a high level of explanation would include evidence, citations, audit trails, and a clear path on how information was retrieved. 

4. Personalization, Access & Entitlements Requirements: describes how specifically content and data are tailored and made accessible based on user identity, roles, behavior, or needs. At lower levels, content is available to all users without personalization or adaptations. At higher levels, content personalization is integrated with user profiles, entitlements, and explicit access controls, ensuring users only see highly relevant, permissioned content. 

5. Accuracy of Information: refers to how reliably and correctly the LLM solution can answer user queries. At lower levels, users receive reasonable answers that may have minor ambiguities or occasional inaccuracies. At the highest accuracy level, each response is traced back to original source materials and are cross-validated with authoritative sources. 

6. Enterprise Agentic Support: describes how the LLM solution interacts with the broader enterprise AI ecosystem, and coordinates with other AI agents. At the lowest level, the solution acts independently without any coordination with external AI agents. At the highest level, the solution seamlessly integrates as a consumer and provider within an ecosystem of other intelligent agents.

7. Enterprise Embedding Strategy: refers to how the LLM solution converts information into vector representations (embeddings) to support retrieval. At a lower level embeddings are simple vector representations with minimal or no structured metadata. At the highest levels, embeddings include robust metadata and are integrated with enterprise context through semantic interpretation and ontology-based linkages. 

For an organization, each of the technical criteria will be weighed differently based on the unique use cases and requirements of the LLM solution. For example, an organization that is working on a content generation use case could have a greater emphasis on Level of Explanation and Freshness of Information while an organization that is working on an information retrieval use case may care more about Personalization, Access, & Entitlements Requirements. This is an integral part of the evaluation process, with an organization coming to agreement on the level of proficiency needed within each factor. Leveraging this standard, EK has worked with organizations across various industries and diverse LLM use cases to optimize their solutions.

Additionally, EK recommends that an organization undergoing an LLM PoC evaluation also conduct an in-depth analysis of content relevant to their selected use case(s). This enables them to gain a more comprehensive understanding of its quality – including factors like completeness, relevancy, and currency – and can help unearth gaps in what the LLM may be able to answer. All of this informs the testing phase by guiding the creation of each test, as well as the expected outcomes, and can be generally categorized across three main areas of remediation:

  • Content Quality – The content regarding a certain topic doesn’t explicitly exist and is not standardized – this may necessitate creating dummy data to enable certain types of tests.
  • Content Structure – The way certain content is structured varies – we can likely posit that one particular structure will give more accurate results than another. This may necessitate creating headings to indicate clear hierarchy on pages, and templates to consistently structure content. 
  • Content Metadata – Contextual information that may be useful to users is missing from content. This may necessitate establishing a taxonomy to tag with a controlled vocabulary, or an ontology to establish relationships between concepts. 

 

Technical Evaluation of LLM PoCs In Practice:

Putting the organizational implementation and technical considerations into practice, EK recently completed an engagement with a leading semiconductor manufacturer, employing the standard process for evaluating their PoC LLM search solution. The organization had developed a PoC search solution that was being utilized for answering questions against a series of user-selected PDFs relating to the company’s technical onboarding documentation. EK worked with the organization  to align on key functional requirements via a capability assessment for a production LLM solution based on the 7 dimensions EK has identified. Additionally, EK completed a simultaneous analysis of in-scope content for the use case. The results of this content evaluation informed which content components should be prioritized and candidates for the testing plan.

After aligning on priority requirements, in this case, accuracy and freshness of information, EK developed and conducted a testing plan for parts of the PoC LLM. To operationalize the testing plan, EK created a four-phase RAG Evaluation & Optimization Workflow to turn the testing plan into actionable insights.This workflow helped produce a present-state snapshot of the LLM solution, a target-state benchmark, and a bridging roadmap that prioritizes retriever tuning, prompt adjustments, and content enrichment. Based on the workflow results, stakeholders at the organization were able to easily interpret how improved semantics, content quality, structure, and metadata would improve the results of their LLM search solution.

In the following blogs of the “RAGs to Riches” series, EK will be explaining the process for developing a capability assessment and testing plan for LLM based PoCs. These blogs will expand further on how each of the technical criteria can be measured as well as how to develop long-term strategy for production solutions.

 

Conclusion

Moving an LLM solution from proof-of-concept to enterprise production is no small feat. It requires careful attention to organizational alignment, strong business cases, technical planning, compliance readiness, content optimization, and a commitment to ongoing talent development. Addressing these dimensions systematically will ensure that your organization will be well positioned to turn AI innovation into a durable competitive advantage.

If you are interested in having EK evaluate your LLM-based solution, and help build out an enterprise-grade implementation contact us here

The post LLM Solutions PoC to Production: From RAGs to Riches (Part 1) appeared first on Enterprise Knowledge.

]]>
Sara Nash Presenting at Data Architecture Online https://enterprise-knowledge.com/sara-nash-presenting-at-data-architecture-online/ Fri, 18 Jul 2025 18:27:18 +0000 https://enterprise-knowledge.com/?p=24975 Sara Nash, Principal Consultant at Enterprise Knowledge, will be moderating the keynote session titled “Data Architecture for AI” at Data Architecture Online’s annual event on Wednesday, July 23rd at 11:30am EST. Through this session, attendees will gain valuable insights into … Continue reading

The post Sara Nash Presenting at Data Architecture Online appeared first on Enterprise Knowledge.

]]>

Sara Nash, Principal Consultant at Enterprise Knowledge, will be moderating the keynote session titled “Data Architecture for AI” at Data Architecture Online’s annual event on Wednesday, July 23rd at 11:30am EST.

Through this session, attendees will gain valuable insights into best practices, common pitfalls, and forward-looking strategies to align their data architecture with the accelerating pace of AI. The panelists will discuss topics such as:

  • The shift from traditional data warehouses to real-time, scalable, and decentralized frameworks;
  • The role of data governance and quality in training reliable AI models; and 
  • How organizations can future-proof their infrastructure for emerging AI capabilities.

For more information on the conference, check out the schedule and registration here

The post Sara Nash Presenting at Data Architecture Online appeared first on Enterprise Knowledge.

]]>
The Semantic Exchange: Humanitarian Foundation – SemanticRAG POC https://enterprise-knowledge.com/the-semantic-exchange-humanitarian-foundation-semanticrag-poc/ Thu, 17 Jul 2025 18:25:33 +0000 https://enterprise-knowledge.com/?p=24913 Enterprise Knowledge is concluding the first round of our new webinar series, The Semantic Exchange. In this webinar series, we follow a Q&A style to provide participants an opportunity to engage with our semantic design experts on a variety of … Continue reading

The post The Semantic Exchange: Humanitarian Foundation – SemanticRAG POC appeared first on Enterprise Knowledge.

]]>

Enterprise Knowledge is concluding the first round of our new webinar series, The Semantic Exchange. In this webinar series, we follow a Q&A style to provide participants an opportunity to engage with our semantic design experts on a variety of topics about which they have written. This webinar is designed for a variety of audiences, ranging from those working in the semantic space as taxonomists or ontologists, to folks who are just starting to learn about structured data and content, and how they may fit into broader initiatives around artificial intelligence or knowledge graphs.

This 30-minute session invites you to engage with James Egan’s case study, Humanitarian Foundation – SemanticRAG POC. Come ready to hear and ask about:

  • How various types of organizations can leverage standards-based semantic graph technologies;
  • How can leveraging semantics addresses data integration challenges; and
  • What value semantics can provide to an organization’s overall data ecosystem.

This webinar will take place on Wednesday July 23rd, from 2:00 – 2:30PM EDT. Can’t make it? The session will also be recorded and published to registered attendees. View the recording here!

The post The Semantic Exchange: Humanitarian Foundation – SemanticRAG POC appeared first on Enterprise Knowledge.

]]>
Data Governance for Retrieval-Augmented Generation (RAG) https://enterprise-knowledge.com/data-governance-for-retrieval-augmented-generation-rag/ Thu, 20 Feb 2025 17:58:05 +0000 https://enterprise-knowledge.com/?p=23151 Retrieval-Augmented Generation (RAG) has emerged as a powerful approach for injecting organizational knowledge into enterprise AI systems. By combining the capabilities of large language models (LLMs) with access to relevant, up-to-date organizational information, RAG enables AI solutions to deliver context-aware, accurate, … Continue reading

The post Data Governance for Retrieval-Augmented Generation (RAG) appeared first on Enterprise Knowledge.

]]>
Retrieval-Augmented Generation (RAG) has emerged as a powerful approach for injecting organizational knowledge into enterprise AI systems. By combining the capabilities of large language models (LLMs) with access to relevant, up-to-date organizational information, RAG enables AI solutions to deliver context-aware, accurate, and actionable insights. 

Unlike standalone LLMs, which often struggle with outdated or irrelevant information, RAG architectures ensure domain-specific knowledge transfer by providing some organizational context in which an AI model operates within the enterprise. This makes RAG a critical tool for aligning AI outputs with an organization’s unique expertise, reducing errors, and enhancing decision-making. As organizations increasingly rely on RAG for tailored AI solutions, a strong data governance framework becomes essential to ensure the quality, integrity, and relevance of the knowledge fueling these systems.

At the heart of RAG’s success lies the data driving the process. The quality, structure, and accessibility of this data directly influence the effectiveness of the RAG architecture. For RAG to deliver context-aware insights, it must rely on information that is accurate, current, well-organized, and readily retrievable. Without a robust framework to manage this data, RAG solutions risk being hampered by inconsistencies, inaccuracies, or gaps in the information pipeline. This is where RAG-specific data governance becomes indispensable. Unlike general data governance, which focuses on managing enterprise-wide data assets, RAG data governance specifically addresses the curation, structuring, and accessibility of knowledge used in retrieval and generation processes. It ensures that the data fed into RAG models remains relevant, up-to-date, and aligned with business objectives, enabling AI-driven insights that are both accurate and actionable.

A strong data governance framework is foundational to ensuring the quality, integrity, and relevance of the knowledge that fuels RAG systems. Such a framework encompasses the processes, policies, and standards necessary to manage data assets effectively throughout their lifecycle. From data ingestion and storage to processing and retrieval, governance practices ensure that the data driving RAG solutions remain trustworthy and fit for purpose.

To establish this connection, this article delves into key governance strategies tailored for two major types of RAG: general/vector-based RAG and graph-based RAG. These strategies are designed to address each approach’s unique data requirements while highlighting shared practices essential to both. The tables below illustrate the governance practices specific to each RAG type, as well as the overlapping principles that form the foundation of effective data governance across both methods.

What is Vector-Based RAG?

RAG Vector-Based AI leverages vector embeddings (embeddings are mathematical representations of text that help systems understand the semantic meaning of words, sentences, and documents) to retrieve semantically similar data from dense vector databases, such as Pinecone or Weaviate.  The approach is based on vector search, a technique that converts text into numerical representations (vectors) and then finds documents that are most similar to a user’s query. This approach is ideal for unstructured text and multimedia data, making it particularly reliant on robust data governance.

What is Graph RAG?

Graph RAG combines generative models with graph databases such as Neo4j, AWS Neptune, Graphwise, GraphDB, or Stardog, which represent relationships between data points. This approach is particularly suited for knowledge graphs and ontology-driven AI.

 

Key Data Governance Practices for RAG

Practices Applicable to Both Vector-Based and Graph-Based RAG

Governance Practice Why it Matters Governance Actions
Data Quality and Consistency Ensures accurate, reliable, and relevant AI-generated responses. Implement data profiling, quality checks, and cleansing processes. Regular audits to validate accuracy and resolve redundancies.
Metadata Management Provides context for AI to retrieve the most relevant data. Maintain comprehensive metadata and implement a data catalog for efficient tagging, classification, and retrieval.
Role-Based Access Control (RBAC) Protects sensitive data from unauthorized access. Enforce RBAC policies for granular control over access to data, embeddings, and graph relationships.
Data Versioning and Lineage Tracks changes to ensure reproducibility and transparency. Implement data versioning to align vectors and graph entities with source data. Map data lineage to ensure provenance.
Compliance with Data Sovereignty Laws Ensures compliance with regulations on storing and processing sensitive data. Store and process data in regions that comply with local regulations, e.g., GDPR, HIPAA.

 

Practices Unique to Vector-Based RAG

Governance Practice Why it Matters Governance Actions
Embedding Quality and Standards Ensures accurate and relevant content retrieval. Standardize embedding generation techniques. Validate embeddings against real-world use cases.
Efficient Indexing and Cataloging Optimizes the performance and relevance of vector-based queries. Create and maintain dynamic data catalogs linking metadata to vector representations.
Data Retention and Anonymization RAG often pulls from historical data, making it essential to manage data retention periods and anonymize sensitive information. Implement policies that balance data usability with compliance and privacy standards.
Metadata Management Effective metadata provides context for the AI to retrieve the most relevant data. Maintain comprehensive metadata to tag, classify, and describe data assets, improving AI retrieval efficiency. Consider implementing a data catalog to manage metadata.

 

Practices Unique to Graph-Based RAG

Governance Practice Why it Matters Governance Actions
Ontology Management Ensures the accurate representation of relationships and semantics in the knowledge graph. Collaborate with domain experts to define and maintain ontologies. Regularly validate and update relationships.
Taxonomy Management Supports the hierarchical classification of knowledge for efficient data organization and retrieval. Use automated tools to evolve taxonomies. Validate taxonomy accuracy with domain-specific experts.
Reference Data Management Ensures consistency and standardization of data attributes across the graph. Define and govern reference datasets. Monitor for changes and propagate updates to dependent systems.
Data Modeling for Graphs Provides the structural framework necessary for efficient query execution and graph traversal. Design graph models that align with business requirements. Optimize models for scalability and performance.
Graph Query Optimization Improves the efficiency of complex queries in graph databases. Maintain indexed nodes and monitor query performance.
Knowledge Graph Governance Ensures the integrity, security, and scalability of the graph-based RAG system. Implement version control for graph updates. Define governance policies for merging, splitting, and retiring nodes.
Provenance Tracking Tracks the origin and history of data in the graph to ensure trust and auditability. Enable provenance metadata for all graph nodes and edges. Integrate with lineage tracking tools.

Refer to Top 5 Tips for Managing and Versioning an Ontology for suggestions on ontology governance. 

Refer to Taxonomy Design Best Practices for more on taxonomy governance.

 

Case Study: Impact of Lack of RAG Governance

  • Inaccurate and Irrelevant Insights: Without proper RAG governance, AI systems may pull outdated or inconsistent information, leading to inaccurate insights and flawed decision-making that can cost organizations time and resources. 
    • “Garbage In, Garbage Out: How Poor Data Governance Poisons AI”
      This article discusses how inadequate data governance can lead to unreliable AI outcomes, emphasizing the importance of proper data management.
      labs.sogeti.com
    • “AI’s Achilles’ Heel: The Consequence of Bad Data”
      This article highlights the critical role of data quality in AI performance and the risks associated with poor data governance.
      versium.com
    • “Understanding the Impact of Lack of Data Governance”
      This resource outlines the risks and consequences of poor data governance, providing insights into how it can affect business operations.
      actian.com
  • Difficulty in Scaling AI Systems: A lack of structured governance limits the scalability of RAG solutions. As the volume of data grows, it becomes harder to ensure that the right information is retrieved and used, resulting in inefficient AI models.
  • Data Silos and Inaccessibility: Without proper metadata management and access control, important knowledge may remain isolated or inaccessible, reducing the effectiveness of AI in providing actionable insights across departments.
  • Compliance and Security Risks: The absence of governance may lead to failures in data sovereignty and privacy requirements, exposing the organization to compliance risks, potential breaches, and reputational damage.
  • Loss of Stakeholder Confidence: As RAG outputs become unreliable and inconsistent, stakeholders may lose confidence in AI-driven decisions, affecting future investment and buy-in from key decision-makers.

 

Conclusion

Effective data governance is crucial for RAG, regardless of the retrieval method. RAG Vector-Based AI relies on embedding standards, efficient indexing, quality controls, and strong metadata management, while Graph RAG demands careful management of ontologies, taxonomy, and tracking data lineage. By applying tailored governance strategies for each type, organizations can maximize the value of their AI systems, ensuring accurate, secure, and compliant data retrieval.

Graph RAG AI is the future of contextual intelligence, offering unparalleled potential to unlock insights from interconnected data. By combining advanced graph technologies with industry-best data governance practices, EK helps organizations transform their data into actionable knowledge while maintaining security and scalability.

As organizations look to unlock the full potential of their data-driven solutions, robust data governance becomes key. EK delivers Graph RAG AI solutions that reflect domain-specific needs, with governance frameworks that ensure data integrity, security, and compliance. Please check out our case studies for more details on how we have helped organizations in similar domains. EK also optimizes graph performance for scalable AI-driven insights. If your organization is ready to elevate its RAG initiatives with effective data governance, contact us today to explore how we can help you transform your data into actionable knowledge while maintaining security and scalability.

Is your organization ready to elevate its RAG initiatives with robust data governance? Contact us to unlock the full potential of your data-driven solutions.

The post Data Governance for Retrieval-Augmented Generation (RAG) appeared first on Enterprise Knowledge.

]]>
Multimodal Graph RAG (mmGraphRAG): Incorporating Vision in Search and Analytics https://enterprise-knowledge.com/multimodal-graph-rag-mmgraphrag-incorporating-vision-in-search-and-analytics/ Wed, 29 Jan 2025 15:42:35 +0000 https://enterprise-knowledge.com/?p=23029 David Hughes, Principal Data & AI Solution Architect at Enterprise Knowledge, presented “Unleashing the Power of Multimodal GraphRAG: Integrating Image Features for Deeper Insights” at Data Day Texas 2025 in Austin, TX on Saturday, January 25th. In this presentation, Hughes … Continue reading

The post Multimodal Graph RAG (mmGraphRAG): Incorporating Vision in Search and Analytics appeared first on Enterprise Knowledge.

]]>
David Hughes, Principal Data & AI Solution Architect at Enterprise Knowledge, presented “Unleashing the Power of Multimodal GraphRAG: Integrating Image Features for Deeper Insights” at Data Day Texas 2025 in Austin, TX on Saturday, January 25th.

In this presentation, Hughes discussed an underexplored dimension of GraphRAG–the integration of image–by introducing Multimodal GraphRAG, an innovative framework that brings image data to the forefront of graph-based reasoning and retrieval. He demonstrated how this approach enables more comprehensive understanding of images, amplifying both the depth and accuracy of insights. Attendees gained insight into:

  • How mmGraphRAG works;
  • The integration of vision models, hypervectors, and graph databases;
  • BAML agentic workflows; and
  • Real-world applications and benefits for mmGraphRAG.

 

The post Multimodal Graph RAG (mmGraphRAG): Incorporating Vision in Search and Analytics appeared first on Enterprise Knowledge.

]]>
How to Prepare Content for AI https://enterprise-knowledge.com/how-to-prepare-content-for-ai/ Wed, 21 Feb 2024 16:40:32 +0000 https://enterprise-knowledge.com/?p=19919 Artificial Intelligence (AI) enables organizations to leverage and manage their content in exciting new ways, from chatbots and content summarization to auto-tagging and personalization. Most organizations have a copious amount of content and are looking to use AI to improve … Continue reading

The post How to Prepare Content for AI appeared first on Enterprise Knowledge.

]]>
Artificial Intelligence (AI) enables organizations to leverage and manage their content in exciting new ways, from chatbots and content summarization to auto-tagging and personalization. Most organizations have a copious amount of content and are looking to use AI to improve their operations and efficiency while enabling end users to find relevant information quickly and intuitively. 

With the rise of ChatGPT and other generative AI tools in the last year, there’s a common misconception that you can “do” AI on any content with no preparation. If you want accurate and useful results and insights, however, it requires some upfront work. Understanding how AI interacts with your content and how your content strategy supports AI readiness will set you up for an effective AI implementation. 

How AI Interacts with Content

While AI can help in many phases of the content lifecycle, from planning and authoring to discovery, AI usually interacts with existing content in two key ways:

1) Comprehension: AI must parse existing content to “understand” an organization’s vernacular or common language. This helps the AI model create statistical models, cluster content and concepts, and create a baseline for addressing future inputs.
2) Search: AI often needs to quickly identify snippets of content related to text, chunking longer content into smaller components and searching these components for relevant material. These smaller snippets are often used to gain an understanding of new or updated material.

When AI examines existing content, it is trying to understand what it is about and how it relates to other concepts within the knowledge domain, and there are steps we can take to help. While this blog is mostly considering how large language models (LLMs) and retrieval augmented generation (RAG) AI interact with content, the steps listed below will prepare content for a variety of other types of AI for both insight and action.

Developing a Content Strategy

The best way to prepare content for AI is to develop a content strategy that addresses the relationships, the structure, the clean up, and the componentization of the content. One key preliminary activity will be to audit your content with the specific lens of AI-readiness, and to assess your organization’s content against the steps listed below.  

Model the Knowledge Domain

In most situations, AI creates internal models to group and cluster information to help the AI respond efficiently to new inputs. AI does a decent job of inferring the relationships between information, but organizations can significantly assist this process by defining an ontology. Ontologies enable organizations to define and relate organizational information, codifying how people, tools, content, topics, and other concepts are related. These models improve findability, support advanced search use cases, and form semantic layers that facilitate the integration of data from multiple sources into consumable formats and user-intuitive structures. 

Once created, an ontology can be used with content to:

  • auto-tag content with related organizational information (topics, people, etc.);
  • enable navigation through an organization’s knowledge domain by following relationships; and
  • supply AI with curated models that explain how content connects with the organization’s information that can lead to key business insights. 

Modeling an organization’s knowledge domain with an ontology improves AI’s ability to utilize content more effectively and produce more accurate results.

Cleanup and Deduplicate the Content 

Today’s organizations have too much content and duplicated information. Content is often split between multiple systems due to limitations with legacy tools, user permissions, or the need to support new features and displays. While auditing all of an organization’s content may seem daunting, there are steps an organization can take to streamline the process. Organizations should focus on their NERDy content, identifying the new, essential, reliable, and dynamic content users need to perform their jobs. As part of this focus, organizations reduce content ROT (Redundant, Outdated, Trivial), improving user trust and experience with organizational information. 
As part of the cleaning effort, an organization may want to create a centralized authoring platform to maintain content in one place rather than siloed locations. This allows content to be managed in one place, reducing the effort to update content and enabling content reuse. Reusing content helps deduplicate content, removing the need to replicate and update content in multiple places. A content audit, analysis, and clean-up will organize content in an intuitive way for AI and reduce bias from repeated or incorrect information.

Add Structure and Standardization

Once your organization’s knowledge domain is defined, the next step is to create the content models and content types that support that ontology, this is often referred to as content engineering
Content types are the reusable templates that standardize the structure for a particular format of content such as articles, infographics, webinars, and product information, as well as the standard metadata that should be included with that content type (created date, author, department, related subjects, etc.).

Example of how each type of cake "bundt, round layered, and cupcake" all need their cake pan - or content type template.

If we think of Content Types as the cake pan in this analogy, a content model is the Cake Recipe. While the Content Type defines the structure of the content, the Content Model defines the meaning of that content. In the cake analogy, you may have a chocolate cake, a vanilla cake, and a carrot cake; theoretically, any of those recipes could be used in any of the pans. If the content type dictates how, the content model dictates what. In an organization this could look like a content model of a product that includes parts like the product title, the product value proposition, the product features, etc. This content model of a product could then be fit into many content types, such as a brochure, a web page, and an infographic. By creating content models and content types we give the AI model better insight into how the content is connected and the purpose it serves.

The structure of these templates provides AI with content in a consumable and semantically meaningful format where content sections and metadata are given to the AI model. A crucial part of content engineering  is the creation of a taxonomy to describe the content. Taxonomies should be user-centric, highlighting users’ terminology to talk about content. The terms within a taxonomy and the associated synonyms improve an AI’s capability to utilize the content. Additionally, content types and content models facilitate the consistent display of information and configuration of advanced search features, improving the user experience when looking for and viewing content.

Componentize the Content

Once the content is structured and cleaned, a common next step is to break up the content into smaller sections according to the content model. This process has many names, such as content deconstruction, content chunking, or the creation of content components. In content deconstruction, structured content is split into smaller semantically meaningful sections. Each section or component has a standalone purpose, even without the context of the original document. Content components are often managed in a component content management system (CCMS), providing the following benefits:

  • Users (and AI) can quickly identify relevant sections of larger content.
  • Authors can reuse content components across multiple documents.
  • Content components can have associated metadata, enabling systems to personalize the content that users see based on their profiles.
  • Dynamic content is possible.

Similar to the user benefits, content components provide AI with user-generated components of content as opposed to requiring the AI to perform statistical chunking. The content chunks allow an AI to identify relevant text inputs quickly and more accurately than if fed entire large documents.

Conclusion

Through effective content strategy, content audit, and content engineering, organizations can efficiently manage information and ensure that AI has correct, comprehensive content with semantically meaningful relationships. A well-defined content strategy provides a framework to curate old and new information, allowing organizations to continuously feed information into AI, keeping its internal models up-to-date. A well-structured content audit will ensure preparation time is spent on the areas that will make the most difference in AI-readiness such as structure, standardization, componentization, and relationships across content. Well-thought-out content engineering will enable content reuse and personalization at scale through machine readable structure. 

Are you seeking help defining a content strategy, auditing your content for AI-readiness, or training your AI to understand your domain? Contact us and let us know how we can help!

Special thank you to James Midkiff for his contributions to the first draft of this blog post!

The post How to Prepare Content for AI appeared first on Enterprise Knowledge.

]]>