Urmi Majumder, Author at Enterprise Knowledge

Enterprise AI Architecture Series: How to Inject Business Context into Structured Data using a Semantic Layer (Part 3)

Urmi Majumder — Wed, 26 Mar 2025 14:55:28 +0000

Introduction

AI has attracted significant attention in recent years, prompting me to explore enterprise AI architectures through a multi-part blog series this year. Part 1 of this series introduced the key technical components required for implementing an enterprise AI architecture. Part 2 discussed our typical approaches and experiences in structuring unstructured content with a semantic layer. In the third installment, we will focus on leveraging structured data to power enterprise AI use cases.

Today, many organizations have developed the technical ability to capture enormous amounts of data to power improved business operations or compliance with regulatory bodies. For large organizations, this data collection process is typically decentralized so that organizations can move quickly in the face of competition and regulations. Over time, such decentralization results in increased complexities with data management, such as inconsistent data formats across various data platforms and multiple definitions for the same data concept. A common example in EK’s engagements includes reviewing customer data from different sources with variations in spelling and abbreviations (such as “Bob Smith” vs. “Robert Smith” or “123 Main St” vs. “123 Main Street”), or seeing the same business concept (such as customer or supplier) being referred to differently across various departments in an organization. Obviously, with such extensive data quality and inconsistency issues, it is often impossible to integrate and harmonize data from the diverse underlying systems for a 360-degree view of the enterprise and enable cross-functional analysis and reporting. This is exactly the problem a semantic layer solves.

A semantic layer is a business representation of data that offers a unified and consolidated view of data across an organization. It establishes common data definitions, metadata, categories and relationships, thereby enabling data mapping and interpretation across all organizational data assets. A semantic layer injects intelligence into structured data assets in an organization by providing standardized meaning and context to the data in a machine-readable format, which can be readily leveraged by Artificial Intelligence (AI) systems. We call this process of embedding business context into organizational data assets for effective use by AI systems knowledge intelligence (KI). Providing a common understanding of structured data using a semantic layer will be the focus of this blog.

How a Semantic Layer Provides Context for Structured Data

A semantic layer provides AI with a programmatic framework to make organizational context and domain knowledge machine readable. It does so by using one or more components such as metadata, business glossary, taxonomy, ontology and knowledge graph. Specifically, it helps enterprise AI systems:

Leverage metadata to power understanding of the operational context;
Improve shared understanding of organizational nomenclature using business glossaries;
Provide a mechanism to categorize and organize the same data through taxonomies and controlled vocabularies;
Encode domain-specific business logic and rules in ontologies; and
Enable a normalized view of siloed datasets via knowledge graphs

Embedding Business Context into Structured Data: An Architectural Perspective

The figure below illustrates how the semantic layer components work together to enable Enterprise AI. This shows the key integration patterns via which structured data sources can be connected using a knowledge graph in the KI layer,including batch and incremental data pull using declarative and custom data mappings, as well as data virtualization.

Enterprise AI Architecture: Injecting Business Content into Structured Data using a Semantic Layer

AI models can reason and infer based on explicit knowledge encoded in the graph. This is achieved when both the knowledge or data schema (e.g. ontology) and its instantiation are represented in the knowledge graph. This representation is made possible through a custom service that allows the ontology to be synchronized with the graph (labeled as Ontology Sync with Graph in the figure) and graph construction pipelines described above.

Enterprise AI can derive additional context on linked data when taxonomies are ingested into the same graph via a custom service that allows the taxonomy to be synchronized with the graph (labeled as Taxonomy Sync with Graph in the figure). This is because taxonomies can be used to consistently organize this data and provide clear relationships between different data points. Finally, technical metadata collected from structured data sources can be connected with other semantic assets in the knowledge graph through a custom service that allows this metadata to be loaded into the graph (labeled as Metadata Load into Graph in the figure). This brings in additional context regarding data sourcing, ownership, versioning, access levels, entitlements, consuming systems and applications into a single location.

As is evident from the figure above ,a semantic layer enables data from different sources to be quickly mapped and connected using a variety of mapping techniques, thus enabling a unified, consistent, and single view of data for use in advacned analytics. In addition, by injecting business context into this unified view via semantic assets such as taxonomies, ontologies and glossaries, organizations can power AI applications ranging from semantic recommenders and knowledge panels to traditional machine learning (ML) model training and LLM-powered AI agents.

Case Studies & Enterprise Applications

In many engagements, EK has used semantic layers with structured data to power various use cases, from enterprise 360 to AI enablement. As part of enterprise AI engagements, a common issue we’ve seen is a lack of business context surrounding data. AI engineers continue to struggle to locate relevant data and ensure its suitability for specific tasks, hindering model selection and leading to suboptimal results and abandoned AI initiatives. These experiences show that raw data lacks inherent value; it becomes valuable only when contextualized for its users. Semantic layers provide this context to both AI models and AI teams, driving successful Enterprise AI endeavors.

Last year, a global retailer partnered with EK to overcome delays in retrieving store performance metrics and creating executive dashboards. Their centralized data lakehouse lacked sufficient metadata, hindering engineers from locating and understanding crucial metrics. By standardizing metadata, aligning business glossaries, and establishing taxonomy, we empowered their data visualization engineers to perform self-service analytics and rapidly create dashboards. This streamlined their insight generation without relying on source data system owners and IT teams. You can read more about how we helped this organization democratize their AI efforts using a semantic layer here.

In a separate case, EK facilitated the rapid development of AI models for a multinational financial institution by integrating business context into the company’s structured risk data through a semantic layer. The semantic layer expedited data exploration, connection, and feature extraction for the AI team, leading to the efficient implementation of enterprise AI systems like intelligent search engines, recommendation engines, and anomaly detection applications. EK also integrated AI model outputs into the risk management graph, enabling the development of proactive alerts for critical changes or potential risks, which, in turn, improved the productivity and decision-making of the risk assessment team.

Finally, the significant role a semantic layer plays in reducing data cleaning efforts and streamlining data management. Research consistently shows AI teams spend more time cleaning data than modeling it to produce valuable insights. By connecting previously siloed data using an identity graph, EK helped a large digital marketing firm gain a deeper understanding of its customer base through behavior and trend analytics. This solution resolved the discrepancy between 2 billion distinct records in their relational databases and the actual user base of 240 million.

Closing

Semantic layers effectively represent complex relationships between data objects, unlike traditional applications built for structured data. This allows them to support highly interconnected use cases like analyzing supply chains and recommendation systems. To adopt this framework, organizations must shift from an application-centric to a data-centric enterprise architecture. A semantic layer ensures that data retains its meaning and context when extracted from a relational database. In the AI era, this metadata-first framework is crucial for staying competitive. Organizations need to provide their AI systems with a consolidated, context-rich view of all transactional data for more accurate predictions.

This article completes our discussion about the technical integration between semantic layers and enterprise AI, introduced here. In the next segment of this KI architecture blog series, we will move onto the second KI component and discuss the technical approaches for encoding expert knowledge into enterprise AI systems.

To get started with leveraging structured data, building a semantic layer, and the KI journey at your organization, contact EK!

The post Enterprise AI Architecture Series: How to Inject Business Context into Structured Data using a Semantic Layer (Part 3) appeared first on Enterprise Knowledge.

Beyond Content Management for Real Knowledge Sharing

Urmi Majumder — Wed, 19 Feb 2025 15:41:42 +0000

Enterprise Knowledge’s Urmi Majumder and Maryam Nozari presented “AI-Based Access Management: Ensuring Real-time Data and Knowledge Control” on November 21 at KMWorld in Washington, D.C.

In this presentation, Majumder and Nozari explored the crucial role of AI in enhancing data governance through Role-Based Access Control (RBAC), and discussed how the Adaptable Rule Foundation (ARF) system uses metadata and labels to classify and manage data effectively across three levels: Core, Common, and Unique. This system allows domain experts to participate actively in the AI-driven governance processes without needing deep technical expertise, promoting a secure and collaborative development environment.

Check out the presentation below to learn how to:

Implement AI to streamline RBAC processes, enhancing data security and operational efficiency.
Understand the impact of real-time data control on organizational dynamics.
Enable domain experts to contribute securely and effectively to the AI development process.
Leverage the ARF system to adapt security measures tailored to the specific needs of various organizational units.

The post Beyond Content Management for Real Knowledge Sharing appeared first on Enterprise Knowledge.

Enterprise AI Architecture Series: How to Extract Knowledge from Unstructured Content (Part 2)

Urmi Majumder — Fri, 14 Feb 2025 18:06:15 +0000

Our CEO, Zach Wahl, recently noted in his annual KM trends blog for 2025 that Knowledge Management (KM) and Artificial Intelligence (AI) are really two sides of the same coin, detailing this idea further in his seminal blog introducing the term Knowledge Intelligence (KI). In particular, KM can play a big role in structuring unstructured content and make it more suitable for use by enterprise AI. Injecting knowledge into unstructured data using taxonomies, ontologies, and knowledge graphs will be the focus of this blog, which is Part 2 in the Knowledge Intelligence Architecture Series. I will also describe our typical approaches and experience with mining knowledge out of unstructured content to develop taxonomies and knowledge graphs. As a refresher, you can review Part 1 of this series where I introduced the high-level technical components needed for implementing any KI architecture.

Role of NLP in Structuring Unstructured Content

Natural language processing (NLP) is a machine learning technique that gives computers the ability to interpret and understand human language. According to most industry estimates, 80-90% of an organization’s data is considered to be unstructured, most of which originates from emails, chat messages, documents, presentations, videos, and social media posts. Extracting meaningful insights from such unstructured content can be difficult due to its lack of predefined structure. This is where NLP techniques can be immensely useful. NLP works through the differences in dialects, metaphors, variations in sentence structure, grammatical irregularities, and usage exceptions that are common in such data and structures it effectively. A common NLP task for analyzing unstructured content and making it machine readable is content classification. This process categorizes text into predefined classes by identifying keywords that indicate the topic of the text.

Over the past decade, we have employed numerous NLP techniques across our typical knowledge and data management engagements, focusing on unstructured content classification. With the emergence of Large Language Models (LLMs), traditional NLP tasks can now be executed with higher precision and recall while requiring significantly less development effort. The section below presents a comprehensive, though not exhaustive, range of NLP strategies incorporating both traditional ML and cutting-edge LLMs along with inherent pattern recognition capabilities in vendor platforms for content understanding and classification. Specifically, it describes for each approach the underlying architecture, illustrating the steps involved in adding context to unstructured content using semantic data assets and some relevant case studies.

1. Transfer Learning for Content Classification

Transfer learning is a method in which a deep learning model trained on a large dataset is applied to a similar task using a different dataset. Starting with a pre-trained model that has already learned linguistic patterns and structures from a significant volume of data eliminates the need for extensive labeled datasets and reduces training time. Since the release of the BERT (Bidirectional Encoder Representations from Transformers) language model in 2018, we have extensively utilized transfer learning to analyze and categorize unstructured content using a predefined classification scheme for our clients. In fact, it is often our preferred approach for entity extraction when instantiating a knowledge graph as it supports a scalable and maintainable solution for the enterprise.

Enterprise AI Architecture: Transfer Learning for Content Classification

As illustrated in the figure above, unstructured data in the enterprise can originate from many different systems beyond the conventional ones such as content management systems and websites. Such sources can include communication channels such as emails, instant messaging systems, and social media platforms, as well as digital asset management platforms to centrally store, organize, manage, and distribute media files such as images, video, and audio files within the organization. As machine learning can only work with textual data, depending on the type of content, the first step in implementing transfer learning is to employ appropriate text extraction and transformation algorithms to make the data suitable for use. Next, domain SMEs label a small chunk of the clean data to fine-tune the selected pretrained AI model with a predefined classification scheme (also provided by the domain SMEs). Post-training, the fine-tuned model is deployed to production and is available for content classification. At this stage, organizations can run their content through the operationalized transfer learning based content classification pipeline and store it in a centralized metadata repository such as a data catalog or even a simple object store that, in turn, can be used to power multiple enterprise use cases from data discovery to data analytics.

Transfer learning is one of the popular techniques we employ for entity extraction from unstructured content in our typical knowledge graph accelerator engagements. It is one of our criteria when evaluating data fabric solution vendors – especially in the case of a multinational pharmaceutical company. This is because transfer learning can easily grow with inputs from domain SMEs (with respect to data labelling and classification scheme definition) to tailor the machine prediction to the organizational needs and sustain the classification effort without extensive machine learning training. However, this does not mean that machine learning (ML) expertise is not required. For organizations that lack the internal skills to build and maintain custom ML pipelines, the following content classification approaches may be useful.

2. Taxonomy Manager-Driven Content Classification

Most modern Taxonomy Ontology Management Systems (TOMS) include a classification engine that supports automatic text classification based on a defined taxonomy. In our experience, organizations with access to a TOMS but without dedicated AI teams to develop and maintain custom ML models prefer using built-in classification capabilities of TOMS to categorize and structure their unstructured content.

Enterprise AI Architecture: Taxonomy Manager Driven Content Classification

While there are variations across TOMS vendors in how they classify unstructured content using a taxonomy (such as leveraging just textual metadata or using structural relationships between taxonomy concepts to categorize content), as shown in the figure above, the high-level architecture integrating a TOMS with enterprise systems managing unstructured content and leveraging TOMS-generated metadata is generally independent on specific TOMS platforms. In this architecture, when an information architect deems a taxonomy is ready for use, they publish the corresponding classification rules to the TOMS-specific classification engine. Typically, organizations configure custom change listeners for taxonomy publication. This helps them decide when to tag their unstructured content with the published rules and store these tags in a central metadata repository to power many use cases in the enterprise. Sometimes, however, TOMS platforms offer native connectors for specific CMS such as SharePoint or WordPress to manage automatic tagging of its delta content upon the publication of a new taxonomy version.

We work with many leading TOMS vendors in our typical taxonomy accelerator engagements, and you can learn more about specific use cases and success stories in our knowledge base regarding the application of this approach when it comes to powering content discovery – from a knowledge portal in a global investment firm to creating more personalized customer experiences using effective content assembly at a financial solutions provider organization.

3. LLM-Powered Content Classification

With the rise of LLMs in recent years, we have been working with various prompting techniques to effectively classify text using LLMs in our engagements. Based on our experimentation, we have found that a few-shot prompting approach, in which the language model is provided with a small set of labelled examples along with a prompt to guide the classification of unstructured content, achieves high accuracy in text classification tasks even with limited labeled data. This does not, however, deemphasize the need for designing effective prompts to improve accuracy of the in-context learning approach that is central to any prompt engineering technique.

Enterprise AI Architecture: LLM-Powered Content Classification

As illustrated in the figure above, a prompt in a few-shot learning approach to content classification includes the classification scheme and labelled examples from domain SMEs besides the raw text we need the LLM to classify. But because of the limitations of the context window for most state-of-the-art (SOTA) LLMs, the input text often needs to be chunked post-preprocessing and cleaning to abide by the length limitations of the prompt (also shown in the figure above). What is not included in the image, however, are the LLM optimization techniques we often employ to improve the classification task performance at scale. It is widely accepted that any natural language processing (NLP) task that requires interaction with a LLM, which is often hosted on a remote server, will not be performant by default. Therefore, in our typical engagements, we employ optimization techniques such as caching prior responses, batching multiple requests into one prompt, and classifying multiple chunks in parallel beyond basic prompt engineering to implement a scalable content classification solution for the enterprise.

Last year we used the LLM-powered content classification approach when we completed a knowledge graph accelerator project with a public safety agency in Europe, where we could not use a TOMS-driven content classification approach to instantiate a knowledge graph. This is because of the risks associated with sensitive data transfer out of Azure’s Northern European region where the solution was hosted and into the infrastructure of the hosted TOMS platform (which was outside the allowed region). In this case, a LLM-powered content classification such as a few-shot prompting approach allowed us to develop the solution by extracting entities from their unstructured content and instantiating a knowledge graph that facilitated context-based, data-driven decision making for construction site planners at the agency.

More recently, we used the LLM-powered content classification approach when we engaged with a non-profit charitable organization to analyze their healthcare product survey data to understand its adoption in a given market and demographic and ultimately inform future product development. We developed a comprehensive list of product adoption factors that are not easily identified and included in product research. We then leveraged this controlled vocabulary of product adoption factors and Azure OpenAI models to classify the free form survey responses and understand the distinct ways in which these factors influence each other, thus contributing to a more nuanced understanding of how users make decisions related to the product. This enhanced pattern detection approach enabled a holistic view of influencing factors, addressing knowledge gaps in future product development efforts at the organization.

4. AI-Augmented Topic Taxonomy Generation

Up until this point in the article, we have focused on using taxonomies to structure unstructured content. We will now shift to using machine learning to analyze unstructured content and propose taxonomies and create knowledge graphs using AI. We will discuss how in recent years, LLMs have simplified entity and relationship extraction, enabling more organizations to incorporate knowledge graphs into their data management.

While we generally do not advise our clients to use LLMs without a human-in-the-loop process to create production grade domain taxonomies, we have used LLMs in past engagements to augment and support our taxonomic experts in naming latent topics in semantically grouped unstructured content and therefore create a very rough draft version of a topic taxonomy.

Elaborating on the figure below, our approach centers on three key tasks:

Unsupervised clustering of the dataset,
Discovering latent themes within each cluster,
Creating a topic taxonomy based on these themes, and
Engaging taxonomists and domain experts to validate and enhance taxonomy.

Because of the token limits inherent in all SOTA embedding models, once raw text is extracted from unstructured content, preprocessed, and cleaned, it has to be chunked before numerical representations that encapsulate semantic information called embeddings can be created by the embedding generation service and stored in the vector database. The embedding generation service may optionally include quantization techniques to address the high memory requirements for managing embeddings of a large dataset. Post-embedding generation, the taxonomy generation pipeline focuses on semantic similarity calculation. While semantic similarity between the underlying content or corpora can be trivially computed as the inner product of embeddings, for scalability reasons, we typically project the embeddings from their original high-dimensional space to lower dimensions, while also preserving their local and global data structures. At this point, the content clustering service will be able to use the embeddings as input features of a clustering algorithm, enabling the identification of related categories based on embedding distances. The next step in the process of autogenerating taxonomy concepts is to infer the latent topic of each cluster using an LLM as part of the latent topic identification service. Finally, a draft taxonomy is available for validation and update by domain experts before it can be used to power enterprise use cases from data discovery to analytics.

Enterprise AI Architecture: AI-Augmented Topic Taxonomy Generation

We have enabled consumer-grade semantic capabilities using this very approach for taxonomy generation specifically for non-financial risk management in production at a multi-national bank by collapsing their original risk dataset from 20,000 free-text risk descriptions into a streamlined process with 1100 standardized taxonomy concepts for risk.

5. AI-Augmented Knowledge Graph Construction

AI-assistance for extracting entities and relationships from unstructured content can utilize methods ranging from transfer learning to LLM-prompting. In our experience, incorporating the schema as part of the latter technique greatly enhances the consistency of entity and relationship labeling. Before loading the extracted entities and relationships into a knowledge graph, LLMs, as well as heuristics as defined by domain SMEs, can be used to further disambiguate those entities.

Enterprise AI Architecture: AI-Augmented Knowledge Graph Construction

Our typical approach for leveraging AI to construct a knowledge graph is depicted in the figure above. It starts with unstructured content processing techniques to generate raw text from which entities can be extracted. Coreference resolution, where all mentions of the same real-world entity are replaced by the noun phrase, often forms the first step of the entity extraction process. In the next step, whether we can employ some of the techniques described in the taxonomy driven content classification section for entity extraction or not depends on the underlying ontology (knowledge model or data schema) and how many of the classes in this data model can be instantiated with a corresponding taxonomy. Even for non-taxonomy classes, we can use transfer learning and prompt engineering to accelerate the extraction of instances of ontological classes from the raw text. Next, we can optionally process the extracted entities through an entity resolution pipeline to identify and connect instances of the same real-world entity within and across content sources into a distilled representation. In the last step of the entity extraction process, if applicable, we can further disambiguate extracted entities by linking them to corresponding entries in a public or private knowledge base (such as Wikidata). Once entities are available, it is time to relate these entities following the ontology to complete the knowledge graph instantiation process. Similar to entity extraction, an array of machine learning techniques ranging from both traditional supervised and unsupervised learning techniques to more modern transfer learning and prompt engineering techniques can be used for relationship classification. For example, when developing a knowledge graph powered recommendation engine connecting learning content and product data, we compared the efficacy of an unsupervised learning approach (e.g., similarity index) to predicting relationships between entities with that of a supervised learning approach (e.g., link classifier) for doing the same.

Closing

While structuring unstructured content with semantic assets has been the focus of this blog, it is clear that it can only be effective by incorporating an organization’s most valuable knowledge asset: its human expertise and all types of data. While I will delve deeper into the technical details of how to encode this expert knowledge into enterprise AI systems in a later segment of this KI architecture blog series, it is evident from the discussion above that mining knowledge from an organization’s vast amount of unstructured content will not be possible without domain expertise. As our case studies illustrate, these complementary techniques for knowledge extraction, topic modeling, and text classification, when combined with domain expertise can help organizations achieve true KI. In the next segment of this blog series, I will explore the technical approaches for providing standardized meaning and context to structured data in the enterprise using a semantic layer. In the meantime, if our case studies describing how we brought structure to our clients’ unstructured content through metadata resonate with you, contact us to help get you started with KI.

The post Enterprise AI Architecture Series: How to Extract Knowledge from Unstructured Content (Part 2) appeared first on Enterprise Knowledge.

Enterprise AI Architecture Series: How to Build a Knowledge Intelligence Architecture (Part 1)

Urmi Majumder — Tue, 04 Feb 2025 18:07:47 +0000

Since the launch of ChatGPT over two years ago, we have observed that our clients are increasingly drawn to the promise of AI. They also recognize that the large language models (LLMs), trained on public data sets, may not effectively solve their domain-specific problems. Consequently, it would be essential to integrate domain knowledge into these AI systems to furnish them with a structured understanding of the organization. Recently, my colleague Lulit Tesfaye described three key strategies to enable such knowledge intelligence (KI) in the organization via expert knowledge capture, business context embedding and knowledge extraction using semantic layer assets and Retrieval Augmented Generation (RAG). Incorporating such a knowledge intelligence layer into enterprise architecture is not just a theoretical concept anymore but a critical necessity in the age of AI. It is a practical enhancement that transforms the way in which organizations can inject knowledge into their AI systems to allow better interpretation of data, effective reasoning and informed decision making.

When designing and implementing KI layers at client organizations, our goal is always to recommend an architecture that aligns closely with their existing enterprise architecture, providing a minimally disruptive starting point.

In this article, I will describe the common architectural patterns we have utilized over the last decade to design and implement some of the KI strategies such as automated knowledge capture, semantic layers and RAG across a diverse set of organizations. I will describe the key components of a KI layer, outlining their relationship with organizational data sources and applications through a high-level conceptual framework. In subsequent blogs, I will delve deeper into each of the 3 main strategies that details how KI integrates institutional knowledge, business context and human expertise to deliver on the promise of AI for the enterprise.

Enterprise AI Architecture: Knowledge Intelligence

Semantic Layer

A semantic layer provides standardized meaning and business context to aggregated data assets in an organization, allowing AI models to understand and process information more accurately and generate more relevant insights. Specifically, it can offer a more intuitive and connected representation of organizational data entities without having to physically move the data and it does so through use of metadata, business glossaries, taxonomies, ontologies and knowledge graphs.

When implementing a semantic layer, we often encounter this common misconception that a semantic layer is a single product such as a graph database or a data catalog. While we have been developing the individual components of a semantic layer for almost a decade, we have only been integrating them all into a semantic layer in the last couple of years. You can learn more about the typical semantic layer architectures that we have implemented for our clients here. For implementing specific components of the semantic layer before they can all be integrated into a logical abstraction layer over enterprise data, we work with most top vendors in the space and leverage our proprietary vendor evaluation matrix to identify the appropriate tool for our client whether it is a taxonomy ontology management platform (TOMS), a graph database or a data catalog. You can read this article to learn more about our high level considerations when choosing any knowledge management platform including semantic layer tools.

Expert Knowledge Capture

This KI component programmatically encodes both implicit and explicit domain expert knowledge into a structured repository of information, allowing AI systems to incorporate an organization’s most valuable assets, its tacit knowledge and human expertise into its decision making process. While tacit knowledge is difficult to articulate, record and disseminate, using modern AI tools, it can be easily mined from recorded interactions (such as meeting transcripts, chat history) with domain experts. Explicit knowledge, although documented, is often not easily discoverable. State-of-the-art LLM models and taxonomies, however, make tagging this knowledge with meaningful metadata quite straightforward. In other words, in the age of AI while content capture may be a breeze, transforming the captured content into knowledge requires some thought. You can learn more about the best practices we often share with our clients for effective knowledge base management here. In particular, we have written extensively about improving the quality of knowledge bases using metadata and the big role taxonomy plays in it. With a taxonomy in place, it all comes down to teaching a machine learning (ML) model the domain-specific language that is used to describe the content so that it can accurately auto-classify it. See this article to learn more about our auto-tagging approach.

Another aspect of expert knowledge capture is to engage domain experts in annotating datasets with contextual information or providing them with an embedded feedback loop to review AI outputs and provide corrections and enhancements. While annotation and feedback capabilities can be included in a vendor platform such as a data science workbench in a data management platform or taxonomy concept approval workflow in a taxonomy management system, we have implemented custom workflows to capture this domain knowledge for our clients as well. For example, you can read more about our human-in-the-loop taxonomy development process here or SME validation of taxonomy tag application process here.

Retrieval Augmented Generation

A Retrieval Augmented Generation (RAG) framework allows LLMs to access up-to-date organizational knowledge bases instead of relying solely on the LLM’s pre-trained knowledge to provide more accurate and contextually relevant outputs. An enterprise RAG application may even require reasoning based on specific relationships between knowledge fragments to collect information related to answering who/what/when/how/where questions as opposed to relying only on semantic similarity with complete knowledge base items. Thus we typically leverage two or more types of information retrieval systems when solving KI use cases through RAG for our customers.

In its most basic form, a RAG application can be developed with an LLM, an embedding model and a vector database. You can read more about how we implemented this architecture to power semantic search in a multinational development bank here. In reality, however, RAG implementations rely on additional information retrieval systems in the enterprise such as search engines or data warehouses as well as semantic layer assets such as knowledge graphs. In addition, RAG applications require elaborate data orchestration between the available knowledge bases and the LLM; popular frameworks such as LangChain and LlamaIndex can greatly simplify this orchestration by providing abstractions for common RAG steps such as indexing, retrieval, and workflows. Finally, to take any POC implementation of a RAG application to production, we need to leverage some of the data integrations and shared services such as monitoring, security described below.

Data Integration

Just like any data integration, aggregation and transformation layer, a KI layer depends on various tools to extract, connect, transform and unify both structured and unstructured data sources. These tools include ELT (Extract, Load and Transform) and ETL (Extract, Transform and Load) tools, like Apache Airflow, API management platforms, like MuleSoft, and data virtualization platforms, like Tibco Data Virtualization. Typically, these integration and transformation patterns are well-established within organizations; hence, we often recommend that our clients reuse proven design patterns wherever possible. Additionally, we advise our clients to leverage established data cleansing techniques before sending the data to the KI layer for further enrichment and standardization.

KI Applications

While chatbots remain the most common application of KI, we have leveraged KI to power intelligent search, recommendation engines, agentic AI workflows and business intelligence applications for our clients. In our experience, KI applications range from fully custom applications such as AI agents to configurable Software-as-a-Service (SaaS) platforms such as AI search engines.

Shared Services

Services including data security management, user and system access management, logging, monitoring and other centralized IT functions within an organization will need to be integrated with the KI layer in accordance with established organizational protocols.

Case Study

While we have been implementing individual KI components at client organizations over the past decade, only recently we have begun to implement and integrate multiple KI components to enable organizations to extract maximum value from their AI efforts. For example, over the last two years we established a data center of excellence at a multinational bank to enable effective non-financial risk management by implementing and integrating two distinct KI components: semantic layer and expert knowledge capture and transfer. Using a semantic layer, we injected business context into their structured datasets by enriching it using a standardized categorization structure, contextualizing it using a domain ontology and connecting it via a knowledge graph. As a result, when instantiated and deployed to production, the graph became an authoritative source of truth and provided a solid foundation for advanced analytics and AI capabilities to improve the efficiency and accuracy of the end-to-end risk management process. We also implemented the expert knowledge capture component of KI by programmatically encoding domain knowledge and business context into the taxonomies and ontologies we developed for this initiative. For example, we created a new risk taxonomy by mining free text risk descriptions using a ML pipeline but significantly shortened the overall development time by embedding human feedback in the pipeline. Specifically, we provided domain experts with embedded tools and processes to review model outputs and provide corrections and additional annotations that were in turn leveraged to refine the ML models and create the finalized taxonomy in an iterative fashion. In the end both KI components enabled the firm to establish a robust foundation for enhanced risk management; it powered consumer-grade AI capabilities running on the semantic layer that streamlined access to critical insights through intelligent search, linked data view and query, thereby improving regulatory reporting, and fostering a more data-driven risk management culture at the firm.

Closing

While there are a number of approaches to designing and implementing the core KI components described above, there are best practices to ensure the quality and scalability of the solution. The upcoming blogs in this series zoom into each of these components, enumerate the approaches for implementing each component, discuss how to achieve KI from a technical perspective, and detail how each component would support the development of Enterprise AI with real-life case studies. As with any technical implementation, we recommend grounding any KI implementation effort in a business case, starting small and iterating, beginning with a few source systems to lay a solid foundation for an enterprise KI layer. Once the initial KI layer has been established, it is easier to expand the KI ecosystem while enabling foundational AI models to generate meaningful content, make intelligent predictions, discover hidden insights, and drive valuable business outcomes.

Looking for technical advisory on how to get your KI layer off the ground? Contact us to get started.

The post Enterprise AI Architecture Series: How to Build a Knowledge Intelligence Architecture (Part 1) appeared first on Enterprise Knowledge.

Hybrid Approaches to Green Information Management: A Case Study

Urmi Majumder — Wed, 18 Dec 2024 15:13:12 +0000

Today, enterprises have more tools than ever for creating and sharing information, which leads to significant challenges in managing duplicate content. Enterprise Knowledge’s Urmi Majumder, Principal Consultant, and Nina Spoelker, Consultant, presented “Hybrid Approaches to Green Information Management: A Case Study” on Thursday, November 21 at Text Analytics Forum—one of five events underneath KMWorld 2024—in Washington, D.C.

In this presentation, Majumder and Spoelker explored how a large supply chain organization implemented green information management best practices to support their sustainability goals, showcasing a hybrid AI framework combining heuristic and LLM-based approaches to effectively analyze and reduce duplicate content across enterprise repositories at scale. They demonstrated the environmental benefits of reducing duplicate content, focusing on carbon footprint reduction, and addressed how this information ultimately pushes for a cultural shift among employees to want to contribute to greener information management within their organizations.

Participants in this session gained insights into:

What “green” information management means;
The practical implementation of AI-driven content analysis frameworks;
The environmental impact of effective data management, and the importance of integrating ESG goals into information management strategies; and
How modern AI techniques can transform their enterprise’s data practices and support a sustainable future.

The post Hybrid Approaches to Green Information Management: A Case Study appeared first on Enterprise Knowledge.

Semantic Maturity Spectrum: Search with Context

Urmi Majumder — Tue, 26 Nov 2024 20:57:09 +0000

EK’s Urmi Majumder and Madeleine Powell jointly delivered the presentation ‘Semantic Maturity Spectrum: Search with Context’ at the MarkLogic World Conference on September 24, 2024.

Semantic search has long proven to be a powerful tool in creating intelligent search experiences. By leveraging a semantic data model, it can effectively understand the searcher’s intent and the contextual meaning of the terms to improve search accuracy. In this session, Majumder and Powell presented case studies for three different organizations across three different industries (finance, pharmaceuticals, and federal research) that started their semantic search journey at three very different maturity levels. For each case study, they described the business use case, solution architecture, implementation approach, and outcomes. Finally, Majumder and Powell rounded out the presentation with a practical guide to getting started with semantic search projects using the organization’s current maturity in the space as a starting point.

The post Semantic Maturity Spectrum: Search with Context appeared first on Enterprise Knowledge.

Mastering the Dark Data Challenge: Harnessing AI for Enhanced Data Governance and Quality

Maryam Nozari — Tue, 06 Aug 2024 18:14:56 +0000

Enterprise Knowledge’s Maryam Nozari, Senior Data Scientist, and Urmi Majumder, Principal Data Architecture Consultant, presented a talk on “Mastering the Dark Data Challenge: Harnessing AI for Enhanced Data Governance and Quality” at the Data Governance & Information Quality Conference (DGIQ) on June 5, 2024, in San Diego.

In this engaging session, Nozari and Majumder explored the challenges and opportunities presented by the rapid evolution of Large Language Models (LLMs) and the exponential growth of unstructured data within enterprises. They also addressed the critical intersection of technology and data governance necessary for managing AI responsibly in an era dominated by data breaches and privacy concerns.

Check out the presentation below to learn more about:

A comprehensive framework to define and identify dark data
Innovative AI solutions to secure data effectively
Actionable insights to help organizations enhance data privacy and achieve regulatory compliance within the AI-driven data ecosystem

The post Mastering the Dark Data Challenge: Harnessing AI for Enhanced Data Governance and Quality appeared first on Enterprise Knowledge.

Driving Behavioral Change for Information Management through Data-Driven Green Strategy (EDW 2024)

Urmi Majumder — Fri, 03 May 2024 17:46:00 +0000

Enterprise Knowledge’s Urmi Majumder, Principal Data Architecture Consultant, and Fernando Aguilar Islas, Senior Data Science Consultant, presented “Driving Behavioral Change for Information Management through Data-Driven Green Strategy” on March 27, 2024 at Enterprise Data World (EDW) in Orlando, Florida.

In this presentation, Majumder and Aguilar Islas discussed a case study describing how the information management division in a large supply chain organization drove user behavior change through awareness of the carbon footprint of their duplicated and near-duplicated content, identified via advanced data analytics. Check out their presentation to gain valuable perspectives on utilizing data-driven strategies to influence positive behavioral shifts and support sustainability initiatives within your organization.

In this session, participants gained answers to the following questions:

What is a Green Information Management (IM) Strategy, and why should you have one?
How can Artificial Intelligence (AI) and Machine Learning (ML) support your Green IM Strategy through content deduplication?
How can an organization use insights into their data to influence employee behavior for IM?
How can you reap additional benefits from content reduction that go beyond Green IM?

The post Driving Behavioral Change for Information Management through Data-Driven Green Strategy (EDW 2024) appeared first on Enterprise Knowledge.

Semantic Layers from a Solutions Architect’s Point of View

Urmi Majumder — Thu, 04 Apr 2024 19:21:00 +0000

In this Enterprise Knowledge video, Principal Consultant Urmi Majumder speaks about her technical experience with a Semantic Layer. Urmi shares her transition from conventional data management to the nuanced field of semantics, beginning her semantic career at EK. She delves into the intricacies of using large-scale content and IT extraction to power discovery through semantics, contrasting it with previous methods that relied on simple string matching.

Additionally, Urmi speaks to the skill gap concerns for professionals entering the field of semantic technology, offering reassurance about the learnability of relevant concepts and tools. She underscores the increasing adoption of semantic layers in data management platforms across industries and the critical role data architects play in leveraging this technology for data democratization.

For those contemplating the adoption of semantic technologies or seeking to deepen their understanding, this interview provides valuable guidance on starting small, leveraging in-house resources, and the importance of maintaining quality through human oversight in the era of AI. Watch now to discover the significance of semantic search in modern data analysis and how it mirrors the intuitive search experiences users have come to expect from decades of using Google, and explore how semantic layers can revolutionize data management practices and pave the way for a data-centric future.

The post Semantic Layers from a Solutions Architect’s Point of View appeared first on Enterprise Knowledge.

Scaling Knowledge Graph Architectures with AI

Sara Nash — Thu, 30 Nov 2023 16:45:28 +0000

Sara Nash and Urmi Majumder, Principal Consultants at Enterprise Knowledge, presented “Scaling Knowledge Graph Architectures with AI” on November 9th, 2023 at KM World in Washington D.C. In this presentation, Nash and Majumder defined a Knowledge Graph architecture and reviewed how AI can support the creation and growth of Knowledge Graphs. Drawing from their experience in designing enterprise Knowledge Graphs based on knowledge embedded in unstructured content, Nash and Majumder defined approaches for entity and relationship extraction depending on Enterprise AI maturity and highlighted other key considerations to incorporate AI capabilities into the development of a Knowledge Graph. Check out the presentation below to learn how to:

Assess entity and relationship extraction readiness according to EK’s Extraction Maturity Spectrum and Relationship Extraction Maturity Spectrum.
Utilize knowledge extraction from content to translate important insights into organizational data.
Extract knowledge with three approaches:
- RedEx Rule
- Auto-Classification Rule
- Custom ML Model
Examine key factors such as how to leverage SMEs, iterate AI processes, define use cases, and invest in establishing robust AI models.

The post Scaling Knowledge Graph Architectures with AI appeared first on Enterprise Knowledge.