Technology Solutions Articles - Enterprise Knowledge

Optimizing Historical Knowledge Retrieval: Leveraging an LLM for Content Cleanup

EK Team — Wed, 02 Jul 2025 19:39:00 +0000

The Challenge

Enterprise Knowledge (EK) recently worked with a Federally Funded Research and Development Center (FFRDC) that was having difficulty retrieving relevant content in a large volume of archival scientific papers. Researchers were burdened with excessive search times and the potential for knowledge loss when target documents could not be found at all. To learn more about the client’s use case and EK’s initial strategy, please see the first blog in the Optimizing Historical Knowledge Retrieval series: Standardizing Metadata for Enhanced Research Access.

To make these research papers more discoverable, part of EK’s solution was to add “about-ness” tags to the document metadata through a classification process. Many of the files in this document management system (DMS) were lower quality PDF scans of older documents, such as typewritten papers and pre-digital technical reports that often included handwritten annotations. To begin classifying the content, the team first needed to transform the scanned PDFs into machine-readable text. EK utilized an Optical Character Recognition (OCR) tool, which can “read” non-text file formats for recognizable language and convert it into digital text. When processing the archival documents, even the most advanced OCR tools still introduced a significant amount of noise in the extracted text. This frequently manifested as:

A table, figure, or handwriting in the document being read in as random symbols and white space.
Inserting random punctuation where a spot or pen mark may have been on the file, breaking up words and sentences.
Excessive or misplaced line breaks separating related content.
Other miscellaneous irregularities in the text that make the document less comprehensible.

The first round of text extraction using out-of-the-box OCR capabilities resulted in many of the above issues across the output text files. This starter batch of text extracts was sent to the classification model to be tagged. The results were assessed by examining the classifier’s evidence within the document for tagging (or failing to tag) a concept. Through this inspection, the team found that there was enough clutter or inconsistency within the text extracts that some irrelevant concepts were misapplied and other, applicable concepts were being missed entirely. It was clear from the negative impact on classification performance that document comprehension needed to be enhanced.

Auto-Classification
Auto-Classification (also referred to as auto-tagging) is an advanced process that automatically applies relevant terms or labels (tags) from a defined information model (such as a taxonomy) to your data. Read more about Enterprise Knowledge’s auto-tagging solutions here:

The Solution

To address this challenge, the team explored several potential solutions for cleaning up the text extracts. However, there was concern that direct text manipulation might lead to the loss of critical information if blanket applied to the entire corpus. Rather than modifying the raw text directly, the team decided to leverage a client-side Large Language Model (LLM) to generate additional text based on the extracts. The idea was that the LLM could potentially better interpret the noise from OCR processing as irrelevant and produce a refined summary of the text that could be used to improve classification.

The team tested various summarization strategies via careful prompt engineering to generate different kinds of summaries (such as abstractive vs. extractive) of varying lengths and levels of detail. The team performed a human-in-the-loop grading process to manually assess the effectiveness of these different approaches. To determine the prompt to be used in the application, graders evaluated the quality of summaries generated per trial prompt over a sample set of documents with particularly low-quality source PDFs. Evaluation metrics included the complexity of the prompt, summary generation time, human readability, errors, hallucinations, and of course – precision of auto-classification results.

The EK Difference

Through this iterative process, the team determined that the most effective summaries for this use case were abstractive summaries (summaries that paraphrase content) of around four complete sentences in length. The selected prompt generated summaries with a sufficient level of detail (for both human readers and the classifier) while maintaining brevity. To improve classification, the LLM-generated summaries are meant to supplement the full text extract, not to replace it. The team incorporated the new summaries into the classification pipeline by creating a new metadata field for the source document. The new ‘summary’ metadata field was added to the auto-classification submission along with the full text extracts to provide additional clarity and context. This required adjusting classification model configurations, such as the weights (or priority) for the new and existing fields.

Large Language Models (LLMs)
A Large Language Model is an advanced AI model designed to perform Natural Language Processing (NLP) tasks, including interpreting, translating, predicting, and generating coherent, contextually relevant text. Read more about how Enterprise Knowledge is leveraging LLMs in client solutions here:

The Results

By including the LLM-generated summaries in the classification request, the team was able to provide more context and structure to the existing text. This additional information filled in previous gaps and allowed the classifier to better interpret the content, leading to more precise subject tags compared to using the original OCR text alone. As a bonus, the LLM-generated summaries were also added to the document metadata in the DMS, further improving the discoverability of the archived documents.

By leveraging the power of LLMs, the team was able to clean up noisy OCR output to improve auto-tagging capabilities as well as further enriching document metadata with content descriptions. If your organization is facing similar challenges managing and archiving older or difficult to parse documents, consider how Enterprise Knowledge can assist in optimizing your content findability with advanced AI techniques.

Download Flyer

Ready to Get Started?

Get in Touch

The post Optimizing Historical Knowledge Retrieval: Leveraging an LLM for Content Cleanup appeared first on Enterprise Knowledge.

Enhancing Insurance Fraud Detection through Graph-Based Link Analysis

EK Team — Wed, 21 May 2025 17:29:35 +0000

The Challenge

Technology is increasingly used as both a force for good and as a means to exploit vulnerabilities that greatly damage organizations – whether financially, reputationally, or through the release of classified information. Consequently, efforts to combat this fraud must evolve to become more sophisticated with each passing day. The field of fraud analytics is rapidly emerging and, over the past 10 years, has expanded to include graph analytics as a critical method for detecting suspicious behavior.

In one such application, a national agency overseeing insurance claims engaged EK to advise on developing and implementing graph-based analytics to support fraud detection. The agency had a capable team of data scientists, program analysts, and engineers focused on identifying suspicious activity among insurance claims, such as:

Personal information being reused across multiple claims;
Claims being filed under the identities of deceased individuals; or
Individuals claiming insurance from multiple locations.

However, they were reliant on relational databases to accomplish these tasks. This made it difficult for program analysts to identify subtle connections between records in tabular format, with data points often differing by just a single digit or character. Additionally, while the organization was effective at flagging anomalies and detecting potentially suspicious behavior, they faced challenges relating to legacy software applications and limited traditional data analytics processes.

EK was engaged to provide the agency with guidance on standing up graph capabilities. This graph-based solution would transform claim information into interconnected nodes, revealing hidden relationships and patterns among potentially fraudulent claims. In addition, EK was asked to build the agency’s internal expertise in graph analytics by sharing the methods and processes required to uncover deeper, previously undetectable patterns of suspicious behavior.

The Solution

To design a solution suitable for the agency’s production environment, EK began by assessing the client’s existing data infrastructure and analytical capabilities. Their initial cloud solution featured a relational database, which EK suggested extending with a graph database through the same cloud computing platform vendor for easy integration. Additionally, to identify suspicious connections between claims in a visual format, EK recommended an approach for the agency to select and integrate a link analysis visualization tool. These tools are crucial to a link analysis solution and allow for the graphical visualization of entities alongside behavior detection features that identify data anomalies, such as timeline views of relationship formation. EK made this recommendation using a custom and proprietary tooling evaluation matrix that facilitates informed decision-making based on a client’s priority factors. Once the requisite link analysis components were identified, EK delivered a solution architecture with advanced graph machine learning functionality and an intuitive user experience that promoted widespread adoption among technical and nontechnical stakeholders alike.

EK also assessed the agency’s baseline understanding of graphical link analysis and developed a plan for upskilling existing data scientists and program analysts on the foundations of link analysis. Through a series of primer sessions, EK’s subject matter experts introduced key concepts such as knowledge graphs, graph-based link analysis for detecting potentially suspicious behavior, and the underlying technology architecture required to instantiate a fully functional solution at the agency.

Finally, EK applied our link analysis experience to address client challenges by laying out a roadmap and implementation plan that detailed challenges along with proposed solutions to overcome them. This took the form of 24 separate recommendations and the delivery of bespoke materials meant to serve as quick-start guides for client reference.

The EK Difference

A standout feature of this project is its novel, generalizable technical architecture:

During the course of the engagement, EK relied on its deep expertise in unique domains such as knowledge graph design, cloud-based SaaS architecture, graph analytics, and graph machine learning to propose an easily implementable solution. To support this goal, EK developed an architecture recommendation that prompted as few modifications to existing programs and processes as possible. With the proposed novel architecture utilizing the same cloud platform that already hosted client data, the agency could implement the solution in production with minimal effort.

Furthermore, EK adapted a link analysis maturity benchmark and tool evaluation matrix to meet the agency’s needs and ensure that all solutions were aligned with the agency’s goal. Recognizing that no two clients face identical challenges, EK delivered a customized suite of recommendations and supporting materials that directly addressed the agency’s priorities, constraints, and long-term needs for scale.

The Results

Through this engagement, EK provided the agency with the expertise and tools necessary to begin constructing a production-ready solution that will:

Instantiate claims information into a knowledge graph;
Allow users to graphically explore suspicious links and claims through intuitive, no-code visualizations;
Alert partner agencies and fraud professionals to suspicious activity using graph-based machine learning algorithms; and
Track changes in data over time by viewing claims through a temporal lens.

In parallel, key agency stakeholders gained practical skills related to knowledge graphs, link analysis, and suspicious behavior detection using graph algorithms and machine learning, significantly enhancing their ability to address complex insurance fraud cases and support partner agency enforcement efforts.

Interested in strengthening your organization’s fraud detection capabilities? Want to learn what graph analytics can do for you? Contact us today!

Download Flyer

Ready to Get Started?

Get in Touch

The post Enhancing Insurance Fraud Detection through Graph-Based Link Analysis appeared first on Enterprise Knowledge.

Graph Solutions PoC to Production: Overcoming the Barriers to Success (Part I)

David Hughes — Thu, 15 May 2025 13:16:55 +0000

Part I: A Review of Why Graph PoCs Struggle to Demonstrate Success or Progress to Production

This is Part 1 of a two-part series on graph database PoC success and production deployment.

Introduction

I began my journey with graphs around 2014 when I discovered network theory and tools like NetworkX and Neo4j. As our world becomes increasingly connected, it makes sense to work with data by leveraging its inherent connections. Soon, every problem I faced seemed ideally suited for graph solutions.

Early in my experiences, I worked with a biotech startup, exploring how graphs could surface insights into drug-protein interactions (DPI). The team was excited about graphs’ potential to reveal latent signals that traditional analytics missed. With a small budget, we created a Proof-of-Concept (PoC) to demonstrate the “art of the possible.” After a quick kick-off meeting, we loaded data into a free graph database and wrote queries exploring the DPI network. In just three months, we established novel insights that advanced the team’s understanding.

Despite what we considered success, the engagement wasn’t extended. More troubling, I later learned our PoC had been put into a production-like environment where it failed to scale in performance or handle new data sources. What went wrong? How had we lost the potential scientific value of what we’d built?

This experience highlights a common problem in the graph domain: many promising PoCs never make it to production. Through reflection, I’ve developed strategies for avoiding these issues and increasing the likelihood of successful transitions to production. This blog explores why graph PoCs fail and presents a holistic approach for success. It complements the blog Why Graph Implementations Fail (Early Signs & Successes).

Why Graph Database Solutions and Knowledge Graph PoCs Often Fail

Organizational Challenges

Lack of Executive Sponsorship and Alignment

Successful production deployments require strong top-level support. Without executive buy-in, graph initiatives seldom become priorities or receive funding. Executives often don’t understand the limitations of existing approaches or the paradigm shift that graphs represent.

The lack of sponsorship is compounded by how graph practitioners approach stakeholders. We often begin with technical explanations of graph theory, ontologies, and the differences between Resource Description Framework (RDF) and Label Property Graphs (LPG), rather than focusing on business value. No wonder executives struggle to understand why graph initiatives deserve funding over other projects. I’ve been guilty of this myself, starting conversations with “Let me tell you about Leonhard Euler and graph theory…” instead of addressing business problems directly.

Middle Management Resistance and Data Silos

Even with executive interest, mid-level managers can inhibit progress. Many have vested interests in existing systems and fear losing control over their data domains. They’re comfortable with familiar relational databases and may view knowledge graphs as threats to their “systems of record”. This presents an opportunity to engage managers and demonstrate how graphs can integrate with existing systems and support their goals.For example, a graph database may load data “just in time” to perform a connected data analysis and then drop the data after returning the analytic results. This would be an ephemeral use of graph analytics.

Bureaucracy and Data Duplication Concerns

Large organizations have lengthy approval processes for new technologies. Infrastructure teams may be reluctant to support experimental technology without an established return on investment (ROI).

A critical but often undiscussed factor is that graph databases typically require extracting data from existing sources and creating another copy—raising security risks, infrastructure costs, and data synchronization concerns. This is the Achilles heel of graph databases. However, emerging trends in decoupling data from query engines may offer alternatives to this problem. A new paradigm is emerging where data in data lakes can be analyzed through a graph lens at rest without an ETL ingestion into a graph database. Graph query engines enable data to be viewed through traditional relational and now connected data lenses.

Isolated Use Cases and Limited Understanding

Many graph initiatives start as isolated projects tackling narrow use cases. While this limits upfront risk, it can make the impact seem trivial. Conventional technologies might solve that single problem adequately, leading skeptics to question whether a new approach is needed. The real value of knowledge graphs emerges when connecting data across silos—something that’s hard to demonstrate in limited-scope PoCs.

A practical approach I’ve found effective is asking stakeholders to diagram their problem at a whiteboard. This naturally reveals how they’re already thinking in graph terms, making it easier to demonstrate the value of a graph approach.

Talent and Skills Gap

Graph technologies require specialized skills that are in short supply. Learning curve issues affect even experienced developers, who must master new query languages and paradigms. This shortage of expertise can lead to reliance on a few key individuals, putting projects at risk if they leave.

Technical Challenges

Complex Data Modeling

Graph data models require a different mindset than relational schemas. Designing an effective graph schema or ontology is complex, and mistakes can lead to poor performance. Equally, an effective semantic layer is critical to understanding the meaning of an organization’s data. The schema-less flexibility of graphs can be a double-edged sword—without careful planning, a PoC might be built ad-hoc and prove inefficient or lead to data quality issues when scaled up. Refactoring a graph model late in development can be a major undertaking that casts doubt on the technology itself.

Integration Challenges

Enterprise data rarely lives in one place. Integrating graphs and other solutions with legacy systems requires extensive data mapping and transformation. Without smooth interoperability via connectors, APIs, or virtualization layers, the graph becomes an isolated silo with limited production value. Decoupled approaches mentioned above address this solution by focusing on graph and connected data analytics as a standalone feature of graph query engines. Tooling optimized for graphs are making ETL and integration of graph databases easier and more efficient.

Performance Trade-offs

Graph databases excel at traversing complex relationships but may underperform for simple transactions compared to optimized relational databases. In a PoC with a small dataset, this may not be immediately noticeable, but production workloads expose these limitations. As data volumes grow, even traversals that were fast in the PoC can slow significantly, requiring careful performance tuning and possibly hybrid approaches.

Evolving Standards and Tooling

The graph ecosystem is still evolving, with multiple database models and query languages (Cypher, Gremlin, SPARQL). More recently, decoupled graph query engines enable analyzing tabular and columnar data as if it were a graph, supporting the concept of “Single Copy Analytics” and concurrently increasing the breadth of options for graph analytics. Unlike the relational world with SQL and decades of tooling, graph technologies lack standardization, making it difficult to find mature tools for monitoring, validation, and analytics integration. This inconsistency means organizations must develop more in-house expertise and custom tools.

Production Readiness Gaps

Production deployment requires high availability, backups, and disaster recovery—considerations often overlooked during PoCs. Some graph databases lack battle-tested replication, clustering, and monitoring solutions. Integrating with enterprise logging and DevOps pipelines requires additional effort that can derail production transitions. In the next blog on this topic, we will present strategies for integrating logging into a PoC and production releases.

Scaling Limitations

Graph databases often struggle with horizontal scaling compared to relational databases. While this isn’t apparent in small PoCs, production deployment across multiple servers can reveal significant challenges. As graphs grow larger and more complex, query performance can degrade dramatically without careful tuning and indexing strategies. We will explore how to thoughtfully scale graph efforts in the next blog on taking projects from PoC to Production.

Security and Compliance Challenges

Access Control Complexity

Graphs connect data in ways that complicate fine-grained access control. In a relational system, you might restrict access to certain tables; in a graph, queries traverse multiple node types and relationships. Implementing security after the fact is tremendously complex. Demonstrating that a graph solution can respect existing entitlements and implement role-based access control is crucial.

Sensitive Data and Privacy Risks

Graphs can amplify privacy concerns because of their connected nature. An unauthorized user gaining partial access might infer much more from relationship patterns. This interconnectedness raises security stakes—you must protect not just individual data points but relationships as well.

Regulatory Compliance

Regulations like GDPR, HIPAA, or PCI present unique challenges for graphs. For instance, GDPR’s “right to be forgotten” is difficult to implement when deleting a node might leave residual links or inferred knowledge. Auditing requires tracking which relationships were traversed, and demonstrating data lineage becomes complex. If compliance wasn’t planned for in the PoC, retrofitting it can stall production deployment.

Financial and ROI Challenges

Unclear Business Value

Justifying a graph project financially is tricky, especially when benefits are long-term or indirect. A PoC might show an interesting capability, but translating that into clear ROI is difficult if only one use case is demonstrated. Without a strong business case tied to measurable Key Performance Indicators (KPIs), projects struggle to secure production funding.

Scaling Costs

PoCs often leverage free or low-cost resources. However, production deployment requires enterprise licenses, robust infrastructure, and high-availability configurations. An enterprise-level knowledge graph spanning multiple use cases can incur significant long-term costs. These financial requirements can shock organizations that didn’t plan for them.

Operational and Talent Expenses

Beyond technology costs, successfully operating a knowledge graph requires specialized talent—data engineers, knowledge engineers, and graph database administrators. While a PoC might be built by a single person or small team, maintaining a production graph could require several dedicated staff. This represents a significant ongoing expense that organizations often underestimate.

Competing Priorities

Every project competes for finite resources. Graph initiatives promise strategic long-term benefits but may seem less immediately impactful than customer-facing applications. Organizations focused on quarterly results may abandon graph projects if they don’t show quick wins. Breaking the roadmap into phased deliverables demonstrating incremental value can help maintain support.

Data Governance and Scalability Challenges

Ontology and Data Stewardship

Knowledge graphs require consistent definitions across the enterprise. Many organizations lack ontology expertise, leading to inconsistent data modeling. Strong governance is essential to manage how data elements are defined, connected, and updated. Without data stewards responsible for accuracy, production graphs can become unreliable or inconsistent, undermining user trust.

Conclusion

Transitioning a graph database or knowledge graph from PoC to production involves multiple challenges across organizational, technical, security, financial, governance, and talent dimensions. Many promising PoCs fail to cross this “last mile” due to one or more of these issues.

In Part Two, I’ll outline a holistic strategy for successful graph initiatives that can effectively transition to production—incorporating executive alignment, technical best practices, emerging trends like GraphRAG and semantic layers, and the critical people-process factors that make the difference between a stalled pilot and a thriving production deployment.

The post Graph Solutions PoC to Production: Overcoming the Barriers to Success (Part I) appeared first on Enterprise Knowledge.

Enhancing Taxonomy Management Through Knowledge Intelligence

Maryam Nozari — Wed, 30 Apr 2025 20:56:44 +0000

In today’s data-driven world, managing taxonomies has become increasingly complex, requiring a balance between precision and usability. The Knowledge Intelligence (KI) framework – a strategic integration of human expertise, AI capabilities, and organizational knowledge assets – offers a transformative approach to taxonomy management. This blog explores how KI can revolutionize taxonomy management while maintaining strict compliance standards.

The Evolution of Taxonomy Management

Traditional taxonomy management has long relied on Subject Matter Experts (SME) manually curating terms, relationships, and hierarchies. While this time-consuming approach ensures accuracy, it struggles with scale. Modern organizations generate millions of documents across multiple languages and domains, and manual curation simply cannot keep pace with the large variety and velocity of organizational data while maintaining the necessary precision. Even with well-defined taxonomies, organizations must continuously analyze massive amounts of content to verify that their taxonomic structures accurately reflect and capture the concepts present in their rapidly growing data repositories.

In the scenario above, traditional AI tools might help classify new documents, but an expert-guided recommender brings intelligence to the process.

KI-Driven Taxonomy Management

KI represents a fundamental shift from traditional AI systems, moving beyond data processing to true knowledge understanding and manipulation. As Zach Wahl explains in his blog, From Artificial Intelligence to Knowledge Intelligence, KI enhances AI’s capabilities by making systems contextually aware of an organization’s entire information ecosystem and creating dynamic knowledge systems that continuously evolve through intelligent automation and semantic understanding.

At its core, KI-driven taxonomy management works through a continuous cycle of enrichment, validation, and refinement. This approach integrates domain expertise at every stage of the process:

1. During enrichment, SMEs guide AI-powered discovery of new terms and relationships.

2. In validation, domain specialists ensure accuracy and compliance of all taxonomy modifications.

3. Through refinement, experts interpret usage patterns to continuously improve taxonomic structures.

By systematically injecting domain expertise into each stage, organizations transform static taxonomies into adaptive knowledge frameworks that continue to evolve with user needs while maintaining accuracy and compliance. This expert-guided approach ensures that AI augments rather than replaces human judgement in taxonomy development.

Enrichment: Augmenting Taxonomies with Domain Intelligence

When augmenting the taxonomy creation process with AI, SMEs begin by defining core concepts and relationships, which then serve as seeds for AI-assisted expansion. Using these expert-validated foundations, systems employ Natural Language Processing (NLP) and Generative AI to analyze organizational content and extract relevant phrases that relate to existing taxonomy terms.

Topic modeling, a set of algorithms that discover abstract themes within collections of documents, further enhances this enrichment process. Topic modeling techniques like BERTopic, which uses transformer-based language models to create coherent topic clusters, can identify concept hierarchies within organizational content. The experts evaluate these AI-generated suggestions based on their specialized knowledge, ensuring that automated discoveries align with industry standards and organizational needs. This human-AI collaboration creates taxonomies that are both technically sound and practically useful, balancing precision with accessibility across diverse user groups.

Validation: Maintaining Compliance Through Structured Governance

What sets the KI framework apart is its unique ability to maintain strict compliance while enabling taxonomy evolution. Every suggested change, whether generated through user behavior or content analysis, goes through a structured governance process that includes:

Automated compliance checking against established rules;
Human expert validation for critical decisions;
Documentation of change justifications; and
Version control with complete audit trails.

Organizations implementing KI-driven taxonomy management see transformative results including improving search success rates and decreasing the time required for taxonomy updates. More importantly, taxonomies become living knowledge frameworks that continuously adapt to organizational needs while maintaining compliance standards.

Refinement: Learning From Usage to Improve Taxonomies

By systematically analyzing how users interact with taxonomies in real-world scenarios, organizations gain invaluable insights into potential improvements. This intelligent system extends beyond simple keyword matching—it identifies emerging patterns, uncovers semantic relationships, and bridges gaps between formal terminology and practical usage. This data-driven refinement process:

Analyzes search patterns to identify semantic relationships;
Generates compliant alternative labels that match user behavior;
Routes suggestions through appropriate governance workflows; and
Maintains an audit trail of changes and justifications.

The refinement process analyzes the conceptual relationship between terms, evaluates usage contexts, and generates suggestions for terminological improvements. These suggestions—whether alternative labels, relationship modifications, or new term additions—are then routed through governance workflows where domain experts validate their accuracy and compliance alignment. Throughout this process, the system maintains a comprehensive audit trail documenting not only what changes were made but why they were necessary and who approved them.

Case Study: KI in Action at a Global Investment Bank

To show the practical application of the continuous, knowledge-enhanced taxonomy management cycle, in the following section we describe a real-world implementation at a global investment bank.

Challenge

The bank needed to standardize risk descriptions across multiple business units, creating a consistent taxonomy that would support both regulatory compliance and effective risk management. With thousands of risk descriptions in various formats and terminology, manual standardization would have been time-consuming and inconsistent.

Solution

Phase 1: Taxonomy Enrichment

The team began by applying advanced NLP and topic modeling techniques to analyze existing risk descriptions. Risk descriptions were first standardized through careful text processing. Using the BERTopic framework and sentence transformers, the system generated vector embeddings of risk descriptions, allowing for semantic comparison rather than simple keyword matching. This AI-assisted analysis identified clusters of semantically similar risks, providing a foundation for standardization while preserving the important nuances of different risk types. Domain experts guided this process by defining the rules for risk extraction and validating the clustering approach, ensuring that the technical implementation remained aligned with risk management best practices.

Phase 2: Expert Validation

SMEs then reviewed the AI-generated standardized risks, validating the accuracy of clusters and relationships. The system’s transparency was critical so experts could see exactly how risks were being grouped. This human-in-the-loop approach ensured that:

All source risk IDs were properly accounted for;
Clusters maintained proper hierarchical relationships; and
Risk categorizations aligned with regulatory requirements.

The validation process transformed the initial AI-generated taxonomy into a production-ready, standardized risk framework, approved by domain experts.

Phase 3: Continuous Refinement

Once implemented, the system began monitoring how users actually searched for and interacted with risk information. The bank recognized that users often do not know the exact standardized terminology when searching, so the solution developed a risk recommender that displayed semantically similar risks based on both text similarity and risk dimension alignment. This approach allowed users to effectively navigate the taxonomy despite being unfamiliar with standardized terms. By analyzing search patterns, the system continuously refined the taxonomy with alternative labels reflecting actual user terminology, and created a dynamic knowledge structure that evolved based on real usage.

This case study demonstrates the power of knowledge-enhanced taxonomy management, combining domain expertise with AI capabilities through a structured cycle of enrichment, validation, and refinement to create a living taxonomy that serves both regulatory and practical business needs.

Taxonomy Standards

For taxonomies to be truly effective and scalable in modern information environments, they must adhere to established semantic web standards and follow best practices developed by information science experts. Modern taxonomies need to support enterprise-wide knowledge initiatives, break down data silos, and enable integration with linked data and knowledge graphs. This is where standards like the Simple Knowledge Organization System (SKOS) become essential. By using universal standards like SKOS, organizations can:

Enable interoperability between systems and across organizational boundaries
Facilitate data migration between different taxonomy management tools
Connect taxonomies to ontologies and knowledge graphs
Ensure long-term sustainability as technology platforms evolve

Beyond SKOS, taxonomy professionals should be familiar with related semantic web standards such as RDF and SPARQL, especially as organizations move toward more advanced semantic technologies like ontologies and enterprise knowledge graphs. Well-designed taxonomies following these standards become the foundation upon which more advanced Knowledge Intelligence capabilities can be built. By adhering to established standards, organizations ensure their taxonomies remain both technically sound and semantically precise, capable of scaling effectively as business requirements evolve.

The Future of Taxonomy Management

The future of taxonomy management lies not just in automation, but in intelligent collaboration between human expertise and AI capabilities. KI provides the framework for this collaboration, ensuring that taxonomies remain both precise and practical.

For organizations considering this approach, the key is to start with a clear understanding of their taxonomic needs and challenges, and to ensure their taxonomy efforts are built on solid foundations of semantic web standards like SKOS. These standards are essential for taxonomies to effectively scale, support interoperability, and maintain long-term value across evolving technology landscapes. Success comes not from replacement of existing processes, but from thoughtful integration of KI capabilities into established workflows that respect these standards and best practices.

Ready to explore how KI can transform your taxonomy management? Contact our team of experts to learn more about implementing these capabilities in your organization.

The post Enhancing Taxonomy Management Through Knowledge Intelligence appeared first on Enterprise Knowledge.

EK’s Joe Hilger, Lulit Tesfaye, Sara Nash, and Urmi Majumder to Speak at Data Summit 2025

EK Team — Thu, 27 Mar 2025 16:28:32 +0000

Enterprise Knowledge’s Joe Hilger, Chief Operating Officer, and Sara Nash, Principal Consultant, will co-present a workshop, and Lulit Tesfaye, Partner and Vice President of Knowledge and Data Services, and Urmi Majumder, Principal Data Architect, will present a conference session at the Data Summit Conference in Boston. The premiere data management and analytics conference will take place May 14-15 at the Hyatt Regency Boston, with pre-conference workshops on May 13, and will feature workshops, panel discussions, and provocative talks from industry leaders.

Hilger and Nash will be giving an in-person half-day workshop titled “Building the Semantic Layer of Your Data Platform,” on Tuesday, May 13. Semantic layers stand out as a key approach to solving business problems for organizations grappling with the complexities of managing and understanding the meaning of their content and data. Join Hilger and Nash to learn what a semantic layer is, how it is implemented, and how it can be used to support your Enterprise AI, search, and governance initiatives. Participants will get hands-on experience building a key component of the semantic layer, knowledge graphs, and the foundational elements required to scale it within an enterprise.

Tesfaye and Majumder’s session, “Implementing Semantic Layer Architectures,” on May 15 will focus on the real-world applications of how semantic layers enable generative AI (GenAI) to integrate organizational context, content, and domain knowledge in a machine-readable format, making them essential for enterprise-scale data transformation. Tesfaye and Majumder will highlight how enterprise AI can be realized through semantic components such as metadata, business glossaries, taxonomy/ontology, and graph solutions – uncovering the technical architectures behind successful semantic layer implementations. Key topics include federated metadata management, data catalogs, ontologies and knowledge graphs, and enterprise AI infrastructure. Attendees will learn how to establish a foundation for explainable GenAI solutions and facilitate data-driven decision-making by connecting disparate data and unstructured content using a semantic layer.

For further details and registration, please visit the conference website.

The post EK’s Joe Hilger, Lulit Tesfaye, Sara Nash, and Urmi Majumder to Speak at Data Summit 2025 appeared first on Enterprise Knowledge.

EK’s David Hughes Interviewed on The Data Exchange Podcast by Gradient Flow

EK Team — Mon, 24 Mar 2025 19:42:52 +0000

David Hughes, Principal Data & AI Solution Architect at Enterprise Knowledge (EK), was recently interviewed by Ben Lorica on Gradient Flow’s The Data Exchange podcast to discuss “How BAML Transforms AI Development.” The Data Exchange is an independent podcast series focused on data, machine learning, and AI.

The episode centers on BAML, a domain-specific language that transforms prompts into structured functions with defined inputs and outputs, enabling developers to create more deterministic and maintainable AI applications. This approach fundamentally changes prompt engineering by focusing on output schemas rather than crafting perfect prompt text, resulting in more robust applications that can adapt to new models without significant refactoring. BAML’s polyglot nature, testing capabilities, and runtime adaptability make it particularly valuable for enterprise environments and agentic AI applications, including multimodal systems.

Ahead of the episode’s release, an excerpt of their conversation was made available in Gradient Flow’s newsletter – in the article, “Faster Iteration, Lower Costs: BAML’s Impact on AI Projects,” Hughes discusses his BAML “aha moment,” how he approaches using BAML in AI solutioning, and the pros and cons of BAML when it comes to the AI project lifecycle. He also shares his excitement for what’s next, and provides a few tips and tricks for new adopters as they get started.

You can now listen to the episode here, available through Apple, Spotify, or wherever you get your podcasts.

The post EK’s David Hughes Interviewed on The Data Exchange Podcast by Gradient Flow appeared first on Enterprise Knowledge.

Tesfaye and Majumder Speaking at EDW 2025

EK Team — Wed, 19 Mar 2025 15:05:08 +0000

Enterprise Knowledge’s Lulit Tesfaye, Partner and Vice President of Knowledge and Data Services, and Urmi Majumder, Principal Data Architect, will deliver an in-depth tutorial on semantic layer architectures at the DGIQ West + EDW 2025 conference on May 5th.

The tutorial will delve into the components of a semantic layer and how they interconnect organizational knowledge and data assets to power AI systems such as chatbots and intelligent search functions. Tesfaye and Majumder will examine the top semantic layer architectural patterns and best practices for enabling enterprise AI. This interactive workshop will provide attendees with the opportunity to learn about semantic solutions, connect them to use cases, and architect a semantic layer from the ground up. Key topics include federated metadata management, data catalogs, ontologies and knowledge graphs, and enterprise AI infrastructure.

Register for this session at the conference website.

The post Tesfaye and Majumder Speaking at EDW 2025 appeared first on Enterprise Knowledge.

Incorporating Unified Entitlements in a Knowledge Portal

Chris Marino — Wed, 12 Mar 2025 17:37:34 +0000

Recently, we have had a great deal of success developing a certain breed of application for our customers—Knowledge Portals. These knowledge-centric applications holistically connect an organization’s information—its data, content, people and knowledge—from disparate source systems. These portals provide a “single pane of glass” to enable an aggregated view of the knowledge assets that are most important to the organization.

The ultimate goal of the Knowledge Portal is to provide the right people access to the right information at the right time. This blog focuses on the first part of that statement—“the right people.” This securing of information assets is called entitlements. As our COO Joe Hilger eloquently points out, entitlements are vital in “enabling consistent and correct privileges across every system and asset type in the organization.” The trick is to ensure that an organization’s security model is maintained when aggregating this disparate information into a single view so that users only see what they are supposed to.

The Knowledge Portal Security Challenge

The Knowledge Portal’s core value lies in its ability to aggregate information from multiple source systems into a single application. However, any access permissions established outside of the portal—whether in the source systems or an organization-wide security model—need to be respected. There are many considerations to take into account when doing this. For example, how does the portal know:

Who am I?
Am I the same person specified in the various source systems?
Which information should I be able to see?
How will my access be removed if my role changes?

Once a user has logged in, the portal needs to know that the user has Role A in the content management system, Role B in our HR system, and Role C in our financial system. Since the portal aggregates information from the aforementioned systems, it uses this information to ensure what I see in the portal is reflective of what I would see in any of the individual systems.

The Tenets of Unified Entitlements in a Knowledge Portal

At EK, we have a common set of principles that guide us when implementing entitlements for a Knowledge Portal. They include:

Leveraging a single identity via an Identity Provider (IdP).
Creating a universal set of groups for access control.
Respecting access permissions set in source systems when available.
Developing a security model for systems without access permissions.

Leverage an Identity Provider (IdP)

When I first started working in search over 20 years ago, most source systems had their own user stores—the feature that allows a user to log into a system and uniquely identifies them within the system. One of the biggest challenges for implementing security was correctly mapping a user’s identity in the search application to their various identities in the source systems sending content to the search engine.

Thankfully, enterprise-wide Identity Providers (IdP) like Okta, Microsoft Entra ID (formerly Azure Active Directory), and Google Cloud Identity are ubiquitous these days. An Identity Provider (IdP) is like a digital doorkeeper for your organization. It identifies who you are and shares that information with your organization’s applications and systems.

By leveraging an IdP, I can present myself to all my applications with a single identifier such as “cmarino@enterprise-knowledge.com.” For the sake of simplicity in mapping my identity within the Knowledge Portal, I’m not “cmarino” in the content management system, “marinoc” in the HR system, and “christophermarino” in the financial system.

Instead, all of those systems recognize me as “cmarino@enterprise-knowledge.com” including the Knowledge Portal. And the subsequent decision by the portal to provide or deny access to information is greatly simplified. The portal needs to know who I am in all systems to make these determinations.

Create Universal Groups for Access Control

Working hand in hand with an IdP, the establishment of a set of universally used groups for access control is a critical step to enabling Unified Entitlements. These groups are typically created within your IdP and should reflect the common groupings needed to enforce your organization’s security model. For instance, you might choose to create groups based on a department or a project or a business unit. Most systems provide great flexibility in how these groups are created and managed.

These groups are used for a variety of tasks, such as:

Associating relevant users to groups so that security decisions are based on a smaller, manageable number of groups rather than on every user in your organization.
Enabling access to content by mapping appropriate groups to the content.
Serving as the unifying factor for security decisions when developing an organization’s security model.

As an example, we developed a Knowledge Portal for a large global investment firm which used Microsoft Entra ID as their IdP. Within Entra ID, we created a set of groups based on structures like business units, departments, and organizational roles. Access permissions were applied to content via these groups whether done in the source system or an external security model that we developed. When a user logged in to the portal, we identified them and their group membership and used that in combination with the permissions of the content. Best of all, once they moved off a project or into a different department or role, a simple change to their group membership in the IdP cascaded down to their access permissions in the Knowledge Portal.

Respect Permissions from Source Systems

The first two principles have focused on identifying a user and their roles. However, the second key piece to the entitlements puzzle rests with the content. Most source systems natively provide the functionality to control access to content by setting access permissions. Examples are SharePoint for your organization’s sensitive documents, ServiceNow for tickets only available to a certain group, or Confluence pages only viewable by a specific project team.

When a security model already exists within a source system, the goal of integrating that content within the Knowledge Portal is simple: respect the permissions established in the source. The key here is syncing your source systems with your IdP and then leveraging the groups managed there. When specifying access to content in the source, use the universal groups.

Thus, when the Knowledge Portal collects information from the source system, it pulls not only the content and its applicable metadata but also the content’s security information. The permissions are stored alongside the content in the portal’s backend and used to determine whether a specific user can view specific content within the portal. The permissions become just another piece of metadata by which the content can be filtered.

Develop Security Model for Unsupported Systems

Occasionally, there will be source systems where access permissions have not or can not be supported. In this case, you will have to leverage your own internal security model by developing one or using an entitlements tool. Instead of entitlements stored within the source system, the entitlements will be managed through this internal model.

The steps to accomplish this include:

Identify the tools needed to support unified entitlements;
Build the models for applying the security rules; and
Develop the integrations needed to automate security with other systems.

The process to implement this within the Knowledge Portal would remain the same: store the access permissions with the content (mapped using groups) and use these as filters to ensure that users see only the information they should.

Conclusion

Getting unified entitlements correct for your organization plays a large part in a successful Knowledge Portal implementation. If you need proven expertise to help guide managing access to your organization’s valuable information, contact us!

The “right people” in your organization will thank you.

The post Incorporating Unified Entitlements in a Knowledge Portal appeared first on Enterprise Knowledge.

How a Semantic Layer Transforms Engineering Research Industry Challenges

Juliana Simon — Wed, 05 Mar 2025 18:06:09 +0000

To drive future innovation, research organizations increasingly seek to develop advanced platforms that enhance the findability and connectivity of their knowledge, data, and content–empowering more efficient and impactful R&D efforts. However, many face challenges due to decentralized information systems, where critical data and content remain siloed, inaccessible, and opaque to users. Much of this content (i.e., publications, reports, technical drawings, etc.) is unstructured or stored in analog formats, making it difficult to discover without proper metadata and search functionality. Additionally, inefficiencies and redundancies abound when there is limited visibility into past work, expertise, and the processes for centralizing and sharing institutional knowledge. These challenges become even more pressing as experienced professionals retire, risking the loss of valuable institutional and tacit knowledge, and leading to gaps in expertise and identifying it.

The semantic layer framework acts as a critical bridge between raw, unstructured data and modern knowledge platforms, enabling seamless integration, organization, and retrieval of information. By leveraging structured metadata, AI-supplemented auto-classification, and knowledge graphs, this framework enhances the discoverability and usability of dispersed content while enabling inference-based relationships to be surfaced. Beyond its technical role, it also provides a strategic foundation for knowledge management by guiding the implementation of process and governance models that ensure consistency, accessibility, and long-term sustainability. In the following section, we will explore two real-world cases that illustrate how this approach has been successfully implemented to address the business challenges outlined above.

Case 1

Challenge: Identifying experts in specific research areas with experience on past projects is complex and time-consuming, requiring extensive institutional knowledge, and relying on informal networks and word of mouth.

Solution: Project and Expert Finder

A large, federally-funded engineering research and development center relied heavily on personal networks and institutional memory to find individuals with specific skills and experience. They managed extensive data sources and content repositories, but it was still difficult to leverage organizational knowledge and onboard new employees efficiently. They were not linking critical entities within both structured and unstructured data and content–such as people, projects, roles, and materials. This gap hindered effective project staffing, planning, and research. The issue was further exacerbated by a retiring workforce, leading to a loss of valuable tacit knowledge.

EK implemented a scalable, adaptable semantic layer framework to develop a knowledge graph that connects people, projects, engineering components, and technical topics. This institutional knowledge graph aggregates data across 40 applications, eliminating the need for discrete data connectors, reducing costs, and serving as a centralized resource. Integrating with the enterprise search system, it enhances browsability and discoverability, providing a comprehensive view of relationships within the organization. Beyond improving access to information, the unified data within the knowledge graph can also act as a powerful input to artificial intelligence algorithms, enabling predictions and discovery of previously unknown relationships. This enhanced intelligence not only supports decision-making but also drastically reduces the time required to locate critical information–from three weeks to just five minutes–accelerating research, publication processes, and internal collaboration.

Case 2

Challenge: Disorganized and siloed content prevents seamless searchability across repositories, hinders the ability to trace the digital thread responsibly, and complicates long-term information preservation. As a result, answering critical questions about the impact of past and current work on project decisions and deliverables becomes nearly impossible.

Solution: Internal Research Platform

A federally funded energy research and development center relies on past project outcomes and experimental data to advance scientific innovation. However, researchers struggled to find relevant reports due to unstructured metadata and ineffective search capabilities, with manual metadata entry being time-consuming, error-prone, and inconsistent across systems. Analog content, inconsistent metadata standards, and fragmented information systems have created significant barriers to knowledge discovery, records management, and long-term information preservation. This exacerbates information silos, leading to knowledge loss, inefficiencies, and decision-making risks from limited access to reliable research data.

To address these challenges, EK supported the development of 5 metadata and taxonomy models tailored to tag 80,000+ documents with high precision. This was achieved through a custom auto-classification pipeline, which integrates multiple gold data sources with a TOMS (Taxonomy and Ontology Management System). By enriching both analog and native digital documents with semantic metadata, the solution facilitates enhanced content and research material discovery through a knowledge graph underlying a front-end search interface. Standardizing metadata and taxonomy models not only automates classification but also digitizes analog content and integrates several systems, significantly improving searchability and accessibility across the organization. To ensure organization-wide scalability, adoption, and sustainability, governance models were developed with strategies spanning tactical, strategic, operational, and technical domains. These models address key facets necessary for a successful long-term implementation, ensuring the framework remains adaptable and effective over time.

Conclusion

In the research and development space, ensuring that information such as expertise, project history, and past research is easily findable is critical for driving innovation, reducing inefficiencies, and managing knowledge transfer amidst retirement in a highly specialized workforce. Research organizations also benefit from using semantic layers to manage research-focused data products more consistently, leveraging graph data to find patterns and insights. Additionally, AI-driven knowledge discovery, automated metadata tagging, and interoperability across systems further enhance research efficiency and collaboration.

At EK, we specialize in delivering tailored solutions to help your organization overcome its unique data and knowledge management challenges–from strategy to implementation. Explore our Knowledge Base to learn more about our expertise and semantic layer solutions, and contact us to discuss how we can support your specific needs.

The post How a Semantic Layer Transforms Engineering Research Industry Challenges appeared first on Enterprise Knowledge.

The Minimum Requirements To Consider Something a Semantic Layer

Ben Kass — Fri, 28 Feb 2025 17:58:58 +0000

Semantic Layers are an important design framework for connecting information across an organization in preparation for Enterprise AI and Knowledge Intelligence. But with every new technology and framework, interest in utilizing the technological advance outpaces experience in effective implementation. As awareness of the importance of a semantic layer grows, and the market is becoming saturated with products, it is crucial to clearly distinguish between what is and is not a semantic layer. This distinction helps identify architectures that provide the full benefits of a semantic layer–such as aggregating structured and unstructured data with business context and understanding–versus more general data fabrics and semantic applications that may only provide some of its benefits.

To draw this distinction, it’s essential to understand the components that make up a semantic layer and how they connect, as well as the core capabilities and design requirements.

A Semantic Layer is not

One vendor
A metrics layer
A TMS, LMS, or CMS

No one application is a semantic layer; a semantic layer is a framework for design. This article will focus on summarizing the requirements of the semantic layer framework design. For a deeper exploration of the specific components and how they interact and can be implemented, see “What is a Semantic Layer? (Components and Enterprise Applications)”.

Requirement 1: A Semantic Layer supports more than one consuming application

A Semantic Layer is not equivalent to a model or orchestration layer developed to serve only one data product. While application-specific semantic models and integrations–such as those unifying customer information or tracking specific business health analytics via executive dashboards–can be critical to your business’s tech stack, they are not enough to connect information across the organization. To do this, there must be multiple applications operating within a design framework that enables the sharing of semantic data, such as catalogs, recommendation engines, dashboards, and semantic search engines. A semantic layer-type framework that serves only one downstream application risks becoming too closely tied to one domain and stakeholder group, limiting its broader organizational impact.

Requirement 2: A Semantic Layer connects data/content from more than one source system

Similar to enabling more than one application, a semantic layer should also connect information from multiple source systems. A layer that pulls only from one source is not able to meet modern needs for structured and unstructured data aggregation across silos to generate insights. Without a layer for interconnection, organizations run the risk of creating silos between data sources and applications. Additionally, organizations and semantic layer teams should develop data processing and analytics tools that are reusable across source systems as a part of the semantic layer. Tying a layer to a single downstream application encourages the duplication of work, instead of solution reuse enabled by the semantic layer. One multi-national bank that EK worked with developed a semantic layer to pull together complex risk management information from across multiple sources. The bank ended up cutting down their time spent on what used to be weeks-long efforts to aggregate data for regulators, and made information from siloed process-specific applications available in one central system for easy access and use.

Requirement 3: A Semantic Layer establishes a logical architecture

The semantic layer can serve as a logical connection layer between source systems

What separates a semantic layer from a well-implemented data catalog or data governance tool is its ability to serve as a connection and insight layer between multiple heterogenous sources of information, for multiple downstream data products and applications. To serve sources of information that have different data and content structures, the semantic layer needs to be based on a logical architecture that source models can map to. This logical architecture can be managed as a part of the ontology models if desired, but the important thing is that it serves as a necessary abstraction step so that business stakeholders can move from talking about the specific physical details of databases and documents, to what the data and content is about. Without this, the work required to ensure that the layer is both aggregating and enriching information will fail to scale over multiple domains. Moreso, the layer itself may begin to fracture over time without a logical architecture to unify its approach to disparate applications and data sources.

Requirement 4: A Semantic Layer reflects business stakeholder context and vocabulary

A semantic layer is more than simply a means of mapping data between systems. It serves to capture business knowledge and context so that actionable insights can be pulled from structured and unstructured data. In order to do this, the semantic layer must leverage terminology that business stakeholders use to describe and categorize information. These vocabularies then serve as a standardized source of metadata to ensure that insights from across the enterprise can be compared, contrasted, and linked for further analysis. Without reflecting the language of business stakeholders and understanding the context they use it in to describe key information, the semantic layer will not be able to accurately and effectively enrich data with business meaning. The layer may make data more accessible, but it will fail to make that data meaningful.

Requirement 5: A Semantic Layer leverages standards

As a semantic layer evolves, so too does a business’s understanding of what tooling best fits their needs for the layer’s components. Core semantic layer components such as the glossary, graph, and metadata storage should be, based on widely adopted semantic web standards, such as RDF and SKOS, to avoid vendor tool lock in for interchangeability. No one component should become an anchor – instead, each component should function more like lego bricks that may be changed out as an organization’s semantic ecosystem and needs evolve. Additionally, basing a semantic layer on standards opens up a world of already matured libraries, design frameworks, and application integrations that can extend and enhance the functionality of the semantic layer, rather than requiring a development team to re-create the wheel. Without standards-based architecture, organizations risk problems with the long-term scalability and management of their semantic layer.

Conclusion

A Semantic Layer connects information across an organization, by establishing a standards-based logical architecture, informed by business context and vocabulary, that connects two or more source systems to two or more downstream applications. Efforts that do not meet these requirements will fail to realize the benefits of the semantic layer, resulting in incomplete or failed projects. The five key requirements for the semantic layer framework described in this article create a baseline for what a semantic layer implementation should be. While not exhaustive, understanding and following these requirements will help organizations unlock the full benefits of the semantic layer to deliver real business value. These requirements will ensure that your semantic layer is able to capture knowledge and embed business context across your organization to power Enterprise AI and Knowledge Intelligence. If you are interested in learning more about semantic layer development and how EK can help, check out our other blogs on the subject, or reach out to EK if you have specific questions.

The post The Minimum Requirements To Consider Something a Semantic Layer appeared first on Enterprise Knowledge.