Architecture Articles - Enterprise Knowledge https://enterprise-knowledge.com/tag/architecture/ Mon, 17 Nov 2025 22:22:02 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.2 https://enterprise-knowledge.com/wp-content/uploads/2022/04/EK_Icon_512x512.svg Architecture Articles - Enterprise Knowledge https://enterprise-knowledge.com/tag/architecture/ 32 32 Data Quality and Architecture Enrichment for Insights Visualization https://enterprise-knowledge.com/data-quality-and-architecture-enrichment-for-insights-visualization/ Wed, 10 Sep 2025 18:39:35 +0000 https://enterprise-knowledge.com/?p=25343 The Challenge A radiopharmaceutical imaging company faced challenges in monitoring patient statistics and clinical trial logistics. A lack of visibility and awareness into this data hindered conversations with leadership regarding the status of active clinical trials, ultimately putting clinical trial … Continue reading

The post Data Quality and Architecture Enrichment for Insights Visualization appeared first on Enterprise Knowledge.

]]>

The Challenge

A radiopharmaceutical imaging company faced challenges in monitoring patient statistics and clinical trial logistics. A lack of visibility and awareness into this data hindered conversations with leadership regarding the status of active clinical trials, ultimately putting clinical trial results at risk. The company needed a trusted, single location to ask relevant business questions about their data and to see trends or anomalies across multiple clinical trials. They faced challenges, however, due to trial data being sent by various vendors in different formats (no standardized values across trials). To mitigate these issues, the company engaged Enterprise Knowledge (EK) to provide Semantic Data Management Advisory & Development as part of a data normalization and portfolio reporting program. The engagement’s goal was to develop data visualization dashboards to answer critical business questions with cleaned, normalized, and trustworthy patient data from four clinical trials, depicted in an easy-to-understand and actionable manner.

The Solution

To unlock data insights across trials in one accessible location, EK designed and developed a Power BI dashboard to visualize data from multiple trials in one centralized location. To begin developing dashboards, EK met with the client to confirm the business questions the dashboards would answer, ensuring the dashboards would visually display the patient and trial information needed to answer them. To remedy the varying data formats sent by vendors, EK mapped data values from trial reports to each other, normalizing and enriching the data with metadata and lineage. With structure and standardization added to the data, the dashboards could display robust data insights into patient status with filterable trial-specific information for the clinical imaging team.

EK also worked to transform the company’s data management environment—developing a medallion architecture structure to handle historical files and enforcing data cleaning and standardization on raw data inputs—to ensure dashboard insights were accurate and scalable to the inclusion of future trials. Implementing these data quality pre-processing steps and architecture considerations prepared the company for future applications and uses of reliable data, including the development of data products or the creation of a single view into the company-wide data landscape.

The EK Difference

To support the usage, maintenance, and future expansion of the data environment and data visualization tooling, EK developed knowledge transfer materials. These proprietary materials included setting up a semantic modeling foundation via a data dictionary to explain and define dashboard fields and features, a proposed future medallion architecture, and materials to socialize and expand the usage of visualization tools to additional sections of the company that could benefit from them.

Dashboard Knowledge Transfer Framework
To ensure the longevity of the dashboard, especially with the future inclusion of additional trial data, it was essential to develop materials for future dashboard users and developers. The knowledge transfer framework designed by EK outlined a repeatable process for dashboard development with enough detail and information that someone unfamiliar with the dashboards can understand the background, use cases, data inputs, visualization outputs, and the overall purpose of the dashboarding effort. Instructions for dashboard upkeep, including how to update and add data to the dashboard as business needs evolve, were also provided.

Semantic Model Foundations: Data Dictionary
To semantically enhance the dashboards, all dashboard fields and features were cataloged and defined by EK experts in semantics and data analysis. In addition to definitions, the dictionary included purpose statements and calculation rules for each dashboard concept (where applicable). This data dictionary was created to prepare the client to process all trial information moving forward and serve as a reference for the data transformation process.

Proposed Future Architecture
To optimize data storage in the future, EK proposed a medallion architecture strategy consisting of Bronze, Silver, and Gold layers to preserve historical data and pave the way for matured logging techniques. At the time EK engaged the client, there was no proper data storage. EK’s architecture strategy detailed storage preparation considerations for each layer, including workspace creation, file retention policies, and options for ingesting and storing data. EK leveraged technical expertise and a rich background in architecture strategies to provide expert advisory on the client’s future architecture.

Roadshow Materials
EK developed materials that summarized the mission and value of the clinical imaging dashboards. These materials included a high-level overview of the dashboard ecosystem so all audiences could comprehend the dashboard’s purpose and execution. With a KM-angled focus, the overall purpose of the materials was to gain organizational buy-in for the dashboard and build awareness of the clinical imaging team and the importance of the work they do. The roadshow materials also sought to promote dashboard adoption and future expansion of dashboarding into other areas of the company.

The Results

Before the dashboard, employees had to track down various spreadsheets for each trial sent from different sources and stored in at least four different locations. After the engagement, the company had a functional dashboard that displayed on-demand data visualizations across four clinical trials that pulled from a single data repository, creating a seamless way for the clinical imaging team to identify trial data and patient discrepancies early and often, preventing errors that could have resulted in unusable trial data. In all, having multiple trials’ information available in one streamlined view through the dashboard dramatically reduced the time and effort employees had previously spent tracking down and manually analyzing raw, disparate data for insights, from as high as 1–2 hours every week to as low as 15 minutes. Clinical imaging managers are now able to quickly determine and share trusted trial insights with their leadership confidently, enabling informed decision-making with the resources to explain where those insights were derived from.

In addition to the creation of the dashboard, EK helped develop a knowledge transfer framework and future architecture and data cleaning considerations, providing the company with a clear path to expand and scale usage to more clinical trials, other business units, and new business needs. In fact, the clinical imaging team identified at least four additional trials that, as a result of EK’s foundational work, can be immediately incorporated into the dashboard as the company sees fit.

Want to improve your organization’s content data quality and architecture? Contact us today!

Download Flyer

Ready to Get Started?

Get in Touch

The post Data Quality and Architecture Enrichment for Insights Visualization appeared first on Enterprise Knowledge.

]]>
Semantic Layer Strategy for Linked Data Investigations https://enterprise-knowledge.com/semantic-layer-strategy-for-linked-data-investigations/ Thu, 08 May 2025 15:08:08 +0000 https://enterprise-knowledge.com/?p=24011 A government organization sought to more effectively exploit their breadth of data generated by investigation activity of criminal networks for comprehensive case building and threat trend analysis. EK engaged with the client to develop a strategy and product vision for their semantic solution, paired with foundational semantic data models for meaningful data categorization and linking, architecture designs and tool recommendations for integrating and leveraging graph data, and entitlements designs for adhering to complex security standards. Continue reading

The post Semantic Layer Strategy for Linked Data Investigations appeared first on Enterprise Knowledge.

]]>

The Challenge

A government organization sought to more effectively exploit their breadth of data generated by investigation activity of criminal networks for comprehensive case building and threat trend analysis. The agency struggled to meaningfully connect structured and unstructured data from multiple siloed data sources, each with misaligned naming conventions and inconsistent data structures and formats. Users had to have an existing understanding of underlying data models and jump between multiple system views to answer core investigation analysis questions, such as “What other drivers have been associated with this vehicle involved in an inspection at the border?” or “How often has this person in the network traveled to a known suspect storage location in the past 6 months?”

These challenges manifest in data ambiguity across the organization, complex and resource-intensive integration workflows, and underutilized data assets lacking meaningful context, all resulting in significant cognitive load and burdensome manual efforts for users conducting intelligence analyses. The organization recognized the need to define a robust semantic layer solution grounded in data modeling, architecture frameworks, and governance controls to unify, contextualize, and operationalize data assets via a “single pane of intelligence” analysis platform.

The Solution

To address these challenges, EK engaged with the client to develop a strategy and product vision for their semantic solution, paired with foundational semantic data models for meaningful data categorization and linking, architecture designs and tool recommendations for integrating and leveraging graph data, and entitlements designs for adhering to complex security standards. With phased implementation plans for incremental delivery, these components lay the foundations for the client’s solution vision for advanced entity resolution and analytics capabilities. The overall solution will power streamlined consumption experiences and data-driven insights through the “single pane of intelligence.”  

The core components of EK’s semantic advisory and solution development included:

Product Vision and Use Case Backlog:
EK collaborated with the client to shape a product vision anchored around the solution’s purpose and long-term value for the organization. Complemented with a strategic backlog of priority use cases, EK’s guidance resulted in a compelling narrative to drive stakeholder engagement and organizational buy-in, while also establishing a clear and tangible vision for scalable solution growth.

Solution Architecture Design:
EK’s solution architects gathered technical requirements to propose a modular solution architecture consisting of multiple, self-contained technology products that will provision a comprehensive analytic ecosystem to the organization’s user base. The native graph architecture involves a graph database, entity resolution services, and a linked data analysis platform to create a unified, interactive model of all of their data assets via the “single pane of intelligence.”

Tool Selection Advisory:
EK guided the client on selecting and successfully gaining buy-in for procurement of a graph database and a data analysis and visualization platform with native graph capabilities to plug into the semantic and presentation layers of the recommended architecture design. This selection moves the organization away from a monolithic, document-centric platform to a data-centric solution for dynamic intelligence analysis in alignment with their graph and network analytics use cases. EK’s experts in unified entitlements and industry security standards also ensured the selected tooling would comply with the client’s database, role, and attribute-based access control requirements.

Taxonomy and Ontology Modeling:
In collaboration with intelligence subject matter experts, EK guided the team from a broad conceptual model to an implementable ontology and starter taxonomy designs to enable a specific use case for prioritized data sources. EK advised on mapping the ontology model to components of the Common Core Ontologies to create a standard, interoperable foundation for consistent and scalable domain expansion.

Phased Implementation Plan:
Through dedicated planning and solutioning sessions with the core client team, EK developed an iterative implementation plan to scale the foundational data model and architecture components and unlock incremental technical capabilities. EK advised on identifying and defining starter pilot activities, outlining definitions of done, necessary roles and skillsets, and required tasks and supporting tooling from the overall architecture to ensure the client could quickly start on solution implementation. EK is directly supporting the team on the short-term implementation tasks while continuing to advise and plan for the longer-term solution needs.

 

The EK Difference

Semantic Layer Solution Strategy:
EK guided the client in transforming existing experimental work in the knowledge graph space into an enterprise solution that can scale and bring tangible value to users. From strategic use case development to iterative semantic model and architecture design, EK provided the client with repeatable processes for defining, shaping, and productionalizing components of the organization’s semantic layer.

LPG Analytics with RDF Semantics:
To support the client’s complex and dynamic analytics needs, EK recommended an LPG-based solution for its flexibility and scalability. At the same time, the client’s need for consistent data classification and linkage still pointed to the value of RDF frameworks for taxonomy and ontology development. EK is advising on how to bridge these models for the translation and connectivity of data across RDF and LPG formats, ultimately enabling seamless data integration and interoperability in alignment with semantic standards.

Semantic Layer Tooling:
EK has extensive experience advising on the evaluation, selection, procurement, and scalable implementation of semantic layer technologies. EK’s qualitative evaluation for the organization’s linked data analysis platforms was supplemented by a proprietary structured matrix measuring down-selected tools against 50+ functional and non-functional factors to provide a quantitative view of each tool’s ability to meet the organization’s specific needs.

Semantic Modeling and Scalable Graph Development:
Working closely with the organization’s domain experts, EK provided expert advisory in industry standards and best practices to create a semantic data model that will maximize graph benefits in the context of the client’s use cases and critical data assets. In parallel with model development, EK offered technical expertise to advise on the scalability of the resulting graph and connected data pipelines to support continued maintenance and expansion.

Unified Entitlements Design:
Especially working with a highly regulated government agency, EK understands the critical need for unified entitlements to provide a holistic definition of access rights, enabling consistent and correct privileges across every system and asset type in the organization. EK offered comprehensive entitlements design and development support to ensure access rights would be properly implemented across the client’s environment, closely tied to the architecture and data modeling frameworks.

Organizational Buy-In:
Throughout the engagement, EK worked closely with project sponsors to craft and communicate the solution product vision. EK tailored product communication components to different audiences by detailing granular technical features for tool procurement conversations and formulating business-driven, strategic value statements to engage business users and executives for organizational alignment. Gaining this buy-in early on is critical for maintaining development momentum and minimizing future roadblocks as wider user groups transition to using the productionalized solution.

The Results

With initial core semantic models, iterative solution architecture design plans, and incremental pilot modeling and engineering activities, the organization is equipped to stand up key pieces of the solution as they procure the graph analytics tooling for continued scale. The phased implementation plan provides the core team with tangible and achievable steps to transition from their current document-centric ways of working to a truly data-centric environment. The full resulting solution will facilitate investigation activities with a single pane view of multi-sourced data and comprehensive, dynamic analytics. This will streamline intelligence analysis across the organization with the enablement of advanced consumption experiences such as self-service reporting, text summarization, and geospatial network analysis, ultimately reducing the cognitive load and manual efforts users currently face in understanding and connecting data. EK’s proposed strategy has been approved for implementation, and EK will publish the results from the MVP development as a follow-up to this case study.

Download Flyer

Ready to Get Started?

Get in Touch

The post Semantic Layer Strategy for Linked Data Investigations appeared first on Enterprise Knowledge.

]]>
EK’s Joe Hilger, Lulit Tesfaye, Sara Nash, and Urmi Majumder to Speak at Data Summit 2025 https://enterprise-knowledge.com/tesfaye-and-majumder-speaking-at-data-summit-conference-2025/ Thu, 27 Mar 2025 16:28:32 +0000 https://enterprise-knowledge.com/?p=23542 Enterprise Knowledge’s Joe Hilger, Chief Operating Officer, and Sara Nash, Principal Consultant, will co-present a workshop, and Lulit Tesfaye, Partner and Vice President of Knowledge and Data Services, and Urmi Majumder, Principal Data Architect, will present a conference session at … Continue reading

The post EK’s Joe Hilger, Lulit Tesfaye, Sara Nash, and Urmi Majumder to Speak at Data Summit 2025 appeared first on Enterprise Knowledge.

]]>
Enterprise Knowledge’s Joe Hilger, Chief Operating Officer, and Sara Nash, Principal Consultant, will co-present a workshop, and Lulit Tesfaye, Partner and Vice President of Knowledge and Data Services, and Urmi Majumder, Principal Data Architect, will present a conference session at the Data Summit Conference in Boston. The premiere data management and analytics conference will take place May 14-15 at the Hyatt Regency Boston, with pre-conference workshops on May 13, and will feature workshops, panel discussions, and provocative talks from industry leaders.

Hilger and Nash will be giving an in-person half-day workshop titled “Building the Semantic Layer of Your Data Platform,” on Tuesday, May 13. Semantic layers stand out as a key approach to solving business problems for organizations grappling with the complexities of managing and understanding the meaning of their content and data. Join Hilger and Nash to learn what a semantic layer is, how it is implemented, and how it can be used to support your Enterprise AI, search, and governance initiatives. Participants will get hands-on experience building a key component of the semantic layer, knowledge graphs, and the foundational elements required to scale it within an enterprise.

Tesfaye and Majumder’s session, “Implementing Semantic Layer Architectures,” on May 15 will focus on the real-world applications of how semantic layers enable generative AI (GenAI) to integrate organizational context, content, and domain knowledge in a machine-readable format, making them essential for enterprise-scale data transformation. Tesfaye and Majumder will highlight how enterprise AI can be realized through semantic components such as metadata, business glossaries, taxonomy/ontology, and graph solutions – uncovering the technical architectures behind successful semantic layer implementations. Key topics include federated metadata management, data catalogs, ontologies and knowledge graphs, and enterprise AI infrastructure. Attendees will learn how to establish a foundation for explainable GenAI solutions and facilitate data-driven decision-making by connecting disparate data and unstructured content using a semantic layer.

For further details and registration, please visit the conference website.

The post EK’s Joe Hilger, Lulit Tesfaye, Sara Nash, and Urmi Majumder to Speak at Data Summit 2025 appeared first on Enterprise Knowledge.

]]>
Tesfaye and Majumder Speaking at EDW 2025 https://enterprise-knowledge.com/tesfaye-and-majumder-speaking-at-edw-2025/ Wed, 19 Mar 2025 15:05:08 +0000 https://enterprise-knowledge.com/?p=23503 Enterprise Knowledge’s Lulit Tesfaye, Partner and Vice President of Knowledge and Data Services, and Urmi Majumder, Principal Data Architect, will deliver an in-depth tutorial on semantic layer architectures at the DGIQ West + EDW 2025 conference on May 5th. The … Continue reading

The post Tesfaye and Majumder Speaking at EDW 2025 appeared first on Enterprise Knowledge.

]]>
Enterprise Knowledge’s Lulit Tesfaye, Partner and Vice President of Knowledge and Data Services, and Urmi Majumder, Principal Data Architect, will deliver an in-depth tutorial on semantic layer architectures at the DGIQ West + EDW 2025 conference on May 5th.

The tutorial will delve into the components of a semantic layer and how they interconnect organizational knowledge and data assets to power AI systems such as chatbots and intelligent search functions. Tesfaye and Majumder will examine the top semantic layer architectural patterns and best practices for enabling enterprise AI. This interactive workshop will provide attendees with the opportunity to learn about semantic solutions, connect them to use cases, and architect a semantic layer from the ground up. Key topics include federated metadata management, data catalogs, ontologies and knowledge graphs, and enterprise AI infrastructure.

Register for this session at the conference website.

The post Tesfaye and Majumder Speaking at EDW 2025 appeared first on Enterprise Knowledge.

]]>
Enterprise AI Architecture Series: How to Extract Knowledge from Unstructured Content (Part 2) https://enterprise-knowledge.com/enterprise-ai-architecture-series-how-to-extract-knowledge-from-unstructured-content-part-2/ Fri, 14 Feb 2025 18:06:15 +0000 https://enterprise-knowledge.com/?p=23099 Our CEO, Zach Wahl, recently noted in his annual KM trends blog for 2025 that Knowledge Management (KM) and Artificial Intelligence (AI) are really two sides of the same coin, detailing this idea further in his seminal blog introducing the … Continue reading

The post Enterprise AI Architecture Series: How to Extract Knowledge from Unstructured Content (Part 2) appeared first on Enterprise Knowledge.

]]>
Our CEO, Zach Wahl, recently noted in his annual KM trends blog for 2025 that Knowledge Management (KM) and Artificial Intelligence (AI) are really two sides of the same coin, detailing this idea further in his seminal blog introducing the term Knowledge Intelligence (KI). In particular, KM can play a big role in structuring unstructured content and make it more suitable for use by enterprise AI. Injecting knowledge into unstructured data using taxonomies, ontologies, and knowledge graphs will be the focus of this blog, which is Part 2 in the Knowledge Intelligence Architecture Series. I will also describe our typical approaches and experience with mining knowledge out of unstructured content to develop taxonomies and knowledge graphs. As a refresher, you can review Part 1 of this series where I introduced the high-level technical components needed for implementing any KI architecture. 

 

Role of NLP in Structuring Unstructured Content

Natural language processing (NLP) is a machine learning technique that gives computers the ability to interpret and understand human language. According to most industry estimates, 80-90% of an organization’s data is considered to be unstructured, most of which originates from emails, chat messages, documents, presentations, videos, and social media posts. Extracting meaningful insights from such unstructured content can be difficult due to its lack of predefined structure. This is where NLP techniques can be immensely useful. NLP works through the differences in dialects, metaphors, variations in sentence structure, grammatical irregularities, and usage exceptions that are common in such data and structures it effectively. A common NLP task for analyzing unstructured content and making it machine readable is content classification. This process categorizes text into predefined classes by identifying keywords that indicate the topic of the text. 

Over the past decade, we have employed numerous NLP techniques across our typical knowledge and data management engagements, focusing on unstructured content classification. With the emergence of Large Language Models (LLMs), traditional NLP tasks can now be executed with higher precision and recall while requiring significantly less development effort. The section below presents a comprehensive, though not exhaustive, range of NLP strategies incorporating both traditional ML and cutting-edge LLMs along with inherent pattern recognition capabilities in vendor platforms for content understanding and classification. Specifically, it describes for each approach the underlying architecture, illustrating the steps involved in adding context to unstructured content using semantic data assets and some relevant case studies.

 

1. Transfer Learning for Content Classification

Transfer learning is a method in which a deep learning model trained on a large dataset is applied to a similar task using a different dataset. Starting with a pre-trained model that has already learned linguistic patterns and structures from a significant volume of data eliminates the need for extensive labeled datasets and reduces training time. Since the release of the BERT (Bidirectional Encoder Representations from Transformers) language model in 2018, we have extensively utilized transfer learning to analyze and categorize unstructured content using a predefined classification scheme for our clients. In fact, it is often our preferred approach for entity extraction when instantiating a knowledge graph as it supports a scalable and maintainable solution for the enterprise. 

 

Enterprise AI Architecture: Transfer Learning for Content Classification

 

As illustrated in the figure above, unstructured data in the enterprise can originate from many different systems beyond the conventional ones such as content management systems and websites. Such sources can include communication channels such as emails, instant messaging systems, and social media platforms, as well as digital asset management platforms to centrally store, organize, manage, and distribute media files such as images, video, and audio files within the organization. As machine learning can only work with textual data, depending on the type of content, the first step in implementing transfer learning is to employ appropriate text extraction and transformation algorithms to make the data suitable for use. Next, domain SMEs label a small chunk of the clean data to fine-tune the selected pretrained AI model with a predefined classification scheme (also provided by the domain SMEs). Post-training, the fine-tuned model is deployed to production and is available for content classification. At this stage, organizations can run their content through the operationalized transfer learning based content classification pipeline and store it in a centralized metadata repository such as a data catalog or even a simple object store that, in turn, can be used to power multiple enterprise use cases from data discovery to data analytics.

Transfer learning is one of the popular techniques we employ for entity extraction from unstructured content in our typical knowledge graph accelerator engagements. It is one of our criteria when evaluating data fabric solution vendors – especially in the case of a multinational pharmaceutical company. This is because transfer learning can easily grow with inputs from domain SMEs (with respect to data labelling and classification scheme definition) to tailor the machine prediction to the organizational needs and sustain the classification effort without extensive machine learning training. However, this does not mean that machine learning (ML) expertise is not required. For organizations that lack the internal skills to build and maintain custom ML pipelines, the following content classification approaches may be useful.

 

2. Taxonomy Manager-Driven Content Classification

Most modern Taxonomy Ontology Management Systems (TOMS) include a classification engine that supports automatic text classification based on a defined taxonomy. In our experience, organizations with access to a TOMS but without dedicated AI teams to develop and maintain custom ML models prefer using built-in classification capabilities of TOMS to categorize and structure their unstructured content. 

 

Enterprise AI Architecture: Taxonomy Manager Driven Content Classification

 

While there are variations across TOMS vendors in how they classify unstructured content using a taxonomy (such as leveraging just textual metadata or using structural relationships between taxonomy concepts to categorize content), as shown in the figure above, the high-level architecture integrating a TOMS with enterprise systems managing unstructured content and leveraging TOMS-generated metadata is generally independent on specific TOMS platforms. In this architecture, when an information architect deems a taxonomy is ready for use, they publish the corresponding classification rules to the TOMS-specific classification engine. Typically, organizations configure custom change listeners for taxonomy publication. This helps them decide when to tag their unstructured content with the published rules and store these tags in a central metadata repository to power many use cases in the enterprise. Sometimes, however, TOMS platforms offer native connectors for specific CMS such as SharePoint or WordPress to manage automatic tagging of its delta content upon the publication of a new taxonomy version.

We work with many leading TOMS vendors in our typical taxonomy accelerator engagements, and you can learn more about specific use cases and success stories in our knowledge base regarding the application of this approach when it comes to powering content discovery – from a knowledge portal in a global investment firm to creating more personalized customer experiences using effective content assembly at a financial solutions provider organization.

 

3. LLM-Powered Content Classification

With the rise of LLMs in recent years, we have been working with various prompting techniques to effectively classify text using LLMs in our engagements. Based on our experimentation, we have found that a few-shot prompting approach, in which the language model is provided with a small set of labelled examples along with a prompt to guide the classification of unstructured content, achieves high accuracy in text classification tasks even with limited labeled data. This does not, however, deemphasize the need for designing effective prompts to improve accuracy of the in-context learning approach that is central to any prompt engineering technique.

 

Enterprise AI Architecture: LLM-Powered Content Classification

 

As illustrated in the figure above, a prompt in a few-shot learning approach to content classification includes the classification scheme and labelled examples from domain SMEs besides the raw text we need the LLM to classify. But because of the limitations of the context window for most state-of-the-art (SOTA) LLMs, the input text often needs to be chunked post-preprocessing and cleaning to abide by the length limitations of the prompt (also shown in the figure above). What is not included in the image, however, are the LLM optimization techniques we often employ to improve the classification task performance at scale. It is widely accepted that any natural language processing (NLP) task that requires interaction with a LLM, which is often hosted on a remote server, will not be performant by default. Therefore, in our typical engagements, we employ optimization techniques such as caching prior responses, batching multiple requests into one prompt, and classifying multiple chunks in parallel beyond basic prompt engineering to implement a scalable content classification solution for the enterprise.

Last year we used the LLM-powered content classification approach when we completed a knowledge graph accelerator project with a public safety agency in Europe, where we could not use a TOMS-driven content classification approach to instantiate a knowledge graph. This is because of the risks associated with sensitive data transfer out of Azure’s Northern European region where the solution was hosted and into the infrastructure of the hosted TOMS platform (which was outside the allowed region). In this case, a LLM-powered content classification such as a few-shot prompting approach allowed us to develop the solution by extracting entities from their unstructured content and instantiating a knowledge graph that facilitated context-based, data-driven decision making for construction site planners at the agency.

More recently, we used the LLM-powered content classification approach when we engaged with a non-profit charitable organization to analyze their healthcare product survey data to understand its adoption in a given market and demographic and ultimately inform future product development. We developed a comprehensive list of product adoption factors that are not easily identified and included in product research. We then leveraged this controlled vocabulary of product adoption factors and Azure OpenAI models to classify the free form survey responses and understand the distinct ways in which these factors influence each other, thus contributing to a more nuanced understanding of how users make decisions related to the product. This enhanced pattern detection approach enabled a holistic view of influencing factors, addressing knowledge gaps in future product development efforts at the organization.

 

4. AI-Augmented Topic Taxonomy Generation

Up until this point in the article, we have focused on using taxonomies to structure unstructured content. We will now shift to using machine learning to analyze unstructured content and propose taxonomies and create knowledge graphs using AI. We will discuss how in recent years, LLMs have simplified entity and relationship extraction, enabling more organizations to incorporate knowledge graphs into their data management.

While we generally do not advise our clients to use LLMs without a human-in-the-loop process to create production grade domain taxonomies, we have used LLMs in past engagements to augment and support our taxonomic experts in naming latent topics in semantically grouped unstructured content and therefore create a very rough draft version of a topic taxonomy.  

Elaborating on the figure below, our approach centers on three key tasks: 

  1. Unsupervised clustering of the dataset, 
  2. Discovering latent themes within each cluster, 
  3. Creating a topic taxonomy based on these themes, and
  4. Engaging taxonomists and domain experts to validate and enhance taxonomy.

Because of the token limits inherent in all SOTA embedding models, once raw text is extracted from unstructured content, preprocessed, and cleaned, it has to be chunked before numerical representations that encapsulate semantic information called embeddings can be created by the embedding generation service and stored in the vector database. The embedding generation service may optionally include quantization techniques to address the high memory requirements for managing embeddings of a large dataset. Post-embedding generation, the taxonomy generation pipeline focuses on semantic similarity calculation. While semantic similarity between the underlying content or corpora can be trivially computed as the inner product of embeddings, for scalability reasons, we typically project the embeddings from their original high-dimensional space to lower dimensions, while also preserving their local and global data structures. At this point, the content clustering service will be able to use the embeddings as input features of a clustering algorithm, enabling the identification of related categories based on embedding distances. The next step in the process of autogenerating taxonomy concepts is to infer the latent topic of each cluster using an LLM as part of the latent topic identification service. Finally, a draft taxonomy is available for validation and update by domain experts before it can be used to power enterprise use cases from data discovery to analytics. 

 

Enterprise AI Architecture: AI-Augmented Topic Taxonomy Generation

 

We have enabled consumer-grade semantic capabilities using this very approach for taxonomy generation specifically for non-financial risk management in production at a multi-national bank by collapsing their original risk dataset from 20,000 free-text risk descriptions into a streamlined process with 1100 standardized taxonomy concepts for risk.

 

5. AI-Augmented Knowledge Graph Construction

AI-assistance for extracting entities and relationships from unstructured content can utilize methods ranging from transfer learning to LLM-prompting. In our experience, incorporating the schema as part of the latter technique greatly enhances the consistency of entity and relationship labeling. Before loading the extracted entities and relationships into a knowledge graph, LLMs, as well as heuristics as defined by domain SMEs, can be used to further disambiguate those entities. 

 

Enterprise AI Architecture: AI-Augmented Knowledge Graph Construction

 

Our typical approach for leveraging AI to construct a knowledge graph is depicted in the figure above. It starts with unstructured content processing techniques to generate raw text from which entities can be extracted. Coreference resolution, where all mentions of the same real-world entity are replaced by the noun phrase, often forms the first step of the entity extraction process. In the next step, whether we can employ some of the techniques described in the taxonomy driven content classification section for entity extraction or not depends on the underlying ontology (knowledge model or data schema) and how many of the classes in this data model can be instantiated with a corresponding taxonomy. Even for non-taxonomy classes, we can use transfer learning and prompt engineering to accelerate the extraction of instances of ontological classes from the raw text. Next, we can optionally process the extracted entities through an entity resolution pipeline to identify and connect instances of the same real-world entity within and across content sources into a distilled representation. In the last step of the entity extraction process, if applicable, we can further disambiguate extracted entities by linking them to corresponding entries in a public or private knowledge base (such as Wikidata). Once entities are available, it is time to relate these entities following the ontology to complete the knowledge graph instantiation process. Similar to entity extraction, an array of machine learning techniques ranging from both traditional supervised and unsupervised learning techniques to more modern transfer learning and prompt engineering techniques can be used for relationship classification. For example, when developing a knowledge graph powered recommendation engine connecting learning content and product data, we compared the efficacy of an unsupervised learning approach (e.g., similarity index) to predicting relationships between entities with that of a supervised learning approach (e.g., link classifier) for doing the same.

 

Closing

While structuring unstructured content with semantic assets has been the focus of this blog, it is clear that it can only be effective by incorporating an organization’s most valuable knowledge asset: its human expertise and all types of data. While I will delve deeper into the technical details of how to encode this expert knowledge into enterprise AI systems in a later segment of this KI architecture blog series, it is evident from the discussion above that mining knowledge from an organization’s vast amount of unstructured content will not be possible without domain expertise. As our case studies illustrate, these complementary techniques for knowledge extraction, topic modeling, and text classification, when combined with domain expertise can help organizations achieve true KI. In the next segment of this blog series, I will explore the technical approaches for providing standardized meaning and context to structured data in the enterprise using a semantic layer. In the meantime, if our case studies describing how we brought structure to our clients’ unstructured content through metadata resonate with you, contact us to help get you started with KI.

The post Enterprise AI Architecture Series: How to Extract Knowledge from Unstructured Content (Part 2) appeared first on Enterprise Knowledge.

]]>
Enterprise AI Architecture Series: How to Build a Knowledge Intelligence Architecture (Part 1) https://enterprise-knowledge.com/enterprise-ai-architecture-series-how-to-build-a-knowledge-intelligence-architecture-part-1/ Tue, 04 Feb 2025 18:07:47 +0000 https://enterprise-knowledge.com/?p=23070 Since the launch of ChatGPT over two years ago, we have observed that our clients are increasingly drawn to the promise of AI. They also recognize that the large language models (LLMs), trained on public data sets, may not effectively … Continue reading

The post Enterprise AI Architecture Series: How to Build a Knowledge Intelligence Architecture (Part 1) appeared first on Enterprise Knowledge.

]]>
Since the launch of ChatGPT over two years ago, we have observed that our clients are increasingly drawn to the promise of AI. They also recognize that the large language models (LLMs), trained on public data sets, may not effectively solve their domain-specific problems. Consequently, it would be essential to integrate domain knowledge into these AI systems to furnish them with a structured understanding of the organization. Recently, my colleague Lulit Tesfaye described three key strategies to enable such knowledge intelligence (KI) in the organization via expert knowledge capture, business context embedding and knowledge extraction using semantic layer assets and Retrieval Augmented Generation (RAG). Incorporating such a knowledge intelligence layer into enterprise architecture is not just a theoretical concept anymore but a critical necessity in the age of AI.  It is a practical enhancement that transforms the way in which organizations can inject knowledge into their AI systems to allow better interpretation of data, effective reasoning and informed decision making. 

When designing and implementing KI layers at client organizations, our goal is always to recommend an architecture that aligns closely with their existing enterprise architecture,  providing a minimally disruptive starting point.

In this article, I will describe the common architectural patterns we have utilized over the last decade to design and implement some of the KI strategies such as automated knowledge capture, semantic layers and RAG across a diverse set of organizations.  I will describe the key components of a KI layer, outlining their relationship with organizational data sources and applications through a high-level conceptual framework. In subsequent blogs, I will delve deeper into each of the 3 main strategies that details how KI integrates institutional knowledge, business context and human expertise to deliver on the promise of AI for the enterprise.

Enterprise AI Architecture: Knowledge Intelligence

Semantic Layer

A semantic layer provides standardized meaning and business context to aggregated data assets in an organization, allowing AI models to understand and process information more accurately and generate more relevant insights. Specifically, it can offer a more intuitive and connected representation of organizational data entities without having to physically move the data and it does so through use of metadata, business glossaries, taxonomies, ontologies and knowledge graphs

When implementing a semantic layer, we often encounter this common misconception that a semantic layer is a single product such as a graph database or a data catalog. While we have been developing the individual components of a semantic layer for almost a decade, we have only been integrating them all into a semantic layer in the last couple of years. You can learn more about the typical semantic layer architectures that we have implemented for our clients here. For implementing specific components of the semantic layer before they can all be integrated into a logical abstraction layer over enterprise data, we work with most top vendors in the space and leverage our proprietary vendor evaluation matrix to identify the appropriate tool for our client whether it is a taxonomy ontology management platform (TOMS), a graph database or a data catalog. You can read this article to learn more about our high level considerations when choosing any knowledge management platform including semantic layer tools.

Expert Knowledge Capture

This KI component programmatically encodes both implicit and explicit domain expert knowledge into a structured repository of information, allowing AI systems to incorporate an organization’s most valuable assets, its tacit knowledge and human expertise into its decision making process. While tacit knowledge is difficult to articulate, record and disseminate, using modern AI tools, it can be easily mined from recorded interactions (such as meeting transcripts, chat history) with domain experts. Explicit knowledge, although documented, is often not easily discoverable. State-of-the-art LLM models and taxonomies, however, make tagging this knowledge with meaningful metadata quite straightforward. In other words, in the age of AI while content capture may be a breeze, transforming the captured content into knowledge requires some thought. You can learn more about the best practices we often share with our clients for effective knowledge base management here. In particular, we have written extensively about improving the quality of knowledge bases using metadata and the big role taxonomy plays in it. With a taxonomy in place, it all comes down to teaching a machine learning (ML) model the domain-specific language that is used to describe the content so that it can accurately auto-classify it. See this article to learn more about our auto-tagging approach. 

Another aspect of expert knowledge capture is to engage domain experts in annotating datasets with contextual information or providing them with an embedded feedback loop to review AI outputs and provide corrections and enhancements. While annotation and feedback capabilities can be included in a vendor platform such as a data science workbench in a data management platform or taxonomy concept approval workflow in a taxonomy management system, we have implemented custom workflows to capture this domain knowledge for our clients as well. For example, you can read more about our human-in-the-loop taxonomy development process here or SME validation of taxonomy tag application process here.

Retrieval Augmented Generation

A Retrieval Augmented Generation (RAG) framework allows LLMs to access up-to-date organizational knowledge bases instead of relying solely on the LLM’s pre-trained knowledge to provide more accurate and contextually relevant outputs. An enterprise RAG application may even require reasoning based on specific relationships between knowledge fragments to collect information related to answering who/what/when/how/where questions as opposed to relying only on semantic similarity with complete knowledge base items. Thus we typically leverage two or more types of information retrieval systems when solving KI use cases through RAG for our customers. 

In its most basic form, a RAG application can be developed with an LLM, an embedding model and a vector database. You can read more about how we implemented this architecture to power semantic search in a multinational development bank here. In reality, however, RAG implementations rely on additional information retrieval systems in the enterprise such as search engines or data warehouses as well as semantic layer assets such as knowledge graphs. In addition, RAG applications require elaborate data orchestration between the available knowledge bases and the LLM; popular frameworks such as LangChain and LlamaIndex can greatly simplify this orchestration by providing abstractions for common RAG steps such as indexing, retrieval, and workflows. Finally, to take any POC implementation of a RAG application to production, we need to leverage some of the data integrations and shared services such as monitoring, security described below.

Data Integration

Just like any data integration, aggregation and transformation layer, a KI layer depends on various tools to extract, connect, transform and unify both structured and unstructured data sources. These tools include ELT (Extract, Load and Transform) and ETL (Extract, Transform and Load) tools, like Apache Airflow, API management platforms, like MuleSoft, and data virtualization platforms, like Tibco Data Virtualization. Typically, these integration and transformation patterns are well-established within organizations; hence, we often recommend that our clients reuse proven design patterns wherever possible. Additionally, we advise our clients to leverage established data cleansing techniques before sending the data to the KI layer for further enrichment and standardization.

KI Applications

While chatbots remain the most common application of KI, we have leveraged KI to power intelligent search, recommendation engines, agentic AI workflows and business intelligence applications for our clients. In our experience, KI applications range from fully custom applications such as AI agents to configurable Software-as-a-Service (SaaS) platforms such as AI search engines.

Shared Services

Services including data security management, user and system access management, logging, monitoring and other centralized IT functions within an organization will need to be integrated with the KI layer in accordance with established organizational protocols.

Case Study

While we have been implementing individual KI components at client organizations over the past decade, only recently we have begun to implement and integrate multiple KI components to enable organizations to extract maximum value from their AI efforts. For example, over the last two years we established a data center of excellence at a multinational bank to enable effective non-financial risk management by implementing and integrating two distinct KI components: semantic layer and expert knowledge capture and transfer. Using a semantic layer, we injected business context into their structured datasets by enriching it using a standardized categorization structure, contextualizing it using a domain ontology and connecting it via a knowledge graph. As a result, when instantiated and deployed to production, the graph became an authoritative source of truth and provided a solid foundation for advanced analytics and AI capabilities to improve the efficiency and accuracy of the end-to-end risk management process.  We also implemented the expert knowledge capture component of KI by programmatically encoding domain knowledge and business context into the taxonomies and ontologies we developed for this initiative. For example, we created a new risk taxonomy by mining free text risk descriptions using a ML pipeline but significantly shortened the overall development time by embedding human feedback in the pipeline. Specifically, we provided domain experts with embedded tools and processes to review model outputs and provide corrections and additional annotations that were in turn leveraged to refine the ML models and create the finalized taxonomy in an iterative fashion. In the end both KI components enabled the firm to establish a robust foundation for enhanced risk management; it powered consumer-grade AI capabilities running on the semantic layer that streamlined access to critical insights through intelligent search, linked data view and query, thereby improving regulatory reporting, and fostering a more data-driven risk management culture at the firm.

Closing

While there are a number of approaches to designing and implementing the core KI components described above, there are best practices to ensure the quality and scalability of the solution. The upcoming blogs in this series zoom into each of these components, enumerate the approaches for implementing each component, discuss how to achieve KI from a technical perspective, and detail how each component would support the development of Enterprise AI with real-life case studies. As with any technical implementation, we recommend grounding any KI implementation effort in a business case, starting small and iterating, beginning with a few source systems to lay a solid foundation for an enterprise KI layer. Once the initial KI layer has been established, it is easier to expand the KI ecosystem while enabling foundational AI models to generate meaningful content, make intelligent predictions, discover hidden insights, and drive valuable business outcomes.

Looking for technical advisory on how to get your KI layer off the ground? Contact us to get started.

The post Enterprise AI Architecture Series: How to Build a Knowledge Intelligence Architecture (Part 1) appeared first on Enterprise Knowledge.

]]>
Leveraging Headless CMS for Technical Cross-Functionality https://enterprise-knowledge.com/leveraging-headless-cms-for-technical-cross-functionality/ Mon, 08 Aug 2022 13:00:32 +0000 https://enterprise-knowledge.com/?p=15956 Headless CMS (Content Management System) architecture is a flexible development strategy for applications that is rapidly growing in today’s industry practices. Utilizing a headless CMS architecture allows an application to deliver content authored from a single interface to multiple delivery … Continue reading

The post Leveraging Headless CMS for Technical Cross-Functionality appeared first on Enterprise Knowledge.

]]>
Headless CMS (Content Management System) architecture is a flexible development strategy for applications that is rapidly growing in today’s industry practices. Utilizing a headless CMS architecture allows an application to deliver content authored from a single interface to multiple delivery channels. Content is processed through an API (Application Programming Interface) and distributed to multiple channels or “heads,” by means of a central service, or the “body.” One of the concerns many organizations have about pursuing headless development is that producing content for multiple channels means having a team skilled in multiple areas. However, with a thoughtful approach, this can be a powerful opportunity for an organization’s engineering team.

The code base for a headless CMS is complex, more so than a traditional, monolithic solution. While it would be ideal to have a development team consisting of people with existing, overlapping skills in all of the pieces the headless CMS project will touch, the reality is typically quite different. Rather than viewing this as an obstacle, however, the broad scope of headless CMS projects offer an opportunity for growth in an environment of siloed development. Because a headless CMS application often houses several communicating services, it is absolutely necessary for the entire team to be in sync with where certain data lives, how content is structured, and how each delivery point communicates with other delivery channels and/or the central service. To accomplish this, it is crucial to intentionally build a well-thought-out, cross-functional headless CMS team that will naturally tear down the existing silos between team members who would otherwise work on only a specific, small portion of the application. The team can then learn areas outside of their comfort zone and ensure the development team remains in sync, all while delivering a valuable product to a customer.

Architecture of the Application

Planning Phases

Steps to building a strong cross-functional team begin early. During the planning phase for a headless project, if possible, ensure that the entire development team is involved in designing the architecture and selecting the technology stack for development. This will give engineers an opportunity to ask questions and explore learning materials regarding topics outside of their area(s) of expertise. Adding planning time into early sprints to invest in the growth of the technical team will pay off later in the development lifecycle. Beyond simply improving future work on the current project, expanding the abilities of the team now naturally leads to a larger bench of engineers who are experienced in the industry-wide practice of headless CMS development. It will also foster increased trust from both clients and the development team to have an entire team of developers fluent in the entire technology stack of an application. This allows for greater flexibility within both the space of a client’s availability and allotted work within a sprint.

Building the Codebase

Within the later phases of planning, consider the importance of structure and documentation within the API(s) that extend the functionality of the central microservice and deliver content. Building in the time to create solid documentation is a clear winner, both from the point of view of helping “future you” recall how a system works but also by making it dramatically easier for a teammate to pick up work in a new area and quickly get up to speed. Again, this expands the bench of engineers that are able to work in a traditionally siloed area, increasing productivity and mitigating the worry of technical debt. Engineers who are heavily involved in planning will feel more comfortable contributing code when development starts since they will be familiar with the architectural goals of the application. Because a Headless CMS is built with the ideal state able to implement limitless supported devices, building a codebase to house structured, flexible content, and clean points of communication results in a maintainable application and a well-prepared group of engineers. This also reinforces best practices of multiple languages/technologies during application development. As a result, engineers will better understand how to contribute scalable, well-commented code without the need for upskilling later on in the development process.

Development Processes

Team Code Reviews

In many aspects, code reviews within the agile development process of a Headless CMS remain the same. However, to integrate the continuing theme of team cross-functionality, it is important to include the entire development team in the code review process. As multiple features are being added to the application in a sprint, it is crucial to ensure each team member maintains their understanding of the codebase. When reviewing code, keep in mind the structure of the application. Consider how the content should be structured in delivery and storage. Furthermore, keep in mind that the structure of said content may also be transformed upon delivery through APIs. In this way, it is most efficient to have the entire development team involved with all reviews of delivered code, not just those who have expertise in that area of development. With good communication and team synchronization during the process of review, there will be less time needed for upskilling. This allows all involved engineers to add features without the necessity to take time reviewing content delivery or general points of communication between services housed in the Headless CMS application.

Consider having synchronous code reviews when code is added that will affect or extend the communication between APIs or any of the APIs with the central microservice. At the very least, make sure all developers have a chance to review all contributions made to the application as a whole to mitigate the scope creep caused by avoidable technical debt from upskilling later on.

Version Control Workflows

Another crucial aspect of Headless CMS development is the Git Workflow the application follows during a sprint cadence and production releases. It is surprisingly easy for a team’s Git Flow to fall out of sync in the midst of building features and making changes, especially when tasked with engineering such a large application. It is crucial for the entire team to understand what format their feature, bugfix, or hotfix branches must follow and where they should be branched from. This is especially important in the scope of building a Headless CMS application, considering all the possible points of failure between points of communication within the technology stack, channels of content delivery, and proper structure of stored data. If a team’s workflow falls out of sync, the possibility for portions of the application to fall behind or creep ahead increases. Accordingly, the imbalance of incurred technical debt may alter the development timeline of the application as a whole.

In Summary

To ensure the most efficient delivery of a headless CMS application, it is absolutely crucial to break down the silos of a development team throughout both the planning and development processes of a large application. Investing in the growth of developers and keeping a strong focus of synchronization regarding the whole product mitigates numerous risks of the development timeline of a headless CMS application. With the proper approach and correct mindset to leverage the opportunities of growth presented by this new development practice, a maintainable product can be delivered in the most efficient manner. Simultaneously, the development team involved with building the product will achieve growth and more opportunities to learn contemporary practices in the space of application development through hands-on practice.

The post Leveraging Headless CMS for Technical Cross-Functionality appeared first on Enterprise Knowledge.

]]>
What is a Semantic Architecture and How do I Build One? https://enterprise-knowledge.com/what-is-a-semantic-architecture-and-how-do-i-build-one/ Thu, 02 Apr 2020 13:00:48 +0000 https://enterprise-knowledge.com/?p=10865 Can you access the bulk of your organization’s data through simple search or navigation using common business terms? If so, your organization may be one of the few that is reaping the benefits of a semantic data layer. A semantic … Continue reading

The post What is a Semantic Architecture and How do I Build One? appeared first on Enterprise Knowledge.

]]>
Can you access the bulk of your organization’s data through simple search or navigation using common business terms? If so, your organization may be one of the few that is reaping the benefits of a semantic data layer. A semantic layer provides the enterprise with the flexibility to capture, store, and represent simple business terms and context as a layer sitting above complex data. This is why most of our clients typically give this architectural layer an internal nickname, referring to it as “The Brain,”  “The Hub,” “The Network,” “Our Universe,” and so forth. 

As such, before delving deep into the architecture, it is important to align on and understand what we mean by a semantic layer and its foundational ability to solve business and traditional data management challenges. In this article, I will share EK’s experience designing and building semantic data layers for the enterprise, the key considerations and potential challenges to look out for, and also outline effective practices to optimize, scale, and gain the utmost business value a semantic model provides to an organization.

What is a Semantic Layer?

A semantic layer is not a single platform or application, but rather the realization or actualization of a semantic approach to solving business problems by managing data in a manner that is optimized for capturing business meaning and designing it for end user experience. At its core, a standard semantic layer is specifically comprised of at least one or more of the following semantic approaches: 

  • Ontology Model: defines the types of things that exist in your business domain and the properties that can be used to describe them. An ontology provides a flexible and standard model that organizes structured and unstructured information through entities, their properties, and the way they relate to one another.
  • Enterprise Knowledge Graph: uses an ontology as a framework to add in real data and enable a standard representation of an organization’s knowledge domain and artifacts so that it is understood by both humans and machines. It is a collection of references to your organization’s knowledge assets, content, and data that leverages a data model to describe the people, places, and things and how they are related. 

A semantic layer thus pulls in these flexible semantic models to allow your organization to map disparate data sources into a single schema or a unified data model that provides a business representation of enterprise data in a “whiteboardable” view, making large data accessible to both technical and nontechnical users. In other words, it provides a business view of complex knowledge, information, and data and their assorted relationships in a way that can be visually understood.

How Does a Semantic Layer Provide Business Value to Your Organization?

Organizations have been successfully utilizing data lakes and data warehouses in order to unify enterprise data in a shared space. A semantic data layer delivers the best value for enterprises that are looking to support the growing consumers of big data, business users, by adding the “meaning” or “business knowledge” behind their data as an additional layer of abstraction or as a bridge between complex data assets and front-end applications such as enterprise search, business analytics and BI dashboards, chatbots, natural language process etc. For instance, if you ask a non-semantic chatbot, “what is our profit?” and it recites the definition of “profit” from the dictionary, it does not have a semantic understanding or context of your business language and what you mean by “our profit.” A chatbot built on a semantic layer would instead respond with something like a list of revenue generated per year and the respective percentage of your organization’s profit margins.Visual representation of how a semantic layer draws connections between your data management and storage layer

With a semantic layer as part of an organization’s Enterprise Architecture (EA), the enterprise will be able to realize the following key business benefits: 

  • Bringing Business Users Closer to Data: business users and leadership are closer to data and can independently derive meaningful information and facts to gain insights from large data sources without the technical skills required to query, cleanup, and transform large data.   
  • Data Processing: greater flexibility to quickly modify and improve data flows in a way that is aligned to business needs and the ability to support future business questions and needs that are currently unknown (by traversing your knowledge graph in real time). 
  • Data Governance: unification and interoperability of data across the enterprise minimizes the risk and cost associated with migration or duplication efforts to analyze the relationships between various data sources. 
  • Machine Learning (ML) and Artificial Intelligence (AI):  Serves as the source of truth for providing definition of the business data to machines and enabling the foundation for deep learning and analytics to help the business answer or predict business challenges.

Building the Architecture of a Semantic Layer

A semantic layer consists of a wide array of solutions, ranging from the organizational data itself, to data models that support object or context oriented design, semantic standards to guide machine understanding, as well as tools and technologies to enable and facilitate implementation and scale. Visual representation of semantic layer architecture. Shows how to go from data sources, to data modeling/transformation/unification and standardization, to graph storage and a unified taxonomy, to finally a semantic layer, and then lists some of the business outcomes.

The three foundational steps we have identified as critical to building a scalable semantic layer within your enterprise architecture are: 

1. Define and prioritize your business needs: In building semantic enterprise solutions, clearly defined use cases provide the key question or business reason your semantic architecture will answer for the organization. This in turn drives an understanding of the users and stakeholders, articulates the business value or challenge the solution will solve for your organization, and enables the definition of measurable success criteria. Active SME engagement and validation to ensure proper representation of their business knowledge and understanding of their data is critical to success. Skipping this foundational step will result in missed opportunities for ensuring organizational alignment and return on your investment (ROI). 

2. Map and model your relevant data: Many organizations we work with support a data architecture that is based on relational databases, data warehouses, and/or a wide range of content management cloud or hybrid cloud applications and systems that drive data analysis and analytics capabilities. This does not necessarily mean that these organizations need to start from scratch or overhaul their working enterprise architecture in order to adopt/implement semantic capabilities. For these organizations, it is more effective to start increasing the focus on data modeling and designing efforts by adding models and standards that will allow for capturing business meaning and context (see section below on Web Standards) in a manner that provides the least disruptive starting point. In such scenarios, we typically select the most effective approach to model data and map from source systems by employing the relevant transformation and unification processes (Extract, Transform, Load – ETLs) as well as model-mapping best practices (think ‘virtual model’ versus stored data model in graph storages like graph databases, property graphs, etc.) that are based on the organization’s use cases, enterprise architecture capabilities, staff skill sets, and primarily provide the highest flexibility for data governance and evolving business needs.

The state of an organization’s data typically comes in various formats and from disparate sources. Start with a small use case and plan for an upfront clean-up and transformation effort that will serve as a good investment to start organizing your data and set stakeholder expectations while demonstrating the value of your model early.

3. Leverage semantic web standards to ensure interoperability and governance: Despite the required agility to evolve data management practices, organizations need to think long term about scale and governance. Semantic Web Standards provide the fundamentals that enable you to adopt standard frameworks and practices when kicking off or advancing your semantic architecture. The most relevant standards to the enterprise should be to: 

  • Employ an established data description framework to add business context to your data to enable human understanding and natural language meaning of data (think taxonomies, data catalogs, and metadata); 
  • Use standard approaches to manage and share the data through core data representation formats and a set of rules for formalizing data to ensure your data is both human-readable and machine-readable (examples include XML/RDF formats); 
  • Apply a flexible logic or schema to map and represent relationships, knowledge, and hierarchies between your organization’s data (think ontologies/OWL);
  • A semantic query language to access and analyze the data natural language and artificial intelligence systems (think SPARQL). 
  • Start with available existing/open-source semantic models and ecosystems for your organization to serve as a low-risk, high-value stepping stone (think Open Linked Data/Schema.org). For instance, organizations in the financial industry can start their journey by using a starter ontology for Financial Industry Business Ontology (FIBO), while we have used the Gene Ontology for Biopharma as a jumping off point or to enrich or tailor their model for the specific needs of their organization.

4. Scale with Semantic Tools: Semantic technology components in a more mature semantic layer include graph management applications that serve as middleware, powering the storage, processing, and retrieval of your semantic data. In most scaled enterprise implementations, the architecture for a semantic layer includes a graph database for storing the knowledge and relationships within your data (i.e. your ontology), an enterprise taxonomy/ontology management or a data cataloging tool for effective application and governance of your metadata on enterprise applications such as content management systems, and text analytics or extraction tools to support  advanced capabilities such as Machine Learning (ML) or natural language processing (NLP) depending on the use cases you are working with. 

5. “Plug in” your customer/employee facing applications: The most practical and scalable semantic architecture will successfully support upstream customers or employees facing applications such as enterprise search, data visualization tools, end services/consuming systems, and chatbots, just to name a few potential applications. This way you can “plug” semantic components into other enterprise solutions, applications, and services. With this as your foundation, your organization can now start taking advantage of advanced artificial intelligence (AI) capabilities such as knowledge/relationship and text extraction tools to enable Natural Language Processing (NLP), Machine Learning based pattern recognition to enhance findability and usability of your content, as well automated categorization of your content to augment your data governance practices. 

The cornerstone of a scalable semantic layer is ensuring the capability for controlling and managing versions, governance, and automation. Continuous integration pipelines including standardized APIs and automated ETL scripts should be considered as part of the DNA to ensure consistent connections for structured input from tested and validated sources.

Conclusion

In summary, semantic layers work best as a natural integration framework for enabling interoperability of organizational information assets. It is important to get started by focusing on valuable business-centric use cases that drive getting into semantic solutions. Further, it is worth considering a semantic layer as a complement to other technologies, including relational databases, content management systems (CMS), and other front-end web applications that benefit from having easy access and an intuitive representation of your content and data including your enterprise search, data dashboards, and chatbots.

If you are interested in learning more to determine if a semantic model fits within your organization’s overall enterprise architecture or if you are embarking on the journey to bridge organizational silos and connect diverse domains of knowledge and data that accelerate enterprise AI capabilities, read more or email us.   

Get Started Ask Us a Question

 

The post What is a Semantic Architecture and How do I Build One? appeared first on Enterprise Knowledge.

]]>