enterprise knowledge graph Articles - Enterprise Knowledge

Enterprise Knowledge Speaking at KMWorld 2025

EK Team — Wed, 12 Nov 2025 21:11:22 +0000

Enterprise Knowledge (EK) will once again have a strong presence at the upcoming KMWorld Conference in Washington, D.C. This year, EK is delivering 11 sessions throughout KMWorld and its four co-located events: Taxonomy Boot Camp, Enterprise Search & Discovery, Enterprise AI World, and the Text Analytics Forum.

EK is offering an array of thought leadership sessions to share KM approaches and methodologies. Several of EK’s sessions include presentations with clients, where presenters jointly deliver advanced case studies on knowledge graphs, enterprise learning solutions, and AI.

On November 17, EK-led events will include:

Taxonomy Principles to Support Knowledge Management at a Not-for-Profit, featuring Bonnie Griffin, co-presenting with Miriam Heard of YMCA – Learn how Heard and Griffin applied taxonomy design to tame tags, align content types, and simplify conventions, transforming the YMCA’s intranet so staff can find people faster, retrieve information reliably, and share updates with the right audiences.
Utilizing Taxonomies to Meet UN SDG Obligations, featuring Benjamin Kass, co-presenting with Mike Cannon of the American Speech-Language-Hearing Association (ASHA) – Discover how ASHA, a UN SDG Publishers Compact signatory, piloted automatic tagging to surface SDG-relevant articles, using taxonomies for robust metadata, analytics, and high-quality content collections.
Driving Knowledge Management With Taxonomy and Ontology, featuring Bonnie Griffin, co-presenting with Alexander Zichettello of Honda Development & Manufacturing of America – Explore how Zichettello and Griffin designed taxonomies and ontologies for a major automaker, unifying siloed content and terminology. Presenters will share a repeatable, standards-based process and the best practices for scalable, sustainable knowledge management with attendees.

On November 18, EK-led events will include:

Taxonomy From 2006 to 2045: Are We Ready for the Future?, moderated by Zach Wahl, EK’s CEO and co-founder – Celebrate 20 years of Taxonomy Boot Camp with a look back at 2006 abstracts, crowd-voted predictions for the next two decades (AI included), lively debate, and a cake-cutting send-off.

On November 19, EK-led events will include:

Transforming Content Operations in the Age of AI, featuring Rebecca Wyatt and Elliott Risch – Learn how Wyatt and Risch partnered to leverage an AI proof of concept to prioritize and accelerate content remediation and improve content and search experiences on a flagship Intel KM platform.
Tracing the Thread: Decoding the Decision-Making Process With GraphRAG, featuring Urmi Majumder and Kaleb Schultz – Learn about GraphRAG, how pairing generative AI with a standards-based knowledge graph can unify data to tackle complex questions, curb hallucinations, and deliver traceable answers.
The Cost of Missing Critical Connections in Data: Suspicious Behavior Detection Using Link Analysis (A Case Study), featuring Urmi Majumder and Kyle Garcia – See how graph-powered link analysis and NLP can uncover hidden connections in messy data, powering fraud detection and risk mitigation, with practical modeling choices and a real-world, enterprise-ready case study.
Generating Structured Outputs From Unstructured Content Using LLMs, featuring Kyle Garcia and Joseph Hilger, EK’s COO and co-founder – Discover how LLMs guided by content models break long, unstructured documents into reusable, knowledge graph–ready components, reducing hallucinations while improving search, personalization, and cross-platform reuse.

On November 20, EK-led events will include:

Enterprises, KM, & Agentic AI, featuring Jess DeMay, co-presenting with Rachel Teague of Emory Consulting Services – This interactive discussion looks at organizational trends as well as new technologies and processes to enhance knowledge sharing, communication, collaboration, and innovation in the enterprises of the future.
Making Search Less Taxing: Leveraging Semantics and Keywords in Hybrid Search, featuring Chris Marino, co-presenting with Jaime Martin of Tax Analysts – Explore how Tax Analysts, the nonpartisan nonprofit behind Tax Notes, scaled an advanced search overhaul that lets subscribers rapidly find what they need while surfacing relevant content they didn’t know to look for.
The Future of Enterprise Search & Discovery, a panel including EK’s COO and co-founder Joseph Hilger – Get a glimpse of what’s next in enterprise search and discovery as this panel unpacks agentic AI and emerging trends, offering near and long-term predictions for how tools, workflows, and roles will evolve.

Come to KMWorld 2025, November 17–20 in Washington D.C., to hear from EK experts and learn more about the growing field of knowledge management. Register here.

The post Enterprise Knowledge Speaking at KMWorld 2025 appeared first on Enterprise Knowledge.

Women’s Health Foundation – Semantic Classification POC

EK Team — Thu, 10 Apr 2025 19:20:31 +0000

The Challenge

A humanitarian foundation focusing on women’s health faced a complex problem: determining the highest impact decision points in contraception adoption for specific markets and demographics. Two strategic objectives drove the initiative—first, understanding the multifaceted factors (from product attributes to social influences) that guide women’s contraceptive choices, and second, identifying actionable insights from disparate data sources. The key challenge was integrating internal survey response data with internal investment documents to answer nuanced competency questions such as, “What are the most frequently cited factors when considering a contraceptive method?” and “Which factors most strongly influence adoption or rejection?” This required a system that could not only ingest and organize heterogeneous data but also enable executives to visualize and act upon insights derived from complex cross-document analyses.

The Solution

To address these challenges, the project team developed a proof-of-concept (POC) that leveraged advanced graph technology combined with AI-augmented classification techniques.

The solution was implemented across several workstreams:

Defining System Functionality
The initial phase involved clearly articulating the use case. By mapping out the decision landscape—from strategic objectives (improving modern contraceptive prevalence rates) to granular insights from user research—the team designed a tailored taxonomy and ontology for the women’s health domain. This semantic framework was engineered to capture cultural nuances, local linguistic variations, and the diverse attributes influencing contraceptive choices.

Processing Existing Data
With the functionality defined, the next phase involved transforming internal survey responses and investment documents into a unified, structured format. An AI-augmented classification workflow was deployed to extract tacit knowledge from survey responses. This process was supported by a stakeholder-validated taxonomy and ontology, allowing raw responses to be mapped into clearly defined data classes. This robust data processing pipeline ensured that quantitative measures (like frequency of citation) and qualitative insights were captured in a cohesive base graph.

Building the Analysis Model
The core of the solution was the creation of a Product Adoption Survey Base Graph. Processed data was converted into RDF triples using a rigorous ontology model, forming the base graph designed to answer competency questions via SPARQL queries. While this model laid the foundation for revealing correlations and decision factors, the full production of the advanced analysis graph—designed to incorporate deeper inference and reasoning—remained as a future enhancement.

Handoff of Analysis Graph Production and Frontend Implementation
Due to time constraints, the production of the comprehensive analysis graph and the implementation of the interactive front end were transitioned to the client. Our team delivered the base graph and all necessary supporting documentation, providing the client with a solid foundation and a detailed roadmap for further development. This handoff ensures that the client’s in-house teams can continue productionalizing the analysis graph and integrate it with their BI dashboard for end-user access.

Provide a Roadmap for Further Development
Beyond the initial POC, a clear roadmap was established. The next steps include refining the AI classification workflow, fully instantiating the analysis graph with enhanced reasoning capabilities, and developing the front end to expose these insights via a business intelligence (BI) dashboard. These tasks have been handed off to the client, along with guidance on leveraging enterprise graph database licenses and integrating the solution within existing knowledge management frameworks.

The EK Difference

A standout feature of this project is its novel, generalizable technical architecture:

Ontology and Taxonomy Design
A custom ontology was developed to model the women’s health domain—incorporating key decision factors, cultural influences, and local linguistic variations. This semantic backbone ensures that structured investment data and unstructured survey responses are harmonized under a common framework.

AI-Augmented Classification Pipeline:
The solution leverages state-of-the-art language models to perform the initial classification of survey responses. Supported by a validated taxonomy, this pipeline automatically extracts and tags critical data points from large volumes of survey content, laying the groundwork for subsequent graph instantiation, inference, and analysis.

Graph Instantiation and Querying:
Processed data is transformed into RDF triples and instantiated within a dedicated Product Adoption Survey Base Graph. This graph, queried via SPARQL through a GraphDB workbench, offers a robust mechanism for cross-document analysis. Although the full analysis graph is pending, the base graph effectively supports the core competency questions.

Guidance for BI Integration:
The architecture includes a flexible API layer and clear documentation that maps graph data into SQL tables. This design is intended to support future integration with BI platforms, enabling real-time visualization and executive-level decision-making.

The Results

The POC delivered compelling outcomes despite time constraints:

Actionable Insights:
The system generated new insights by identifying frequently cited and impactful decision factors for contraceptive adoption, directly addressing the competency questions set by the Women’s Health teams.
Improved Data Transparency:
By structuring tribal knowledge and unstructured survey data into a unified graph, the solution provided an explainable view of the decision landscape. Stakeholders gained visibility into how each insight was derived, enhancing trust in the system’s outputs.
Scalability and Generalizability:
The technical architecture is robust and adaptable, offering a scalable model for analyzing similar survey data across other health domains. This approach demonstrates how enterprise knowledge graphs can drive down the total cost of ownership while enhancing integration within existing data management frameworks.
Strategic Handoff:
Recognizing time constraints, our team successfully handed off the production of the comprehensive analysis graph and the implementation of the front end to the client. This strategic decision ensured continuity and allowed the client to tailor further development to their unique operational needs.

Download Flyer

Ready to Get Started?

Get in Touch

The post Women’s Health Foundation – Semantic Classification POC appeared first on Enterprise Knowledge.

Enterprise Knowledge Graphs: The Importance of Semantics

EK Team — Thu, 23 May 2024 17:13:34 +0000

Heather Hedden, Senior Consultant at Enterprise Knowledge, presented “Enterprise Knowledge Graphs: The Importance of Semantics” on May 9, 2024, at the annual Data Summit in Boston.

In her presentation, Hedden describes the components of an enterprise knowledge graph and provides further insight into the semantic layer – or knowledge model – component, which includes an ontology and controlled vocabularies, such as taxonomies, for controlled metadata. While data experts tend to focus on the graph database components (RDF triple store or a label property graph), Hedden emphasizes they should not overlook the importance of the semantic layer.

Explore the presentation to learn:

The definition and benefits of an enterprise knowledge graph
The components of a knowledge graph
The fundamentals of graph databases
The basics features of taxonomies and ontologies
The role of taxonomies and ontologies in knowledge graphs
How an enterprise knowledge graph is built

The post Enterprise Knowledge Graphs: The Importance of Semantics appeared first on Enterprise Knowledge.

The Top 3 Ways to Implement a Semantic Layer

Lulit Tesfaye — Tue, 12 Mar 2024 16:09:47 +0000

Over the last decade, we have seen some of the most exciting innovations emerge within the enterprise knowledge and data management spaces. Those innovations with real staying power have proven to drive business outcomes and prioritize intuitive user engagement. Within this list are a semantic layer (for breaking the silos between knowledge and data) and of course, generative AI (a topic that is often top of mind on today’s strategic roadmaps). Both have one thing in common – they are showing promise in addressing the age-old challenge of unlocking business insights from organizational knowledge and data, without the complexities of expensive data, system, and content migrations.

In 2019, Gartner published research emphasizing the end to “a single version of the truth” for data and knowledge management and that by 2026, “active metadata” will power over 50% of BI and analytics tools and solutions to provide a structured and consistent approach to connecting instead of consolidating data.

By employing semantic components and standards (through metadata, business glossaries, taxonomy/ontology, and graph solutions), a semantic layer arms organizations with a framework to aggregate and connect siloed data/content, explicitly provide business context for data, and serve as the layer for explainable AI. Once connected, independent business units can use the organization’s semantic layers to locate and work with not only enterprise data, but their own, unit-specific data as well.

Incorporating a semantic layer into enterprise architecture is not just a theoretical concept, it’s a practical enhancement that transforms how organizations harness their data. Over the last ten years, we’ve worked with a diverse set of organizations to design and implement the components of a semantic layer. Many organizations we work with support a data architecture that is based on relational databases, data warehouses, and/or a wide range of content management, cloud, or hybrid cloud applications and systems that drive data analysis and analytics capabilities. These models do not necessarily mean that organizations need to start from scratch or overhaul their working enterprise architecture in order to adopt/implement a semantic layer. To the contrary, it is more effective to shift the focus on metadata and data modeling or designing efforts by adding models and standards that will allow for capturing business meaning and context in a manner that provides the least disruptive starting point.

Though we’ve been implementing the individual components for over a decade, it has only been the last couple years where we’ve been integrating them all to form a semantic layer. The maturity of approaches, technologies, and awareness have all combined with the growing need of organizations and the AI revolution to create this opportunity now.

In this article, I will explore the top three common approaches we are seeing at play in order to weave this data and knowledge layer into the fabric of enterprise architecture, highlighting the applications and organizational considerations for each.

1. A Metadata-First Logical Architecture: Using Enterprise Semantic Layer Solutions

This is the most common and scalable model we see across various industries and use cases for enterprise-wide applications.

Architecture

Implementing a semantic layer through a metadata-first logical architecture involves creating a logical layer that abstracts the underlying data sources by focusing on metadata. This approach establishes an organizational logical layer through standardized definitions and governance at the enterprise level while allowing for additional, decentralized components and solutions to be “pushed,” “published,” or “pulled from” specific business units, use cases, and systems/applications at a set cadence.

Pros

Using middleware solutions like a data catalog or an ontology/graph storage, organizations are able to create a metadata layer that abstracts the underlying complexities, offering a unified view of data in real time based on metadata only. This allows organizations to abstract access, ditch application-centric approaches, and analyze data without the need for physical consolidation. This model effectively leverages the capabilities of standalone systems or applications to manage semantic layer components (such as metadata, taxonomies, glossaries, etc.) while providing centralized storage for semantic components to create a shared, enterprise semantic layer. This approach ensures consistency in core or shared data definitions to be managed at the enterprise level while providing the flexibility for individual teams to manage their unique secondary and group-level semantic data requirements.

Cons

Implementing a semantic layer as a metadata architecture or logical layer across enterprise systems requires planning in phases and incremental development to maintain cohesion and prevent fragmentation of shared metadata and semantic components across business groups and systems. Additionally, depending on the selected synchronization approach of the layer with downstream/upstream applications (push vs. pull), data orchestration and ETL pipelines will need to plan for a centralized vs. decentralized orchestration that ensures ongoing alignment.

Best Suited For

This approach is our most deployed and well-suited for organizations that want to balance standardization with the need for business unit or application level agility in data processing and operations in different parts of the business.

2. Built-for-Purpose Architecture: Individual Tools with Semantic Capabilities

This model allows for greater flexibility and autonomy at the business unit or functional level.

Architecture

This architecture approach is a distributed model that leverages each standalone system or application capabilities to own semantic layer components – without a connected technical framework or governance structure at the enterprise level for shared semantics. With this approach, organizations typically identify establishing semantic standards as a strategic initiative but each individual team or department (marketing, sales, product, data teams, etc.) is responsible for creating, executing, and managing its semantic components (metadata, taxonomies, glossaries, graph, etc.), tailored to their specific needs and requirements.

Most knowledge and data solutions such as content or document management systems (CMS/DMS), digital asset management (DAMs), customer relationship management (CRM), and data analytics/BI dashboards (such as Tableau and PowerBI) have inherent capabilities to manage simple semantic components (although with varied maturity and feature flexibility levels). This decentralized architecture results in the implementation of multiple system-level semantic layers. Let’s take SharePoint as an example, an enterprise document and content collaboration platform. For organizations that are in the early stages of growing their semantic capabilities, we leverage the Term Store for structuring metadata and taxonomy management within SharePoint, which allows teams to create a unified language, fostering consistency across documents, lists, and libraries. This helps with information retrieval and also enhances collaboration by ensuring a shared understanding of key metrics. On the other hand, Salesforce, a renowned CRM platform, offers semantic capabilities that enable teams across sales, marketing, and customer service to define and interpret customer data consistently across various modules.

Pros

This decentralized model promotes agility and empowers business units to leverage their existing platforms (that are built-for-purpose) as not just data/content repositories but as dynamic sources of context and alignment, driving consistent understanding of shared data and knowledge assets for specific business functions.

Cons

However, this decentralized approach typically leads those users who need cohesive organizational content and data to do so through separate interfaces. Data governance teams or content stewards are also likely to manage each system independently. This leads to data silos, “semantic drifts,” and inconsistency in data definitions and governance (where duplication and data quality issues arise). This ultimately results in misalignment between business units, as they may interpret data elements differently, leading to confusion and potential inaccuracies.

Best Suited For

This approach is particularly advantageous for organizations with diverse business units or teams that operate independently. It empowers business users to have more control over their data definitions and modeling and allows for quicker adaptation to evolving business needs, enabling business units to respond swiftly to changing requirements without relying on a centralized team.

3. A Centralized Architecture: Within an Enterprise Data Warehouse (EDW) or Data Lake (DL)

This structured environment simplifies data engineering and ensures a consistent and centralized semantic layer specifically for analytics and BI use cases.

Architecture

Organizations that are looking to create a single, unified representation of their core organizational domains develop a semantic layer architecture that serves as the authoritative source for shared data definitions and business logic within a centralized architecture – particularly within an Enterprise Data Warehouse or Data Lake. This model makes it easier to build the semantic layer since data is already in one place, and analytics solutions that are using cloud-based data warehousing platforms (e.g., Amazon Redshift, Google BigQuery, Snowflake, Azure Blob Storage, Databricks, etc.) can serve as a “centralized” location for semantic layer components.

Building a semantic layer within an EDW/DL involves consolidating and ingesting data from various sources into a centralized repository, identifying key data sources to be ingested, defining business terms, establishing relationships between different datasets, and mapping the semantic layer to the underlying data structures to create a unified and standardized interface for data access.

Pros

This model architecture is a common implementation approach we support specifically within a dedicated team of data management, data analytics, and BI groups that are consistently ingesting data, setting the implementation processes for changes to data structures, and enforcing business rules through dedicated pipelines (ETL/APIs) for governance across enterprise data.

Cons

The core consideration here (that usually suffers) is collaboration between business and data teams that is pivotal during the implementation process, guides investment in the right tools and solutions that have semantic modeling capabilities, and supports the creation of a semantic layer within this centralized landscape.

It is important to ensure that the semantic layer reflects the actual needs and perspectives of end users. Regular feedback loops and iterative refinements are essential for creating a model that evolves with the dynamic nature of business requirements. Adopting these solutions within this environment will enable the effective definition of business concepts, hierarchies, and relationships, allowing for translation of technical data into business-friendly terms.

Another important aspect with this type of centralized model is that it is dependent on data that is consolidated or co-located and requires upfront investment in terms of resources and time to design and implement the layer comprehensively. As such, it’s important to start small by focusing on specific business use cases, the relevant scope of knowledge/data sources and foundational models that are highly visible, and focused on business outcomes. This will allow the organization to create a foundational model that will expand across the rest of the organization’s data and knowledge assets, incrementally.

Best Suited For

We have seen this approach being particularly beneficial for large enterprises with complex but shared data requirements and that have the need for stringent knowledge and data governance and compliance rules – specifically, organizations that produce data products and need to control the data and knowledge assets that are shared internally or externally on a regular basis. This includes, but is not limited to, financial institutions, healthcare organizations, bioengineering firms, and retail companies.

Closing

A well-implemented semantic layer is not merely a technical necessity but a strategic asset for organizations aiming to harness the full potential of their knowledge and data assets, as well as have the right foundations in place to make AI efforts successful. The choice of how to architect and implement a semantic layer depends on the specific needs, size, and structure of the organization. When considering this solution, the core decision really comes down to striking the right balance between standardization and flexibility, in order to ensure that your semantic layer serves as an effective enabler for knowledge-driven decision making across the organization.

Organizations that invest in an enterprise architecture through the metadata layer and those that rely on experts with modeling experience that are anchored in semantic web standards find it the most flexible and scalable approach. As such, they are better positioned to abstract their data from vendor lock and ensure interoperability to navigate the complexities of today’s technologies and future evolutions.

When embarking on a semantic layer initiative, not understanding or planning for a solid technical architecture and phased implementation approach leads to unplanned investments or failure for many organizations. If you are looking to get started and learn more about how other organizations are approaching scale, read more from our case studies or contact us if you have specific questions.

The post The Top 3 Ways to Implement a Semantic Layer appeared first on Enterprise Knowledge.

Knowledge Graph Use Cases are Priceless

Lulit Tesfaye — Wed, 30 Nov 2022 15:48:47 +0000

At Knowledge Graph Forum 2022, Lulit Tesfaye, Partner and Division Director, and Sara Nash, Senior Consultant, presented on the importance of establishing valuable and actionable use cases for knowledge graph efforts. The talk was on September 29, 2022 in New York City.

Tesfaye and Nash drew on lessons learned from several knowledge graph development efforts to define how to diagnose a bad use case and outlined their impact on initiatives – including strained relationships with stakeholders, time spent reworking priorities, and team turnover. They also share guidance on how to navigate these scenarios and provide a checklist to assess a strong use case.

The post Knowledge Graph Use Cases are Priceless appeared first on Enterprise Knowledge.

Translating AI from Concept to Reality: Five Keys to Implementing AI for Knowledge, Content, and Data

Lulit Tesfaye — Thu, 14 Apr 2022 14:00:41 +0000

Lulit Tesfaye, a Partner and Director for Enterprise Knowledge’s Data and Information Management Division, presented on April 07, 2022 at the data.world Spring Virtual Summit 2022 on the topic of Translating AI from Concept to Reality: Five Keys to Implementing AI for Knowledge, Content, and Data. In this presentation, Tesfaye explains how foundational knowledge management and knowledge engineering approaches can play a key role in ensuring enterprise Artificial Intelligence (AI) initiatives start right, quickly demonstrate business value, and “stick” within the organization. The presentation includes real world case studies and examples of how organizations are approaching their data and AI transformations through knowledge maturity models to translate organizational information and data into actionable and clickable solutions.

The post Translating AI from Concept to Reality: Five Keys to Implementing AI for Knowledge, Content, and Data appeared first on Enterprise Knowledge.

EK’s Hilger and Nash Speaking at Data Summit 2022

EK Team — Mon, 04 Apr 2022 18:19:14 +0000

Enterprise Knowledge Chief Operating Officer Joe Hilger and Senior Graph Consultant Sara Nash will be co-presenting at the upcoming Data Summit in Boston, MA. The premiere data management and analytics conference will be from May 16th to 18th featuring workshops, panel discussions, and provocative talks.

Hilger and Nash will lead an in-person workshop, titled “Introduction to Knowledge Graphs”. The interactive session will define what a knowledge graph is, how it is implemented, and how it can be used to increase the value of your organization’s data. Participants will get hands-on experience designing a knowledge graph and the foundational elements required to scale it within an enterprise.

About Data Summit

Data Summit will offer practical advice, inspiring thought leadership, and in-depth training. Participants will hear the innovative approaches the world’s leading companies are taking to solve today’s key challenges in data management.

You can find out more information and register for the conference here.

The post EK’s Hilger and Nash Speaking at Data Summit 2022 appeared first on Enterprise Knowledge.

Transforming Tabular Data into Personalized, Componentized Content using Knowledge Graphs in Python

Kate Erfle — Tue, 22 Mar 2022 13:30:26 +0000

My colleagues Joe Hilger and Neil Quinn recently wrote blogs highlighting the benefits of leveraging a knowledge graph in tandem with a componentized content management system (CCMS) to curate personalized content for users. Hilger set the stage explaining the business value of a personalized digital experience and the logistics of these two technologies supporting one another to create it. Quinn makes these concepts more tangible by processing sample data into a knowledge graph in Python and querying the graph to find tailored information for a particular user. This post will again show the creation and querying of a knowledge graph in Python, however, the same sample data will now be sourced from external CSV files.

A Quick Note on CSVs

CSV files, or comma-separated values files, are widely used to store tabular data. If your company uses spreadsheet applications, such as Microsoft Excel or Google Sheets, or relational databases, then it is likely you have encountered CSV files before. This post will help you use the already existent, CSV-formatted data throughout your company, transform it into a usable knowledge graph, and resurface relevant pieces of information to users in a CCMS. Although this example uses CSV files as the tabular dataset format, the same principles apply to Excel sheets and SQL tables alike.

Aggregating Data

The diagram below is a visual model of the knowledge graph we will create from data in our example CSV files.

In order to populate this graph, just as in Quinn’s blog, we will begin with three sets of data about:

Customers and the products they own
Products and the parts they are composed of
Parts and the actions that need to be taken on them

This information is stored in three CSV files, Customer_Data.csv, Product_Data.csv and Part_Data.csv:

Customers

Customer ID	Customer Name	Owns Product
1	Stephen Smith	Product A
2	Lisa Lu	Product A

Products

Product ID	Product Name	Composed of
1	Product A	Part X
1	Product A	Part Y
1	Product A	Part Z

Parts

Part ID	Part Name	Action
1	Part X
2	Part Y
3	Part Z	Recall

To create a knowledge graph from these tables, we will need to

Read the data tables from our CSV files into DataFrames (an object representing a 2-D data structure, such as a spreadsheet or table)
Transform the DataFrames into RDF triples and add them to the graph

In order to accomplish these two tasks, we will be utilizing two Python libraries. Pandas, a data analysis and manipulation library, will help us serialize our CSV files into DataFrames and rdflib, a library for working with RDF data, will allow us to create RDF triples from the data in our DataFrames.

Reading CSV Data

This first task is quite easy to accomplish using pandas. Pandas has a read_csv method for ingesting CSV data into a DataFrame. For this use case, we only need to provide two parameters: the CSV’s file path and the number of rows to read. To read the Customers table from our Customer_Data.csv file:

import pandas as pd

customer_table = pd.read_csv("Customer_Data.csv", nrows=2)

The value of customer_table is:

       Customer ID      Customer Name     Owns Product
0                1      Stephen Smith        Product A
1                2            Lisa Lu        Product A

We repeat this process for the Products and Parts files, altering the filepath_or_buffer and nrows parameters to reflect the respective file’s location and table size.

Tabular to RDF

Now that we have our tabular data stored in DataFrame variables, we are going to use rdflib to create subject-predicate-object triples for each column/row entry in the three DataFrames. I would recommend reading Quinn’s blog prior to this one as I am following the methods and conventions that he explains in his post.

Utilizing the Namespace module will provide us a shorthand for creating URIs, and the create_eg_uri method will url-encode our data values.

from rdflib import Namespace
from urllib.parse import quote

EG = Namespace("http://example.com/")

def create_eg_uri(name: str) -> URIRef:
    """Take a string and return a valid example.com URI"""
    quoted = quote(name.replace(" ", "_"))
    return EG[quoted]

The columns in our data tables will need to be mapped to predicates in our graph. For example, the Owns Product column in the Customers table will map to the http://example.com/owns predicate in our graph. We must define the column to predicate mappings for each of our tables before diving into the DataFrame transformations. Additionally, each mapping object contains a “uri” field which indicates the column to use when creating the unique identifier for an object.

customer_mapping = {
    "uri": "Customer Name",
    "Customer ID": create_eg_uri("customerId"),
    "Customer Name": create_eg_uri("customerName"),
    "Owns Product": create_eg_uri("owns"),
}

product_mapping = {

    "uri": "Product Name",
    "Product ID": create_eg_uri("productId"),
    "Product Name": create_eg_uri("productName"),
    "Composed of": create_eg_uri("isComposedOf"),
}

part_mapping = {

    "uri": "Part Name",
    "Part ID": create_eg_uri("partId"),
    "Part Name": create_eg_uri("partName"),
    "Action": create_eg_uri("needs"),
}

uri_objects = ["Owns Product", "Composed of", "Action"]

The uri_objects variable created above indicates which columns from the three data tables should have their values parsed as URI References, rather than Literals. For example, Composed of maps to a Part object. We want to make the object in the triple EG:Product_A EG:isComposedOf a URI pointing to/referencing a particular Part, not just the string name of the Part. Whereas the Product Name column creates triples such as EG:Product_A EG:productName “name” and “name” is simply a string, i.e. a Literal, and not a reference to another object.

Now, using all of the variables and methods declared above, we can begin the translation from DataFrame to RDF. For the purposes of this example, we create a global graph variable and a reusable translate_df_to_rdf function which we will call for each of the three DataFrames. With each call to the translate function, all triples for that particular table are added to the graph.

from rdflib import URIRef, Graph, Literal
import pandas as pd

graph = Graph()

def translate_df_to_rdf(customer_data, customer_mapping):
    # Counter variable representing current row in the table
    i = 0
    num_rows = len(customer_data.index)

    # For each row in the table
    while i < num_rows:
        # Create URI subject for triples in this row using ‘Name’ column
        name = customer_data.loc[i, customer_mapping["uri"]]
        row_uri = create_eg_uri(name)

        # For each column/predicate mapping in mapping dictionary
        for column_name, predicate in customer_mapping.items():

            # Grab the value at this specific row/column entry
            value = customer_data.loc[i, column_name]

            # Strip extra characters from value
            if isinstance(value, str):
                value = value.strip()

            # Check if the value exists
            if not pd.isnull((value)):
                # Determine if object should be a URI or Literal
                if column_name in uri_objects:
                    # Create URI object and add triple to graph
                    uri_value = create_eg_uri(value)
                    graph.add((row_uri, predicate, uri_value))
                else:
                    # Create Literal object and add triple to graph
                    graph.add((row_uri, predicate, Literal(value)))
        i = i + 1

Querying the Graph

In this case, we make three calls to translate_df_to_rdf:

translate_df_to_rdf(customer_data, customer_mapping)
translate_df_to_rdf(product_data, product_mapping)
translate_df_to_rdf(part_data, part_mapping)

Now that our graph is populated with the Customers, Products, and Parts data, we can query it for personalized content of our choosing. So, if we want to find all customers who own products that are composed of parts that need a recall, we can create and use the same query from Quinn’s previous blog:

sparql_query = """SELECT ?customer ?product
WHERE {
  ?customer eg:owns ?product .
  ?product eg:isComposedOf ?part .
  ?part eg:needs eg:Recall .
}"""

results = graph.query(sparql_query, initNs={"eg": EG})
for row in results:
    print(row)

As you would expect, the results printed in the console are two ?customer ?product pairings:

(rdflib.term.URIRef('http://example.com/Stephen_Smith'), rdflib.term.URIRef('http://example.com/Product_A'))
(rdflib.term.URIRef('http://example.com/Lisa_Lu'), rdflib.term.URIRef('http://example.com/Product_A'))

Summary

By transforming our CSV files into RDF triples, we created a centralized, connected graph of information, enabling the simple retrieval of very granular and case-specific data. In this case, we simply traversed the relationships in our graph between Customers, Products, Parts, and Actions to determine which Customers needed to be notified of a recall. In practice, these concepts can be expanded to meet any personalization needs for your organization.

Knowledge Graphs are an integral part of serving up targeted, useful information via a Componentized Content Management System, and your organization doesn’t need to start from scratch. CSVs and tabular data can easily be transformed into RDF and aggregated as the foundation for your organization’s Knowledge Graph. If you are interested in transforming your data into RDF and want help planning or implementing a transformation effort, contact us here.

The post Transforming Tabular Data into Personalized, Componentized Content using Knowledge Graphs in Python appeared first on Enterprise Knowledge.

Content Personalization with Knowledge Graphs in Python

Neil Quinn — Mon, 14 Feb 2022 15:00:14 +0000

In a recent blog post, my colleague Joe Hilger described how a knowledge graph can be used in conjunction with a componentized content management system (CCMS) to provide personalized content to customers. This post will show the example data from Hilger’s post being loaded into a knowledge graph and queried to find the content appropriate for each customer, using Python and the rdflib package. In doing so, it will help make these principles more concrete, and help you in your journey towards content personalization.

To follow along, a basic understanding of Python programming is required.

Aggregating Data

Hilger’s article shows the following visualization of a knowledge graph to illustrate how the graph connects data from many different sources and encodes the relationship between them.

To show this system in action, we will start out with a few sets of data about:

Customers and the products they own
Products and the parts they are composed of
Parts and the actions that need to be taken on them

In practice, this information would be pulled from the sales tracking, product support, and other systems it lives in via APIs or database queries, as described by Hilger.

customers_products = [
    {"customer": "Stephen Smith", "product": "Product A"},
    {"customer": "Lisa Lu", "product": "Product A"},
]

products_parts = [
    {"product": "Product A", "part": "Part X"},
    {"product": "Product A", "part": "Part Y"},
    {"product": "Product A", "part": "Part Z"},
]
parts_actions = [{"part": "Part Z", "action": "Recall"}]

We will enter this data into a graph as a series of subject-predicate-object triples, each of which represents a node (the subject) and its relationship (the predicate) to another node (the object). RDF graphs use uniform resource identifiers (URIs) to provide a unique identifier for both nodes and relationships, though an object can also be a literal value.

Unlike the traditional identifiers you may be used to in a relational database, URIs in RDF always use a URL format (meaning they begin with http://), although a URI is not required to point to an existing website. The base part of this URI is referred to as a namespace, and it’s common to use your organization’s domain as part of this. For this tutorial we will use http://example.com as our namespace.

We also need a way to represent these relationship predicates. For most enterprise RDF knowledge graphs, we start with an ontology, which is a data model that defines the types of things in our graph, their attributes, and the relationships between them. For this example, we will use the following relationships:

Relationship	URI
Customer’s ownership of a product	http://example.com/owns
Product being composed of a part	http://example.com/isComposedOf
Part requiring an action	http://example.com/needs

Note the use of camelCase in the name – for more best practices in ontology design, including how to incorporate open standard vocabularies like SKOS and OWL into your graph, see here.

The triple representing Stephen Smith’s ownership of Product A in rdflib would then look like this, using the URIRef class to encode each URI:

from rdflib import URIRef

triple = (
    URIRef("http://example.com/Stephen_Smith"),
    URIRef("http://example.com/owns"),
    URIRef("http://example.com/Product_A"),
)

Because typing out full URLs every time you want to add or reference a component of a graph can be cumbersome, most RDF-compliant tools and development resources provide some shorthand way to refer to these URIs. In rdflib that’s the Namespace module. Here we create our own namespace for example.com, and use it to more concisely create that triple:

from rdflib import Namespace

EG = Namespace("http://example.com/")

triple = (EG["Stephen_Smith"], EG["owns"], EG["Product_A"])

We can further simplify this process by defining a function to transform these strings into valid URIs using the quote function from the urlparse module:

from urllib.parse import quote

def create_eg_uri(name: str) -> URIRef:
    """Take a string and return a valid example.com URI"""
    quoted = quote(name.replace(" ", "_"))
    return EG[quoted]

Now, let’s create a new Graph object and add these relationships to it:

from rdflib import Graph

graph = Graph()

owns = create_eg_uri("owns")
for item in customers_products:
    customer = create_eg_uri(item["customer"])
    product = create_eg_uri(item["product"])
    graph.add((customer, owns, product))

is_composed_of = create_eg_uri("isComposedOf")
for item in products_parts:
    product = create_eg_uri(item["product"])
    part = create_eg_uri(item["part"])
    graph.add((product, is_composed_of, part))

needs = create_eg_uri("needs")
for item in parts_actions:
    part = create_eg_uri(item["part"])
    action = create_eg_uri(item["action"])
    graph.add((part, needs, action))

Querying the Graph

Now we are able to query the graph, in order to find all of the customers that own a product containing a part that requires a recall. To do this, we’ll construct a query in SPARQL, the query language for RDF graphs.

SPARQL has some features in common with SQL, but works quite differently. Instead of selecting from a table and joining others, we will describe a path through the graph based on the relationships each kind of node has to another:

sparql_query = """SELECT ?customer ?product
WHERE {
  ?customer eg:owns ?product .
  ?product eg:isComposedOf ?part .
  ?part eg:needs eg:Recall .
}"""

The WHERE clause asks for:

Any node that has an owns relationship to another – the subject is bound to the variable ?customer and the object to ?product
Any node that has an isComposedOf relationship to the ?product from the previous line, the subject of which is then bound to ?part
Any node where the object has a needs relationship to an object which is a Recall.

Note that we did not at any point tell the graph which of the URIs in our graph referred to a customer. By simply looking for any node that owns something, we were able to find the customers automatically. If we had a requirement to be more explicit about typing, we could add triples to our graph describing the type of each entity using the RDF type relationship, then refer to these in the query.

We can then execute this query against the graph, using the initNs argument to map the “eg:” prefixes in the query string to our example.com namespace, and print the results:

results = graph.query(sparql_query, initNs={"eg": EG})
 
for row in results:
    print(row)

This shows us the URIs for the affected customers and the products they own:

(rdflib.term.URIRef('http://example.com/Stephen_Smith'), rdflib.term.URIRef('http://example.com/Product_A'))
(rdflib.term.URIRef('http://example.com/Lisa_Lu'), rdflib.term.URIRef('http://example.com/Product_A'))

These fields could then be sent back to our componentized content management system, allowing it to send the appropriate recall messages to those customers!

Summary

The concepts and steps described in this post are generally applicable to setting up a knowledge graph in any environment, whether in-memory using Python or Java, or with a commercial graph database product. By breaking your organization’s content down into chunks inside a componentized content management system and using the graph to aggregate this data with your other systems, you can ensure that the exact content each customer needs to see gets delivered to them at the right time. You can also use your graph to create effective enterprise search systems, among many other applications.

Interested in best in class personalization using a CCMS plus a knowledge graph? Contact us.

The post Content Personalization with Knowledge Graphs in Python appeared first on Enterprise Knowledge.

Enterprise Knowledge and data.world Partner to Make Knowledge Graphs More Accessible to the Enterprise

EK Team — Thu, 23 Sep 2021 15:16:51 +0000

New Knowledge Graph Accelerator Provides Organizations the Toolset and Capabilities to Make Enterprise AI a Reality.

Enterprise Knowledge (EK), the world’s largest dedicated knowledge and information management consulting firm, announced the launch of the Knowledge Graph Accelerator today, a mechanism to establish an organization’s first knowledge graph solution in a matter of weeks. In partnership with data.world, the knowledge graph-based enterprise data catalog, organizations will be able to rapidly unlock use cases such as Employee, Product, and Customer 360, Advanced Analytics, and Natural Language Search.

“Knowledge Graphs are a critical component necessary to achieve Enterprise AI, but most organizations need a quick and scalable way to understand and experience the value,” said Lulit Tesfaye, Practice Lead of Data and Information Management at EK. “EK, in partnership with data.world, is creating a holistic solution to make building Enterprise AI intuitive using knowledge graphs, from data modeling and storage to enrichment and governance. Having this end-to-end consistency is critical for the success of knowledge graph products and setting the foundations for Enterprise AI.”

“EK has been at the leading edge of Knowledge Graph strategy, design, and implementation since our inception,” added Zach Wahl, CEO of EK. “Our thought leadership in this field, combined with data.world’s advanced capabilities, creates an exciting opportunity for organizations to feel the impact and realize the benefits quickly and meaningfully.”

Gartner predicts that graph technologies will be leveraged in over 80% of innovations in data and analytics by 2025, but many organizations find the business and technical complexities of graph design and implementation to be daunting. The Knowledge Graph Accelerator addresses the need to develop a practical, standards-based roadmap and prototype to quickly realize the potential of knowledge graphs.

Through the Knowledge Graph Accelerator, organizations will get the following outcomes in less than 2 months:

An understanding of the foundations of knowledge graphs, including graph data modeling, data mapping, and data management;
A first implementable version (FIV) knowledge graph that can be scaled and enhanced;
A pilot version of your graph solution leveraging the knowledge graph-based data management solution data.world and gra.fo; and
A strategy for your organization to make Enterprise AI a reality.

“Enterprises need to understand and trust the data powering their analytics while generating meaningful insights. But supporting different data sources and use cases, while analyzing and traversing changes to metadata and automating relationships can be challenging,” said Dr. Juan Sequeda, Principal Scientist at data.world. “Knowledge graphs are foundational for an effective and future proof data catalog, as well for next generation AI and analytics .”

To learn more, explore our approach and what your organization will get through the Knowledge Graph Accelerator. Also, reach out to Enterprise Knowledge to learn how to unlock the use cases that are most valuable to your enterprise.

On September 29th, 2021, Enterprise Knowledge will participate in the virtual data.world fall summit. Additional keynote speakers include Zhamak Dehghani, Barr Moses, Doug Laney, and Jon Loyens.

About Enterprise Knowledge

Enterprise Knowledge (EK) is a services firm that integrates Knowledge Management, Information and Data Management, Information Technology, and Agile Approaches to deliver comprehensive solutions. Our mission is to form true partnerships with our clients, listening and collaborating to create tailored, practical, and results-oriented solutions that enable them to thrive and adapt to changing needs. At the heart of these services, we always focus on working alongside our clients to understand their needs, ensuring we can provide practical and achievable solutions on an iterative, ongoing basis. Visit enterprise-knowledge.com to see how optimizing your knowledge and data management will impact your organization.

About data.world

data.world is the enterprise data catalog for the modern data stack. Our cloud-native SaaS (software-as-a-service) platform combines a consumer-grade user experience with a powerful knowledge graph to deliver enhanced data discovery, agile data governance, and actionable insights. data.world is a Certified B Corporation and public benefit corporation and home to the world’s largest collaborative open data community with more than 1.3 million members, including 2/3 of the Fortune 500. Our company has 40 patents and has been named one of Austin’s Best Places to Work six years in a row. Follow us on LinkedIn, Twitter, and Facebook, or join us.

The post Enterprise Knowledge and data.world Partner to Make Knowledge Graphs More Accessible to the Enterprise appeared first on Enterprise Knowledge.