Data Modeling Articles - Enterprise Knowledge

Semantic Layer Strategy for Linked Data Investigations

EK Team — Thu, 08 May 2025 15:08:08 +0000

The Challenge

A government organization sought to more effectively exploit their breadth of data generated by investigation activity of criminal networks for comprehensive case building and threat trend analysis. The agency struggled to meaningfully connect structured and unstructured data from multiple siloed data sources, each with misaligned naming conventions and inconsistent data structures and formats. Users had to have an existing understanding of underlying data models and jump between multiple system views to answer core investigation analysis questions, such as “What other drivers have been associated with this vehicle involved in an inspection at the border?” or “How often has this person in the network traveled to a known suspect storage location in the past 6 months?”

These challenges manifest in data ambiguity across the organization, complex and resource-intensive integration workflows, and underutilized data assets lacking meaningful context, all resulting in significant cognitive load and burdensome manual efforts for users conducting intelligence analyses. The organization recognized the need to define a robust semantic layer solution grounded in data modeling, architecture frameworks, and governance controls to unify, contextualize, and operationalize data assets via a “single pane of intelligence” analysis platform.

The Solution

To address these challenges, EK engaged with the client to develop a strategy and product vision for their semantic solution, paired with foundational semantic data models for meaningful data categorization and linking, architecture designs and tool recommendations for integrating and leveraging graph data, and entitlements designs for adhering to complex security standards. With phased implementation plans for incremental delivery, these components lay the foundations for the client’s solution vision for advanced entity resolution and analytics capabilities. The overall solution will power streamlined consumption experiences and data-driven insights through the “single pane of intelligence.”

The core components of EK’s semantic advisory and solution development included:

Product Vision and Use Case Backlog:
EK collaborated with the client to shape a product vision anchored around the solution’s purpose and long-term value for the organization. Complemented with a strategic backlog of priority use cases, EK’s guidance resulted in a compelling narrative to drive stakeholder engagement and organizational buy-in, while also establishing a clear and tangible vision for scalable solution growth.

Solution Architecture Design:
EK’s solution architects gathered technical requirements to propose a modular solution architecture consisting of multiple, self-contained technology products that will provision a comprehensive analytic ecosystem to the organization’s user base. The native graph architecture involves a graph database, entity resolution services, and a linked data analysis platform to create a unified, interactive model of all of their data assets via the “single pane of intelligence.”

Tool Selection Advisory:
EK guided the client on selecting and successfully gaining buy-in for procurement of a graph database and a data analysis and visualization platform with native graph capabilities to plug into the semantic and presentation layers of the recommended architecture design. This selection moves the organization away from a monolithic, document-centric platform to a data-centric solution for dynamic intelligence analysis in alignment with their graph and network analytics use cases. EK’s experts in unified entitlements and industry security standards also ensured the selected tooling would comply with the client’s database, role, and attribute-based access control requirements.

Taxonomy and Ontology Modeling:
In collaboration with intelligence subject matter experts, EK guided the team from a broad conceptual model to an implementable ontology and starter taxonomy designs to enable a specific use case for prioritized data sources. EK advised on mapping the ontology model to components of the Common Core Ontologies to create a standard, interoperable foundation for consistent and scalable domain expansion.

Phased Implementation Plan:
Through dedicated planning and solutioning sessions with the core client team, EK developed an iterative implementation plan to scale the foundational data model and architecture components and unlock incremental technical capabilities. EK advised on identifying and defining starter pilot activities, outlining definitions of done, necessary roles and skillsets, and required tasks and supporting tooling from the overall architecture to ensure the client could quickly start on solution implementation. EK is directly supporting the team on the short-term implementation tasks while continuing to advise and plan for the longer-term solution needs.

The EK Difference

Semantic Layer Solution Strategy:
EK guided the client in transforming existing experimental work in the knowledge graph space into an enterprise solution that can scale and bring tangible value to users. From strategic use case development to iterative semantic model and architecture design, EK provided the client with repeatable processes for defining, shaping, and productionalizing components of the organization’s semantic layer.

LPG Analytics with RDF Semantics:
To support the client’s complex and dynamic analytics needs, EK recommended an LPG-based solution for its flexibility and scalability. At the same time, the client’s need for consistent data classification and linkage still pointed to the value of RDF frameworks for taxonomy and ontology development. EK is advising on how to bridge these models for the translation and connectivity of data across RDF and LPG formats, ultimately enabling seamless data integration and interoperability in alignment with semantic standards.

Semantic Layer Tooling:
EK has extensive experience advising on the evaluation, selection, procurement, and scalable implementation of semantic layer technologies. EK’s qualitative evaluation for the organization’s linked data analysis platforms was supplemented by a proprietary structured matrix measuring down-selected tools against 50+ functional and non-functional factors to provide a quantitative view of each tool’s ability to meet the organization’s specific needs.

Semantic Modeling and Scalable Graph Development:
Working closely with the organization’s domain experts, EK provided expert advisory in industry standards and best practices to create a semantic data model that will maximize graph benefits in the context of the client’s use cases and critical data assets. In parallel with model development, EK offered technical expertise to advise on the scalability of the resulting graph and connected data pipelines to support continued maintenance and expansion.

Unified Entitlements Design:
Especially working with a highly regulated government agency, EK understands the critical need for unified entitlements to provide a holistic definition of access rights, enabling consistent and correct privileges across every system and asset type in the organization. EK offered comprehensive entitlements design and development support to ensure access rights would be properly implemented across the client’s environment, closely tied to the architecture and data modeling frameworks.

Organizational Buy-In:
Throughout the engagement, EK worked closely with project sponsors to craft and communicate the solution product vision. EK tailored product communication components to different audiences by detailing granular technical features for tool procurement conversations and formulating business-driven, strategic value statements to engage business users and executives for organizational alignment. Gaining this buy-in early on is critical for maintaining development momentum and minimizing future roadblocks as wider user groups transition to using the productionalized solution.

The Results

With initial core semantic models, iterative solution architecture design plans, and incremental pilot modeling and engineering activities, the organization is equipped to stand up key pieces of the solution as they procure the graph analytics tooling for continued scale. The phased implementation plan provides the core team with tangible and achievable steps to transition from their current document-centric ways of working to a truly data-centric environment. The full resulting solution will facilitate investigation activities with a single pane view of multi-sourced data and comprehensive, dynamic analytics. This will streamline intelligence analysis across the organization with the enablement of advanced consumption experiences such as self-service reporting, text summarization, and geospatial network analysis, ultimately reducing the cognitive load and manual efforts users currently face in understanding and connecting data. EK’s proposed strategy has been approved for implementation, and EK will publish the results from the MVP development as a follow-up to this case study.

Download Flyer

Ready to Get Started?

Get in Touch

The post Semantic Layer Strategy for Linked Data Investigations appeared first on Enterprise Knowledge.

GovData Innovations 2023

EK Team — Tue, 07 Mar 2023 13:19:26 +0000

Enterprise Knowledge is excited to be hosting the upcoming GovData Innovations Summit, where we will convene industry thought leaders and government executives to discuss the latest trends and best practices driving innovation in data, including knowledge graphs, semantics, and metadata hubs.
This is an invite-only event. To express interest or to request an advance or follow-up briefing, please email or use our contact form:

govdata@enterprise-knowledge.com

Special thanks to our cosponsors for this event, three market leaders in advanced data and knowledge solutions, data.world, Neo4j, and Squirro.

This event will be held on March 16th from 6pm to 9pm at the National Press Club in Washington, DC.

Get Directions

The Keynote Speaker for GovData Innovations 2023 on March 16th in Washington, DC will be Zach Whitman, Chief Data Officer for the U.S Census Bureau.

Zachary Whitman has been leading the Census Bureau’s data transformation and fulfilling the agency’s strategic objectives for improved stakeholder engagement through more relevant data, crafting a data culture of innovation, and modernizing the government’s largest data portal to enhance user experience and data reliability.

In addition to the keynote, Anthony Zech, Data and AI Community of Excellence Director at ECS will sit down with EK’s Chief Operating Officer Joe Hilger to discuss, “Dream Data Mesh Model – Creating Solutions to Define What.”

Anthony (Tony) Zech is the Data and AI Community of Excellence Director at ECS and a Marine Reservist. Tony Zech oversees ECS’s Community of Practice for Data and AI and supports ECS’s many customers with advanced data management solutions. Most recently, Tony presented “Building the AI-Powered Information Enterprise at last February’s Potomac Officers Club Artificial Intelligence Summit.

Agenda:

7:00 – 7:05 pm Opening Remarks by Zach Wahl (EK CEO) and Joe Hilger (EK COO)
7:05 – 7:20 pm Keynote – Zach Whitman (CDO, US Census Bureau)
7:25 – 7:45 pm Fireside Chat – Joe Hilger and Tony Zech (Data and AI Community of Excellence Director, ECS)
7:50 – 8:10 pm Industry Panel – Joe Hilger, Michael Moore (Neo4j), Juan Sequeda (Data.World), and Tim Murphy (Esri)
8:10 – 9:00 pm – Networking

Attendees will:

Learn from the experts about modern data architectures like Semantic Data Stacks, Knowledge Graphs, Data Mesh, and Data Fabrics
Leveraging the Private Sector for Next Generation Data solutions
Find out about the newest trends in Data and Infrastructure to Follow or Avoid in 2023

To request an advance or follow-up briefing, please email:
govdata@enterprise-knowledge.com

The post GovData Innovations 2023 appeared first on Enterprise Knowledge.

AI Beyond a Prototype

Lulit Tesfaye — Tue, 11 May 2021 16:00:36 +0000

How to take an AI Project Beyond a Prototype

Before going “all in,” we often advise our clients to first understand and quickly validate the value proposition for adopting advanced Artificial Intelligence (AI) and Machine learning (ML) solutions within their organization by engaging in a beyond AI project prototype or pilot. Conducting such targeted experimentations not only provides the enterprise with a safe way to validate that AI and ML solutions will solve real problems, but also provides a design foundation for key AI elements required for their roadmap and supports long-term change management by showing immediate incremental benefits and developing interest.

Without the appropriate guidance and strategy, AI efforts may still get stalled right after a prototype or proof of concept, regardless of how successful these initial efforts may have been.

Although 84% of executives see the value and agree that they need to integrate and scale AI within their business processes, only 16% of them say that they have actually moved beyond the experimentation phase.

Mainly informed by the diverse set of organizational realities and AI projects we have delivered, below I will explore the common themes I see when it comes to potential roadblocks in moving from prototype to enterprise, and provide a selection of approaches that I have found helpful in scaling enterprise AI efforts.

1. Understand that AI projects have unique life cycles

In software delivery, Agile and DevOps continue to serve as successful frameworks for allowing iterative delivery, getting the product closer to the end user or customer and ultimately delivering immediate value. However, Enterprise AI efforts have surfaced the need to revisit Agile delivery within the context of AI and ML processes. What this means for the sponsoring organization and the teams involved is that any project management or delivery approach that is employed will need to balance the predictable nature of software programming with facilitation and ongoing education about expected machine outcomes for the end-user and subject matter expert (SME), while balancing the random number of experimental data ingestion and model refinement required for AI deliverables.

Enterprise AI projects typically have a number of workstreams or task areas that need to be at play, in parallel. These include use case definition, information architecture, data mapping and modeling, integration and pipeline development, the data science side of things where there are multiple Machine Learning (ML) processes running, and, of course, the software engineering aspect that is required to connect with downstream or upstream applications that will render the solution to end users. With all these variables at play, the following approaches help to build a more AI-centric delivery framework:

Sprints for data teams are different: While software programming or development is focused on predefined applications or features, the primary focuses for data science and machine learning tasks are analysis, modeling, cleaning, and exploration. Meaning, the data is the center of the universe and the exploration process is what determines the outcome or the features being delivered. The results from the machine and data exploration phase could result in the project having to loop back to the planning phase. As such, the data workstream doesn’t necessarily need to be within or work through the same sprint as the development team.

AI Project Delivery Iterations

Embed research or “spike” sprints to create room for understanding and data exploration: Unlike humans, machines need to go through diverse sets of data to understand the context within which it is being applied at your organization (a knowledge graph significantly helps in this process) and align it to your expected results. This process requires stages of understanding, analysis, and research to identify relevant data. Do your AI projects plan for this research?
Embrace testing and quality assurance (QA) from the start: Testing in AI/ML is not limited to the model itself. Ensuring the data quality stays sufficient to serve use cases and having the right entry point checks in place to detect potential data collection errors is a foundational step before starting the model. Additionally the QA process in AI and ML projects should take into account the ability to test integration points as well as any peripheral systems and processes that serve as inputs or outputs to the model itself. As time goes by, having a proven integration and automation process to continue updating and training the model is another area that will require automation itself.
Prepare for organizational impact: When it comes down to implementation, some projects are inherently too big. Imagine replacing legacy applications with AI technology and models, for instance. There needs to be supporting organization-wide processes in place to ensure your model and delivery is supported all the way throughout strategy, implementation, and adoption. There are more players that need to be involved in addition to the project team itself.

2. Know what is really being delivered

For machine learning and AI, the product is the algorithm, or the model, not necessarily the accuracy of the results. Meaning, if the model is right, with the right data, it will deliver the intended results. Otherwise, garbage in, garbage out. Understanding this dynamic is key when defining acceptance criteria and your minimum viable product. Additionally, leveraging UI/UX resources and wireframing sessions facilitates the explanation of what the AI tool really is and sets expectations around what it can help stakeholders achieve before they test the tool.

- AI scope is mostly driven by two factors, use cases and available data: Combining top-down discovery and ideation sessions with end-users and subject matter experts (SMEs) with bottom-up mapping and review of content, data, and systems is a technique we use to narrow down AI/ML opportunities to define the initial delivery requirements. As the project progresses, there will almost always be new developments, findings, and challenges that arise. The key to successful definition of what is really being delivered is building the required flexibility into iteration cycles and update loops for end-users and SMEs to review exploratory results from the data and ML workstream regularly and provide context and domain knowledge to refine the model based on available datasets.
- Plan for diverging stakeholder opinions: Machine learning models are better than a human at browsing through thousands of content items and finding recommendations that organizational SMEs may not have thought of. However, your current data may not necessarily capture the topics or the “aboutness” of how your data is used. Encouraging non-technical stakeholders to provide context by participating in the ideation and the acceptance criteria development process is key. You need SMEs to help create a rich semantic layer that captures key business facts and context. However, your stakeholders or SMEs may have their own tacit knowledge and memory of your organization’s content to say what’s good or bad when it comes to your project results. What if the machine uncovers better content for search results that everyone may have forgotten about? And remember, missing results are not necessarily bad because they can help identify the content or data your organization is currently missing.
- Defining KPIs or ROI for AI projects is an iterative process: It is important to create the ability to ensure the right solution is being developed and is effective. The definition of the use case, acceptance criteria, and gold standard typically serve as introductory benchmarks to determine how to measure impact of the solution and overall success and return. However, as more training data is added, the model is continually updated and can change significantly over time. Thus, it is important to understand that the initial KPIs will usually have assumptions that are validated and updated as the solutions are incrementally developed and tested. It is also critical to have baseline data in order to successfully compare outcomes with ML/AI and without. Because setting KPIs is a journey, it really boils down to planning for and setting up the right governance and monitoring processes to support continuous re-training of the model and measure impact frequently.

3. Plan for ancillary (potentially hidden) costs

This is one of the primary areas where AI projects encounter a reality check. If not planned for, these hidden costs can take many forms and cause significant delays or completely stall projects. The following items are some of the most common items to consider when planning to scale AI efforts:

Size and quality of the right data: AI and ML models learn from lots of training data. The larger the dataset, the better the AI model and results will perform. This required size of data introduces challenges including the need to aggregate and merge data from multiple sources with different security constraints, diverse formats (structured, unstructured, video files, text, images, etc.). This affects where and how your data and AI projects teams spend most of their time i.e., preparing data for analysis as opposed to building models and developing results. One of the most helpful ways to make such datasets easier to manage is to enhance them with rich, descriptive metadata (see next item) and a data knowledge graph.
Data preparation and labeling (taxonomies / metadata): Most organizations do not have labeled data readily available for effective execution of ML and AI projects. If not planned for or staffed properly, the majority of your resources will be spent in annotating or labeling training data. Because this step requires domain knowledge and the use of standards and best practices in knowledge organization systems, organizations will have to invest in formal and standardized semantic experts and hybrid automation in order to maintain quality and consistency data across sources.
Licenses and tools: The most common misconceptions for Enterprise AI implementations and why many AI projects fail starts with the assumption that AI is a “Single Technology” solution. Organizations looking to “plug-and-play AI” or who want to experiment with a variety of open source tools need to reset their expectations and plan for the requirements and actual cost using these tools as costs can add up quickly. AI solutions range from data management and orchestration capabilities to employing a solution for metadata storage, and depending on the use case, the ability to push ML model results to upstream or downstream applications.
Project team expertise (or lack thereof): Experienced data scientists are required to effectively handle most of the machine learning and AI projects, especially when it comes to defining the success criteria, final delivery, scale, and continuous improvement of the model. Overlooking this foundational need could result in even more costly outcomes, or wasted efforts after producing misleading results or results that aren’t actionable or insightful.

Closing

The approach to enable rapid delivery of AI and its adoption continue to evolve. However, the challenges with scale still remain attributed to many factors including selecting the right project management and delivery framework, acquiring the right solutions, instituting the foundational data management and governance practices, and finding, hiring, and retaining people with the right skill sets. And ultimately, enterprise leaders need to understand how AI and Machine Learning work and what AI really delivers for the organization. The good news is that if built with the right foundations, a given AI solution can be reusable for multiple use cases, connect diverse data sources, cross organizational silos, and continue to deliver on the hype.

How’s your organization tracking? Find out if your organization has the right foundations to take AI to production or email us to learn more about our experience and how we can help.

The post AI Beyond a Prototype appeared first on Enterprise Knowledge.

RDF*: What is it and Why do I Need it?

EK Team — Fri, 24 Jul 2020 16:24:24 +0000

RDF* (pronounced RDF star) is an extension to the Resource Description Framework (RDF) that enables RDF graphs to more intuitively represent complex interactions and attributes through the implementation of embedded triples. This allows graphs to capture relationships between more than two entities, add metadata to existing relationships, and add provenance information to all triples, reducing the burden of maintenance.

But let’s back up…before we talk about RDF*, let’s cover the basics — what is RDF, and how is RDF* different from RDF?

What is RDF?

The Resource Description Framework (RDF) is a semantic web standard used to describe and model information for web resources or knowledge management systems. RDF consists of “triples,” or statements, with a subject, predicate, and object that resemble an English sentence.

For example, take the English sentence: “Bess Schrader is employed by Enterprise Knowledge.” This sentence has:

A subject: Bess Schrader
A predicate: is employed by
An object: Enterprise Knowledge

Bess Schrader and Enterprise Knowledge are two entities that are linked by the relationship is employed by. An RDF triple representing this information would look like this:

(There are many ways, or serializations, to represent RDF. In this blog, I’ll be using the Turtle syntax because it’s easy to read, but this information could also be shown in RDF/XML, JSON for Linking Data, and other formats.)

The World Wide Web Consortium (W3C) maintains the RDF Specification, making it easy for applications and organizations to develop RDF data in an interoperable way. This means if you create RDF data in one tool and share it with someone else using a different RDF tool, they will still be able to easily use your data. This interoperability allows you to build on what’s already been done — you can combine your enterprise knowledge graph with established, open RDF datasets like Wikidata, jump starting your analytic capabilities. This also makes data sharing and migration between internal RDF systems simple, enabling you to unify data and reducing your dependency on a single tool or vendor.

For more information on RDF and how it can be used, check out Why a Taxonomist Should Know SPARQL.

What are the limitations of RDF (Why is RDF* necessary)?

Standard RDF has many strengths:

Like most graph models, it more intuitively captures the way we think about the world as humans (as networks, not as tables), making it easier to design, capture, and query data.
As a standard supported by the W3C, it allows us to create interoperable data and systems, all using the same standard to represent and encode data.

However, it has one key weakness: because RDF is based on triples, standard RDF can only connect two objects at a time. For many use cases, this limitation isn’t a problem. Consider my example from above, where I want to represent the relationship between me and my employer:

Simple! However, what if I want to capture the role or position that I hold at this organization? I could add a triple denoting my position:

Great! But what if I decide to add in my (fictional) employment history?

Now it’s unclear whether I was a consultant at Enterprise Knowledge or at Hogwarts.

There are a variety of ways to address this problem in RDF. One of the most popular is reification or n-ary relations, in which you create an intermediary node that allows you to group more than two entities together. For example:

Using this technique allows you to clear up confusion and model the complexity of the world. However, adding these intermediary nodes takes away some of the simplicity of graph data — the idea of an “employment event” isn’t exactly intuitive.

There are many other methods that have been developed to handle this kind of complexity in RDF, including singleton properties and named graphs/quads. Additionally, an entirely different type of non-RDF graph model, labeled property graphs, allows users to attach properties directly to relationships. However, labeled property graphs don’t allow for interoperability at the same scale as RDF — it’s much harder to share and combine different data sets, and moving data from tool to tool isn’t as simple.

None of these solutions retain both of the strengths of RDF: the interoperable standards and the intuitive data model. This crucial limitation of RDF has limited its effectiveness in certain applications, particularly those involving temporal or transactional data.

What is RDF*?

RDF* (pronounced RDF-star) is an extension to RDF that proposes a solution to the weaknesses of RDF mentioned above. As an extension, RDF* supplements RDF but doesn’t replace it.

The main idea behind RDF* is to treat a triple as a single entity. By “nesting” or “embedding” triples, an entire triple can become the subject of a second triple. This allows you to add metadata to triples, assigning attributes to a triple, or creating relationships not just between two entities in your knowledge graph, but between triples and entities, or triples and triples. Take our example from above. In standard RDF, if I want to express past employers and positions, I need to use reification:

In RDF*, I can use nested triples to simply denote the same information:

This eliminates the need for intermediary entities and makes the model easier to understand and implement.

Just as standard RDF can be queried via the SPARQL query language, RDF* can be queried using SPARQL*, allowing users to query both standard and nested triples.

Currently, RDF* is under consideration by the W3C and has not yet been officially accepted as a standard. However, the specification has been formally defined in Foundations of an Alternative Approach to Reification in RDF, and many enterprise tools supporting RDF have added support for RDF* (including BlazeGraph, AnzoGraph, Stardog, and GraphDB ). Hopefully this standard will be formally adopted by the W3C, allowing it to retain and build on the original strengths of RDF: its intuitive model/simplicity and interoperability.

What are the benefits of RDF*?

As you can see above, RDF* can be used to represent relationships that involve more than one entity (e.g. person, role, and organization) in a more intuitive manner than standard RDF. However, RDF* has additional use cases, including:

Adding metadata to a relationship (For example: start dates and end dates for jobs, marriages, events, etc.)

Adding provenance information for triples: I have a triple that indicates Bess Schrader works for Enterprise Knowledge. When did I add this triple to my graph? What was the source of this information? Who added the information to the graph?

Conclusion

On its own, RDF provides an excellent way to create, combine, and share semantic information. Extending this framework with RDF* gives knowledge engineers more flexibility to model complex interactions between multiple entities, attach attributes to relationships, and store metadata about triples, helping us more accurately model the real world while improving our ability to understand and verify where data origins.

Looking for more information on RDF* and how you can leverage it to solve your data challenges? Contact Enterprise Knowledge.

The post RDF*: What is it and Why do I Need it? appeared first on Enterprise Knowledge.

Enterprise AI Readiness Assessment

Lulit Tesfaye — Thu, 02 Jul 2020 14:46:25 +0000

A wide range of organizations have placed AI on their strategic roadmap, with C-levels commonly listing Knowledge AI amongst their biggest priorities. Yet, many are already encountering challenges as a vast majority of AI initiatives are failing to show results, meet expectations, and provide real business value. For these organizations, the setbacks typically originate from the lack of foundation on which to build AI capabilities. Enterprise AI projects too often end up as isolated endeavors, lacking the necessary foundations to support business practices and operations across the organization. So, how can your organization avoid these pitfalls? There are three key questions to ask when developing an Enterprise AI strategy; do you have clear business applications, do you understand the state of our information, and what in house capabilities do you possess?

With our focus and expertise in knowledge, data, and information management, Enterprise Knowledge (EK) developed this proprietary Enterprise Artificial Intelligence (AI) Readiness Assessment in order to enable organizations to understand where they are and where they need to be in order to begin leveraging today’s technologies and AI capabilities for knowledge and data management.

Based on our experience conducting strategic assessments as well as designing and implementing Enterprise AI solutions, we have identified four key factors as the most common indicators and foundations for many organizations in order to evaluate their current capabilities and understand what it takes to invest in advanced capabilities.

This assessment leverages over thirty measurements across these four Enterprise AI Maturity factors as categorized under the following aspects.

1. Organizational Readiness

The foundational requirement for any organization to undergo an Enterprise AI transformation stems from alignment on vision and the business applications and justifications for launching successful initiatives. The Organizational Readiness Factor includes the assessment of appropriate organizational designs, leadership willingness, and mandates that are necessary for success. This factor evaluates topics including:

The need for vision and strategy for AI and its clear application across the organization.
If AI is a strategic priority with leadership support.
If the scope of AI is clearly defined with measurable success criteria.
If there is a sense of urgency to implement AI.

With a clear picture of what your organizational needs are, your Organizational Readiness assessment factor will allow you to determine if your organization meets the requirements to consider AI related initiatives while surfacing and preparing you for potential risks to better mitigate failure.

2. The State of Organizational Data and Content

The volume and dynamism of data and content (structured and/or unstructured) is growing exponentially, and organizations need to be able to securely manage and integrate that information. Enterprise AI requires quality of, and access to, this information. This assessment factor focuses on the extent to which existing structured and unstructured data is in a machine consumable format and the level to which it supports business operations within the enterprise. This factor consider topics including:

The extent to which the organization’s information ecosystems allow for quick access to data from multiple sources.
The scope of organizational content that is structured and in a machine-readable format.
The state of standardized organization of content/data such as business taxonomy and metadata schemes and if it is accurately applied to content.
The existence of metadata for unstructured content.
Access considerations including compliance or technical barriers.

AI needs to learn the human way of thinking and how an organization operates in order to provide the right solutions. Understanding the full state of your current data and content will enable you to focus on the right content/data with the highest business impact and help you develop a strategy to get your data in an organized and accessible format. Without high quality, well organized and tagged data, AI applications will not deliver high-value results for your organization.

3. Skills Sets and Technical Capabilities

With the increased focus on AI, the demand for individuals who have the technical skills to engineer advanced machine learning and intelligent solutions, as well as business knowledge experts who can transform data to a paradigm that aligns with how users and customers communicate knowledge, have both increased. Further, over the years, cloud computing capabilities, web standards, open source training models, and linked open data for a number of industries have emerged to help organizations craft customized Enterprise AI solutions for their business. This means an organization that is looking to start leveraging AI for their business no longer has to start from scratch. This assessment factor evaluates the organization’s existing capabilities to design, management, operate, and maintain an Enterprise AI Solution. Some of the factors we consider include:

The state of existing enterprise ontology solutions and enterprise knowledge graph capabilities that optimize information aggregation and governance.
The existence of auto-classification and automation tools within the organization.
Whether roles and skill sets for advanced data modeling or knowledge engineering are present within the organization.
The availability and capacity to commit business and technical SMEs for AI efforts.

Understanding the current gaps and weaknesses in existing capabilities and defining your targets are crucial elements to developing a practical AI Roadmap. This factor also plays a foundational role in giving your organization the key considerations to ensure AI efforts kick off on the right track, such as leveraging web standards that enable interoperability, and starting with available existing/open-source semantic models and ecosystems to avoid short-term delays while establishing long-term governance and strategy.

4. Change Threshold

The success of Enterprise AI relies heavily on the adoption of new technologies and ways of doing business. Organizations who fail to succeed with AI often struggle to understand the full scope of the change that AI will bring to their business and organizational norms. This usually manifests itself in the form of fear (either of change in job roles or creating wrong or unethical AI results that expose the organization to higher risks). Most organizations also struggle with the understanding that AI requires a few iterations to get it “right”. As such, this assessment factor focuses on the organization’s appetite, willingness, and threshold to understand and tackle the cultural, technical, and business challenges in order to achieve the full benefits of AI. This factor evaluates topics including:

Business and IT interest and desire for AI.
Existence of resource planning for the individuals whose roles will be impacted.
Education and clear communication to facilitate adoption.

The success of any technical solution is highly dependent on the human and culture factor in an organization and each organization has a threshold for dealing with change. Understanding and planning for this factor will enable your organization to integrate change management that addresses the negative implications, avoids unnecessary resistance or weak AI results, and provides the proper navigation through issues that arise.

How it Works

This Enterprise AI readiness assessment and benchmarking leverages the four factors that have over 30 different points upon which each organization can be evaluated and scored. We apply this proprietary maturity model to help assess your Enterprise AI readiness and clearly define success criteria for your target AI initiatives. Our steps include:

Knowledge Gathering and Current State Assessment: We leverage a hybrid model that includes interviews and focus groups, supported by content/data and technology analysis to understand where you are and where you need to be.This gives us a complete understanding of your current strengths and weaknesses across the four factors, allowing us to provide the right recommendations and guidance to drive success, business value, and long-term adoption.
Strategy Development and Roadmapping: Building on the established focus on the assessment factors, we work with you to develop a strategy and roadmap that outlines the necessary work streams and activities needed to achieve your AI goals. It combines our understanding of your organization with proven best practices and methodologies into an iterative work plan that ensures you can achieve the target state while quickly and consistently showing interim business value.
Business Case Development and Alignment Support: we further compile our assessment of potential project ROI based on increased revenues, cost avoidance, risk and compliance management. We then balance those against the perceived business needs and wants by determining the areas that would have the biggest business impact with lowest costs. We further focus our discussions and explorations on these areas with the greatest need and higher interest.

Keys to Our Assessment

Over the past several years, we have worked with diverse organizations to enable them to strategize, design, pilot, and implement scaled Enterprise AI solutions. What makes our priority assessment unique is that it is developed based on years of real-world experience supporting organizations in their knowledge and data management. As such, our assessment offers the following key differentiators and values for the enterprise:

Recognition of Unique Organizational Factors: This assessment recognizes that no Enterprise AI initiative is exactly the same. It is designed in such a way that it recognizes the unique aspects of every organization, including priorities and challenges to then help develop a tailored strategy to address those unique needs.
Emphasis on Business Outcomes: Successful AI efforts result in tangible business applications and outcomes. Every assessment factor is tied to specific business outcomes with corresponding steps on how the organization can use it to better achieve practical business impact.
A Tangible Communication and Education Tool: Because this assessment provides measurable scores and over 30 tangible criteria for assessment and success factors, it serves as an effective tool to allow your organization to communicate up to leadership and quickly garner leadership buy-in, helping organizations understand the cost and the tangible value for AI efforts.

Results

As a result of this effort, you will have a complete view of your AI readiness, gaps and required ecosystem and an accompanying understanding of the potential business value that could be realized once the target state is achieved. Taken as a whole, the assessment allows an organization to:

Understand strengths and weaknesses, and overall readiness to move forward with Enterprise AI compared to other organizations and the industry as a whole;
Judge where foundational gaps may exist in the organization in order to improve Enterprise AI readiness and likelihood of success; and
Identify and prioritize next steps in order to make immediate progress based on the organization’s current state and defined goals for AI and Machine Learning.

Get Started

Download Trends

Ask a Question

Taking the first step toward gaining this invaluable insight is easy:

1. Take 10-15 minutes to complete your Enterprise AI Maturity Assessment by answering a set of questions pertaining to the four factors; and
2. Submit your completed assessment survey and provide your email address to download a formal PDF report with your customized results.

Take the Assessment Survey

The post Enterprise AI Readiness Assessment appeared first on Enterprise Knowledge.

Natural Language Processing and Taxonomy Design

EK Team — Tue, 30 Jun 2020 16:47:35 +0000

Natural Language Processing (NLP) is a branch of artificial intelligence (AI) that processes and analyzes human language found in text. Some of the exciting capabilities that NLP offers includes parsing out the significant entities in content through a statistical analysis and identifying contextual relationships between different words. Taxonomy also aims to provide a hierarchical context for concepts and the words used to describe them. Taxonomists care about how people use language to categorize and identify concepts, in an effort to make information usable for both people and technology.

At EK, depending on the scope of a project, we incorporate NLP into the taxonomy design process in order to deliver highly detailed, relevant, and AI-ready designs that are backed by statistical processes generated from unstructured text data. One of EK’s key differentiators is our hybrid approach to designing business taxonomies. Our top-down approach leverages subject matter experts and business stakeholders to ensure a taxonomy is accurate and relevant for a domain or organization, while our bottom-up approach analyzes existing content and systems to ensure a taxonomy design is useful for the people and systems that will be using the taxonomy. Essentially, NLP in taxonomy design is a type of bottom-up process in which Named Entity Recognition (NER) collects the lowest level terms found in the content. The taxonomist can then identify broader categories for these terms. This is complemented by top-down analysis when engaging SMEs to help name and fine tune the categories, thus fulfilling EK’s hybrid methodology for taxonomy design.

However, NLP is far from automating the human judgment that is required in taxonomy design-data scientists and taxonomists (as well as subject matter experts) need to work together to determine why and how the data generated by NLP will be incorporated into a taxonomy. Here, I outline the ways in which NLP can enhance taxonomy design, and why taxonomists should consider teaming up with data scientists.

Named Entity Recognition and Taxonomy Development

Named Entity Recognition (NER) is a branch of NLP that identifies entities in text. NER can be implemented using powerful Python libraries like spaCy, a NLP library that can be used to train an initial NER model from annotations of sample content. For specific industries, NER models will have to be trained to find entities in different domains. For example, a general scientific model can be used for a medical domain, though the model will have to be further trained to identify entities such as medications, conditions, and medical procedures.

An NER pipeline can run on a volume of content in order to identify and extract the entities found in the content. Once the NER extracts terms, a data scientist can use semantic word embeddings to cluster the entities in an unsupervised learning process; this means the algorithm makes inferences about a data set without human input or labeling. This results in clusters of terms that have a statistical relationship to each other, derived from the way the terms are used in the language of the content.

Usually, taxonomists can derive a theme from these clusters by reviewing the types of terms that are in each cluster. Can you see a theme in the two clusters below?

Two Clusters

morphine, opioid, opioids, cocaine, benzodiazepine, benzodiazepines, overdose, opiate, antagonist, methamphetamine, analgesic, methadone, stimulant, Benzodiazepines, buprenorphine, flumazenil, heroin, methamphetamines, naloxone, Naloxone, opioid-induced, sedative, narcotic, self-administering, self-administration

rash, erythema, acrocyanosis, pedis, itchy, conjunctivitis, blistering, eczema, impetigo, urticaria, herpetiformis, Tinea crurisi, atopic dermatitis, Erythema toxicum neonatorum, hyperpigmentation, papules, photosensitivity, Tinea corporis, cutaneous, pruritic

Since word embeddings are statistical estimations of a word’s usage in a given language, the clusters generated using word embeddings aren’t always perfect from a human perspective. They do make it easier, however, to identify similar types of words and phrases in a body of content. The clusters and word embeddings won’t be able to tell you exactly what that relationship is between the terms, but in our clusters above, a trained taxonomist can deduce that the first cluster has to do with medications, specifically opioids (and other words that are closely related to opioids, such as overdose and antagonists). The second cluster generally has to do with skin conditions.

Once you have identified the various cluster themes (this particular process resulted in several hundred clusters), you can group those themes into another level of broader categories and continue to go up the ladder of the taxonomy into Level 3, Level 2, or Level 1 concepts. For instance, if we continue with the medication example, we may have another cluster of specific antibiotic drugs, as well as antihypertensives. We now know that we need a broader Medication/Chemicals/Drugs category in order to group these themes (opioids, antibiotics, antihypertensives) together. And voila! We have a taxonomy created with the assistance of Natural Language Processing.

Relevancy and Taxonomy Use Cases

Not all clusters will be relevant to a taxonomy. Sometimes the themes of a cluster will be a certain part of speech, such as adjectives or verbs that seem meaningless on their own; these usually have to be paired with other entities to create a phrase that then becomes meaningful to the domain. These entities most likely exist in other clusters, so it will be helpful to have a tool to look up these phrases in the content to see how they are paired with other entities.

Even though the NER process has found a statistical relationship to form these clusters, this doesn’t mean that we need to incorporate those clusters into our taxonomy. This is when good old human judgement and defined taxonomy use cases will help you decide what is needed from the NER results. A use case is the specific business need that a taxonomy is intended to fulfill. Use cases should always be the signal guiding your way through any taxonomy and ontology effort.

Taxonomy and NLP Iteration

Just like taxonomy design, an NLP process should be iterative. Think of the entire process as a feedback loop with your taxonomy. A data scientist can use the fledgling taxonomy to improve the NER models to be more accurate by annotating content manually with the new labels, which improves the desirability of the clusters returned. A more accurate and repeatedly trained model will be able to look for more precise and narrow concepts. For instance, certain medical conditions and medications would have to be annotated in order to be recognized as part of a conditions or medications model.

Once this has been done, you can train the model on the annotations as many times as needed in order to return an increasingly accurate set of terms relevant to the model. Depending on the results, this may necessitate a restructuring of the taxonomy; perhaps another grouping or subgrouping of medical conditions is discovered, which weren’t previously included in the initial NER analysis, or it becomes clear your taxonomy needs an ontology.

Leverage a Gold Standard

It’s highly suggested that you create a “gold standard” (with the critical input of SMEs and business stakeholders) for the most significant types of semantic relationships that are needed to achieve your goals. In creating a gold standard, SMEs and other stakeholders identify the logic/patterns between concepts that best support your use cases-and then focus only on these specific patterns, at least in the first iteration.

If your use case is a recommendation engine, for example, you need to prioritize the relationships between concepts that help facilitate the appropriate recommendations. In our healthcare example, we may find that the best recommendations are facilitated by ontological relationships-perhaps we need an ontology to describe the relationship between bacterial infections and antibiotics, or the relationship between symptoms and diagnosable conditions.

If your use case is for search and findability, you could utilize user research methods such as card sorting to gain a better understanding of how users will relate the concepts to one another. This may also provide guidance on how to build an initial taxonomy with the term clusters, by allowing users to sort the terms into predefined or user-created categories. From there, an analysis of the general relationship patterns can be used as a gold standard to prioritize how the NLP data will be used.

The purpose of a gold standard is to prioritize and set a strict scope on how NLP will assist taxonomy design. The NLP process of entity extraction, clustering, labeling, annotating, and retraining is an intensive process that will generate a lot of data. It can be difficult and overwhelming to decide how much and which data should be incorporated into the taxonomy. A gold standard, basically a more detailed application of use cases, will make it much easier to decide what is a priority and what is outside the scope of your use cases.

Conclusion

NLP is a promising field that has many inherent benefits for taxonomy and ontology design. Teams of data scientists, taxonomists, and subject matter experts that utilize NLP processes, alongside a gold standard and prioritized use cases, are well positioned to create data models for advanced capabilities. The result of this process will be a highly detailed and customized solution derived from an organization’s existing content and data.

If your taxonomy or ontology effort seems to frequently misalign with the actual content or domain you are working in, or if you have too much unstructured data and content to meaningfully derive and conceive a taxonomy that will accurately model your information, an NLP assisted taxonomy design process will provide a way forward. Not only will it help your organization gain a complete sense of its information and data, it will also glean valuable insights about the unseen connections in data, as well as prepare your organization for robust enterprise data governance and advanced artificial intelligence capabilities-including solutions such as recommendation engines and automated classification.

Interested in seeing how NLP can assist your taxonomy and ontology design? Contact Enterprise Knowledge to learn more.

The post Natural Language Processing and Taxonomy Design appeared first on Enterprise Knowledge.

What’s the Difference Between an Ontology and a Knowledge Graph?

EK Team — Wed, 15 Jan 2020 14:00:38 +0000

As semantic applications become increasingly hot topics in the industry, clients often come to EK asking about ontologies and knowledge graphs. Specifically, they want to know the differences between the two. Are ontologies and knowledge graphs the same thing? If not, how are they different? What is the relationship between the two?

In this blog, I’ll walk you through both ontologies and knowledge graphs, describing how they’re different and how they work together to organize large amounts of data and information.

What is an ontology?

Ontologies are semantic data models that define the types of things that exist in our domain and the properties that can be used to describe them. Ontologies are generalized data models, meaning that they only model general types of things that share certain properties, but don’t include information about specific individuals in our domain. For example, instead of describing your dog, Spot, and all of his individual characteristics, an ontology should focus on the general concept of dogs, trying to capture characteristics that most/many dogs might have. Doing this allows us to reuse the ontology to describe additional dogs in the future.

There are three main components to an ontology, which are usually described as follows:

Classes: the distinct types of things that exist in our data.
Relationships: properties that connect two classes.
Attributes: properties that describe an individual class.

For example, imagine we have the following information on books, authors, and publishers:

First we want to identify our classes (the unique types of things that are in the data). This sample data appears to capture information about books, so that’s a good candidate for a class. Specifically, the sample data captures certain types of things about books, such as authors and publishers. Digging a little deeper, we can see our data also captures information about publishers and authors, such as their locations. This leaves us with four classes for this example:

Books
Authors
Publishers
Locations

Next, we need to identify relationships and attributes (for simplicity, we can consider both relationships and attributes as properties). Using the classes that we identified above, we can look at the data and start to list all of the properties we see for each class. For example, looking at the book class, some properties might be:

Books have authors
Books have publishers
Books are published on a date
Books are followed by sequels (other books)

Some of these properties are relationships that connect two of our classes. For example, the property “books have authors” is a relationship that connects our book class and our author class. Other properties, such as “books are published on a date,” are attributes, describing only one class, instead of connecting two classes together.

It’s important to note that these properties might apply to any given book, but they don’t necessarily have to apply to every book. For example, many books don’t have sequels. That’s fine in our ontology, because we just want to make sure we capture possible properties that could apply to many, but not necessarily all, books.

While the above list of properties is easy to read, it can be helpful to rewrite these properties to more clearly identify our classes and properties. For example, “books have authors” can be written as:

Book → has author → Author

Although there are many more properties that you could include, depending on your use case, for this blog, I’ve identified the following properties:

Book → has author → Author
Book → has publisher→ Publisher
Book → published on → Publication date
Book → is followed by → Book
Author → works with → Publisher
Publisher → located in → Location
Location → located in → Location

Remember that our ontology is a general data model, meaning that we don’t want to include information about specific books in our ontology. Instead, we want to create a reusable framework we could use to describe additional books in the future.

When we combine our classes and relationships, we can view our ontology in a graph format:

What is a knowledge graph?

Using our ontology as a framework, we can add in real data about individual books, authors, publishers, and locations to create a knowledge graph. With the information in our tables above, as well as our ontology, we can create specific instances of each of our ontological relationships. For example, if we have the relationship Book → has author → Author in our ontology, an individual instance of this relationship looks like:

If we add in all of the individual information that we have about one of our books, To Kill a Mockingbird, we can start to see the beginnings of our knowledge graph:

If we do this with all of our data, we will eventually wind up with a graph that has our data encoded using our ontology. Using this knowledge graph, we can view our data as a web of relationships, instead of as separate tables, drawing new connections between data points that we would otherwise be unable to understand. Specifically, using SPARQL, we can query this data, using inferencing, letting our knowledge graph make connections for us that weren’t previously defined.

So how are ontologies and knowledge graphs different?

As you can see from the example above, a knowledge graph is created when you apply an ontology (our data model) to a set of individual data points (our book, author, and publisher data). In other words:

ontology + data = knowledge graph

Ready to get started? Check our ontology design and knowledge graph design best practices, and contact us if you need help beginning your journey with advanced semantic data models.

The post What’s the Difference Between an Ontology and a Knowledge Graph? appeared first on Enterprise Knowledge.

Lulit Tesfaye & Yanko Ivanov Speaking at Graphorum 2019

EK Team — Thu, 10 Oct 2019 17:41:05 +0000

Enterprise Knowledge’s Lulit Tesfaye, Practice Lead for Data and Information Management, and Yanko Ivanov, Solutions Architect and Partnership Manager, are presenting at this year’s Graphorum 2019 to be held from October 14 – 17 in Chicago, Illinois. The conference provides an educational platform and brings together emerging disciplines around intelligent information gathering and analysis, including graph technologies, knowledge graphs, data modeling, ontologies, graph analytics, graph databases, and AI.

Tesfaye and Ivanov will be speaking on the topic of Knowledge Graphs as a Pillar to IA on October 16th under the Knowledge Graph track. They will share best practices, real-world use cases, and case studies regarding innovative and scalable graph-based approaches and solutions that serve as a foundation for advanced AI capabilities, such as Machine Learning (ML), Natural Language Processing (NLP), Predictive Analytics, and the like.

For more information, visit the event website at: https://graphorum2019.dataversity.net/index.cfm

About Enterprise Knowledge

Enterprise Knowledge (EK) is a services firm that integrates Knowledge Management, Data and Information Management, Information Technology, and Agile Approaches to deliver comprehensive solutions. Our mission is to form true partnerships with our clients, listening and collaborating to create tailored, practical, and results-oriented solutions that enable them to thrive and adapt to changing needs.

Our core services include:

Knowledge Graphs, AI and Semantic Technologies design, strategy, and implementation;
Taxonomy and Ontology Design;
Knowledge & Information Management Strategy and Implementation
Change Management and Communication; and
Agile Transformation and Facilitation.

At the heart of these services, we always focus on working alongside our clients to understand their needs, ensuring we can provide practical and achievable solutions on an iterative, ongoing basis.

About Graphorum

Graph technology has been steadily growing over the years, but recently hit a critical mass. Knowledge graphs, graph analytics, graph databases, graphs and AI are bringing new innovation and new practical applications to the marketplace. Graphorum is designed to accommodate all levels of technical understanding. It will bring together emerging disciplines that are focused on more intelligent information gathering and analysis.

The post Lulit Tesfaye & Yanko Ivanov Speaking at Graphorum 2019 appeared first on Enterprise Knowledge.