natural language processing Articles - Enterprise Knowledge

Enhancing Taxonomy Management Through Knowledge Intelligence

Maryam Nozari — Wed, 30 Apr 2025 20:56:44 +0000

In today’s data-driven world, managing taxonomies has become increasingly complex, requiring a balance between precision and usability. The Knowledge Intelligence (KI) framework – a strategic integration of human expertise, AI capabilities, and organizational knowledge assets – offers a transformative approach to taxonomy management. This blog explores how KI can revolutionize taxonomy management while maintaining strict compliance standards.

The Evolution of Taxonomy Management

Traditional taxonomy management has long relied on Subject Matter Experts (SME) manually curating terms, relationships, and hierarchies. While this time-consuming approach ensures accuracy, it struggles with scale. Modern organizations generate millions of documents across multiple languages and domains, and manual curation simply cannot keep pace with the large variety and velocity of organizational data while maintaining the necessary precision. Even with well-defined taxonomies, organizations must continuously analyze massive amounts of content to verify that their taxonomic structures accurately reflect and capture the concepts present in their rapidly growing data repositories.

In the scenario above, traditional AI tools might help classify new documents, but an expert-guided recommender brings intelligence to the process.

KI-Driven Taxonomy Management

KI represents a fundamental shift from traditional AI systems, moving beyond data processing to true knowledge understanding and manipulation. As Zach Wahl explains in his blog, From Artificial Intelligence to Knowledge Intelligence, KI enhances AI’s capabilities by making systems contextually aware of an organization’s entire information ecosystem and creating dynamic knowledge systems that continuously evolve through intelligent automation and semantic understanding.

At its core, KI-driven taxonomy management works through a continuous cycle of enrichment, validation, and refinement. This approach integrates domain expertise at every stage of the process:

1. During enrichment, SMEs guide AI-powered discovery of new terms and relationships.

2. In validation, domain specialists ensure accuracy and compliance of all taxonomy modifications.

3. Through refinement, experts interpret usage patterns to continuously improve taxonomic structures.

By systematically injecting domain expertise into each stage, organizations transform static taxonomies into adaptive knowledge frameworks that continue to evolve with user needs while maintaining accuracy and compliance. This expert-guided approach ensures that AI augments rather than replaces human judgement in taxonomy development.

Enrichment: Augmenting Taxonomies with Domain Intelligence

When augmenting the taxonomy creation process with AI, SMEs begin by defining core concepts and relationships, which then serve as seeds for AI-assisted expansion. Using these expert-validated foundations, systems employ Natural Language Processing (NLP) and Generative AI to analyze organizational content and extract relevant phrases that relate to existing taxonomy terms.

Topic modeling, a set of algorithms that discover abstract themes within collections of documents, further enhances this enrichment process. Topic modeling techniques like BERTopic, which uses transformer-based language models to create coherent topic clusters, can identify concept hierarchies within organizational content. The experts evaluate these AI-generated suggestions based on their specialized knowledge, ensuring that automated discoveries align with industry standards and organizational needs. This human-AI collaboration creates taxonomies that are both technically sound and practically useful, balancing precision with accessibility across diverse user groups.

Validation: Maintaining Compliance Through Structured Governance

What sets the KI framework apart is its unique ability to maintain strict compliance while enabling taxonomy evolution. Every suggested change, whether generated through user behavior or content analysis, goes through a structured governance process that includes:

Automated compliance checking against established rules;
Human expert validation for critical decisions;
Documentation of change justifications; and
Version control with complete audit trails.

Organizations implementing KI-driven taxonomy management see transformative results including improving search success rates and decreasing the time required for taxonomy updates. More importantly, taxonomies become living knowledge frameworks that continuously adapt to organizational needs while maintaining compliance standards.

Refinement: Learning From Usage to Improve Taxonomies

By systematically analyzing how users interact with taxonomies in real-world scenarios, organizations gain invaluable insights into potential improvements. This intelligent system extends beyond simple keyword matching—it identifies emerging patterns, uncovers semantic relationships, and bridges gaps between formal terminology and practical usage. This data-driven refinement process:

Analyzes search patterns to identify semantic relationships;
Generates compliant alternative labels that match user behavior;
Routes suggestions through appropriate governance workflows; and
Maintains an audit trail of changes and justifications.

The refinement process analyzes the conceptual relationship between terms, evaluates usage contexts, and generates suggestions for terminological improvements. These suggestions—whether alternative labels, relationship modifications, or new term additions—are then routed through governance workflows where domain experts validate their accuracy and compliance alignment. Throughout this process, the system maintains a comprehensive audit trail documenting not only what changes were made but why they were necessary and who approved them.

Case Study: KI in Action at a Global Investment Bank

To show the practical application of the continuous, knowledge-enhanced taxonomy management cycle, in the following section we describe a real-world implementation at a global investment bank.

Challenge

The bank needed to standardize risk descriptions across multiple business units, creating a consistent taxonomy that would support both regulatory compliance and effective risk management. With thousands of risk descriptions in various formats and terminology, manual standardization would have been time-consuming and inconsistent.

Solution

Phase 1: Taxonomy Enrichment

The team began by applying advanced NLP and topic modeling techniques to analyze existing risk descriptions. Risk descriptions were first standardized through careful text processing. Using the BERTopic framework and sentence transformers, the system generated vector embeddings of risk descriptions, allowing for semantic comparison rather than simple keyword matching. This AI-assisted analysis identified clusters of semantically similar risks, providing a foundation for standardization while preserving the important nuances of different risk types. Domain experts guided this process by defining the rules for risk extraction and validating the clustering approach, ensuring that the technical implementation remained aligned with risk management best practices.

Phase 2: Expert Validation

SMEs then reviewed the AI-generated standardized risks, validating the accuracy of clusters and relationships. The system’s transparency was critical so experts could see exactly how risks were being grouped. This human-in-the-loop approach ensured that:

All source risk IDs were properly accounted for;
Clusters maintained proper hierarchical relationships; and
Risk categorizations aligned with regulatory requirements.

The validation process transformed the initial AI-generated taxonomy into a production-ready, standardized risk framework, approved by domain experts.

Phase 3: Continuous Refinement

Once implemented, the system began monitoring how users actually searched for and interacted with risk information. The bank recognized that users often do not know the exact standardized terminology when searching, so the solution developed a risk recommender that displayed semantically similar risks based on both text similarity and risk dimension alignment. This approach allowed users to effectively navigate the taxonomy despite being unfamiliar with standardized terms. By analyzing search patterns, the system continuously refined the taxonomy with alternative labels reflecting actual user terminology, and created a dynamic knowledge structure that evolved based on real usage.

This case study demonstrates the power of knowledge-enhanced taxonomy management, combining domain expertise with AI capabilities through a structured cycle of enrichment, validation, and refinement to create a living taxonomy that serves both regulatory and practical business needs.

Taxonomy Standards

For taxonomies to be truly effective and scalable in modern information environments, they must adhere to established semantic web standards and follow best practices developed by information science experts. Modern taxonomies need to support enterprise-wide knowledge initiatives, break down data silos, and enable integration with linked data and knowledge graphs. This is where standards like the Simple Knowledge Organization System (SKOS) become essential. By using universal standards like SKOS, organizations can:

Enable interoperability between systems and across organizational boundaries
Facilitate data migration between different taxonomy management tools
Connect taxonomies to ontologies and knowledge graphs
Ensure long-term sustainability as technology platforms evolve

Beyond SKOS, taxonomy professionals should be familiar with related semantic web standards such as RDF and SPARQL, especially as organizations move toward more advanced semantic technologies like ontologies and enterprise knowledge graphs. Well-designed taxonomies following these standards become the foundation upon which more advanced Knowledge Intelligence capabilities can be built. By adhering to established standards, organizations ensure their taxonomies remain both technically sound and semantically precise, capable of scaling effectively as business requirements evolve.

The Future of Taxonomy Management

The future of taxonomy management lies not just in automation, but in intelligent collaboration between human expertise and AI capabilities. KI provides the framework for this collaboration, ensuring that taxonomies remain both precise and practical.

For organizations considering this approach, the key is to start with a clear understanding of their taxonomic needs and challenges, and to ensure their taxonomy efforts are built on solid foundations of semantic web standards like SKOS. These standards are essential for taxonomies to effectively scale, support interoperability, and maintain long-term value across evolving technology landscapes. Success comes not from replacement of existing processes, but from thoughtful integration of KI capabilities into established workflows that respect these standards and best practices.

Ready to explore how KI can transform your taxonomy management? Contact our team of experts to learn more about implementing these capabilities in your organization.

The post Enhancing Taxonomy Management Through Knowledge Intelligence appeared first on Enterprise Knowledge.

Graph Machine Learning Recommender POC for Public Safety Agency

EK Team — Thu, 15 Feb 2024 16:25:13 +0000

The Challenge

A government agency responsible for regulating and enforcing occupational safety sought to build a content recommender proof-of-concept (POC) that leverages semantic technologies to model the relevant workplace safety domains. The agency aimed to optimize project planning and construction site design by centralizing information from siloed and unstructured sources and extracting a comprehensive view of potential safety risks and related regulations. Automatically connecting and surfacing this information in a single location via the recommender would serve to minimize time currently spent searching for content and limit burdensome manual efforts, ultimately improving risk awareness and facilitating data-driven decision-making for risk mitigation and regulatory adherence.

The Solution

The agency partnered with EK to develop a knowledge graph-powered semantic recommendation engine with a custom front-end. Based on the use case we refined for construction site project planners, we redesigned the agency’s applicable taxonomies and developed an ontology that defined relationships to model the recommendation journey from the user’s inputs of construction site elements to the expected outputs of risks and regulations. With data loaded into the graph from taxonomy values and structured historical data, EK leveraged machine learning (ML) and natural language processing (NLP) techniques to extract data from the agency’s large volume of structured data and generate risk recommendations from user input combinations. EK iterated upon these processes to enrich the data and fine tune the risk prediction models to achieve even more accurate results. Then, based on low-fidelity wireframes collaboratively developed and validated by the client, EK’s software engineers created an interactive front-end for users to view the results and provide feedback, and ultimately deployed the application on cloud infrastructure.

Lastly, in addition to the design and development of the initial POC, EK collaborated closely with the client to assess future uses for the application, as well as methods for improving performance and utility. Potential paths for improving the application include developing user feedback mechanisms, expanding the dimensions of analysis for work sites, and expanding the scope of the application to support additional use cases. EK provided the agency with clear recommendations for next steps and paths forward to build upon the POC and further optimize construction site design and planning.

The EK Difference

EK employed its extensive experience in taxonomy design, ontology design, and data science with specific expertise in the development of recommender systems to capture and model the semantic content of the construction safety domain. Throughout the engagement, EK prioritized close collaboration with the client’s core project team and involved their subject matter experts and stakeholders in taxonomy, ontology, and wireframe design sessions, iteratively soliciting their feedback and domain knowledge to ensure the final product would properly reflect the language and subject matter for the agency’s use case. EK also provided transparency into the development of the recommender, providing thorough technical walkthroughs of the solution. This ensured the agency had all the knowledge required to make informed decisions regarding next steps to scale the solution following the end of our engagement.

The Results

The graph-powered recommender solution delivered at the end of the engagement was a compelling POC for the client to consider for long-term application and scale. The recommendation engine provided coherent recommendations in a centralized location to reduce manual efforts for end users and displayed related regulations and supporting metrics to facilitate context-based, data-driven decision-making for construction site planners at the agency. The tailored roadmap to refine and expand the solution offered clear guidance for further data and system improvements to increase the overall utility of the recommender. With this POC and the accompanying roadmap, the agency has a tangible and effective solution with a path to scale to achieve widespread buy-in from across the organization and address more complex use cases in order to maximize the value of the recommender.

This project was an example of EK’s Knowledge Graph Accelerator offering, delivering the POC to the client in 4 months.

Download Flyer

Ready to Get Started?

Get in Touch

The post Graph Machine Learning Recommender POC for Public Safety Agency appeared first on Enterprise Knowledge.

Taxonomy Use Cases: How To Estimate Effort and Complexity

EK Team — Thu, 27 Aug 2020 13:00:01 +0000

When asked to define taxonomy, I like to define it as a method rather than a thing. I typically say taxonomy is a way of categorizing things hierarchically, from general to more specific. Sounds simple enough, right? After all, who hasn’t been grouping together things that have something in common, and slapping a name on that group since they first learned to speak? Every store, every house, every website has a way of categorizing and labeling stuff so that everything belongs in a place. Everyone does it, so it should be easy… right?

As any seasoned taxonomist, librarian, or knowledge manager will tell you: it depends. Specifically, it depends on the purpose of the taxonomy and its intended users. Even subtle differences in purpose or audience in similar environments can lead to vastly different results. Have you ever completely failed to find something in someone else’s kitchen? This is because it was not organized for you, just like you organized your kitchen with your own purposes and needs in mind. The use case, then, is intertwined with an audience or persona and a goal.

This white paper explores taxonomy use cases as an indicator of complexity, and how they can be used to determine the amount of effort that may be required for an organization to design a taxonomy. Effort refers to the amount of dedicated work and brain power that will be needed in order to design for a taxonomy’s complexity, particularly the effort to maintain the taxonomy in the long run and ensure its future success.

Use Cases

Use cases will establish scope and purpose of a taxonomy. Defining complete and detailed use cases will make a difference in planning out an effort for taxonomy design. Use cases identify who will be using a taxonomy, how they will be using it, and why. These can be similar to user stories in the Agile methodology. Once defined, use cases will delineate relevant scope by defining Minimum Viable Product features, and help decide the direction of a taxonomy (MVP is like a prototype: what are the bare minimum efforts and features we need to put in to this product in order to learn the most about the impact of the product and iteratively expand it?). There may be numerous use cases for a single taxonomy, so it will be necessary to prioritize and create a backlog of use cases that will drive future iterations of a taxonomy. Since we typically focus on First Implementable Versions (a taxonomy MVP), we want to first focus on use cases that are easily compatible with each other and are attainable, recognizing that taxonomies can grow to incorporate future use cases once we have the foundation built.

A use case can be broken down into three parts: the persona, action, and goal. The persona represents an archetype of user that will be interacting with the taxonomy; this could also be a specific role, such as a Sales Representative. The action includes the steps a persona is taking while using the taxonomy; this should also include a specific system in which the taxonomy will be implemented. The goal is the persona’s purpose for using the taxonomy.

An example use case can be:

Clark the Customer (persona) needs to be able to use brand, color, and size facets on the customer shop of Shirts.com (action) so that they can find the perfect shirt for their upcoming interview (goal).

These specific details provide clear indicators of a successful taxonomy: we know that our taxonomy must describe clothing through facets (brand, color, size), including styles that are appropriate for interviews (this is a little bit of extra detail, but it can bring a use case to life). We know that the taxonomy must be implemented in faceted search and navigation on a specific system, so knowing whether this system has this capability is identified; there may also be implicit systems (such as databases) in the back-end we need to account for. Lastly, we will need to have a better understanding of how users currently go about using this system to achieve their goals, and what a taxonomy can do to improve the situation.

Classic Taxonomy Use Cases

Classic use cases for taxonomy include: tagging and faceted search for content, basic reporting or analytics, or creating organizational or navigational structures. These use cases are typically applied in content management repositories such as intranets and learning portals, or any other front-facing interfaces such as retail websites.

Classic use cases are people focused: a customer needs a navigational structure to be clear so that they can find what they’re looking for when they need it. An employee needs to be able to search effectively to find the relevant training on the company learning portal in order to improve at their job. A revenue team needs to be able to classify products and services in one category in order to run reports on their profitability. A data governance team similarly needs to definitively classify data entities and attributes in a single category that corresponds to a business unit, in order to identify data stewards and owners for compliance purposes (such as GDPR or CCPA).

Classic use cases may appear to be less complex and therefore seem easier, but this is deceptive and not always true. Classic use cases can easily multiply into several use cases if it turns out there are multiple personas involved. For instance, you may have customers, sales representatives, and third-party vendors involved in a retail search and navigational use case in which each group has different needs from the taxonomy. Perhaps third party vendors need a way of managing product metadata, and sales representatives need to be able to track sales, while customers need clear facets to find products.

That being said, Classic Use Cases are “classic” because they’ve been implemented time and time again in systems that most organizations already have (unless they are adding an enterprise taxonomy tool to the mix, which will make a long term effort smoother); taxonomists and developers have reliable previous efforts to lean back on and may have a specific methodology for each use case that can be reused. Classic use cases tend to have a more predictable level of effort estimation, that should also include other factors such as the complexity of the domain, the level of specificity or the breadth of concepts possible, and the type of content the taxonomy will be primarily organizing.

Advanced Use Cases

Advanced use cases tend to delve into ontologies, knowledge graphs, and artificial intelligence, but taxonomy is still a foundational aspect of these technologies. These use cases include text parsing and automated classification, predictive analytics, insight inferencing, chatbots, and recommendation engines. While people will still benefit from the end result of these use cases, the complexity of these taxonomies are amplified by the fact they are primarily meant to be utilized by machine learning processes that humans can’t effectively reproduce, on a massive volume of data. A taxonomy meant purely for text parsing and auto-classification will not be directly intuitive or usable by people since these tend to be significantly larger, highly specific, or repetitive as a way to disambiguate concepts, and therefore highly complex. They may also have polyhierarchy or semantic relationships that go beyond hierarchy.

Advanced use cases will require a higher level of effort, more so than classic use cases. The barrier for entry is much higher than a classic use case, requiring specific knowledge regarding machine learning and other semantic capabilities. Advanced use cases will use specific technology that many organizations don’t have, unless they have some of these capabilities already, so new technology may need to be purchased and added to an organization’s system architecture. This is also an actively developing field within artificial intelligence; while of course there are demonstrated successes, these use cases are open to experimentation as the field develops, and may face a higher degree of uncertainty (see my previous blog on NLP and Taxonomy Design to learn more about an example of an advanced use case).

System Use Case Limitations

Systems that are in scope for taxonomy implementation should be noted as part of the action of a use case, in which a persona uses a system to interact with a taxonomy. The added element of a specific in-scope system opens the possibility of certain limitations that can dictate the design of a taxonomy, and will restrict other use cases. For example, some systems do not handle hierarchical values easily. If a taxonomy informs the values of a metadata field in this kind of system, that field will not be able to fully represent the hierarchy of the taxonomy.

This means the implementation of a taxonomy will have to get creative, but it also means the usable fields of the taxonomy may be limited to a certain level. In other words, only the lowest level in the taxonomy can be used as metadata values. The taxonomy must conform to this level across the board, and all areas of this taxonomy must go to a certain level of depth in order to be used. A good rule of thumb for taxonomies with the classic use cases is Three Levels.

A strict hierarchy and a strict number of levels that are both imposed by a system is great for classic use cases, but it will not be ideal for advanced use cases like text parsing. A limitation like this can make fulfilling an advanced use case exceedingly difficult, since certain levels of specificity will have to be sacrificed. This means that certain classic and advanced use cases are incompatible and may require different designs.

While system limitations don’t necessarily change the level of effort of a taxonomy design, not knowing system limitations in advance has a risk for more effort if there needs to be significant rework (which can still be accounted for ahead of time if we plan for constant iteration). However, as mentioned above, system limitations will have an effect on other use cases. As a result, the more systems that are selected to be a part of a taxonomy effort, the higher chance there are system limitations which can impact design decisions; this may increase the level of effort, and restrict the taxonomy’s ability to fulfill other types of use cases, especially if each system roughly corresponds to a classic or advanced use case.

Mixed Use Cases as Indicators of Complexity

Multiple use cases for a taxonomy can be a sign of complex business needs. Multiple use cases can be due to the fact multiple groups of users or even departments are relying on a single taxonomy to achieve their specific goals. Likewise, multiple in-scope systems can indicate multiple groups of users or departments that use their own designated system, each with its own capabilities and limitations that may need to be accounted for.

Depending on the nature of these departments, even if they have the same use case, they may require different concepts or structures to be in the taxonomy. For example, if a global enterprise needs a taxonomy for their products and services, it is usually the case that regional offices offer unique services and products, or engage in markets/industries respective to their regions, but not others.

The implication being, this taxonomy will have parts that are not relevant to specific regions. This increases the potential for misalignment and lower adoption if not identified early on by establishing thorough use cases for each region, which may need to have the ability to designate sections of a master enterprise taxonomy that are relevant to them.

While some use cases are very compatible with each other, every distinct use case for a taxonomy runs the risk of changing the nature or content of a taxonomy, thus potentially increasing the effort required. A taxonomy intended for search and navigation may be a different shape than a taxonomy for reporting, because these entail different users with different goals, even if the taxonomy is modeling the same information domain. As more use cases are introduced to a single taxonomy effort and the effort is not planned accordingly, the higher the risk of not being able to meet expectations, thus lowering adoption.

Conclusion

It’s important to emphasize again that we use terms like “First Implementable Version” and “Initial Design” for a reason: to set expectations that a taxonomy is necessarily iterative, and you don’t need to tackle all possible use cases at once on Day 1. Similarly, expecting to achieve all of your possible use cases within a few months’ initial design project is unrealistic. A sustained effort can be grown as value is realized with an MVP, and then more use cases, as well as the advanced use cases, can eventually be explored. Start small, prioritize the first use cases to the ones that are compatible and attainable, realize and demonstrate the value of your MVP, and grow as necessary.

Taxonomy is incredibly flexible, and it can be designed in many different ways to suit your users’ needs. Taxonomy is an elegant solution to complex, wide ranging yet common problems in the information world. Identifying and analyzing use cases, and considering the potential complexity represented by them, can be used as a way to estimate the effort required for an enterprise taxonomy. From here, a viable long-term roadmap can be created with realistic expectations and priorities.

Know you need a taxonomy, but unsure where to start? Contact Enterprise Knowledge’s team of expert taxonomists and KM consultants to learn more.

The post Taxonomy Use Cases: How To Estimate Effort and Complexity appeared first on Enterprise Knowledge.

Presentation: Introduction to Knowledge Graphs

Joe Hilger — Tue, 07 Jul 2020 16:16:18 +0000

This workshop presentation from Joe Hilger, Founder and COO, and Sara Nash, Technical Analyst, was delivered on June 8, 2020 as part of the Data Summit 2020 virtual conference. The 3-hour workshop provided an interdisciplinary group of participants with a definition of what a knowledge graph is, how it is implemented, and how it can be used to increase the value of an organization’s data. This slide deck gives an overview of the KM concepts that are necessary for the implementation of knowledge graphs as a foundation for Enterprise Artificial Intelligence (AI). Hilger and Nash also outlined four use cases for knowledge graphs, including recommendation engines and natural language query on structured data.

The post Presentation: Introduction to Knowledge Graphs appeared first on Enterprise Knowledge.

EK Listed on KMWorld’s AI 50 Leading Companies

EK Team — Tue, 07 Jul 2020 15:54:34 +0000

Enterprise Knowledge (EK) has been listed on KMWorld’s inaugural list of leaders in Artificial Intelligence, the AI 50: The Companies Empowering Intelligent Knowledge Management. KMWorld developed the list to help shine a light on innovative knowledge management vendors that are incorporating AI and cognitive computing technologies into their offerings.

As a services provider and thought leader in Enterprise AI, Knowledge Management, and Semantic Search, EK is one of the few dedicated services organizations included on the list. EK was uniquely recognized for our leadership in this area, including our AI Readiness Benchmark and range of functional demos that harness knowledge graphs, natural language processing, ontologies, and machine learning tools.

“As the drive for digital transformation becomes an imperative for companies seeking to compete and succeed in all industry sectors, intelligent tools and services are being leveraged to enable speed, insight, and accuracy,” said Tom Hogan, Group Publisher at KMWorld. “To showcase organizations that are incorporating AI and an assortment of related technologies—including natural language processing, machine learning, and computer vision—into their offerings, KMWorld created the “AI 50: The Companies Empowering Intelligent Knowledge Management.”

Lulit Tesfaye, EK’s Practice Leader for Data and Information Management stated, “We are thrilled for this recognition and extremely proud of the cutting edge solutions we’re able to deliver for organizations looking to optimize their data and Knowledge AI initiatives. This recognition demonstrates EK’s ability to leverage our real-world experience and define the enterprise success factors for maturity and readiness for AI, bringing the focus back to business values, and the tangible applications of AI for the enterprise. Allowing organizations to go past the common AI limitations is what helps us show where we are leading.”

EK CEO Zach Wahl added, “Thanks to KMWorld for this recognition and congratulations to my amazing colleagues for their thought leadership. Alongside our recognition as one of the top 100 Companies That Matter in Knowledge Management for the sixth year in a row, this demonstrates EK’s leadership position at the nexus of KM and AI.”

About Enterprise Knowledge

Enterprise Knowledge (EK) is a services firm that integrates Knowledge Management, Information Management, Information Technology, and Agile Approaches to deliver comprehensive solutions. Our mission is to form true partnerships with our clients, listening and collaborating to create tailored, practical, and results-oriented solutions that enable them to thrive and adapt to changing needs.

About KMWorld

KMWorld is the leading information provider serving the Knowledge Management systems market and covers the latest in Content, Document and Knowledge Management, informing more than 21,000 subscribers about the components and processes – and subsequent success stories – that together offer solutions for improving business performance.

KMWorld is a publishing unit of Information Today, Inc.

The post EK Listed on KMWorld’s AI 50 Leading Companies appeared first on Enterprise Knowledge.

Natural Language Processing and Taxonomy Design

EK Team — Tue, 30 Jun 2020 16:47:35 +0000

Natural Language Processing (NLP) is a branch of artificial intelligence (AI) that processes and analyzes human language found in text. Some of the exciting capabilities that NLP offers includes parsing out the significant entities in content through a statistical analysis and identifying contextual relationships between different words. Taxonomy also aims to provide a hierarchical context for concepts and the words used to describe them. Taxonomists care about how people use language to categorize and identify concepts, in an effort to make information usable for both people and technology.

At EK, depending on the scope of a project, we incorporate NLP into the taxonomy design process in order to deliver highly detailed, relevant, and AI-ready designs that are backed by statistical processes generated from unstructured text data. One of EK’s key differentiators is our hybrid approach to designing business taxonomies. Our top-down approach leverages subject matter experts and business stakeholders to ensure a taxonomy is accurate and relevant for a domain or organization, while our bottom-up approach analyzes existing content and systems to ensure a taxonomy design is useful for the people and systems that will be using the taxonomy. Essentially, NLP in taxonomy design is a type of bottom-up process in which Named Entity Recognition (NER) collects the lowest level terms found in the content. The taxonomist can then identify broader categories for these terms. This is complemented by top-down analysis when engaging SMEs to help name and fine tune the categories, thus fulfilling EK’s hybrid methodology for taxonomy design.

However, NLP is far from automating the human judgment that is required in taxonomy design-data scientists and taxonomists (as well as subject matter experts) need to work together to determine why and how the data generated by NLP will be incorporated into a taxonomy. Here, I outline the ways in which NLP can enhance taxonomy design, and why taxonomists should consider teaming up with data scientists.

Named Entity Recognition and Taxonomy Development

Named Entity Recognition (NER) is a branch of NLP that identifies entities in text. NER can be implemented using powerful Python libraries like spaCy, a NLP library that can be used to train an initial NER model from annotations of sample content. For specific industries, NER models will have to be trained to find entities in different domains. For example, a general scientific model can be used for a medical domain, though the model will have to be further trained to identify entities such as medications, conditions, and medical procedures.

An NER pipeline can run on a volume of content in order to identify and extract the entities found in the content. Once the NER extracts terms, a data scientist can use semantic word embeddings to cluster the entities in an unsupervised learning process; this means the algorithm makes inferences about a data set without human input or labeling. This results in clusters of terms that have a statistical relationship to each other, derived from the way the terms are used in the language of the content.

Usually, taxonomists can derive a theme from these clusters by reviewing the types of terms that are in each cluster. Can you see a theme in the two clusters below?

Two Clusters

morphine, opioid, opioids, cocaine, benzodiazepine, benzodiazepines, overdose, opiate, antagonist, methamphetamine, analgesic, methadone, stimulant, Benzodiazepines, buprenorphine, flumazenil, heroin, methamphetamines, naloxone, Naloxone, opioid-induced, sedative, narcotic, self-administering, self-administration

rash, erythema, acrocyanosis, pedis, itchy, conjunctivitis, blistering, eczema, impetigo, urticaria, herpetiformis, Tinea crurisi, atopic dermatitis, Erythema toxicum neonatorum, hyperpigmentation, papules, photosensitivity, Tinea corporis, cutaneous, pruritic

Since word embeddings are statistical estimations of a word’s usage in a given language, the clusters generated using word embeddings aren’t always perfect from a human perspective. They do make it easier, however, to identify similar types of words and phrases in a body of content. The clusters and word embeddings won’t be able to tell you exactly what that relationship is between the terms, but in our clusters above, a trained taxonomist can deduce that the first cluster has to do with medications, specifically opioids (and other words that are closely related to opioids, such as overdose and antagonists). The second cluster generally has to do with skin conditions.

Once you have identified the various cluster themes (this particular process resulted in several hundred clusters), you can group those themes into another level of broader categories and continue to go up the ladder of the taxonomy into Level 3, Level 2, or Level 1 concepts. For instance, if we continue with the medication example, we may have another cluster of specific antibiotic drugs, as well as antihypertensives. We now know that we need a broader Medication/Chemicals/Drugs category in order to group these themes (opioids, antibiotics, antihypertensives) together. And voila! We have a taxonomy created with the assistance of Natural Language Processing.

Relevancy and Taxonomy Use Cases

Not all clusters will be relevant to a taxonomy. Sometimes the themes of a cluster will be a certain part of speech, such as adjectives or verbs that seem meaningless on their own; these usually have to be paired with other entities to create a phrase that then becomes meaningful to the domain. These entities most likely exist in other clusters, so it will be helpful to have a tool to look up these phrases in the content to see how they are paired with other entities.

Even though the NER process has found a statistical relationship to form these clusters, this doesn’t mean that we need to incorporate those clusters into our taxonomy. This is when good old human judgement and defined taxonomy use cases will help you decide what is needed from the NER results. A use case is the specific business need that a taxonomy is intended to fulfill. Use cases should always be the signal guiding your way through any taxonomy and ontology effort.

Taxonomy and NLP Iteration

Just like taxonomy design, an NLP process should be iterative. Think of the entire process as a feedback loop with your taxonomy. A data scientist can use the fledgling taxonomy to improve the NER models to be more accurate by annotating content manually with the new labels, which improves the desirability of the clusters returned. A more accurate and repeatedly trained model will be able to look for more precise and narrow concepts. For instance, certain medical conditions and medications would have to be annotated in order to be recognized as part of a conditions or medications model.

Once this has been done, you can train the model on the annotations as many times as needed in order to return an increasingly accurate set of terms relevant to the model. Depending on the results, this may necessitate a restructuring of the taxonomy; perhaps another grouping or subgrouping of medical conditions is discovered, which weren’t previously included in the initial NER analysis, or it becomes clear your taxonomy needs an ontology.

Leverage a Gold Standard

It’s highly suggested that you create a “gold standard” (with the critical input of SMEs and business stakeholders) for the most significant types of semantic relationships that are needed to achieve your goals. In creating a gold standard, SMEs and other stakeholders identify the logic/patterns between concepts that best support your use cases-and then focus only on these specific patterns, at least in the first iteration.

If your use case is a recommendation engine, for example, you need to prioritize the relationships between concepts that help facilitate the appropriate recommendations. In our healthcare example, we may find that the best recommendations are facilitated by ontological relationships-perhaps we need an ontology to describe the relationship between bacterial infections and antibiotics, or the relationship between symptoms and diagnosable conditions.

If your use case is for search and findability, you could utilize user research methods such as card sorting to gain a better understanding of how users will relate the concepts to one another. This may also provide guidance on how to build an initial taxonomy with the term clusters, by allowing users to sort the terms into predefined or user-created categories. From there, an analysis of the general relationship patterns can be used as a gold standard to prioritize how the NLP data will be used.

The purpose of a gold standard is to prioritize and set a strict scope on how NLP will assist taxonomy design. The NLP process of entity extraction, clustering, labeling, annotating, and retraining is an intensive process that will generate a lot of data. It can be difficult and overwhelming to decide how much and which data should be incorporated into the taxonomy. A gold standard, basically a more detailed application of use cases, will make it much easier to decide what is a priority and what is outside the scope of your use cases.

Conclusion

NLP is a promising field that has many inherent benefits for taxonomy and ontology design. Teams of data scientists, taxonomists, and subject matter experts that utilize NLP processes, alongside a gold standard and prioritized use cases, are well positioned to create data models for advanced capabilities. The result of this process will be a highly detailed and customized solution derived from an organization’s existing content and data.

If your taxonomy or ontology effort seems to frequently misalign with the actual content or domain you are working in, or if you have too much unstructured data and content to meaningfully derive and conceive a taxonomy that will accurately model your information, an NLP assisted taxonomy design process will provide a way forward. Not only will it help your organization gain a complete sense of its information and data, it will also glean valuable insights about the unseen connections in data, as well as prepare your organization for robust enterprise data governance and advanced artificial intelligence capabilities-including solutions such as recommendation engines and automated classification.

Interested in seeing how NLP can assist your taxonomy and ontology design? Contact Enterprise Knowledge to learn more.

The post Natural Language Processing and Taxonomy Design appeared first on Enterprise Knowledge.

Natural Language Search on Big Data

EK Team — Tue, 12 May 2020 13:00:14 +0000

The Challenge

One of the largest global supply chain companies needed to provide its business users and leadership with a way to directly access and glean quick insights from their large and disparate data sources while using natural language search. They also wanted to ensure that their data analysts have the tools and processes available to manage and analyze this data. The data sets were stored in a large RDBMS data warehouse with little to no context attached, making it difficult to gauge their value, understand which information to use, and what questions that data could answer. The organization wanted to bring meaningful information and facts together to overcome these challenges and to make more timely and informed funding and investment decisions.

The Solution

By extracting key entities or metadata fields, such as topic, place, person, customer, plant, etc. from their sample files and data sets, Enterprise Knowledge (EK) developed an ontology to describe the key questions business users were interested in and how they, and their answers, relate to each other. EK then mapped the various data sets to the ontology and a knowledge graph, and leveraged semantic Natural Language Processing (NLP) capabilities to recognize user intent, link concepts, and dynamically generate the data queries that provide the response.

The EK Difference

Our experts worked closely with the organization’s own Data subject matter experts (SMEs) throughout the endeavor. We facilitated knowledge transfers and design sessions in order to refine use cases and to reach a clear definition of key information entities, and their relationships to each other, and to unleash the value of data contexts and meaning to the business. We then leveraged our data science expertise and efficient data Extract, Transform, and Load (ETL) logic to drive a rapid alignment of data elements with the natural language structure of English questions to identify user intent. Simultaneously, EK also leveraged a semantic data layer, allowing for the flexible mapping of disparate data source schemas into a single, unified data model that is easily digestible and accessible to both technical and nontechnical users.

The Results

By allowing the company to collect, integrate, and identify user interest and intent, ontologies and knowledge graphs provided the foundation for Artificial Intelligence (AI), ultimately enabling the joint analysis of different entity paths, as well as the ability to describe their connectivity from various angles, and discover hidden facts and relationships through inferences in related content that would have otherwise gone unnoticed. For this particular supply chain and manufacturing company, by connecting internal data to analyze relationships and further mine disparate data sources, they now have a holistic view of products and services they can leverage to influence operational decisions. The interface through which they interact with the knowledge graph enables non-technical users to uncover the answers to a variety of critical business questions, such as:

Which of your products or services are most profitable and perform better?
What investments are successful, and when are they successful?
How much of a given product did we deliver in a given timeframe?
Who were my most profitable customers last year?
How can we align products and services with the right experts, locations, delivery method, and timing?

NLP-on-Big-Data-Case-Study Download

The post Natural Language Search on Big Data appeared first on Enterprise Knowledge.

What is the Roadmap to Enterprise AI?

Lulit Tesfaye — Wed, 18 Dec 2019 14:00:57 +0000

Artificial Intelligence technologies allow organizations to streamline processes, optimize logistics, drive engagement, and enhance predictability as the organizations themselves become more agile, experimental, and adaptable. To demystify the process of incorporating AI capabilities into your own enterprise, we broke it down into five key steps in the infographic below.

If you are exploring ways your own enterprise can benefit from implementing AI capabilities, we can help! EK has deep experience in designing and implementing solutions that optimizes the way you use your knowledge, data, and information, and can produce actionable and personalized recommendations for you. Please feel free to contact us for more information.

The post What is the Roadmap to Enterprise AI? appeared first on Enterprise Knowledge.