Taxonomy Management Articles - Enterprise Knowledge

AI & Taxonomy: the Good and the Bad

Ben Kass — Tue, 04 Mar 2025 18:18:49 +0000

The recent popularity of new machine learning (ML) and artificial intelligence (AI) applications has disrupted a great deal of traditional data and knowledge management understanding and tooling. At EK, we have worked with a number of clients who have questions–how can these AI tools help with our taxonomy development and implementation efforts? As a rapidly developing area, there is still more to be discovered in terms of how these applications and agents can be used as tools. However, from our own experience, experiments, and work with AI-literate clients, we have noticed alignment on a number of benefits, as well as a few persistent pitfalls. This article will walk you through where AI and ML can be used effectively for taxonomy work, and where it can lead to limitations and challenges. Ultimately, AI and ML should be used as an additional tool in a taxonomist’s toolbelt, rather than as a replacement for human understanding and decision-making.

Pluses

Taxonomy Component Generation

One area of AI integration that Taxonomy Management System (TMS) vendors quickly aligned on is the usefulness of LLMs (Large Language Models) and ML for assisting in the creation of taxonomy components like Alternative Labels, Child Concepts, and Definitions. Using AI to create a list of potential labels or sub-terms that can quickly be added or discarded is a great productivity aid. Content generation is especially powerful when it comes to generating definitions for a taxonomy. Using AI, you can draft hundreds of definitions for a taxonomy at a time, which can then be reviewed, updated, and approved. This is an immensely useful time-saver for taxonomists–especially those that are working solo within a larger organization. By giving an LLM instructions on how to construct definitions, you can avoid creating bad definitions that restate the term being defined (for example, Customer Satisfaction: def. When the customer is satisfied.), and save time that would be spent by the taxonomist looking up definitions individually. I also like using LLMs to help suggest labels for categories when I am struggling to find a descriptive term that isn’t a phrase or jargon.

Mapping Between Vocabularies

Some taxonomists may already be familiar with this use case; I first encountered this five years ago back in 2020. LLMs, as well as applications that can do semantic embedding and similarity analysis, are great for doing an initial pass at cross-mapping between vocabularies. Especially for application taxonomies that ingest a lot of already-tagged content/data from different sources, this can cut down on the time spent reviewing hundreds of terms across multiple taxonomies for potential mappings. One example of this is Learning Management Systems, or LMSs. Large LMSs typically license learning content from a number of different educational vendors. In order to present users with a unified discovery and search experience, the topic categories, audiences, and experience levels that vendors assign to their learning content will need to be mapped to the LMS’s own taxonomies to ensure consistent tagging for findability.

Document Processing and Summarization

One helpful content use case for LLMs is their ability to summarize existing content and text, rather than creating new text from scratch. Using an LLM to create content summaries and abstracts can be a useful input for automatic tagging of longer, technical documents. This should not be the only input for auto-tagging, since hallucinations may lead to missed tags, but when tagged alongside the document text, we have seen improved tagging performance.

Topic Modeling and Classification

The components that make up BERTopic framework for topic modeling. Within each category, components are interchangeable. Image of BERTopic components reprinted with permission from https://maartengr.github.io/BERTopic/algorithm/algorithm.html

Most taxonomists are familiar with using NLP (Natural Language Processing) tools to perform corpus analysis, or the automated identification of potential taxonomy terms from a set of documents. Often taxonomists use either standalone applications or TMS modules to investigate word frequency, compound phrases, and overall relevancy of terms. These tools serve an important part of taxonomy development and validation processes, and we recommend using a TMS to handle NLP analysis and tagging of documents at scale.

BERTopic is an innovative topic modeling approach that is remarkably flexible in handling various information formats and can identify hierarchical relationships with adjustable levels of detail. BERTopic uses document embedding and clustering to add additional layers of analysis and processing to the traditional NLP approach of term frequency-inverse document frequency, or TF-IDF, and can incorporate LLMs to generate topic labels and summaries for topics. For organizations with a well-developed semantic model, the BERTopic technique can also be used for supervised classification, sentiment analysis, and topic tagging. Topic modeling is a useful tool for providing another dimension with which taxonomists can review documents, and demonstrates how LLMs and ML frameworks can be used for analysis and classification.

Pitfalls

Taxonomy Management

One of the most desired features that we have heard from clients is the use of an Agentic AI to handle long-term updates to and expansion of a taxonomy. Despite the desire for a magic bullet that will allow an organization to scale up their taxonomy use without additional headcount, to date there has been no ML or AI application or framework that can replace human decision making in this sphere. As the following pitfalls will show, decision making for taxonomy management still requires human judgement to determine whether management decisions are appropriate, aligned to organizational understanding and business objectives, support taxonomy scaling, and more.

Human Expertise and Contextual Understanding

Taxonomy management requires discussions with experts in a subject area and the explicit capture of their information categories. Many organizations struggle with knowledge capture, especially for tacit knowledge gained through experience. Taxonomies that are designed with only document inputs will fail to capture this important implicit information and language, which can lead to issues in utilization and adoption.

These taxonomies may struggle to handle instances where common terms are used differently in a business context, along with terms where the definition is ambiguous. For example, “Product” at an organization may refer not only to purchasable goods and services, but also to internal data products & APIs, or even not-for-profit offerings like educational materials and research. And within a single taxonomy term, such as “Data Product”, there may be competing ideas of scope and definition across the organization that need to be brought into alignment for it to be accurately used.

Content Quality and Bias

AI taxonomy tools are dependent on the quality of content used for training them. Content cleanup and management is a difficult task, and unfortunately many businesses lag behind in both capturing up-to-date information, and deprecating or removing out-of-date information. This can lead to taxonomies that are out of date with modern understanding of a field. Additionally, if the documents used have a bias towards a particular audience, stakeholder group, or view of a topic then the taxonomy terms and definitions suggested by the AI reflect that bias, even if that audience, stakeholder group, or view is not aligned with your organization. I’ve seen this problem come up when trying to use press releases and news to generate taxonomies – the results are too generic, vague, and public rather than expert oriented to be of much use.

Governance Processes and Decision Making

Similar to the pitfalls of using AI for taxonomy management, governance and decision making are another area where human judgement is required to ensure that taxonomies are aligned to an organization’s initiatives and strategic direction. Choosing whether undertagged terms should be sunsetted or updated, responding to changes in how words are used, and identifying new domain areas for taxonomy expansion are all tasks that require conversation with content owners and users, alongside careful consideration of consequences. As a result, ultimate taxonomy ownership and responsibility should lie with trained taxonomists or subject matter experts.

AI Scalability

There are two challenges to using AI alongside taxonomies. The first challenge is the shortage of individuals with the specialized expertise required to scale AI initiatives from pilot projects to full implementations. In today’s fast-evolving landscape, organizations often struggle to find taxonomists or semantic engineers who can bridge deep domain knowledge with advanced machine learning skills. Addressing this gap can take two main approaches. Upskilling existing teams is a viable strategy—it is cost-effective and builds long-term internal capability, though it typically requires significant time investment and may slow progress in the short term. Alternatively, partnering with external experts offers immediate access to specialized skills and fresh insights, but it can be expensive and sometimes misaligned with established internal processes. Ultimately, a hybrid approach—leveraging external partners to guide and accelerate the upskilling of internal teams—can balance these tradeoffs, ensuring that organizations build sustainable expertise while benefiting from immediate technical support.

The second challenge is overcoming infrastructure and performance limitations that can impede the scaling of AI solutions. Robust and scalable infrastructure is essential for maintaining data latency, integrity, and managing storage costs as the volume of content and complexity of taxonomies grow. For example, an organization might experience significant delays in real-time content tagging when migrating a legacy database to a cloud-based system, thereby affecting overall efficiency. Similarly, a media company processing vast amounts of news content can encounter bottlenecks in automated tagging, document summarization, and cross-mapping, resulting in slower turnaround times and reduced responsiveness. One mitigation strategy would be to leverage scalable cloud architectures, which offer dynamic resource allocation to automatically adjust computing power based on demand—directly reducing latency and enhancing performance. Additionally, the implementation of continuous performance monitoring to detect system bottlenecks and data integrity issues early would ensure that potential problems are addressed before they impact operations.

Closing

Advances in AI, particularly with large language models, have opened up transformative opportunities in taxonomy development and the utilization of semantic technologies in general. Yet, like any tool, AI is most effective when its strengths are matched with human expertise and a well-thought-out strategy. When combined with the insights of domain experts, ML/AI not only streamlines productivity and uncovers new layers of content understanding but also accelerates the rollout of innovative applications.

Our experience shows that overcoming the challenges of expertise gaps and infrastructure limitations through a blend of internal upskilling and strategic external partnerships can yield lasting benefits. We’re committed to sharing these insights, so if you have any questions or would like to explore how AI can support your taxonomy initiatives, we’re here to help.

The post AI & Taxonomy: the Good and the Bad appeared first on Enterprise Knowledge.

The Importance of a Semantic Layer in a Knowledge Management Technology Suite

EK Team — Thu, 27 May 2021 16:43:36 +0000

One of the most common Knowledge Management (KM) pitfalls at any organization is the inability to find fresh, reliable information at the time of need.

One of, if not the most prominent, causes of this inability to quickly find information that EK has seen more recently is that an organization possesses multiple content repositories that lack a clear intention or purpose. As a result, users are forced to visit each repository within their organization’s technology landscape one at a time in order to search for the information that they need. Further, this problem is often exacerbated by other KM issues, such as a lack of proper search techniques, organization mismanagement of content, and content sprawl and duplication. In addition to a loss in productivity, these issues lead to rework, individuals making decisions on outdated information, employees losing precious working time trying to validate information, and users relying on experts for information they cannot find on their own.

Along with a solid content management and KM related strategy, EK recommends that clients experiencing these types of findability related issues also seek solutions at the technical level. It is critical that organizations take advantage of the opportunity to streamline the way their users access the information they need to do their jobs; this will allow for the reduction of time and effort of users spent searching for information, as well as the assuage of the aforementioned challenges. This blog will explain how organizations can proactively mitigate the challenges of siloed information in different applications by instituting a unique set of technical solutions, including taxonomy management systems, metadata hubs, and enterprise search, to alleviate these problems.

With the abundance and variety of content that organizations typically possess, it is often unrealistic to have one repository that houses all types of content. There are very few, if any, content management systems on the market that can optimally support the storage of every type of content an organization may have, let alone possess the search and metadata capabilities required for proper content management. Organizations can address this dilemma by having a unified, centralized search experience that is able to search all content repositories in a secure and safe manner. This is achieved through the design and implementation of a semantic layer – a combination of unique solutions that work together to provide users one place to go to for searching for content, but behind the scenes allow for the return of results from multiple locations.

In the following sections, I will illustrate the value of Taxonomy Management Systems, Enterprise Search, and Metadata Hubs that make up the semantic layer, which collectively enable a unique and highly beneficial set of solutions.

As seen in the image above, the semantic layer is made up of three main systems/solutions: a Taxonomy Management System (TMS), an Enterprise Search (ES) tool, and a Metadata Hub.

Taxonomy Management Systems

In order to pull consistent data values back from different sources and filter, sort, and facet that data, there must be a taxonomy in place that applies to all content, in all locations. This is achieved by the implementation of an Enterprise TMS, which can be used to create, manage, and apply an enterprise-wide taxonomy to content in every system. This is important because it’s likely there are already multiple, separate taxonomies built into various content repositories that are different from one another and therefore cannot be leveraged in one system. An enterprise wide taxonomy allows for the design of a taxonomy that applies to all content, regardless of its type or location. An additional benefit of having an enterprise TMS is that organizations can utilize the system’s auto-tagging capabilities to assist in the tagging of content in various repositories. Most, if not all major contenders in the TMS industry provide auto-tagging capabilities, and organizations can use these capabilities to significantly reduce the burden on content authors and curators to manually apply metadata to content. Once integrated with content repositories, the TMS can automatically parse content, assign metadata based on a controlled vocabulary (stored in the enterprise taxonomy), and return those tags to a central location.

Metadata Hub

The next piece of this semantic layer puzzle is a metadata hub. We often find that one or more content repositories in an organization’s KM ecosystem lack the necessary metadata capabilities to describe and categorize content. This is extremely important because it facilitates the efficient indexing and retrieval of content. A ‘metadata hub’ can help to alleviate this dilemma by effectively giving those systems their needed metadata capabilities as well as creating a single place to store and manage that metadata. The metadata hub, when integrated with the TMS can apply the taxonomy and tag content from each repository, and store those tags in a single place for a search tool to index.

This metadata hub acts as a ‘manage in place’ solution. The metadata hub points to content in its source location. Tags and metadata that are being generated are only stored in the metadata hub and are not ‘pushed’ down to the source repositories. This “pushing down” of tags can be achieved with additional development, but is generally avoided as not to disrupt the integrity of content within its respective repository. The main goal here is to have one place that contains metadata about all content in all repositories, and that this metadata is based on a shared, enterprise-wide taxonomy.

Enterprise Search

The final component of the semantic layer is Enterprise Search (ES). This is the piece that allows for individuals to perform a single search as opposed to visiting multiple systems and performing multiple searches, which is far from the optimal search experience. The ES solution acts as the enabling tool that makes the singular search experience possible. This search tool is the one that individuals will use to execute queries for content across multiple systems and includes the ability to filter, facet, and sort content to narrow down search results. In order for the search tool to function properly, there must be integrations set up between the source repositories, the metadata hub, and the TMS solution. Once these connectors are established, the search tool will be able to query each source repository with the search criteria provided by the user, and then return metadata and additional information made available by the TMS and metadata hub solutions. The result is a faceted search solution similar to what we are all familiar with at Amazon and other leading e-commerce websites. These three systems work together to not only alleviate the issues created by a lack of metadata functionalities in source repositories, but also to give users a single place to find anything and everything that relates to their search criteria.

Bringing It All Together

The value of a semantic layer can be exemplified through a common use case:

Let’s say you are trying to find out more information about a certain topic within your organization. In order to do this, you would love to perform a search for everything related to this certain topic, but realize that you have to visit multiple systems to do so. One of your content repositories stores digital media, i.e. videos and pictures, another of your content repositories stores scholarly articles, and another one stores information on individuals who are experts on the topic. There could be many more repositories, and you must visit each one separately and search within each system to gather the information you need. This takes considerable time and effort and in a best case scenario makes for a painstakingly long search process. In a worst case scenario, content is missed and the research is incomplete.

With the introduction of the semantic layer, the searchers would only have to visit one location and perform a single search. When doing so, searchers would see the results from each individual repository all in one location. Additionally, searchers would have extensive amounts of metadata on each piece of content to filter to ensure that they find the information they are looking for. Normally when we build these semantic layers the search allows users the option to narrow results by source system, content type (article, person, digital media), date created or modified, and many more. Once the searcher has found their desired content, a convenient link is provided which will take them directly to the content in its respective repository.

Closing

The increasingly common issue of having multiple, disparate content repositories in a KM technology stack is one that causes organizations to lose valuable time and effort, while hindering employees’ ability to efficiently find information through mature, proven metadata and search capabilities. Enterprise Knowledge (EK) specializes in the design and implementation of the exact systems mentioned above and has proven experience building out these types of technologies for clients. If your company is facing issues with the findability of your content, struggling with having to search for content in multiple places, or even finding that searching for information is a cumbersome task, we can help. Contact us with any questions you have about how we can improve the way your organization searches for and finds information within your KM environment.

The post The Importance of a Semantic Layer in a Knowledge Management Technology Suite appeared first on Enterprise Knowledge.

Taxonomy Implementation Best Practices

EK Team — Mon, 08 Feb 2021 15:00:07 +0000

Have you ever found yourself wondering how to implement a taxonomy you’ve just designed or updated? You might have asked yourself, “How do I make this taxonomy work in SharePoint? In Salesforce? Oracle Knowledge Advanced?” You are not alone. Many of our clients struggle with not just how to design the right taxonomy for their content, but how to implement them in a way that allows for the realization of all those benefits we know taxonomies can bring. In my years designing and implementing taxonomies, I’ve come to understand that taxonomies are only as good as their application. In this blog I will talk about some of the important considerations for taxonomy implementation and how preparing for these parameters will help to ensure a smoother implementation and long life for your taxonomy.

Taxonomy Implementation Considerations

One of the most important things to keep in mind with taxonomy development or maintenance and its subsequent implementation is the primary use case(s) that the taxonomy must support. For example, the taxonomy will often serve users in one of three primary use cases: search findability, browsing, and content management. The use case should also inform or be informed by the method of application, or the system(s) within which the taxonomy will be stored, maintained, and utilized. While most taxonomies, especially business taxonomies, are designed to be system agnostic and flexible, namely so they can be used in more than one system or location, it is important to know the limitations and features of systems that will leverage the taxonomy while developing. For example, if your intended system does not support multi-level hierarchies, you may want to consider designing a taxonomy with a deconstructed hierarchy, or multiple flat facet lists instead of a deep hierarchy. Alternatively, if you have not yet selected the system, your defined taxonomy use cases can assist in evaluating the limitations of potential systems (e.g., customizable search filters, synonym dictionaries, hierarchical topic facets).

Depending on the features of your systems and your long term goals, you may also consider a taxonomy management tool as your source system for the taxonomy. Taxonomy management tools assist in mitigating limitations within a content management system, in supporting the use of the taxonomy in more than one system without adding to the maintenance burden, and ensuring you have the foundation for more advanced use cases including ontologies, knowledge graphs, and Enterprise Artificial Intelligence (AI).

Defining Taxonomy Use Cases

First, let’s define the four most common use cases including those mentioned above.

Search Findability: Taxonomy facets support search through both synonym dictionaries and categorical facets. Synonyms allow for the varied language of different use groups, allowing one person to search for “Auto” and another to search “Vehicle”. Facets allow users to narrow their search through defined and optimized categories that represent the different types of information about the content. For example, an insurance company might need a facet for “Product” to allow users to filter by “Auto Insurance” vs “Home Insurance”.
Browsing: Similar to facets, the categories within a taxonomy can be selected and optimized to provide navigation or browsing as an option to find or explore content. This might be seen in the top header of a website, and allow people to select a Product and be taken to a landing page of some kind where all content about that product can be found.
Content Management and Tagging: Taxonomies also often support the content management lifecycle through fields such as Content Type (Article, Procedure) or Status (Draft, Published, Deprecated) in addition to tagging valuable information to help manage each item.
Recommendation Engines and/or Chatbots: Taxonomies provide the foundation for advanced use cases such as recommendation engines and chatbots. In these cases, a taxonomy may be larger, deeper, and more complex to assist in disambiguation and machine learning techniques, rather than assisting a user in navigating a website.

Defining Taxonomy Implementation Methods

Taxonomies can either be implemented directly in the content or document management system(s) of your choice, or can be implemented within a taxonomy management tool that connects to your Content Management System (CMS) via APIs. There are pros and cons to both options, but the main criteria for choosing a Taxonomy Management System include complexity of the taxonomy, use of the taxonomy in multiple, separate systems, and limitations for the taxonomy in one of the intended systems.

Common Implementation Challenges

Often, what we see as implementation challenges fit into one of three categories:

System limitations: This is a consideration when one or more of the intended systems (often content or document management systems) is less than advanced in taxonomy management capabilities. This often can include a lack of features able to store or display hierarchies, inability to store synonyms for terms, inability to display multi-select lists, and inability or difficulty indicating required fields. In any of these cases, it is important to understand what tweaks should be made to the taxonomy (e.g. removing hierarchy, adding synonyms in a keywords field) to fit within the constraints, what impact those changes might have on usability or other systems, or identify the level of effort for customizing the system to address its limitations.
Taxonomy limitations/updates: A second common challenge is not actually a challenge at all. It is part of the process of taxonomy design and maintenance. Implementation often illuminates needed changes or additions to a taxonomy that may not have been identified or prioritized during the initial design. This may include a missing metadata field that needs to be designed, a lack of sufficient synonyms to support search, or the need for content types to assist in flexible, custom implementation options for different types of content or different user groups.

Tagging content: An important component of taxonomy implementation is the tagging of content with the new taxonomy. Manual and assisted tagging approaches provide a range of options for tagging content with the new taxonomy. Assisted tagging can include using a text extraction tool to suggest tags, or including logic in a migration script to map and apply tags. Often, a mix of both approaches is needed to accurately tag the full taxonomy. For example, text extraction tools can auto-tag topical taxonomies that are well aligned to the content’s text, while migration scripts and mapping may be better suited for tagging fields that are similar in the current state. Finally, manual tagging may be needed for new, administrative fields that are not accurately covered by the first two approaches.

Tips to Mitigate Implementation Challenges

Remember that a taxonomy is a living, changing thing and taxonomy governance is of utmost importance. Don’t be afraid to make adjustments, but first ensure you understand the requirements, the options, and the impact on the taxonomy for other systems as well. Focus on your primary use case, e.g. findability, and its benefits for your users, to help navigate implementation challenges including system limitations or complex migrations. And finally, document, document your changes, the reasons for the changes, and any system specifics that you’ve encountered and adjusted for. This will be important for the longevity of the taxonomy and your implementation, reducing the need for rework.

Are you currently designing or working to implement a taxonomy for your organization? We would be happy to help guide you through this process and work alongside you to ensure the taxonomy implementation is optimized for your organization, use cases, and systems. Contact us at to learn more.

The post Taxonomy Implementation Best Practices appeared first on Enterprise Knowledge.