auto-classification Articles - Enterprise Knowledge

Capture as You Work: Embedding Knowledge Capture in Daily Work

Jess DeMay — Fri, 03 Oct 2025 17:53:55 +0000

Knowledge capture is most effective when it is embedded as part of your daily work, not an occasional task. But we know that it is easier said than done.

Enterprise Knowledge regularly hears from our clients that:

“We don’t have time for documentation with everything going on.”

“We’re not sure how to capture knowledge in a way that is useful to others.”

“People don’t know what they can or can’t share.”

These are real barriers, and this blog and accompanying infographic address them directly. It is not about doing more. It is about working smarter by embedding lightweight, effective knowledge-sharing habits into what you are already doing. Over time, these habits create durable knowledge assets that strengthen organizational memory and prepare your content and data for AI-readiness.

Integrate Knowledge Capture Into the Flow of Work

Small changes can make a big impact, especially when they reduce friction and feel like a natural part of the workday. Start by using familiar tools to ensure employees can document and share knowledge within the platforms they already use. This lowers barriers to participation and makes it easier to integrate knowledge sharing into the flow of work.

Standardized templates offer a simple, structured way to capture lessons learned, best practices, and key insights. The templates themselves serve as a guide, prompting employees on what details to capture and where those details belong. This reduces the cognitive load and guesswork that often gets in the way of documenting knowledge.

To reinforce the habit, build knowledge capture tasks into process and project checklists, or use workflow triggers that remind employees when it is time to reflect and share. Until knowledge-sharing practices are fully embedded, timely prompts help ensure action happens at the right moment.

Some moments naturally lend themselves to knowledge capture, such as project closeouts, after client interactions, during onboarding, or following major decisions. These are high-value opportunities where small, structured contributions can have an outsized impact. Our blog on High Value Moments of Content Capture expands on this by showing how to identify the right moments and implement simple practices to capture knowledge effectively when it matters most.

Automate Where You Can

Leverage automated and AI-powered processes to further enhance knowledge capture by minimizing manual effort and making information more accessible with low-effort, intelligent solutions such as:

Automated meeting transcription and indexing capture discussions with minimal effort, converting conversations into structured content that is searchable and readily available for reference.
AI-powered recommendations proactively surface relevant documentation within collaboration tools, reducing the need for employees to search for critical information manually.
Auto-classification of content streamlines knowledge organization by automatically tagging and categorizing information, ensuring documents and insights are consistently structured and easy to retrieve.
AI-driven named entity recognition (NER) automatically extracts and tags key information in real-time, transforming unstructured content into easily searchable and actionable knowledge.

Closing Thoughts

When knowledge capture is built into existing workflows, rather than treated as a separate activity, staff do not have to choose between sharing what they know and doing their job. The goal is not perfection; it is progress through building consistent, low-effort habits.

Whether your organization is just starting to explore knowledge capture or is ready to scale existing practices with automation, EK can help. Our approach is practical and tailored–we will meet you where you are and co-design right-sized solutions that fit your current capacity and goals. Contact us to learn more.

The post Capture as You Work: Embedding Knowledge Capture in Daily Work appeared first on Enterprise Knowledge.

Auto-Classification for the Enterprise: When to Use AI vs. Semantic Models

Kyle Garcia — Tue, 26 Aug 2025 18:19:23 +0000

Auto-classification is a valuable process for adding context to unstructured content. Nominally speaking, some practitioners distinguish between auto-classification (placing content into pre-defined categories from a taxonomy) and auto-tagging (assigning unstructured keywords or metadata, sometimes generated without a taxonomy). In this article, I use ‘auto-classification’ in the broader sense, encompassing both approaches. While it can take many forms, its primary purpose remains the same: to automatically enrich content with metadata that improves findability, helps users immediately determine relevance, and provides crucial information on where content came from and when it was made. And while tagging content is always a recommended practice, it is not always scalable when human time and effort is required to perform it. To solve this problem, we have been helping organizations automate this process and minimize the amount of manual effort required, especially in the age of AI, where organized and well-labeled information is the key to success.

This includes designing and implementing auto-classification solutions that save time and resources – using methods such as natural language processing, machine learning, and rapidly-evolving AI models such as large language models (LLMs). In this article, I will demonstrate how auto-classification processes can deliver measurable value to organizations of diverse sizes or industries, using real-world examples to illustrate the costs and benefits. I will then give an overview of common methods for performing auto-classification, comparing their high-level strengths and weaknesses, and conclude by discussing how incorporating semantics can significantly enhance the performance of these methods.

How Can Auto-Classification Help My Organization?

It’s a good bet that your organization possesses a large repository of unstructured information such as documents, process guides, and informational resources, either meant for internal use or for display on a public webpage. Such a collection of knowledge assets is valuable – but only as valuable as the organization’s ability to effectively access, manage, and utilize them. That’s where auto-classification can shine: by serving as an automated processor of your organization’s unstructured content and applying tags, an auto-classifier adds structure quickly that provides value in multiple ways, as outlined below.

Time Savings

First, an auto-classifier saves content creators time in two key ways. For one, manually reading through documents and applying metadata tags to each individually can be tedious, taking time away from content creators’ other responsibilities – as a solution, auto-classification can free up time that can be used to perform more crucial tasks. On the other end of the process, auto-classification and the use of metadata tags can improve findability, saving employees time when searching for documents. When paired with a taxonomy or set list of terms, an auto-classifier can standardize the search experience by allowing for content to be consistently tagged with a set of standard language.

Content Management and Strategy

These standard tags can also play a role in more content strategy-focused efforts, such as identifying gaps in content and content deduplication. For example, if some taxonomy terms feature no associated content, content strategists and managers may identify an organizational gap that needs to be filled via the authoring of new content. In contrast, too many content pieces identified as having similar themes can be deduplicated so that the most valuable content is prioritized for end users. These analytics-based decisions can help organizations maximize the efficacy of their content, increase content reach, and cut down on the cost of storing duplicate content.

Ensuring Security

Finally, we have seen auto-classification play a key role in keeping sensitive content and information secure. Auto-classifiers can determine what content should be tagged with certain sensitivity classifications (for example, employee addresses being tagged as visible by HR only). One example of this is through dark data detection, where an auto-classifier parses through all organizational content to identify information that should not be visible to all end users. Assigning sensitivity classifications to content through auto-tagging can help to automatically address security concerns and ensure regulatory compliance, saving organizations from the reputational and legal costs associated with data leaks.

Common Auto-Classification Methods

So, how do we go about tagging content automatically? Organizations can choose to employ one of a number of methods as a standalone solution, or combine them as part of a hybrid solution. Below, I will give a high-level overview of six of the most commonly used methods in auto-classification, along with some considerations for each.

1. Rules-Based Tagging: Uses deterministic rules to map content to tags. Rules can be built from dictionaries/keyword lists, proximity or co-occurrence patterns (e.g., “treatment” within 10 words of “disorder”), metadata values (author, department), or structural cues (headings, templates).

Considerations: Highly transparent and auditable; great for regulated/compliance use cases and domain terms with stable phrasing. However, rules can be brittle, require ongoing maintenance, and may miss implied meaning or novel phrasing unless rules are continually expanded.

2. Regular Expression (RegEx) Tagging: A specialized form of rules-based tagging that applies RegEx patterns to detect and tag structured strings (for example, SKUs, case numbers, ICD-10 codes, dates, or email addresses).

Considerations: Excellent precision for well-formed patterns and semi-structured content; lightweight and fast. Can produce false positives without careful validation of results. Best combined with other methods (such as frequency or NLP) for context checks.

3. Frequency-Based Tagging: Frequency-based tagging considers the number of times that a certain term (or variations of said term) appear in a document, and assigns the most frequently appearing tags to the content. Early search engines, website indexers, and tag-mining software relied heavily on this approach for its simplicity and transparency; however, frequency of a term does not always guarantee its importance.

Considerations: Works well with a well-structured taxonomy with ample synonyms for terms, as well as content that has key terms appear frequently. Not as strong a method when meaning is implied/terms are not explicitly used or terms are excessively repeated.

4. Natural Language Processing (NLP): Uses basic calculations of semantic meaning (tokenization) to find the best matches by meaning between two pieces of text (such as a content piece and terms in a taxonomy).

Considerations: Can work well for terms that are not organization/domain-specific, but struggles with acronyms/more specific terms. Better than frequency-based tagging at determining implied meaning.

5. Machine Learning-Based Tagging: Machine learning methods allow for the training of models on pre-tagged content, empowering organizations to improve models iteratively for better results. By comparing new content against patterns they have already learned/been trained on, machine learning models can infer the most relevant concepts and tags to a content piece and apply them consistently. User input can help refine the classifier to identify patterns, trends, and domain-specific terms more accurately.

Considerations: A stock model may initially perform at a lower-than-expected level, while a well-trained model can deliver high-grade accuracy. However, this can come at the expense of time and computing resources.

6. Large Language Model (LLM)-Based Tagging: The newest form of auto-classification, this involves providing a large language model with a tagging prompt, content to tag, and a taxonomy/list of terms if desired. As interest around generative AI and LLMs grows, this method has become increasingly popular for its ability to parse more complex content pieces and analyze meaning deeply.

Considerations: Tags content like a human, meaning results may vary/become inconsistent if the same corpus is tagged multiple times. While LLMs can be smart regarding implied meaning and content sensitivity, they can be inconsistent without specific model tuning and prompt engineering. Additionally, suffers from accuracy/precision issues when fed a large taxonomy.

Some taxonomy and ontology management systems (TOMS), such as Graphwise PoolParty or Progress Semaphore, also offer auto-classification add-ons or extensions to their platforms that make use of one or more of these methods.

The Importance of Semantics in Auto-Classification

Imagine your repository of content as a bookstore, and your auto-classifier as the diligent (but easily confused!) store manager. You have a wide number of books you want to sort into different categories, such as their audience (children, teen, adult) and genre (romance, fantasy, sci–fi, nonfiction).

Now, imagine if you gave your manager no instructions on how to sort the books. They start organizing too specifically. They put four books together on one shelf that says “Nonfiction books about history in 1814.” They put another three books on a shelf that says “Romance books in a fantasy universe with dragons.” They put yet another five books on a shelf that says “Books about knowledge management.”

Before you know it, your bookstore has 1,098 shelves, and no happy customers.

Therein lies the danger of tagging content without a taxonomy, leading to what’s known as semantic drift. While tagging without a taxonomy and creating an initial set of tags can be useful in some circumstances, such as when trying to generate tags or topics to later organize into a hierarchy as part of a taxonomy, it has its limitations. Tags often become very specific and struggle to maintain alignment in a way that makes them useful for search or for grouping larger amounts of content together. And, as I mentioned at the beginning of this article, auto-classification without a taxonomy in place is not auto-classification in the true sense of the word; rather, such approaches are auto-tagging, and may not produce the results business leaders/decision-makers expect.

I’ve seen this in practice when testing auto-classification methods with and without a taxonomy. When an LLM was given the same content corpus of 100 documents to tag, but one generated its own terms and the other was given a taxonomy, the results differed greatly. The LLM without a taxonomy generated 765 extremely domain-specific terms that often only applied to a singular content piece. In contrast, the LLM when given a taxonomy tagged the content with 240 terms, allowing the same tags to apply to multiple content pieces, creating topic clusters and groups of similar content that users can easily browse, search, and navigate, making discovery faster, more intuitive, and less fragmented than when every piece is labeled with unique, one-off terms

Overall, incorporating a taxonomy into LLM-based auto-classification transforms fragmented, messy one-off tags into consistent topic clusters and hierarchies that make content easier to browse, search, and discover.

This illustrates the utility of a taxonomy in auto-classification. When you give your employee a list of shelves to stock in the store, they can avoid the “overthinking” of semantic drift and place books onto more well-architected shelves (e.g., Young Adult, Sci-Fi). A well-defined taxonomy acts as the blueprint for organizing content meaningfully and consistently using an auto-tagger.

When Should I Use AI, Semantic Models, or Both?

While results may vary by use case, methods including both AI and semantic models tend to score higher across the board. These images demonstrate results from one specific content corpus we tested internally.

Methods including both AI and semantic models tend to score higher in accuracy, precision, and recall.

As demonstrated above, tags created by generative AI models without any semantic model in place can become unwieldy and excessive, as LLMs look to create the best tag for that individual content piece rather than a tag that can be used as an umbrella term for multiple pieces of content. However, that does not completely eliminate AI as a standalone solution for all tagging use cases. These auto-tagging models and processes can prove helpful in the early stages of creating a term list as a method of identifying common themes across content in a corpus and forming initial topic clusters that can later bring structure to a taxonomy, either in the form of hierarchies or facets. Once again, while not true auto-classification as the industry dictates, auto-tagging with AI alone can work well for domains where topics don’t neatly fit within a hierarchy or when domain models and knowledge evolve quickly and a hierarchical structure would be infeasible.

On the other hand, semantic models are a great way to add the aforementioned structure to an auto-classification process, and work very well for exact or near-exact term matching. When combined with a frequency tagging, NLP, or machine learning-based auto-classifier in these situations, they tend to excel in terms of precision, applying very few incorrect tags. Additionally, these methods perform well in situations where content contains domain-specific jargon or acronyms located within semantic models, as it tags with a greater emphasis on these exact matches.

Semantic models alone can prove to be a more cost-effective option for auto-classification as well, as lighter, less compute-heavy models that do not require paid cloud hosting can tag some content corpora with a high level of accuracy. Finally, semantic models can assist greatly in cases where security and compliance are paramount, as leading AI models are generally cloud-hosted, and most methods using semantics alone can be run on-premises without introducing privacy concerns.

Nonetheless, semantic models and AI can combine as part of auto-classification solutions that are more robust and well-equipped for complex use cases. LLMs can extract meaning from complex documents where topics may be implied and compare content against a taxonomy or term list, which helps ensure content is easy to organize and consistent with an organization’s model for knowledge. However, one key consideration with this method is taxonomy size – if a taxonomy grows too large (terms in the thousands, for example), an LLM may face difficulties finding/applying the right tag in a limited context window without mitigation strategies such as retrieving tags in batches.

In more advanced use cases, an LLM can also be paired with an ontology, which can help LLMs understand more about interrelationships between organizational topics, concepts, and terms, and apply tags to content more intelligently. For example, a knowledge base of clinical notes and guidelines could be paired with a medical ontology that maps symptoms to potential conditions, and conditions to recommended treatments. An LLM that understands this ontology could tag a physician’s notes with all three layers (symptoms, conditions, and treatments) so when a doctor searches for “persistent cough,” the system retrieves not just symptom references, but also likely diagnoses (e.g., bronchitis, asthma) and corresponding treatment protocols. This kind of ontology-guided tagging makes the knowledge base more searchable and user-friendly and helps surface actionable insights instead of isolated pieces of information.

In some cases, privacy or security concerns may dictate that AI cannot be used alongside a semantic model. In others, an organization may lack a semantic model and may only have the capacity to tag content with AI as a start. However, as a whole, the majority of use cases for auto-classification benefit from a well-architected solution that combines AI’s ability to intelligently parse content with the structure and specific context that semantic models provide.

Conclusion

Auto-classification adds an important step in automation to organizations looking to enrich their content with metadata – whether it be for findability, analytics, or understanding. While there are many methods to choose from when exploring an auto-classification solution, they all rely on semantics in the form of a well-designed taxonomy to function to the best of their ability. Once implemented and governed correctly, these automated solutions can serve as key ways to unblock human efforts and direct them away from tedious tagging processes, allowing your organization’s experts to get back to doing what matters most.

Looking to set up an auto-classification process within your organization? Want to learn more about auto-classification best practices? Contact us!

The post Auto-Classification for the Enterprise: When to Use AI vs. Semantic Models appeared first on Enterprise Knowledge.

Optimizing Historical Knowledge Retrieval: Leveraging an LLM for Content Cleanup

EK Team — Wed, 02 Jul 2025 19:39:00 +0000

The Challenge

Enterprise Knowledge (EK) recently worked with a Federally Funded Research and Development Center (FFRDC) that was having difficulty retrieving relevant content in a large volume of archival scientific papers. Researchers were burdened with excessive search times and the potential for knowledge loss when target documents could not be found at all. To learn more about the client’s use case and EK’s initial strategy, please see the first blog in the Optimizing Historical Knowledge Retrieval series: Standardizing Metadata for Enhanced Research Access.

To make these research papers more discoverable, part of EK’s solution was to add “about-ness” tags to the document metadata through a classification process. Many of the files in this document management system (DMS) were lower quality PDF scans of older documents, such as typewritten papers and pre-digital technical reports that often included handwritten annotations. To begin classifying the content, the team first needed to transform the scanned PDFs into machine-readable text. EK utilized an Optical Character Recognition (OCR) tool, which can “read” non-text file formats for recognizable language and convert it into digital text. When processing the archival documents, even the most advanced OCR tools still introduced a significant amount of noise in the extracted text. This frequently manifested as:

A table, figure, or handwriting in the document being read in as random symbols and white space.
Inserting random punctuation where a spot or pen mark may have been on the file, breaking up words and sentences.
Excessive or misplaced line breaks separating related content.
Other miscellaneous irregularities in the text that make the document less comprehensible.

The first round of text extraction using out-of-the-box OCR capabilities resulted in many of the above issues across the output text files. This starter batch of text extracts was sent to the classification model to be tagged. The results were assessed by examining the classifier’s evidence within the document for tagging (or failing to tag) a concept. Through this inspection, the team found that there was enough clutter or inconsistency within the text extracts that some irrelevant concepts were misapplied and other, applicable concepts were being missed entirely. It was clear from the negative impact on classification performance that document comprehension needed to be enhanced.

Auto-Classification
Auto-Classification (also referred to as auto-tagging) is an advanced process that automatically applies relevant terms or labels (tags) from a defined information model (such as a taxonomy) to your data. Read more about Enterprise Knowledge’s auto-tagging solutions here:

The Solution

To address this challenge, the team explored several potential solutions for cleaning up the text extracts. However, there was concern that direct text manipulation might lead to the loss of critical information if blanket applied to the entire corpus. Rather than modifying the raw text directly, the team decided to leverage a client-side Large Language Model (LLM) to generate additional text based on the extracts. The idea was that the LLM could potentially better interpret the noise from OCR processing as irrelevant and produce a refined summary of the text that could be used to improve classification.

The team tested various summarization strategies via careful prompt engineering to generate different kinds of summaries (such as abstractive vs. extractive) of varying lengths and levels of detail. The team performed a human-in-the-loop grading process to manually assess the effectiveness of these different approaches. To determine the prompt to be used in the application, graders evaluated the quality of summaries generated per trial prompt over a sample set of documents with particularly low-quality source PDFs. Evaluation metrics included the complexity of the prompt, summary generation time, human readability, errors, hallucinations, and of course – precision of auto-classification results.

The EK Difference

Through this iterative process, the team determined that the most effective summaries for this use case were abstractive summaries (summaries that paraphrase content) of around four complete sentences in length. The selected prompt generated summaries with a sufficient level of detail (for both human readers and the classifier) while maintaining brevity. To improve classification, the LLM-generated summaries are meant to supplement the full text extract, not to replace it. The team incorporated the new summaries into the classification pipeline by creating a new metadata field for the source document. The new ‘summary’ metadata field was added to the auto-classification submission along with the full text extracts to provide additional clarity and context. This required adjusting classification model configurations, such as the weights (or priority) for the new and existing fields.

Large Language Models (LLMs)
A Large Language Model is an advanced AI model designed to perform Natural Language Processing (NLP) tasks, including interpreting, translating, predicting, and generating coherent, contextually relevant text. Read more about how Enterprise Knowledge is leveraging LLMs in client solutions here:

The Results

By including the LLM-generated summaries in the classification request, the team was able to provide more context and structure to the existing text. This additional information filled in previous gaps and allowed the classifier to better interpret the content, leading to more precise subject tags compared to using the original OCR text alone. As a bonus, the LLM-generated summaries were also added to the document metadata in the DMS, further improving the discoverability of the archived documents.

By leveraging the power of LLMs, the team was able to clean up noisy OCR output to improve auto-tagging capabilities as well as further enriching document metadata with content descriptions. If your organization is facing similar challenges managing and archiving older or difficult to parse documents, consider how Enterprise Knowledge can assist in optimizing your content findability with advanced AI techniques.

Download Flyer

Ready to Get Started?

Get in Touch

The post Optimizing Historical Knowledge Retrieval: Leveraging an LLM for Content Cleanup appeared first on Enterprise Knowledge.

Enterprise AI Readiness Assessment

Lulit Tesfaye — Thu, 02 Jul 2020 14:46:25 +0000

A wide range of organizations have placed AI on their strategic roadmap, with C-levels commonly listing Knowledge AI amongst their biggest priorities. Yet, many are already encountering challenges as a vast majority of AI initiatives are failing to show results, meet expectations, and provide real business value. For these organizations, the setbacks typically originate from the lack of foundation on which to build AI capabilities. Enterprise AI projects too often end up as isolated endeavors, lacking the necessary foundations to support business practices and operations across the organization. So, how can your organization avoid these pitfalls? There are three key questions to ask when developing an Enterprise AI strategy; do you have clear business applications, do you understand the state of our information, and what in house capabilities do you possess?

With our focus and expertise in knowledge, data, and information management, Enterprise Knowledge (EK) developed this proprietary Enterprise Artificial Intelligence (AI) Readiness Assessment in order to enable organizations to understand where they are and where they need to be in order to begin leveraging today’s technologies and AI capabilities for knowledge and data management.

Based on our experience conducting strategic assessments as well as designing and implementing Enterprise AI solutions, we have identified four key factors as the most common indicators and foundations for many organizations in order to evaluate their current capabilities and understand what it takes to invest in advanced capabilities.

This assessment leverages over thirty measurements across these four Enterprise AI Maturity factors as categorized under the following aspects.

1. Organizational Readiness

The foundational requirement for any organization to undergo an Enterprise AI transformation stems from alignment on vision and the business applications and justifications for launching successful initiatives. The Organizational Readiness Factor includes the assessment of appropriate organizational designs, leadership willingness, and mandates that are necessary for success. This factor evaluates topics including:

The need for vision and strategy for AI and its clear application across the organization.
If AI is a strategic priority with leadership support.
If the scope of AI is clearly defined with measurable success criteria.
If there is a sense of urgency to implement AI.

With a clear picture of what your organizational needs are, your Organizational Readiness assessment factor will allow you to determine if your organization meets the requirements to consider AI related initiatives while surfacing and preparing you for potential risks to better mitigate failure.

2. The State of Organizational Data and Content

The volume and dynamism of data and content (structured and/or unstructured) is growing exponentially, and organizations need to be able to securely manage and integrate that information. Enterprise AI requires quality of, and access to, this information. This assessment factor focuses on the extent to which existing structured and unstructured data is in a machine consumable format and the level to which it supports business operations within the enterprise. This factor consider topics including:

The extent to which the organization’s information ecosystems allow for quick access to data from multiple sources.
The scope of organizational content that is structured and in a machine-readable format.
The state of standardized organization of content/data such as business taxonomy and metadata schemes and if it is accurately applied to content.
The existence of metadata for unstructured content.
Access considerations including compliance or technical barriers.

AI needs to learn the human way of thinking and how an organization operates in order to provide the right solutions. Understanding the full state of your current data and content will enable you to focus on the right content/data with the highest business impact and help you develop a strategy to get your data in an organized and accessible format. Without high quality, well organized and tagged data, AI applications will not deliver high-value results for your organization.

3. Skills Sets and Technical Capabilities

With the increased focus on AI, the demand for individuals who have the technical skills to engineer advanced machine learning and intelligent solutions, as well as business knowledge experts who can transform data to a paradigm that aligns with how users and customers communicate knowledge, have both increased. Further, over the years, cloud computing capabilities, web standards, open source training models, and linked open data for a number of industries have emerged to help organizations craft customized Enterprise AI solutions for their business. This means an organization that is looking to start leveraging AI for their business no longer has to start from scratch. This assessment factor evaluates the organization’s existing capabilities to design, management, operate, and maintain an Enterprise AI Solution. Some of the factors we consider include:

The state of existing enterprise ontology solutions and enterprise knowledge graph capabilities that optimize information aggregation and governance.
The existence of auto-classification and automation tools within the organization.
Whether roles and skill sets for advanced data modeling or knowledge engineering are present within the organization.
The availability and capacity to commit business and technical SMEs for AI efforts.

Understanding the current gaps and weaknesses in existing capabilities and defining your targets are crucial elements to developing a practical AI Roadmap. This factor also plays a foundational role in giving your organization the key considerations to ensure AI efforts kick off on the right track, such as leveraging web standards that enable interoperability, and starting with available existing/open-source semantic models and ecosystems to avoid short-term delays while establishing long-term governance and strategy.

4. Change Threshold

The success of Enterprise AI relies heavily on the adoption of new technologies and ways of doing business. Organizations who fail to succeed with AI often struggle to understand the full scope of the change that AI will bring to their business and organizational norms. This usually manifests itself in the form of fear (either of change in job roles or creating wrong or unethical AI results that expose the organization to higher risks). Most organizations also struggle with the understanding that AI requires a few iterations to get it “right”. As such, this assessment factor focuses on the organization’s appetite, willingness, and threshold to understand and tackle the cultural, technical, and business challenges in order to achieve the full benefits of AI. This factor evaluates topics including:

Business and IT interest and desire for AI.
Existence of resource planning for the individuals whose roles will be impacted.
Education and clear communication to facilitate adoption.

The success of any technical solution is highly dependent on the human and culture factor in an organization and each organization has a threshold for dealing with change. Understanding and planning for this factor will enable your organization to integrate change management that addresses the negative implications, avoids unnecessary resistance or weak AI results, and provides the proper navigation through issues that arise.

How it Works

This Enterprise AI readiness assessment and benchmarking leverages the four factors that have over 30 different points upon which each organization can be evaluated and scored. We apply this proprietary maturity model to help assess your Enterprise AI readiness and clearly define success criteria for your target AI initiatives. Our steps include:

Knowledge Gathering and Current State Assessment: We leverage a hybrid model that includes interviews and focus groups, supported by content/data and technology analysis to understand where you are and where you need to be.This gives us a complete understanding of your current strengths and weaknesses across the four factors, allowing us to provide the right recommendations and guidance to drive success, business value, and long-term adoption.
Strategy Development and Roadmapping: Building on the established focus on the assessment factors, we work with you to develop a strategy and roadmap that outlines the necessary work streams and activities needed to achieve your AI goals. It combines our understanding of your organization with proven best practices and methodologies into an iterative work plan that ensures you can achieve the target state while quickly and consistently showing interim business value.
Business Case Development and Alignment Support: we further compile our assessment of potential project ROI based on increased revenues, cost avoidance, risk and compliance management. We then balance those against the perceived business needs and wants by determining the areas that would have the biggest business impact with lowest costs. We further focus our discussions and explorations on these areas with the greatest need and higher interest.

Keys to Our Assessment

Over the past several years, we have worked with diverse organizations to enable them to strategize, design, pilot, and implement scaled Enterprise AI solutions. What makes our priority assessment unique is that it is developed based on years of real-world experience supporting organizations in their knowledge and data management. As such, our assessment offers the following key differentiators and values for the enterprise:

Recognition of Unique Organizational Factors: This assessment recognizes that no Enterprise AI initiative is exactly the same. It is designed in such a way that it recognizes the unique aspects of every organization, including priorities and challenges to then help develop a tailored strategy to address those unique needs.
Emphasis on Business Outcomes: Successful AI efforts result in tangible business applications and outcomes. Every assessment factor is tied to specific business outcomes with corresponding steps on how the organization can use it to better achieve practical business impact.
A Tangible Communication and Education Tool: Because this assessment provides measurable scores and over 30 tangible criteria for assessment and success factors, it serves as an effective tool to allow your organization to communicate up to leadership and quickly garner leadership buy-in, helping organizations understand the cost and the tangible value for AI efforts.

Results

As a result of this effort, you will have a complete view of your AI readiness, gaps and required ecosystem and an accompanying understanding of the potential business value that could be realized once the target state is achieved. Taken as a whole, the assessment allows an organization to:

Understand strengths and weaknesses, and overall readiness to move forward with Enterprise AI compared to other organizations and the industry as a whole;
Judge where foundational gaps may exist in the organization in order to improve Enterprise AI readiness and likelihood of success; and
Identify and prioritize next steps in order to make immediate progress based on the organization’s current state and defined goals for AI and Machine Learning.

Get Started

Download Trends

Ask a Question

Taking the first step toward gaining this invaluable insight is easy:

1. Take 10-15 minutes to complete your Enterprise AI Maturity Assessment by answering a set of questions pertaining to the four factors; and
2. Submit your completed assessment survey and provide your email address to download a formal PDF report with your customized results.

Take the Assessment Survey

The post Enterprise AI Readiness Assessment appeared first on Enterprise Knowledge.

Natural Language Processing and Taxonomy Design

EK Team — Tue, 30 Jun 2020 16:47:35 +0000

Natural Language Processing (NLP) is a branch of artificial intelligence (AI) that processes and analyzes human language found in text. Some of the exciting capabilities that NLP offers includes parsing out the significant entities in content through a statistical analysis and identifying contextual relationships between different words. Taxonomy also aims to provide a hierarchical context for concepts and the words used to describe them. Taxonomists care about how people use language to categorize and identify concepts, in an effort to make information usable for both people and technology.

At EK, depending on the scope of a project, we incorporate NLP into the taxonomy design process in order to deliver highly detailed, relevant, and AI-ready designs that are backed by statistical processes generated from unstructured text data. One of EK’s key differentiators is our hybrid approach to designing business taxonomies. Our top-down approach leverages subject matter experts and business stakeholders to ensure a taxonomy is accurate and relevant for a domain or organization, while our bottom-up approach analyzes existing content and systems to ensure a taxonomy design is useful for the people and systems that will be using the taxonomy. Essentially, NLP in taxonomy design is a type of bottom-up process in which Named Entity Recognition (NER) collects the lowest level terms found in the content. The taxonomist can then identify broader categories for these terms. This is complemented by top-down analysis when engaging SMEs to help name and fine tune the categories, thus fulfilling EK’s hybrid methodology for taxonomy design.

However, NLP is far from automating the human judgment that is required in taxonomy design-data scientists and taxonomists (as well as subject matter experts) need to work together to determine why and how the data generated by NLP will be incorporated into a taxonomy. Here, I outline the ways in which NLP can enhance taxonomy design, and why taxonomists should consider teaming up with data scientists.

Named Entity Recognition and Taxonomy Development

Named Entity Recognition (NER) is a branch of NLP that identifies entities in text. NER can be implemented using powerful Python libraries like spaCy, a NLP library that can be used to train an initial NER model from annotations of sample content. For specific industries, NER models will have to be trained to find entities in different domains. For example, a general scientific model can be used for a medical domain, though the model will have to be further trained to identify entities such as medications, conditions, and medical procedures.

An NER pipeline can run on a volume of content in order to identify and extract the entities found in the content. Once the NER extracts terms, a data scientist can use semantic word embeddings to cluster the entities in an unsupervised learning process; this means the algorithm makes inferences about a data set without human input or labeling. This results in clusters of terms that have a statistical relationship to each other, derived from the way the terms are used in the language of the content.

Usually, taxonomists can derive a theme from these clusters by reviewing the types of terms that are in each cluster. Can you see a theme in the two clusters below?

Two Clusters

morphine, opioid, opioids, cocaine, benzodiazepine, benzodiazepines, overdose, opiate, antagonist, methamphetamine, analgesic, methadone, stimulant, Benzodiazepines, buprenorphine, flumazenil, heroin, methamphetamines, naloxone, Naloxone, opioid-induced, sedative, narcotic, self-administering, self-administration

rash, erythema, acrocyanosis, pedis, itchy, conjunctivitis, blistering, eczema, impetigo, urticaria, herpetiformis, Tinea crurisi, atopic dermatitis, Erythema toxicum neonatorum, hyperpigmentation, papules, photosensitivity, Tinea corporis, cutaneous, pruritic

Since word embeddings are statistical estimations of a word’s usage in a given language, the clusters generated using word embeddings aren’t always perfect from a human perspective. They do make it easier, however, to identify similar types of words and phrases in a body of content. The clusters and word embeddings won’t be able to tell you exactly what that relationship is between the terms, but in our clusters above, a trained taxonomist can deduce that the first cluster has to do with medications, specifically opioids (and other words that are closely related to opioids, such as overdose and antagonists). The second cluster generally has to do with skin conditions.

Once you have identified the various cluster themes (this particular process resulted in several hundred clusters), you can group those themes into another level of broader categories and continue to go up the ladder of the taxonomy into Level 3, Level 2, or Level 1 concepts. For instance, if we continue with the medication example, we may have another cluster of specific antibiotic drugs, as well as antihypertensives. We now know that we need a broader Medication/Chemicals/Drugs category in order to group these themes (opioids, antibiotics, antihypertensives) together. And voila! We have a taxonomy created with the assistance of Natural Language Processing.

Relevancy and Taxonomy Use Cases

Not all clusters will be relevant to a taxonomy. Sometimes the themes of a cluster will be a certain part of speech, such as adjectives or verbs that seem meaningless on their own; these usually have to be paired with other entities to create a phrase that then becomes meaningful to the domain. These entities most likely exist in other clusters, so it will be helpful to have a tool to look up these phrases in the content to see how they are paired with other entities.

Even though the NER process has found a statistical relationship to form these clusters, this doesn’t mean that we need to incorporate those clusters into our taxonomy. This is when good old human judgement and defined taxonomy use cases will help you decide what is needed from the NER results. A use case is the specific business need that a taxonomy is intended to fulfill. Use cases should always be the signal guiding your way through any taxonomy and ontology effort.

Taxonomy and NLP Iteration

Just like taxonomy design, an NLP process should be iterative. Think of the entire process as a feedback loop with your taxonomy. A data scientist can use the fledgling taxonomy to improve the NER models to be more accurate by annotating content manually with the new labels, which improves the desirability of the clusters returned. A more accurate and repeatedly trained model will be able to look for more precise and narrow concepts. For instance, certain medical conditions and medications would have to be annotated in order to be recognized as part of a conditions or medications model.

Once this has been done, you can train the model on the annotations as many times as needed in order to return an increasingly accurate set of terms relevant to the model. Depending on the results, this may necessitate a restructuring of the taxonomy; perhaps another grouping or subgrouping of medical conditions is discovered, which weren’t previously included in the initial NER analysis, or it becomes clear your taxonomy needs an ontology.

Leverage a Gold Standard

It’s highly suggested that you create a “gold standard” (with the critical input of SMEs and business stakeholders) for the most significant types of semantic relationships that are needed to achieve your goals. In creating a gold standard, SMEs and other stakeholders identify the logic/patterns between concepts that best support your use cases-and then focus only on these specific patterns, at least in the first iteration.

If your use case is a recommendation engine, for example, you need to prioritize the relationships between concepts that help facilitate the appropriate recommendations. In our healthcare example, we may find that the best recommendations are facilitated by ontological relationships-perhaps we need an ontology to describe the relationship between bacterial infections and antibiotics, or the relationship between symptoms and diagnosable conditions.

If your use case is for search and findability, you could utilize user research methods such as card sorting to gain a better understanding of how users will relate the concepts to one another. This may also provide guidance on how to build an initial taxonomy with the term clusters, by allowing users to sort the terms into predefined or user-created categories. From there, an analysis of the general relationship patterns can be used as a gold standard to prioritize how the NLP data will be used.

The purpose of a gold standard is to prioritize and set a strict scope on how NLP will assist taxonomy design. The NLP process of entity extraction, clustering, labeling, annotating, and retraining is an intensive process that will generate a lot of data. It can be difficult and overwhelming to decide how much and which data should be incorporated into the taxonomy. A gold standard, basically a more detailed application of use cases, will make it much easier to decide what is a priority and what is outside the scope of your use cases.

Conclusion

NLP is a promising field that has many inherent benefits for taxonomy and ontology design. Teams of data scientists, taxonomists, and subject matter experts that utilize NLP processes, alongside a gold standard and prioritized use cases, are well positioned to create data models for advanced capabilities. The result of this process will be a highly detailed and customized solution derived from an organization’s existing content and data.

If your taxonomy or ontology effort seems to frequently misalign with the actual content or domain you are working in, or if you have too much unstructured data and content to meaningfully derive and conceive a taxonomy that will accurately model your information, an NLP assisted taxonomy design process will provide a way forward. Not only will it help your organization gain a complete sense of its information and data, it will also glean valuable insights about the unseen connections in data, as well as prepare your organization for robust enterprise data governance and advanced artificial intelligence capabilities-including solutions such as recommendation engines and automated classification.

Interested in seeing how NLP can assist your taxonomy and ontology design? Contact Enterprise Knowledge to learn more.

The post Natural Language Processing and Taxonomy Design appeared first on Enterprise Knowledge.

Ivanov Featured in Image & Data Manager Magazine

EK Team — Thu, 21 Dec 2017 15:04:00 +0000

An article written by EK senior consultant Yanko Ivanov has been featured in the Image & Data Manager (IDM) magazine. Ivanov’s article discusses the key steps to leveraging auto-tagging, auto-classification, and auto-categorization tools more effectively and achieving high automation and accuracy.

“EK is truly on the cutting edge of what’s happening in semantic web technologies today,” said EK CEO Zach Wahl. “I’m thrilled that our consultants are receiving recognition as leaders in this space.”

About Enterprise Knowledge

Enterprise Knowledge (EK) is a services firm that integrates Knowledge Management, Information Management, Information Technology, and Agile Approaches to deliver comprehensive solutions. Our mission is to form true partnerships with our clients, listening and collaborating to create tailored, practical, and results-oriented solutions that enable them to thrive and adapt to changing needs.

About Image & Data Manager Magazine

Image & Data Manager (IDM) is a dedicated magazine and website covering collaboration and information management for Australia and the Asia-Pacific region. It offers expert insight, case studies and essential updates on topics such as:

Imaging and workflow;
Email and instant messaging;
Enterprise content management;
Document & records management;
Network storage and archiving;
Knowledge management; and
Compliance & eDiscovery.

In every issue, there is a major feature on software and systems, analyzing contemporary approaches to how information management is being applied in different industry sectors. IDM readership is across all levels of government, banking, finance, legal, health, architecture, engineering, the media, and manufacturing.

The post Ivanov Featured in Image & Data Manager Magazine appeared first on Enterprise Knowledge.

4 Steps to Content Auto-Classification with High Accuracy

EK Team — Tue, 15 Aug 2017 14:30:15 +0000

As technologies evolve, we have seen the rise of auto-tagging, auto-classification, and auto-categorization tools that attempt to take over the task for describing the content we create. These tools apply metadata tags automatically so we don’t have to.

Yet, in many cases the accuracy of auto-tagging efforts has been underwhelming. Why is that? More often than not, it is the way the technology has been applied, rather than the technology itself. Even if we implement a machine learning algorithm, we still need to teach the machine our language and the way we describe things within our domain. A machine learning algorithm is a toddler who first needs to learn the basics of your language. At EK we educate these “toddlers” by applying the following methodology.

Develop Your Taxonomy/Thesaurus, i.e. vocabulary

To start with, you need to teach your toddler the basics of your domain. This is where your business taxonomy is critical. It helps describe the knowledge in your organization and provides a structure from which the machine learns that solar energy is a type of energy source and that an article containing that term may be talking about energy sources, or clean energy. We help our clients design their taxonomy so that it is intuitive for people and simultaneously understandable for a machine. Utilizing industry standards, we apply alternative labels (e.g. synonyms) for terms, as well as identify how terms are related to each other outside of a simple parent-child hierarchy.

Select Your Teaching Tool, i.e. corpus

Next, we need to expose our toddler to the world, or at least to a contained playground so that it can apply what it already knows (the taxonomy/thesaurus), and learn new things. To do that, we help our clients identify a representative sample of their content that we then feed to the machine learning algorithm. This achieves two goals:

confirm that the taxonomy/thesaurus we developed actually describes the content domain of the organization; and
identify potentially new terms or synonyms in the content that should be included in the taxonomy to ensure comprehensive coverage.

Enhance Your Taxonomy/Thesaurus

Integral to the step above, we now need to define additional terms or concepts so that the toddler can understand what they are and how they fit in its world. In other words, through the automated content analysis and text mining in the previous step, we look through the items that the machine learning algorithm identified, and if applicable we include them in the correct place in the taxonomy, or add them as synonyms or alternative terms for items that already exist. This step helps enhance your taxonomy and increase its expressiveness.

In other words, revising and enhancing your taxonomy enriches your toddler’s vocabulary so it can identify even more things with ever greater accuracy. This step is critical for achieving highly accurate auto-tagging results. Think of it this way: the richer vocabulary you have, the more eloquent you are. Additionally, once the toddler has its base vocabulary, it will need less and less help when running across new terms. It will start identifying them correctly through their relationships with terms in its vocabulary.

Achieve Accurate Auto-tagging

The last step in this process is integrating and fine tuning your auto-tagging process. By the time we get to this step, our toddler has learned quite a bit and we’re really helping him refine its vocabulary. During this step, we apply rules to disambiguate terms that could be easily mixed up like “share” as in stock vs “share” as in a piece of the pie.

Summary

Auto-tagging is a powerful feature that helps organizations better describe their content while achieving better efficiency and time utilization. By teaching your toddler your language you no longer need to take away precious time from your content creators, SMEs, and end users so that your content is properly tagged. This results in happier content creators and increased accuracy in content tagging. And the ultimate end result of this effort is content that is easier to find and reuse.

The post 4 Steps to Content Auto-Classification with High Accuracy appeared first on Enterprise Knowledge.