unstructured content Articles - Enterprise Knowledge

Enterprise AI Architecture Series: How to Extract Knowledge from Unstructured Content (Part 2)

Urmi Majumder — Fri, 14 Feb 2025 18:06:15 +0000

Our CEO, Zach Wahl, recently noted in his annual KM trends blog for 2025 that Knowledge Management (KM) and Artificial Intelligence (AI) are really two sides of the same coin, detailing this idea further in his seminal blog introducing the term Knowledge Intelligence (KI). In particular, KM can play a big role in structuring unstructured content and make it more suitable for use by enterprise AI. Injecting knowledge into unstructured data using taxonomies, ontologies, and knowledge graphs will be the focus of this blog, which is Part 2 in the Knowledge Intelligence Architecture Series. I will also describe our typical approaches and experience with mining knowledge out of unstructured content to develop taxonomies and knowledge graphs. As a refresher, you can review Part 1 of this series where I introduced the high-level technical components needed for implementing any KI architecture.

Role of NLP in Structuring Unstructured Content

Natural language processing (NLP) is a machine learning technique that gives computers the ability to interpret and understand human language. According to most industry estimates, 80-90% of an organization’s data is considered to be unstructured, most of which originates from emails, chat messages, documents, presentations, videos, and social media posts. Extracting meaningful insights from such unstructured content can be difficult due to its lack of predefined structure. This is where NLP techniques can be immensely useful. NLP works through the differences in dialects, metaphors, variations in sentence structure, grammatical irregularities, and usage exceptions that are common in such data and structures it effectively. A common NLP task for analyzing unstructured content and making it machine readable is content classification. This process categorizes text into predefined classes by identifying keywords that indicate the topic of the text.

Over the past decade, we have employed numerous NLP techniques across our typical knowledge and data management engagements, focusing on unstructured content classification. With the emergence of Large Language Models (LLMs), traditional NLP tasks can now be executed with higher precision and recall while requiring significantly less development effort. The section below presents a comprehensive, though not exhaustive, range of NLP strategies incorporating both traditional ML and cutting-edge LLMs along with inherent pattern recognition capabilities in vendor platforms for content understanding and classification. Specifically, it describes for each approach the underlying architecture, illustrating the steps involved in adding context to unstructured content using semantic data assets and some relevant case studies.

1. Transfer Learning for Content Classification

Transfer learning is a method in which a deep learning model trained on a large dataset is applied to a similar task using a different dataset. Starting with a pre-trained model that has already learned linguistic patterns and structures from a significant volume of data eliminates the need for extensive labeled datasets and reduces training time. Since the release of the BERT (Bidirectional Encoder Representations from Transformers) language model in 2018, we have extensively utilized transfer learning to analyze and categorize unstructured content using a predefined classification scheme for our clients. In fact, it is often our preferred approach for entity extraction when instantiating a knowledge graph as it supports a scalable and maintainable solution for the enterprise.

Enterprise AI Architecture: Transfer Learning for Content Classification

As illustrated in the figure above, unstructured data in the enterprise can originate from many different systems beyond the conventional ones such as content management systems and websites. Such sources can include communication channels such as emails, instant messaging systems, and social media platforms, as well as digital asset management platforms to centrally store, organize, manage, and distribute media files such as images, video, and audio files within the organization. As machine learning can only work with textual data, depending on the type of content, the first step in implementing transfer learning is to employ appropriate text extraction and transformation algorithms to make the data suitable for use. Next, domain SMEs label a small chunk of the clean data to fine-tune the selected pretrained AI model with a predefined classification scheme (also provided by the domain SMEs). Post-training, the fine-tuned model is deployed to production and is available for content classification. At this stage, organizations can run their content through the operationalized transfer learning based content classification pipeline and store it in a centralized metadata repository such as a data catalog or even a simple object store that, in turn, can be used to power multiple enterprise use cases from data discovery to data analytics.

Transfer learning is one of the popular techniques we employ for entity extraction from unstructured content in our typical knowledge graph accelerator engagements. It is one of our criteria when evaluating data fabric solution vendors – especially in the case of a multinational pharmaceutical company. This is because transfer learning can easily grow with inputs from domain SMEs (with respect to data labelling and classification scheme definition) to tailor the machine prediction to the organizational needs and sustain the classification effort without extensive machine learning training. However, this does not mean that machine learning (ML) expertise is not required. For organizations that lack the internal skills to build and maintain custom ML pipelines, the following content classification approaches may be useful.

2. Taxonomy Manager-Driven Content Classification

Most modern Taxonomy Ontology Management Systems (TOMS) include a classification engine that supports automatic text classification based on a defined taxonomy. In our experience, organizations with access to a TOMS but without dedicated AI teams to develop and maintain custom ML models prefer using built-in classification capabilities of TOMS to categorize and structure their unstructured content.

Enterprise AI Architecture: Taxonomy Manager Driven Content Classification

While there are variations across TOMS vendors in how they classify unstructured content using a taxonomy (such as leveraging just textual metadata or using structural relationships between taxonomy concepts to categorize content), as shown in the figure above, the high-level architecture integrating a TOMS with enterprise systems managing unstructured content and leveraging TOMS-generated metadata is generally independent on specific TOMS platforms. In this architecture, when an information architect deems a taxonomy is ready for use, they publish the corresponding classification rules to the TOMS-specific classification engine. Typically, organizations configure custom change listeners for taxonomy publication. This helps them decide when to tag their unstructured content with the published rules and store these tags in a central metadata repository to power many use cases in the enterprise. Sometimes, however, TOMS platforms offer native connectors for specific CMS such as SharePoint or WordPress to manage automatic tagging of its delta content upon the publication of a new taxonomy version.

We work with many leading TOMS vendors in our typical taxonomy accelerator engagements, and you can learn more about specific use cases and success stories in our knowledge base regarding the application of this approach when it comes to powering content discovery – from a knowledge portal in a global investment firm to creating more personalized customer experiences using effective content assembly at a financial solutions provider organization.

3. LLM-Powered Content Classification

With the rise of LLMs in recent years, we have been working with various prompting techniques to effectively classify text using LLMs in our engagements. Based on our experimentation, we have found that a few-shot prompting approach, in which the language model is provided with a small set of labelled examples along with a prompt to guide the classification of unstructured content, achieves high accuracy in text classification tasks even with limited labeled data. This does not, however, deemphasize the need for designing effective prompts to improve accuracy of the in-context learning approach that is central to any prompt engineering technique.

Enterprise AI Architecture: LLM-Powered Content Classification

As illustrated in the figure above, a prompt in a few-shot learning approach to content classification includes the classification scheme and labelled examples from domain SMEs besides the raw text we need the LLM to classify. But because of the limitations of the context window for most state-of-the-art (SOTA) LLMs, the input text often needs to be chunked post-preprocessing and cleaning to abide by the length limitations of the prompt (also shown in the figure above). What is not included in the image, however, are the LLM optimization techniques we often employ to improve the classification task performance at scale. It is widely accepted that any natural language processing (NLP) task that requires interaction with a LLM, which is often hosted on a remote server, will not be performant by default. Therefore, in our typical engagements, we employ optimization techniques such as caching prior responses, batching multiple requests into one prompt, and classifying multiple chunks in parallel beyond basic prompt engineering to implement a scalable content classification solution for the enterprise.

Last year we used the LLM-powered content classification approach when we completed a knowledge graph accelerator project with a public safety agency in Europe, where we could not use a TOMS-driven content classification approach to instantiate a knowledge graph. This is because of the risks associated with sensitive data transfer out of Azure’s Northern European region where the solution was hosted and into the infrastructure of the hosted TOMS platform (which was outside the allowed region). In this case, a LLM-powered content classification such as a few-shot prompting approach allowed us to develop the solution by extracting entities from their unstructured content and instantiating a knowledge graph that facilitated context-based, data-driven decision making for construction site planners at the agency.

More recently, we used the LLM-powered content classification approach when we engaged with a non-profit charitable organization to analyze their healthcare product survey data to understand its adoption in a given market and demographic and ultimately inform future product development. We developed a comprehensive list of product adoption factors that are not easily identified and included in product research. We then leveraged this controlled vocabulary of product adoption factors and Azure OpenAI models to classify the free form survey responses and understand the distinct ways in which these factors influence each other, thus contributing to a more nuanced understanding of how users make decisions related to the product. This enhanced pattern detection approach enabled a holistic view of influencing factors, addressing knowledge gaps in future product development efforts at the organization.

4. AI-Augmented Topic Taxonomy Generation

Up until this point in the article, we have focused on using taxonomies to structure unstructured content. We will now shift to using machine learning to analyze unstructured content and propose taxonomies and create knowledge graphs using AI. We will discuss how in recent years, LLMs have simplified entity and relationship extraction, enabling more organizations to incorporate knowledge graphs into their data management.

While we generally do not advise our clients to use LLMs without a human-in-the-loop process to create production grade domain taxonomies, we have used LLMs in past engagements to augment and support our taxonomic experts in naming latent topics in semantically grouped unstructured content and therefore create a very rough draft version of a topic taxonomy.

Elaborating on the figure below, our approach centers on three key tasks:

Unsupervised clustering of the dataset,
Discovering latent themes within each cluster,
Creating a topic taxonomy based on these themes, and
Engaging taxonomists and domain experts to validate and enhance taxonomy.

Because of the token limits inherent in all SOTA embedding models, once raw text is extracted from unstructured content, preprocessed, and cleaned, it has to be chunked before numerical representations that encapsulate semantic information called embeddings can be created by the embedding generation service and stored in the vector database. The embedding generation service may optionally include quantization techniques to address the high memory requirements for managing embeddings of a large dataset. Post-embedding generation, the taxonomy generation pipeline focuses on semantic similarity calculation. While semantic similarity between the underlying content or corpora can be trivially computed as the inner product of embeddings, for scalability reasons, we typically project the embeddings from their original high-dimensional space to lower dimensions, while also preserving their local and global data structures. At this point, the content clustering service will be able to use the embeddings as input features of a clustering algorithm, enabling the identification of related categories based on embedding distances. The next step in the process of autogenerating taxonomy concepts is to infer the latent topic of each cluster using an LLM as part of the latent topic identification service. Finally, a draft taxonomy is available for validation and update by domain experts before it can be used to power enterprise use cases from data discovery to analytics.

Enterprise AI Architecture: AI-Augmented Topic Taxonomy Generation

We have enabled consumer-grade semantic capabilities using this very approach for taxonomy generation specifically for non-financial risk management in production at a multi-national bank by collapsing their original risk dataset from 20,000 free-text risk descriptions into a streamlined process with 1100 standardized taxonomy concepts for risk.

5. AI-Augmented Knowledge Graph Construction

AI-assistance for extracting entities and relationships from unstructured content can utilize methods ranging from transfer learning to LLM-prompting. In our experience, incorporating the schema as part of the latter technique greatly enhances the consistency of entity and relationship labeling. Before loading the extracted entities and relationships into a knowledge graph, LLMs, as well as heuristics as defined by domain SMEs, can be used to further disambiguate those entities.

Enterprise AI Architecture: AI-Augmented Knowledge Graph Construction

Our typical approach for leveraging AI to construct a knowledge graph is depicted in the figure above. It starts with unstructured content processing techniques to generate raw text from which entities can be extracted. Coreference resolution, where all mentions of the same real-world entity are replaced by the noun phrase, often forms the first step of the entity extraction process. In the next step, whether we can employ some of the techniques described in the taxonomy driven content classification section for entity extraction or not depends on the underlying ontology (knowledge model or data schema) and how many of the classes in this data model can be instantiated with a corresponding taxonomy. Even for non-taxonomy classes, we can use transfer learning and prompt engineering to accelerate the extraction of instances of ontological classes from the raw text. Next, we can optionally process the extracted entities through an entity resolution pipeline to identify and connect instances of the same real-world entity within and across content sources into a distilled representation. In the last step of the entity extraction process, if applicable, we can further disambiguate extracted entities by linking them to corresponding entries in a public or private knowledge base (such as Wikidata). Once entities are available, it is time to relate these entities following the ontology to complete the knowledge graph instantiation process. Similar to entity extraction, an array of machine learning techniques ranging from both traditional supervised and unsupervised learning techniques to more modern transfer learning and prompt engineering techniques can be used for relationship classification. For example, when developing a knowledge graph powered recommendation engine connecting learning content and product data, we compared the efficacy of an unsupervised learning approach (e.g., similarity index) to predicting relationships between entities with that of a supervised learning approach (e.g., link classifier) for doing the same.

Closing

While structuring unstructured content with semantic assets has been the focus of this blog, it is clear that it can only be effective by incorporating an organization’s most valuable knowledge asset: its human expertise and all types of data. While I will delve deeper into the technical details of how to encode this expert knowledge into enterprise AI systems in a later segment of this KI architecture blog series, it is evident from the discussion above that mining knowledge from an organization’s vast amount of unstructured content will not be possible without domain expertise. As our case studies illustrate, these complementary techniques for knowledge extraction, topic modeling, and text classification, when combined with domain expertise can help organizations achieve true KI. In the next segment of this blog series, I will explore the technical approaches for providing standardized meaning and context to structured data in the enterprise using a semantic layer. In the meantime, if our case studies describing how we brought structure to our clients’ unstructured content through metadata resonate with you, contact us to help get you started with KI.

The post Enterprise AI Architecture Series: How to Extract Knowledge from Unstructured Content (Part 2) appeared first on Enterprise Knowledge.

A Global Knowledge and Information Management Solution

EK Team — Tue, 14 Jul 2020 13:26:00 +0000

The Challenge

At a global biopharmaceutical company, the global analytics and marketing departments generated a great amount of data and content and experienced a high reuse rate of one another’s content. As a result, information was consistently “lost” or underutilized because it was generated quickly and in large quantities. There were then challenges with consistent rework and time lost from regenerating or trying to locate otherwise pre-existing institutional knowledge and data. Consequently, leadership recognized that because all data and information were not being maximized by the organization, they ran the risk of potential profit and research development loss. With the goal of streamlining cross-departmental content collaboration and data management as well as enhancing findability, the organization needed to put foundational infrastructure in place to adequately prepare for their global Artificial Intelligence (AI) initiatives.

The Solution

Alongside Enterprise Knowledge (EK), the organization embarked on a phased approach to develop a scalable knowledge, data, and information management strategy. EK began by designing a global content and data strategy in parallel with an enterprise search redesign effort that featured an information architecture overhaul. A taxonomy and corresponding content types were designed to support auto-tagging and the automated organization of unstructured content, while also allowing for the transformation of the organization’s content into a machine-readable format.

“People” action-oriented search result page redesign for global staff.

The second half of the approach included identifying scaled integration points across the organization’s content, allowing for advanced inter-content relationships to be utilized by recommendation engines in the future. Ontologies and knowledge graphs were introduced as a means of automating the application of these relationships while also optimizing the use and reuse of the organization’s data and information. To further support the management and scalability of the strategy and design efforts over time, an organizational model and governance plan were developed to support change management, implementation, and adoption.

The EK Difference

Because this large, global organization was seeking to successfully complete an initiative that traversed multiple departments, the effort required alignment and support from department leads, staff, and executives. EK leveraged our proven facilitation and prioritization approaches tailored specifically to information and data management strategy and led strategic discussions with the company’s executives, global program leadership, and staff to align on the “as-is” and “to-be” states of the effort. We developed relevant business impact and ROI measures by identifying prioritized success and performance factors that were evaluated and adjusted consistently throughout the effort.

EK further leveraged our expertise in ontology and enterprise knowledge graphs to design an information architecture that defined the relationships across disparate content and built the foundation for advanced capabilities, such as automated tagging, content governance, natural language search, data analytics, and future AI and Machine Learning (ML) capabilities.

The Results

The knowledge and information management program allowed the organization to better understand and capitalize on their market insights and, as a result, discover and utilize otherwise inaccessible data. Connections between knowledge assets are now defined and the information architecture and content strategy benefit from a taxonomy and metadata design that account for both structured and unstructured data.

EK also revamped the company’s internal search experience by redesigning indexing processes and leading Design Thinking sessions to inform both UI and UX search design decisions, ultimately integrating action-oriented results across the intranet. Consequently, users found that returned results were more relevant to their queries and a user-friendly interface personalized for the organization’s staff facilitated system access and ease-of-use.

The KM organizational structure will ensure that stakeholders are enabled to make informed investment decisions about their data and content management systems and will better understand the relationships required to bring them all together. As AI capabilities become more advanced and accessible on a global scale, the organization will not only be operating ahead of the curve, but will be able to adapt and apply these capabilities on a regular basis.

A-Global-Knowledge-and-Information-Management-Solution Download

The post A Global Knowledge and Information Management Solution appeared first on Enterprise Knowledge.

Enterprise AI Readiness Assessment

Lulit Tesfaye — Thu, 02 Jul 2020 14:46:25 +0000

A wide range of organizations have placed AI on their strategic roadmap, with C-levels commonly listing Knowledge AI amongst their biggest priorities. Yet, many are already encountering challenges as a vast majority of AI initiatives are failing to show results, meet expectations, and provide real business value. For these organizations, the setbacks typically originate from the lack of foundation on which to build AI capabilities. Enterprise AI projects too often end up as isolated endeavors, lacking the necessary foundations to support business practices and operations across the organization. So, how can your organization avoid these pitfalls? There are three key questions to ask when developing an Enterprise AI strategy; do you have clear business applications, do you understand the state of our information, and what in house capabilities do you possess?

With our focus and expertise in knowledge, data, and information management, Enterprise Knowledge (EK) developed this proprietary Enterprise Artificial Intelligence (AI) Readiness Assessment in order to enable organizations to understand where they are and where they need to be in order to begin leveraging today’s technologies and AI capabilities for knowledge and data management.

Based on our experience conducting strategic assessments as well as designing and implementing Enterprise AI solutions, we have identified four key factors as the most common indicators and foundations for many organizations in order to evaluate their current capabilities and understand what it takes to invest in advanced capabilities.

This assessment leverages over thirty measurements across these four Enterprise AI Maturity factors as categorized under the following aspects.

1. Organizational Readiness

The foundational requirement for any organization to undergo an Enterprise AI transformation stems from alignment on vision and the business applications and justifications for launching successful initiatives. The Organizational Readiness Factor includes the assessment of appropriate organizational designs, leadership willingness, and mandates that are necessary for success. This factor evaluates topics including:

The need for vision and strategy for AI and its clear application across the organization.
If AI is a strategic priority with leadership support.
If the scope of AI is clearly defined with measurable success criteria.
If there is a sense of urgency to implement AI.

With a clear picture of what your organizational needs are, your Organizational Readiness assessment factor will allow you to determine if your organization meets the requirements to consider AI related initiatives while surfacing and preparing you for potential risks to better mitigate failure.

2. The State of Organizational Data and Content

The volume and dynamism of data and content (structured and/or unstructured) is growing exponentially, and organizations need to be able to securely manage and integrate that information. Enterprise AI requires quality of, and access to, this information. This assessment factor focuses on the extent to which existing structured and unstructured data is in a machine consumable format and the level to which it supports business operations within the enterprise. This factor consider topics including:

The extent to which the organization’s information ecosystems allow for quick access to data from multiple sources.
The scope of organizational content that is structured and in a machine-readable format.
The state of standardized organization of content/data such as business taxonomy and metadata schemes and if it is accurately applied to content.
The existence of metadata for unstructured content.
Access considerations including compliance or technical barriers.

AI needs to learn the human way of thinking and how an organization operates in order to provide the right solutions. Understanding the full state of your current data and content will enable you to focus on the right content/data with the highest business impact and help you develop a strategy to get your data in an organized and accessible format. Without high quality, well organized and tagged data, AI applications will not deliver high-value results for your organization.

3. Skills Sets and Technical Capabilities

With the increased focus on AI, the demand for individuals who have the technical skills to engineer advanced machine learning and intelligent solutions, as well as business knowledge experts who can transform data to a paradigm that aligns with how users and customers communicate knowledge, have both increased. Further, over the years, cloud computing capabilities, web standards, open source training models, and linked open data for a number of industries have emerged to help organizations craft customized Enterprise AI solutions for their business. This means an organization that is looking to start leveraging AI for their business no longer has to start from scratch. This assessment factor evaluates the organization’s existing capabilities to design, management, operate, and maintain an Enterprise AI Solution. Some of the factors we consider include:

The state of existing enterprise ontology solutions and enterprise knowledge graph capabilities that optimize information aggregation and governance.
The existence of auto-classification and automation tools within the organization.
Whether roles and skill sets for advanced data modeling or knowledge engineering are present within the organization.
The availability and capacity to commit business and technical SMEs for AI efforts.

Understanding the current gaps and weaknesses in existing capabilities and defining your targets are crucial elements to developing a practical AI Roadmap. This factor also plays a foundational role in giving your organization the key considerations to ensure AI efforts kick off on the right track, such as leveraging web standards that enable interoperability, and starting with available existing/open-source semantic models and ecosystems to avoid short-term delays while establishing long-term governance and strategy.

4. Change Threshold

The success of Enterprise AI relies heavily on the adoption of new technologies and ways of doing business. Organizations who fail to succeed with AI often struggle to understand the full scope of the change that AI will bring to their business and organizational norms. This usually manifests itself in the form of fear (either of change in job roles or creating wrong or unethical AI results that expose the organization to higher risks). Most organizations also struggle with the understanding that AI requires a few iterations to get it “right”. As such, this assessment factor focuses on the organization’s appetite, willingness, and threshold to understand and tackle the cultural, technical, and business challenges in order to achieve the full benefits of AI. This factor evaluates topics including:

Business and IT interest and desire for AI.
Existence of resource planning for the individuals whose roles will be impacted.
Education and clear communication to facilitate adoption.

The success of any technical solution is highly dependent on the human and culture factor in an organization and each organization has a threshold for dealing with change. Understanding and planning for this factor will enable your organization to integrate change management that addresses the negative implications, avoids unnecessary resistance or weak AI results, and provides the proper navigation through issues that arise.

How it Works

This Enterprise AI readiness assessment and benchmarking leverages the four factors that have over 30 different points upon which each organization can be evaluated and scored. We apply this proprietary maturity model to help assess your Enterprise AI readiness and clearly define success criteria for your target AI initiatives. Our steps include:

Knowledge Gathering and Current State Assessment: We leverage a hybrid model that includes interviews and focus groups, supported by content/data and technology analysis to understand where you are and where you need to be.This gives us a complete understanding of your current strengths and weaknesses across the four factors, allowing us to provide the right recommendations and guidance to drive success, business value, and long-term adoption.
Strategy Development and Roadmapping: Building on the established focus on the assessment factors, we work with you to develop a strategy and roadmap that outlines the necessary work streams and activities needed to achieve your AI goals. It combines our understanding of your organization with proven best practices and methodologies into an iterative work plan that ensures you can achieve the target state while quickly and consistently showing interim business value.
Business Case Development and Alignment Support: we further compile our assessment of potential project ROI based on increased revenues, cost avoidance, risk and compliance management. We then balance those against the perceived business needs and wants by determining the areas that would have the biggest business impact with lowest costs. We further focus our discussions and explorations on these areas with the greatest need and higher interest.

Keys to Our Assessment

Over the past several years, we have worked with diverse organizations to enable them to strategize, design, pilot, and implement scaled Enterprise AI solutions. What makes our priority assessment unique is that it is developed based on years of real-world experience supporting organizations in their knowledge and data management. As such, our assessment offers the following key differentiators and values for the enterprise:

Recognition of Unique Organizational Factors: This assessment recognizes that no Enterprise AI initiative is exactly the same. It is designed in such a way that it recognizes the unique aspects of every organization, including priorities and challenges to then help develop a tailored strategy to address those unique needs.
Emphasis on Business Outcomes: Successful AI efforts result in tangible business applications and outcomes. Every assessment factor is tied to specific business outcomes with corresponding steps on how the organization can use it to better achieve practical business impact.
A Tangible Communication and Education Tool: Because this assessment provides measurable scores and over 30 tangible criteria for assessment and success factors, it serves as an effective tool to allow your organization to communicate up to leadership and quickly garner leadership buy-in, helping organizations understand the cost and the tangible value for AI efforts.

Results

As a result of this effort, you will have a complete view of your AI readiness, gaps and required ecosystem and an accompanying understanding of the potential business value that could be realized once the target state is achieved. Taken as a whole, the assessment allows an organization to:

Understand strengths and weaknesses, and overall readiness to move forward with Enterprise AI compared to other organizations and the industry as a whole;
Judge where foundational gaps may exist in the organization in order to improve Enterprise AI readiness and likelihood of success; and
Identify and prioritize next steps in order to make immediate progress based on the organization’s current state and defined goals for AI and Machine Learning.

Get Started

Download Trends

Ask a Question

Taking the first step toward gaining this invaluable insight is easy:

1. Take 10-15 minutes to complete your Enterprise AI Maturity Assessment by answering a set of questions pertaining to the four factors; and
2. Submit your completed assessment survey and provide your email address to download a formal PDF report with your customized results.

Take the Assessment Survey

The post Enterprise AI Readiness Assessment appeared first on Enterprise Knowledge.

Metadata Use Case: IMDB in Amazon Prime Video

Fernando Aguilar Islas — Tue, 19 May 2020 16:00:05 +0000

Have you been catching up on your favorite TV shows lately? If so, while watching a series or movie from home, it is very likely you might have asked yourself the following questions:

“The narrator’s voice sounds familiar, who is it?”
“What is that actor’s name? I think I might have seen him in another movie.”
“Isn’t she the actress from this other show I watched some years ago?”

A few years ago, these questions might have gone unanswered if neither you nor any of the people with you knew the answer, or you might have had to wait until the credits appeared. However, now, all it takes is a simple google search to find all of the answers to those questions. The information that you might find on the internet about the series could be the cast, number of seasons, number of episodes in the season, airing dates, episode summaries, episode length, and production details, among others. This relationship between the TV series and the information you found about it on the internet brings us to the concept of metadata.

Metadata

As you might notice from the example above, metadata is simply data about data. In this particular case, it would be the data on the internet about the videos you watched. The primary use of metadata is to provide context and information about data, as well as enhance findability and describe data, all of which are especially helpful when dealing with unstructured data.

Structured data: These data follow a defined framework with a set number of fields. Think of a well-formatted spreadsheet where every column contains one specific type of data. An example of this would be a table with personal information, such as name, address, telephone number, and age of multiple people.
Unstructured data: This data is not able to be stored in a traditional column-row database or spreadsheet. Think of photos, videos, audio, text documents, and websites. Unstructured data is also the most common type of data, and because of its unstructured nature, its metadata is particularly useful to help us find it and make sense of it. How would you be able to find a movie without being able to search by its title, who stars in it, or what it is about?

Amazon Prime Video Meets IMDB

IMDB is a database designed to provide TV watchers and cinephiles information about millions of TV shows and films, including cast biographies and reviews. In 1998, Amazon bought IMDB to acquire its lucrative user base and give its Amazon Prime Video streaming service a marketing push. This strategic acquisition allowed Amazon to promote its video streaming service to an already targeted user base.

Amazon Prime Video and IMDB kept growing in both content and active users. The streaming service not only got its marketing push, but also integrated its user-generated data (such as user behavior and preferences) with IMDB’s database to boost its recommendation systems across platforms. So, how could these two successful products be further integrated? Fast forward about a decade, and Amazon Prime Video added a new feature called X-Ray.

Remember those questions that many people have while watching a video? Well, X-Ray takes care of that. Now, when pausing your favorite show, X-Ray will display information about what you are currently viewing. This includes cast information, filmographies, facts, trivia, character backstories, photo galleries, bonus video content, and music. By leveraging metadata from IMDB, Amazon Prime Video can add structure to unstructured video content, enabling users to answer those nagging questions.

Building More Informed Recommendation Systems

Recently, due to the COVID-19 crisis, video streaming services have been experiencing a surge in demand. As people catch up on their favorite series, streaming service firms need to keep their customers engaged by recommending related material that would keep them active.

At Enterprise Knowledge, we had the opportunity to work with a prominent client in the telecommunications industry. We improved their recommendation systems, leveraging the power of metadata on its unstructured content. The enhanced recommendation system takes the viewers’ input based on a specific scene the viewer is currently watching. The engine would ingest information pulled from the closed captioning file and internal and external databases containing information about the tv series, episode, and scene. The resulting recommendation system would not only work on general information about the tv series such as genre, recurring cast, summaries, and network, but it would also take specific details about the scene, such as sentiment inferred from the subtitles, non-recurring cast appearances, and particular music in the scene, improving the recommendations provided to the viewer.

Beyond the media or telecommunications industries, metadata has an equally crucial role in making unstructured data usable and accessible. It allows enterprise applications to link unstructured content based on assigned attributes included in the metadata. As another example, in the pharmaceutical industry, a recommendation system would take research papers, formulations, and experiment reports and link them based on related chemical compounds, illnesses, or authors. These links in the data power up recommendation systems and enterprise search engines that provide content at the users’ point of need. The resulting enterprise applications are as powerful as the quality and completeness of the metadata used to derive the results.

The benefits of including metadata as an integral part of an organization’s strategy include:

Content findability, reuse, and sharing: Metadata ensures that complex content is easily understood and processed by people other than the content creator. Hence, it allows anyone in the organization to find the content they need to do their jobs regardless of content type, knowledge of its existence, who owns it, or where it is located. This results in increased productivity and higher quality of work.
Data Governance: Metadata can also serve as an annotation tool that denotes content ownership and temporality since some data may be deemed irrelevant after a specific timeframe. This also makes it easier to identify who is responsible for the timeliness and the quality of the content. Furthermore, it can be used to trigger workflows that ensure the content is accurate and up to date, if necessary. As a result, organizations have greater control over their content and data, ensuring the right people are finding and acting on the right information.
Innovation and Service: When employees spend less time asking coworkers for content, looking for information, recreating information, and waiting for answers, they have more time for innovation and customer support. This, in turn, results in greater employee and customer satisfaction, which leads to higher employee and customer retention.

Conclusion

In conclusion, metadata provides structure to unstructured content, making it machine-readable and ready to work with machine learning and artificial intelligence applications. In my specific example, Enterprise Knowledge enhanced the client’s unstructured video content using internal and external sourced data to provide a metadata rich environment. This environment gave the recommendation system access to new information on which to drive its decisions, culminating in better recommendations that keep TV watchers engaged. Similarly, we can help your organization connect your data, content, and people in ways to enhance your corporate knowledge, resulting in the benefits I discussed above.

Does your organization need assistance in leveraging metadata to enhance its unstructured content? Feel free to reach out to us for help!

The post Metadata Use Case: IMDB in Amazon Prime Video appeared first on Enterprise Knowledge.

Structuring Unstructured Content: The Power of Knowledge Graphs and Content Deconstruction

EK Team — Tue, 19 May 2020 13:00:19 +0000

Unstructured content is ubiquitous in today’s business environment. In fact, the IDC estimates that 80% of the world’s data will be unstructured by 2025, with many organizations already at that volume. Every organization possesses libraries, shared drives, and content management systems full of unstructured data contained in Word documents, power points, PDFs, and more. Documents like these often contain pieces of information that are critical to business operations, but these “nuggets” of information can be difficult to find when they’re buried within lengthy volumes of text. For example, legal teams may need information that is hidden in process and policy documents, and call center employees might require fast access to information in product guides. Users search for and use the information found in unstructured content all the time, but its management and retrieval can be quite challenging when content is long, text heavy, and has few descriptive attributes (metadata) associated with it.

What is Unstructured Content?

Unstructured content (also called unstructured data, and used here interchangeably), is content that does not have any data model or infrastructure applied to it. This makes it difficult to be ingested and managed by information systems, rendering search applications less accurate than they could be. Unstructured content is typically textual in nature, but can also include names, dates, and other data.

Common Unstructured Content Dilemmas

At EK we see two common dilemmas that users often encounter when dealing with unstructured content.

The “Search Again” Dilemma

Imagine that you’re trying to find your organization’s process for submitting a travel request. After searching through a shared drive of company information, you find an HR PDF called “Employee Handbook.” You open the file and see that it is 40 pages long, so you use Ctrl + F (or Edit -> Find) to search for the phrase “travel request.” This takes you to the portion of the handbook that you needed. In this scenario, you had to search twice: once for the document, and again for the actual information you needed. If you were using a system that was underpinned by deconstructed content, your initial search for “travel request” would have rendered what you needed, saving you time and effort. This is because the content pieces would each be tagged with metadata and indexed by a search application, as opposed to just the bigger, longer document. When this happens search is able to treat each content chunk as one search result, surfacing more specific answers.

The “I Didn’t Know I Was Looking for That” Dilemma

Users often embark on their search for content with a certain document or type of content in mind. They may think “I need the manual for this procedure,” or “I need this specific form.” However, users don’t always have access to, or awareness of, the full breadth of content an organization has stored in its systems. For example, imagine that you’re a lawyer looking for an example of a Licensing Contract that you can use for the project you’re working on. You may enter your company’s intranet and search for “Licensing Contract,” which returns dozens of contracts that you can scroll through. Further down in the search results you see a document titled “Licensing Contract Template” and realize this is what you need, as opposed to a completed example. In this instance, you had to scroll through search results only to realize that what you were looking for was actually a template. When data is unstructured, search results become unstructured. Systems cannot derive meaning from volumes of text, so they cannot reflect search results back in a meaningful way.

These scenarios should be familiar to almost anyone working in an organization with lots of unstructured data. Many users become accustomed to stumbling through virtual stacks of documents until they strike the right piece of information they need. However, this doesn’t need to be the status quo.

Creating Structure

There are two different practices that, when combined, result in a robust, efficient system for managing and searching for unstructured content. Before I talk about them, though, I should mention a critical part of information architecture: taxonomy. A precursor to more complex content management efforts should be the design of a user centric business taxonomy that satisfactorily encompasses the range of information being stored in a system. Taxonomy terms will be the glue that holds together the solutions I talk about moving forward. Once a taxonomy is in place, content deconstruction and a knowledge graph can be used to create a sophisticated content management solution.

Content deconstruction, which is explained in more depth in this blog, breaks longer documents into smaller chunks to apply more pointed metadata relevant to each section. This creates more relevant search results, consumable by both systems and users alike. In the context of the “Search Again” problem, a deconstructed approach would eliminate the need to dig through longer documents to get to the right piece of information. Applying a knowledge graph to content “chunks” results in an even more sophisticated solution in which these chunks can be related to each other.

Knowledge graphs create and manage meaningful relationships between content, breaking the constraints of keyword search and generating advanced discovery. Creating knowledge graphs is a complex endeavor, one which my colleagues at EK have written about extensively here. For this particular use case, knowledge graphs can relate structured content and data associated with content (like author, business area, and topic), so that relevant information can be quickly surfaced in search results. This drives users to discover content they may not have been aware of, preventing the second dilemma I discussed above and applying significantly more value to an organization’s content.

Putting it All Together

To give an example of how these two solutions can work together to create a seamless content consumption experience for users, take the example of a project I worked on for an international grocery store chain. This organization had an intranet that stored all employee handbooks and HR policies, amounting to long lists of links to download even longer pdfs on topics like Time Off, Dress Code, Pay, and Travel. If an employee wanted to find information about what uniform they are required to wear, they would first have to search the intranet using the term “uniform,” which would return, among other things, a 30 page pdf titled “Employee Dress Code.” Then they would have to download that pdf and take the extra step of scrolling or using Ctrl + F to find information specifically about uniforms. This should sound familiar, as it is an example of the “Search Again” dilemma.

What we did in this scenario was take each of the long policy documents and “chunk” them, breaking each into segments that addressed one topic or subject. A taxonomy was designed so that each segment was tagged with topical and departmental information. For the “Employee Dress Code” document, there were segments like “Store Uniform,” “Office Uniform,” and “Warehouse Uniform,” each with specific rules and expectations around these policies. Now, when an employee searches for the keyword “uniform,” they will be able to quickly assess which segment they need based on its content and tags.

To take this one step further, the use of a knowledge graph surfaces content related to the segment a user is viewing. For example, if the user searched for uniform and clicked on the “Store Uniform” segment, they might be shown the related content: “Uniform Order Form” and “Dress Code Violations.” In the course of finding this information, the user may realize that they need to place an order, and be able to efficiently do so because the order form link is readily available to them. This demonstrates a solution to the “I Didn’t Know I Was Looking for That” Dilemma.

Parting Thoughts

Deconstructing content and creating a knowledge graph for that content is no small feat, but it is a realistic and achievable approach to content management. At the end of the day, the goal is to build a system that stores and manages deconstructed “chunks” of tagged content that are related using a knowledge graph. If you would like guidance on where to begin, here is how Enterprise Knowledge experts can help.

The post Structuring Unstructured Content: The Power of Knowledge Graphs and Content Deconstruction appeared first on Enterprise Knowledge.

Using Facets to Find Unstructured Content

EK Team — Tue, 14 Jan 2020 14:00:25 +0000

What does ‘faceted navigation’ mean to you? For web-savvy individuals, it’s a search experience similar to that which you would find on Amazon. Facets primarily allow an individual to quickly sort through large amounts of information to locate a single or few entities. The infographic below provides a visual overview of what facets are, where they come from, and what they can allow you to do.

This infographic is a visual introduction to how facets can improve item, document, and content findability, regardless of the form and structure of that content. Other factors, like customized action-oriented results and an enterprise-wide taxonomy, allow for an even more advanced search experience. EK has experience in designing and implementing solutions that optimize the way you use your knowledge, data, and information, and can produce actionable and personalized recommendations for you. If this is something you’d like to speak with the experts at EK about, reach out to info@enterprise-knowledge.com.

The post Using Facets to Find Unstructured Content appeared first on Enterprise Knowledge.

What is the Roadmap to Enterprise AI?

Lulit Tesfaye — Wed, 18 Dec 2019 14:00:57 +0000

Artificial Intelligence technologies allow organizations to streamline processes, optimize logistics, drive engagement, and enhance predictability as the organizations themselves become more agile, experimental, and adaptable. To demystify the process of incorporating AI capabilities into your own enterprise, we broke it down into five key steps in the infographic below.

If you are exploring ways your own enterprise can benefit from implementing AI capabilities, we can help! EK has deep experience in designing and implementing solutions that optimizes the way you use your knowledge, data, and information, and can produce actionable and personalized recommendations for you. Please feel free to contact us for more information.

The post What is the Roadmap to Enterprise AI? appeared first on Enterprise Knowledge.