The key to effective use of GenAI and LLMs: metadata
With the increasing prevalence of generative AI technologies (GenAI) and large language models (LLMs) such as OpenAI's GPT and Meta's LLaMA family, companies now have powerful tools at their disposal to manage extensive and complex documentation more efficiently. However, despite the immense potential of these technologies, there are several challenges that must be overcome for their safe and effective use in technical documentation. In this blog post, we highlight these challenges and introduce Retrieval Augmented Generation (RAG) technology, which offers a promising solution through the clever use of metadata.
Table of Contents
The challenges of using LLMs
Precision and reliability are essential in technical documentation. However, unmodified LLMs face several challenges in this regard:
1. Integration of company knowledge
A key challenge is that although LLMs have been trained on huge amounts of data, they do not contain any specific company knowledge, as this is not usually publicly available. This means that an LLM cannot simply answer questions about the operation or maintenance of company-specific machines. The LLM must first acquire this knowledge. However, retraining an LLM often requires more data than is available in a company. At the same time, the answers provided by an LLM-based chatbot should be based exclusively on internal company knowledge and not come from the LLM's general training data pool.
2. Accuracy of results
LLMs tend to produce “hallucinations.” Hallucinations are responses that appear semantically consistent but are incorrect in terms of content. In technical documentation, where accurate information is crucial for the safety and operation of machines, such “invented” responses can have serious consequences. These hallucinations occur especially when the LLM cannot find precise information on the question.
3. Traceability of information sources
Most LLMs function as “black boxes” whose decisions are based on complex neural networks and billions of parameters. The exact source of the information on which an answer is based is often not transparent. However, if an LLM-based chatbot is to provide safe and reliable information on the operation of a machine, it is crucial that the source of the answer is clearly traceable, especially given the aforementioned tendency to hallucinate.
4. Accuracy of answers
Since LLMs are based on static training data, their answers may be outdated or vague, especially when specialized knowledge is required. LLMs often have difficulty distinguishing between similar product variants, which is of great importance in technical documentation, where even the smallest details can be crucial. Technical documents often contain a high degree of redundancy—for example, the operating instructions for two product variants are often highly similar in terms of text, but differ in essential details such as technical data or special features. These differences are often overlooked in LLM scenarios. If an LLM accidentally provides the manual for “T3-H1” when asked for repair instructions for the “T3-B” fan, this can have serious consequences, such as the use of incorrect spare parts or tools.
These challenges make it clear that LLMs cannot be easily applied to content from technical documentation. But there are solutions: With the help of RAG architecture, the general linguistic knowledge of LLMs can be applied to and restricted to specific company knowledge. Metadata ensures that documents are clearly structured and available to the LLM in a comprehensible form, so that precise, traceable, and correct results can be guaranteed—especially for documents with high redundancy, as is often the case in technical documentation.
Retrieval Augment Generation – How RAG works
RAG technology offers a powerful solution to the challenges mentioned above by enriching LLMs with specifically retrieved, relevant data. In simple terms, it consists of three central components:

1. Provision of information from specified external sources (“retrieval”)
In RAG architecture, relevant external data sources are searched in a targeted manner to provide the LLM with the appropriate context for the user query. These can be text documents, structured databases, knowledge graphs, and much more. From the LLM's perspective, this “external data source” provides company- and product-specific expertise. Techniques such as vector search make it possible to identify the most relevant documents or data, which are then integrated into the prompt, i.e., the query to the LLM.
2. Enriching the prompt with information from the source as context (“augmentation”)
The retrieved information is used as additional context to enrich the LLM's input prompt. The information is prepared in such a way that it can be seamlessly integrated into the existing prompt. This enables the LLM to draw on a broader knowledge base and thus generate more informed and accurate responses.
3. Generation of the response by the LLM (“generation”)
By using the extended context, the model can better understand the user's query and deliver more relevant results. The LLM uses its broad, general knowledge of language to transform the data fragments it finds into a well-formed, natural language response. The responses are not only more accurate, but are also based on clearly traceable data sources that can be verified if necessary.
The advantages of RAG-enhanced LLM chatbots are enormous. They enable fast, valid responses in human language and elevate the interaction between users and technicians with the entire company and product knowledge to an unprecedented level of interaction. RAG architecture bridges the gap between the limited scope of prompts (prompt engineering) that can be given to a language model for answering questions and the complex, time-consuming, and resource-intensive retraining of the entire language model (fine-tuning).
The RAG architecture thus offers a practical solution for expanding LLMs with company-specific information. However, in order to exploit the full potential of this technology, some crucial preparatory work must be done. It is not enough to simply “dump” the entire data treasure trove into the system in unstructured form.
Valid, high-quality results require a prepared and structured knowledge base that serves as a sound source of information for the LLM – and this is where metadata plays a key role.
The crucial role of metadata
Metadata is data that contains information about other data, such as document titles, product references, serial numbers, or technical specifications. It structures and organizes large amounts of data and enables LLMs to find the right information quickly and accurately. In a RAG architecture, metadata ensures that the right context is provided for generating answers – particularly important in technical documentation, where similar content is often finely nuanced.
By using knowledge graphs, metadata can be linked and information can be structured intelligently. Careful maintenance and management of metadata thus forms the basis for accurate, up-to-date, and context-sensitive responses, which are essential for the safe and effective use of LLMs in technical documentation.
Let's take the example from Challenge 4 above: The two repair manuals for the similar products T3-B and T3-H1 contain a high proportion of redundant information. The fact that the relevant difference between the documents is the reference to the product variant T3-B or T3-H1 is knowledge that the LLM does not have. By enriching the repair manuals with metadata, the documents are clearly classified and specified in the knowledge space: both what kind of information it is (repair manual) and for which product (T3-B or T3-H1) is now available as essential information for the LLM.
If the metadata is also linked in a knowledge graph, additional knowledge is created that can be included in the LLM prompt. For example, the knowledge that T3-B stands for the basic version of the table fan with 3 speed settings. In the knowledge graph, the knowledge is stored in a performant, machine-searchable manner.
If a user now asks for operating information for their basic model table fan, the LLM can identify the product variant T3-B and find suitable content that is tagged with this metadata.
plusmeta – The enabler for GenAI in technical documentation
The potential of GenAI and LLMs for technical documentation is enormous and will open up further exciting use cases in the future. However, this potential can only be exploited if the underlying information sources are comprehensively structured and tagged with metadata. This is the perfect basis for technologies such as RAG, which enable the application of LLMs to corporate knowledge.
The plusmeta platform offers a comprehensive set of intuitive workflows for automated metadata assignment and the preparation of existing documents. This creates the necessary basis for the secure and effective use of RAG and GenAI technologies.
Would you like to learn more about the many ways plusmeta can help you prepare your content? Then get in touch with us today and arrange a no-obligation demo appointment. We would be happy to personally introduce you to plusmeta and its many possible applications.

White Paper
Mastering Metadata in Technical Documentation
Metadata is the hidden helper in the background that takes you straight to the information you need. It brings order to the vast jungle of data and saves you a great deal of time. We'll show you why metadata is useful in technical writing.
Additionally you will read about those situations where you actually cannot do without it. We also dispel some myths and show you how to introduce metadata step by step.