X-trader NEWS
Open your markets potential
Kingsoft Office Zhu Yi'e: From

# Source: Wall Street Insights
On December 20, at the **Alpha Summit** co-hosted by Wall Street Insights and China Europe International Business School (CEIBS), Zhu Yie, Assistant President and Senior Technical Expert of Kingsoft Office, delivered a speech titled *WPS AI: Toward Higher-Quality Knowledge-Augmented Generation*.
He stated that the core challenge of current AI applications has shifted from the competition of model capabilities to how to efficiently leverage enterprise private domain data. The convergence of model capabilities means that it is difficult for models themselves to form a monopolistic advantage.
He emphasized that the key factor that truly determines the value of AI applications lies in transforming a large amount of complex, unstructured document data within enterprises into high-quality knowledge assets that can be understood by models. Traditional Retrieval-Augmented Generation (RAG) faces fundamental limitations: "documents do not equal knowledge" and "semantic similarity does not equate to logical relevance". Therefore, it is imperative to upgrade the technical paradigm from **model-centric** to **data/knowledge-centric**.
He stressed that the future path lies in developing **Knowledge-Augmented Generation (KAG)**. This requires enterprises to systematically govern, model, and apply knowledge in the same way they manage data. Specifically, it is necessary to integrate multi-modal and multi-structured knowledge through technologies such as Vision-Language Models (VLM) and knowledge graphs, and build an architecture that emphasizes both "data lakes" and "knowledge lakes". The ultimate goal is to enable AI to truly **master** rather than merely "access" enterprise knowledge, thereby delivering reliable value in scenarios such as professional domain Q&A, intelligent writing, and compliant content creation, and completing the critical leap from digitalization to intelligence.
## Key Takeaways from the Speech
1. Enterprise AI applications are shifting from a "model-centric" approach to a "data-centric" one. Data quality has become the key determinant of AI application effectiveness. With the goal of Knowledge-Augmented Generation, WPS AI helps large models truly "master" enterprises' knowledge assets.
2. Manage knowledge the way you manage data. Converting data and knowledge into AI-ready assets is the cornerstone for enterprises to move from digitalization to intelligence. In the era of DATA 2.0, enterprises need to manage knowledge as rigorously as they manage data. Through knowledge modeling, knowledge governance, and multi-modal integration, WPS 365 builds exclusive "enterprise brains" for businesses.
3. High-quality output must start with high-quality input. If the input consists of messy, conflicting raw data, the output will be unreliable no matter how powerful the model is. Therefore, knowledge governance is the cornerstone of AI implementation in professional fields, and its importance will surpass that of algorithm optimization itself.
4. The professional application of AI is a "knowledge engineering" endeavor, not a simple technology integration. From drafting compliant reports to extracting precise information, the essence lies in the process of systematizing and structuring professional domain knowledge. The first enterprises to upgrade their knowledge assets will establish a genuine competitive edge in the AI era.
5. True intelligence is not about "seeing" documents, but about "understanding" logic. Current mainstream AI applications (such as RAG) encounter bottlenecks due to the limitation that "semantic similarity does not equal logical relevance". The real breakthrough lies in integrating multi-source knowledge such as knowledge graphs and business rules, enabling AI to perform logical reasoning and provide accurate answers, thereby unlocking value in professional scenarios.
## Core Summary Compiled by Wall Street Insights
### What is the Real Bottleneck After Large Models?
A key consensus has emerged: the comprehensive intelligence of cutting-edge large models has surpassed that of average employees in terms of knowledge reserve and logical understanding. Moreover, as model capabilities converge, monopolies are becoming increasingly difficult to achieve. This shifts the core question to: **how can large models deliver real value in practical applications?**
Our answer is: they must be deeply integrated with external data, especially enterprise private domain data. However, data in the form of "documents" does not equate to "knowledge"—this is because enterprises’ massive volumes of documents (texts, spreadsheets, PDFs, etc.) suffer from inherent flaws such as complex formats, disorganized structures, and conflicting content. For example, one document may stipulate that unused annual leave is compensated at a 200% rate, while another sets it at 300%; one regulation may require data storage for six months, while another mandates retaining only essential data. Without resolving these conflicts, AI output will be unreliable.
An even more profound challenge lies in the mainstream technical paradigm. The widely adopted RAG (Retrieval-Augmented Generation) technology relies on "vector similarity retrieval", which brings a fundamental limitation: **semantic similarity does not equal logical relevance**. For instance, if a user asks, "What should I do if my laptop won’t turn on?", the system may retrieve a document detailing the specifications of the "MacBook Pro 14-inch" (due to semantic similarity) while missing the actual troubleshooting guide that solves the problem but does not mention the word "laptop" (logically relevant). This leads to a common issue where many AI applications "impress in demos but struggle in production".
### From RAG to KAG: Building a New Paradigm of "Knowledge-Augmented Generation"
To break through these bottlenecks, we propose that a paradigm shift from RAG to **KAG (Knowledge-Augmented Generation)** is essential. This is not a simple optimization, but a fundamental paradigm transformation. Its core ideas are twofold:
1. High-quality input is the prerequisite for high-quality output. Knowledge must first undergo governance to resolve conflicts, fill gaps, and establish structured frameworks.
2. It is imperative to systematically integrate multi-modal and multi-structured knowledge assets. Instead of merely retrieving documents, enterprises should also incorporate existing knowledge graphs, structured tags, and process SOPs to provide high-quality input for AI generation.
Based on this, we have designed a two-layer architecture:
- The **lower layer is the Knowledge Governance Layer**, responsible for document parsing, knowledge extraction, knowledge graph construction, and quality monitoring.
- The **upper layer is the Knowledge Application Layer**, which integrates multi-source retrieval engines, dynamic ranking modules, and context engineering systems as core components to build a knowledge base that empowers various professional scenarios.
### Implementing KAG in Four Key Scenarios
Based on the KAG architecture, we have developed an intelligent document repository product and focused on four core application scenarios:
1. **Knowledge Governance**
Through automated knowledge extraction and knowledge graph construction, we help clients identify duplicate content, logical conflicts, and knowledge gaps in their document repositories. For example, the system can automatically flag conflicting versions of annual leave compensation rates or point out that an "IT support" knowledge base lacks critical chapters on "printer driver installation", assisting administrators in making decisions and optimizations.
2. **Professional Intelligent Q&A**
By integrating private domain document graphs with structured knowledge such as industry regulations and SOPs, our Q&A system can handle complex professional queries. For instance, a user can ask: "In Zhejiang Province, can ingredient X be used in the production of a specific particle-size active pharmaceutical ingredient? Please reference only the 2025 regulations." The system can accurately parse multiple constraints including location, ingredient, and year, and deliver a precise answer.
3. **Intelligent Extraction from Complex Documents**
We have made targeted optimizations for complex tables, checkboxes, and handwritten text commonly found in medical reports, contracts, and invoices. A pharmaceutical client leveraged this feature to automatically parse email attachments of adverse drug reaction reports, extract key fields, and populate them into the client’s drug management system—reducing manual processing time from hours to minutes.
4. **Intelligent Writing for Professional Fields**
This differs from writing leave applications; it involves drafting industry reports with strict formatting and precise data citation requirements (e.g., Clinical Study Reports, CSRs). We enable collaboration between two AI agents: one generates an "intelligent template" including outlines and data requirements based on examples and regulations; the other accurately locates and extracts required data and tables from massive experimental datasets according to the template, ultimately producing a compliant, data-accurate professional report and drastically shortening the drafting cycle from weeks to days.
### Manage Knowledge the Way You Manage Data
In conclusion, the evolution from RAG to Graph RAG and then to KAG represents an upgrade from "enabling large models to see documents" to "understanding the logic between documents" and finally to "truly mastering enterprise knowledge assets".
We believe that in the intelligent era, enterprises need to build a new architecture that emphasizes both **data lakes and knowledge lakes**. In the future, enterprises must not only accumulate raw data but also systematically conduct knowledge operation, knowledge modeling, and knowledge governance—just as they have managed data in the past. This will be the critical cornerstone for enterprises to move from digitalization to intelligence, and the inevitable path for AI to deliver genuine efficiency gains in professional fields.
## Risk Warning and Disclaimer
The market is inherently risky, and all investments should be made with caution. This document does not constitute personal investment advice and does not take into account the specific investment objectives, financial conditions, or needs of individual users. Users should carefully consider whether any opinions, viewpoints, or conclusions contained in this document are suitable for their specific circumstances. Any investment made based on this document shall be at the user’s own risk.
---
### Translation Notes
1. **Terminology Standardization**
- 专有名词:`中欧国际工商学院`译为 *China Europe International Business School (CEIBS)*(官方英文名);`金山办公`译为 *Kingsoft Office*;`知识增强生成`统一译为 *Knowledge-Augmented Generation (KAG)*,并保留英文缩写便于阅读。
- 技术术语:`私域数据`译为 *private domain data*;`检索增强生成`译为 *Retrieval-Augmented Generation (RAG)*;`知识图谱`译为 *knowledge graph*;`多模态`译为 *multi-modal*,均采用AI领域通用译法。
2. **Sentence Structure Optimization**
- 拆分中文长句,采用英文主从复合句结构,如将“传统RAG面临……因此必须推动……”转化为带因果逻辑的复合句,符合英文科技文本表达习惯。
- 处理演讲中的排比与强调句式,如“不是……而是……”译为 *not...but...*,“真正的智能不是……而是……”译为 *True intelligence is not about...but about...*,保留原文语气。
3. **Contextual Adaptation**
- “出Demo惊艳,上生产困难”译为 *impress in demos but struggle in production*,采用英文科技行业常用表达,准确传达技术落地难点。
- “数据湖”“知识湖”译为 *data lakes*、*knowledge lakes*,保留隐喻并符合数据领域术语规范;“企业大脑”译为 *enterprise brains*,体现拟人化表达的专业性。
Contact: Sarah
Phone: +1 6269975768
Tel: +1 6269975768
Email: xttrader777@gmail.com
Add: 250 Consumers Rd, Toronto, ON M2J 4V6, Canada