V0.6 & V0.7 Development Plan
I. Timeline & Milestones
- V0.6: 2025/09/01 - 2025/11/30
- V0.7: 2025/12/01 - 2026/02/28
II. Feature Modules
2.1 RAG
2.1.1 Engineering Capabilities
1. Integrate LazyRAG Capabilities into LazyLLM (V0.6)
Background & Objectives
LazyRAG is a key module of LazyLLM in the Retrieval-Augmented Generation (RAG) direction. It has been validated for feasibility and performance improvement in experimental projects and achieved the best results in the internal RAG evaluation within the group.
However, its capabilities are currently scattered across prototype code and experimental scripts, with some capabilities conflicting with the LazyLLM framework. The goal of this migration is to integrate LazyRAG's core capabilities into the main LazyLLM framework, eliminate useless code in LazyRAG while significantly improving the richness of RAG-supporting components in LazyLLM, and provide user-friendly auxiliary capabilities for RAG scenarios (such as reference citations).
Acceptance Criteria
- Unit tests for migrated modules
- Code review for LazyRAG
Timeline
- [ ] Planned completion of migration by August 31st
2. Multi-Knowledge Base Macro Q&A (V0.6)
Background & Objectives
Currently, LazyLLM's macro Q&A capability only supports single knowledge base scenarios. Users need to switch between different knowledge bases when working across projects or departments, which reduces query efficiency and easily leads to information omission or contradictions. In enterprise applications, there is often a need for unified querying and fusion across multiple knowledge bases, such as comprehensive Q&A combining "product manuals + technical solution libraries + FAQ". The goal is to build a unified access and management mechanism for multiple knowledge bases, allowing users to ask questions globally while the system automatically selects relevant knowledge bases and fuses answers, ensuring the architecture has good scalability and consistency to support parallel use of dozens or even hundreds of knowledge bases in the future.
Technical Solution Design
- Relational Database as Knowledge Base Index: Add a relational database (such as PostgreSQL) to store metadata and SQL-Call information for multiple knowledge bases.
- Dynamic Keyword Mechanism: Each Document object can set additional keywords through a function, with the system maintaining independent keyword sets for each kb-id.
- Database Sharding: Each knowledge base corresponds to an independent table in the database, storing document metadata and additional keyword fields to avoid data interference during cross-database queries.
- Keyword Modification Issues: When keywords for a knowledge base are modified, it's necessary to evaluate whether all documents already in the database must be re-parsed. Two strategies are designed:
- Delayed Update: Only use new keywords for newly added documents, keeping historical versions for old documents
- Batch Reprocessing: Re-extract information from old documents through background asynchronous tasks and update indexes
- Fusion & Routing: During queries, the system extracts keywords from user questions, quickly matches relevant kb in the relational database, then enters vector databases or other retrieval engines for deep queries, finally unifying results through the fusion module.
Acceptance Criteria
- Functional Correctness: The system can accurately identify the scope of knowledge bases involved in user questions in multi-knowledge base environments and call corresponding retrieval engines to return results.
- Performance & Response: In scenarios supporting parallel queries of 10 or more knowledge bases, system query latency should remain within acceptable ranges (e.g., P95 < 2 seconds).
- Maintainability & Scalability: The system supports dynamic addition, deletion of knowledge bases, and keyword adjustments without affecting existing queries. Keyword modification strategies (delayed update or batch reprocessing) should work as expected and be clearly recorded in update logs. Unit test coverage should reach 90% or above, including multi-knowledge base combination queries, keyword modifications, exception handling, and other scenarios.
- Reliability & Consistency: In cases of node failures, database restarts, or retrieval engine exceptions, the system should still ensure correct routing and recovery of query requests, maintaining result consistency. Cross-database fusion modules need to support idempotent operations, avoiding duplicate calculations or information conflicts.
Timeline
- [ ] Planned completion of development by October 15th
3. RAG Horizontal Scaling (Multi-Machine Collaboration) (V0.6)
Background & Objectives
Current LazyRAG has shown good performance in single-machine environments, but in enterprise application scenarios, RAG service demands often exhibit characteristics of high concurrency, long tasks, and cross-regional calls, which single-machine deployment cannot meet production-level requirements. Although some modules of LazyLLM already have certain horizontal scaling capabilities, such as vector retrieval services that can be scaled through sharding, there are still obvious deficiencies in the RAG module: First, ServerModule's automatic startup mechanism does not yet support load balancing and dynamic scaling in cluster environments, requiring manual intervention for current multi-machine deployments; Second, the Processor component in RAG relies on process memory to save state, unable to naturally implement cross-node task splitting and sharing, limiting multi-machine collaboration capabilities. The goal is to refactor ServerModule's service management logic and Processor's state isolation and persistence solutions, enabling the RAG module to automatically scale and elastically schedule tasks in multi-machine environments, thus achieving "multi-machine collaboration" capabilities to meet the needs of large-scale knowledge retrieval for internal and external customers.
Technical Solution Design
Based on the existing ServerModule, introduce Ray cluster as the basic execution framework. Using Ray's Actor and Ray Serve functions, encapsulate the RAG module as horizontally scalable service instances, with each Actor able to run on any node in the cluster. Ray Serve provides built-in routing and load balancing capabilities, automatically distributing requests to healthy instances while supporting version control and rolling updates. To achieve elastic scaling, combine with Ray Autoscaler to dynamically increase or decrease Actor numbers based on system load metrics (such as QPS, latency, CPU/GPU utilization), achieving seamless scaling. Additionally, Ray's built-in global state management and task scheduling mechanisms allow each Actor to avoid explicit dependence on external service registration and discovery components, with cluster management and scheduling controlled by Ray, thus simplifying multi-machine collaboration logic and achieving high-availability, low-latency distributed RAG services.
For the memory dependency issue of RAG Processor, introduce shared storage and caching solutions. Short-term data is stored through distributed caches (Redis/Memcached), while long-term and task states are persisted to distributed databases (PostgreSQL or TiDB), establishing idempotent mechanisms through unique task IDs to avoid duplicate calculations across multiple nodes. Additionally, retrieval result caches use consistent hashing strategies distributed across multiple instances to reduce hotspots and cross-machine data transmission. The overall architecture ensures node statelessness, allowing any request to be assigned to any instance, thus supporting multi-machine collaboration.
Acceptance Criteria
- Functional Level: ① ServerModule can deploy tasks in multi-machine environments and support dynamic scaling; ② When nodes crash or go offline, the system can automatically remove them and restore services at the routing layer; ③ Processor can correctly store state information in the shared layer, ensuring tasks can be correctly recovered when migrating between different nodes.
- Performance Level: Conduct stress testing in environments with three or more nodes, with the system linearly scaling to more than 2.5 times single-machine QPS, and P95 latency not exceeding 1.2 times that of single-machine.
- Reliability Aspect: During dynamic scaling processes, user requests should not be interrupted or receive error responses; in random node failure injection tests, overall system availability should be ≥ 99.9%.
Timeline
- [ ] Planned completion of development by September 19th
4. Knowledge Graph Integration with At Least 1 Open Source Framework (V0.6)
Background & Objectives
Currently, LazyLLM has no internal knowledge graph. Considering that knowledge graph modules have not yet formed a unified system, with scattered functions and insufficient usability, unable to effectively support enterprise-level knowledge fusion and intelligent retrieval, and given that self-developed knowledge graphs have high costs, long development cycles, and complex maintenance, the goal is to integrate an existing open source knowledge graph framework (such as LightRAG or similar frameworks), prioritizing frameworks with single functions and lightweight dependencies to reduce integration complexity and operational costs.
After integration, the knowledge graph can deeply integrate with existing RAG modules to enhance Retrieval-Augmented Generation (RAG) capabilities, while supporting dynamic updates of the knowledge graph, including adding entities, relationships, or modifying attributes, ensuring accurate and traceable answers in rapidly changing enterprise knowledge environments. This goal both enhances knowledge graph practicality and avoids reinventing the wheel, accelerating development efficiency.
Technical Solution Design
The technical solution includes four parts: framework selection, integration, data updates, and fusion retrieval.
First, conduct open source framework selection, prioritizing lightweight knowledge graph libraries like LightRAG, requiring API interfaces that support Python calls and embedding into existing RAG processes.
During integration, deploy the knowledge graph as an independent service node with one-click deployment, exposing knowledge query and update interfaces through RPC or HTTP APIs. For dynamic updates, design asynchronous task queues to implement entity and relationship extraction for new documents and synchronously update the knowledge graph, while maintaining version history to support rollback.
For fusion retrieval, user queries first go through the RAG retrieval module to extract candidate documents, then combine with the knowledge graph for relationship reasoning or path expansion, finally returning unified results to ensure accuracy and richness of retrieval results.
Acceptance Criteria
- Functionality: The knowledge graph can correctly integrate with the RAG module, supporting dynamic addition, modification, and deletion of entities and relationships, with updated knowledge immediately retrievable.
- Fusion Effects: RAG queries combined with knowledge graphs should improve accuracy or recall compared to pure vector retrieval and return clear, traceable knowledge sources.
Timeline
- [ ] Planned completion of development by November 15th
5. Data Segmentation Strategies ≥20 Types (V0.6)
Background & Objectives
Current LazyLLM's data segmentation strategies are relatively simple, mainly including sentence-based splitting (sentence splitter) and using large models to extract keywords, generate summaries, or build QA pairs. These methods can meet basic needs in simple text scenarios, but for semantically complex documents, structured tables, code files, multimodal content, or long reports, existing strategies have limited effectiveness, easily leading to information loss, imprecise retrieval, or incoherent generated context. Therefore, the goal is to expand to at least 20 data segmentation and transformation strategies, covering high-frequency RAG usage scenarios, including: semantic natural paragraph splitting, topic splitting, code splitting by function/class, table splitting by row/column, multimodal content splitting, summary-enhanced splitting, QA pair generation splitting, etc. Ensure new strategies can be flexibly combined, supporting adaptive splitting for different document types, improving RAG retrieval granularity and precision, enhancing system applicability and stability in multi-scenario contexts.
Acceptance Criteria
Implement at least 20 segmentation strategies with flexible combination support for different document types; strategy execution should correctly generate segmentation fragments and corresponding metadata, ensuring information integrity. In terms of coverage, strategies should cover high-frequency RAG scenarios including text, code, tables, PDFs, and multimodal content.
Timeline
- [ ] Planned completion of development by September 19th
6. RAG RootNode Refactoring (V0.6)
Background & Objectives
Currently, LazyLLM's RootNode is quite chaotic and heavily influenced by Reader, resulting in inconsistent RootNode granularity across different documents. Some documents, even if very long, have only 1 RootNode, while some documents, even if short, have many RootNodes.
The goal is to reorganize RootNode so that each document has only one RootNode.
Technical Solution
Process RootNode to become a data structure similar to "MixDocNode" that can store text, images, tables, and other information in sequential order, and can generate and display markdown through some method.
Also consider making root-node hierarchical, using a top-root with blocks set below the top (making blocks the default group), text, images, tables, etc., so that after reader reading, results are stored in corresponding node-groups.
Acceptance Criteria
After document parsing completion, only one RootNode is generated per document, containing full document information.
Timeline
- [ ] Development completion by September 20th
2.1.2 Data Capabilities
1. Table Parsing (V0.6 & V0.7)
Background & Objectives
Currently, LazyLLM's data parsing capabilities are mainly focused on text and basic documents, lacking parsing and structuring capabilities for tabular data. In actual business scenarios, a large amount of knowledge exists in the form of Excel, CSV, or embedded tables in Word/PDF. If parsing cannot be efficient and correct, knowledge will be lost or difficult to retrieve. The goal is to provide unified table parsing capabilities, extracting table content from different sources (Office documents, PDFs, image recognition) into structured JSON/database formats for subsequent indexing, retrieval, and generation tasks.
Technical Solution Design
- Input Format Support:
- Excel/CSV → Direct reading (pandas/openpyxl)
- Word tables (docx) → python-docx extraction
- PDF tables → MinerU reading
-
Image tables → OCR (paddleocr, tesseract) + table structure recognition (TableMaster, DeepDeSRT)
-
Table Structuring:
- Standardized output to JSON or DataFrame structured formats
- Support for merged cells, multi-headers, multi-sheet parsing
-
Preserve row-column coordinates and original format information (e.g., colors, bold for emphasis)
-
Semantic Enhancement:
- Large model-based table semantic completion (e.g., identifying hidden headers, aligning context)
- Association between tables and contextual text (table titles, containing paragraphs)
- Brief descriptions of table structures and feature information extraction
Acceptance Criteria
- Support parsing of at least 4 table sources: Excel, CSV, Word, PDF
- Support restoration of merged cells, multi-headers, multi-sheets
- Support semantic binding between tables and context
Timeline
- V0.6: Implement Excel/CSV/Word/PDF parsing, complete structured extraction
- V0.7: Implement OCR table parsing, enhance semantic completion and Text-to-SQL & Code Interpret
2. CAD Image Parsing (V0.7)
Background & Objectives
Industries such as engineering, manufacturing, and construction extensively use CAD drawings as primary knowledge carriers. Existing LazyLLM lacks parsing capabilities for CAD images (such as .dwg, .dxf, exported PDF/PNG), limiting RAG applications in these fields. The goal is to provide CAD drawing parsing capabilities, including text recognition, symbol recognition, structural relationship extraction, and conversion to indexable knowledge fragments supporting queries and Q&A.
Technical Solution Design
- Input Format Support:
- CAD native formats (DWG, DXF) → Parse through libraries like ezdxf
-
CAD exported PDF/images → OCR + structure recognition
-
Content Extraction:
- Text annotations and dimension information in drawings → OCR extraction
- Graphic elements (lines, circles, block references, etc.) → CAD API parsing
-
Symbol libraries (electrical symbols, architectural symbols, etc.) → Rules + model recognition
-
Semantic Enhancement:
- Large model-based drawing semantic summarization (e.g., extracting "This drawing is a floor plan of a certain level, containing xx rooms, with pipeline routing as xx")
-
Establish relationship graphs between graphic elements (similar to mini-knowledge graph)
-
RAG Integration:
- Convert CAD drawing content to text/knowledge graph nodes
- Support natural language-based retrieval (e.g., "find all air duct diameters on basement level 2")
- Support dynamic CAD knowledge base updates (incremental import)
Acceptance Criteria
- Support at least three CAD input types: DWG/DXF/PDF
- Support extraction of text annotations, dimensions, and symbols from drawings
- Support structured result output (JSON/Graph)
- Support Q&A based on CAD content in RAG
Timeline
- Planned to start support in V0.7
2.1.3 Algorithm Capabilities
1. Structured Text (CSV etc.) Q&A (V0.6)
Background & Objectives
Currently, LazyLLM supports retrieval-augmented Q&A for unstructured text (documents, PDFs, etc.), but for structured data (such as CSV, Excel, database tables), it can only analyze and process according to RAG processes, unable to freely analyze and query our tables.
In actual user scenarios (financial reports, user behavior logs, experimental result tables, sales records, etc.), structured data Q&A demand is extremely high.
Goal: Support users uploading CSV/Excel for natural language Q&A, returning results (including aggregation calculations, filtering, sorting, visualization).
Technical Solution Design
No design yet
Acceptance Criteria
- Users can upload CSV/Excel and perform Q&A
- System can automatically identify fields and generate correct answers based on user questions (accuracy ≥ 70%)
- Support multi-turn follow-up questions (e.g., "take the top 10 from the previous result")
- Return results can choose table/chart formats
Timeline
- V0.6: Implement CSV Q&A (Polars + NL2SQL)
- V0.7: Support complex aggregation, cross-table joins, visualization
2. Multi-Hop Retrieval (V0.6)
Background & Objectives
In existing RAG systems, most retrieval processes are single-hop: user question → retrieve similar documents → generate answer. But in practical applications, many questions require spanning multiple content segments for complete answers, such as:
- Starting from a certain passage, finding its references, then finding related evidence from references, finally assembling a complete answer
- In technical documentation, API descriptions may reference another module, then further point to implementation details, requiring layer-by-layer navigation
- In academic paper reviews, need to trace from original conclusions to experimental data, then combine with methodology sections for comprehensive explanation
The goal is to build a multi-hop retrieval mechanism that doesn't depend on knowledge graphs, automatically expanding retrieval scope along "reference/association clues" until integrating sufficient context, supporting complex Q&A and reasoning.
Technical Solution Design
【This solution is provided by large models, with actual strategies as final】
- Hop Retrieval Framework
- Based on existing vector retrieval (BM25 / dense retriever), extract possible association clues (such as references, context hints, keywords, external links) from initially retrieved passages
- Combine with LLM to determine whether secondary retrieval is needed
-
Support recursive retrieval until reaching maximum hops or confidence convergence
-
Clue Extraction Mechanism
- Reference detection: Detect referential statements like "as shown in figure", "see table X", "reference [12]"
- Entity association: Use Named Entity Recognition (NER) or semantic clustering to discover entities strongly related to original questions but not covered
-
Causal/upstream-downstream hints: Identify logical hints like "based on", "therefore", "furthermore" to expand context
-
Jumping Strategies
- Forward expansion: Starting from passages, follow references or links to target passages
- Backward tracing: Trace original sources based on reference numbers or keywords
-
Lateral expansion: Retrieve similar entities or same-topic documents to complete knowledge
-
Result Integration
- Use chain-of-thought summarization to merge multi-hop retrieval results into coherent answers
- Introduce answer traceability: List source passages for each retrieval step in answers, enhancing interpretability
Acceptance Criteria
- Significantly improve correctness of complex Q&A in scenarios like academic reviews, technical documentation, FAQ knowledge bases
- Retrieval process configurable with maximum hops and confidence thresholds to avoid unbounded expansion
- Output answers include multi-source content integration with traceability (indicating association chains)
3. Information Conflict Handling (V0.7)
Background & Objectives
Different knowledge bases or data sources may contain conflicting information (such as multiple contract versions, statistical data from different sources, laws and regulations from different times and administrative levels). Current RAG systems usually directly concatenate retrieval results, potentially causing models to "fabricate" answers.
Goal: Introduce information conflict detection and handling mechanisms, enabling systems to clearly indicate conflicts and provide alternative answers.
Acceptance Criteria
System can identify and correctly handle conflicts rather than defaulting to merging.
Timeline
- V0.7 development completion
4. AgenticRL & Code Problem-Solving Toolchain Support (V0.7)
Background & Objectives
In code Q&A and algorithmic problem solving, single-generation often has high error rates. Introduce AgenticRL (Reinforcement Learning-driven agents) to optimize solutions through multiple attempts and environmental feedback.
Goal: Support automatic code problem solving (such as LeetCode/Codeforces subsets) with execution and repair in environments.
Acceptance Criteria
Support solving the first 100 problems in LeetCode
Timeline
- V0.7 development completion
2.2 Functional Modules
1. Memory Capabilities (V0.6)
Background & Objectives
Current large model long-term memory should be handled by independent backend systems, with algorithm frameworks themselves not having built-in storage or retrieval capabilities. However, in practical applications, algorithm frameworks can provide auxiliary tools and strategies to help algorithm modules generate, update, and manage memory in the backend. For example, providing pluggable memory interfaces, caching strategies, mechanisms for automatically triggering memory updates, and cross-session context backtracking capabilities.
The goal is to provide "lightweight memory enhancement" for developers without changing core architectural concepts, enabling applications to maintain consistency in conversations, understand historical behavior, and assist users in completing long-term tasks.
Testing Plan
- Construct multi-turn long conversation scenarios to verify memory backtracking consistency
- Stress test memory generation and update latency under high concurrency
- Verify memory integrity and correct recovery after power outages/process restarts
- Set up simulated task chains to observe whether models can correctly rely on historical memory to complete goals
Timeline
- [ ] Completion before November 15th
2. Distributed Launcher (V0.7)
Background & Objectives
Currently, LazyLLM's startup process information is only stored in memory, with strong binding between parent and child processes. Once the parent process exits, child processes are also killed. This mode limits horizontal scaling capabilities and makes stable operation in distributed scenarios difficult.
The goal is to persist process information to databases and achieve parent-child process decoupling, enabling parent processes to elastically scale and horizontally schedule. Also need to design recycling mechanisms to ensure child processes don't become zombie processes when all main processes exit, maintaining overall system stability.
Testing Plan
- Simulate scenarios where parent processes starting child processes exit first in distributed startup situations, verify whether child processes can be correctly retained
- Simulate scenarios where after parent processes starting child processes exit first in distributed startup situations, all main processes exit, verify whether child processes can be correctly cleaned up
Timeline
- Planned V0.7 completion
3. Database Globals Support (V0.6)
Background & Objectives
Existing Globals modules completely rely on memory storage, unable to meet distributed environment scaling needs. As application complexity increases, Globals needs to flexibly choose backends, including memory, Redis, SQL, and other different storage options. This not only adapts to different scales and deployment environments but also enhances system disaster recovery capabilities.
The goal is to implement a unified Globals abstraction layer, enabling developers to seamlessly switch between different storage solutions while ensuring data consistency and high availability.
Testing Plan
Verify correctness of memory, Redis, and SQL modes in single-machine mode.
Timeline
- [ ] Planned completion of development before October 30th
4. ServerModule → MCP Service (V0.7)
Background & Objectives
Currently, ServerModule relies on FastAPI to start services, but in distributed scenarios, it lacks fault tolerance and multi-instance load balancing capabilities. The goal is to transform it into MCP (Model Context Protocol) services to unify protocols, lower access barriers, and provide natural multi-instance scheduling and disaster recovery capabilities. Combined with achievements from RAG horizontal scaling (multi-machine collaboration) to support fault tolerance and load balancing, this can significantly improve framework stability and maintainability in production environments.
Testing Plan
Test whether arbitrary functions can be packaged into MCP services and directly called by agents
Timeline
- V0.7 version development
5. Mini Sandbox & Online Sandbox Service Integration (V0.7)
Background & Objectives
During reinforcement learning (RL) or task execution processes, there's often a need to dynamically execute user or model-generated code. Relying solely on local sandboxes is difficult to meet security and scalability requirements.
Goals: 1. Add Mini sandbox capabilities, allowing users to deploy small sandbox services locally 2. Integrate online sandbox services, enabling code execution through remote isolated environments
This avoids contamination or security risks to main processes while supporting resource limits, timeout controls, and log tracking for better monitoring and optimization of model training and inference processes.
Testing Plan
For Mini sandbox and online sandbox services, verify: - Basic correctness: Execute simple code and compare output consistency - Security: Simulate malicious code, verify sandbox isolation effects - Concurrency: Simultaneously initiate large-scale execution requests, verify stability - Monitoring: Check whether logs and metrics are completely collected
Timeline
- Planned V0.7 completion
2.3 Model Training & Inference
1. OpenAI Interface Deployment and Inference Support (V0.6)
Background & Objectives
In LazyLLM's initial design, the inference framework mainly referenced vLLM or TGI (Text Generation Inference) interface formats, which met most open source large model inference needs in early stages. However, with industry evolution, OpenAI interfaces have gradually become de facto standards, with many ecosystem tools and applications built around OpenAI's API format. Current LazyLLM already has capabilities to use OpenAI format.
Goal: Need to allow users to flexibly choose TGI or OpenAI interfaces during deployment phase, ensuring seamless connection to online models when using OpenAI interfaces. This maintains compatibility with open source frameworks while allowing users to easily migrate existing applications. Users can get consistent calling experiences regardless of which inference interface they choose.
Testing Plan
Unit tests to verify that Trainable services deployed in OpenAI format can connect using Trainable and OnlineChatModule
Timeline
- [ ] Completion by October 30th
2. Prompt Repository Integration (2-3 repositories) (V0.6)
Background & Objectives
Currently, LazyLLM only manually writes some prompts in a few core modules. While this approach is flexible, it lacks systematicity and reusability, often requiring users to maintain and design prompts themselves, increasing onboarding costs. As prompt engineering gradually becomes professional, multiple open source prompt repositories have emerged in the community, accumulating many verified high-quality templates. Therefore, LazyLLM needs to integrate 2-3 mainstream prompt repositories within the framework, allowing users to directly call prompts for common scenarios through convenient interfaces, significantly reducing repetitive work.
The goal is to provide ready-to-use prompt modules while ensuring flexibility, reducing user design costs and improving the system's "out-of-the-box" experience.
Testing Plan
Unit tests to verify that selected prompts can be correctly used for model inference
Timeline
- V0.6 version development completion September 30th
3. Intelligent Model Type Detection + auto-finetune Refactoring (V0.6)
Background & Objectives
Currently, LazyLLM's model type detection mechanism relies on simple matching of model names, which lacks intelligence, has unclear rules, and poor scalability. For example, some models with similar names but different capabilities can be easily misjudged by the system. Meanwhile, the auto-finetune module has overly complex framework selection logic: early solutions determined which framework to use by pre-calculating performance of different frameworks, which while accurate, was too cumbersome.
We hope to refactor in the new version: model type detection needs to be more intelligent, such as automatic identification based on model weight information or structure, with users also able to explicitly specify categories; auto-finetune should simplify logic, changing to priority sorting (framework priority + card allocation strategy), thus ensuring higher availability and scalability.
Testing Plan
- Use various models (large models, Embedding, Rerank, VL, etc.) to verify accuracy of intelligent detection
- Test whether the system can correctly override default logic when users explicitly pass in model categories
- Test whether auto-finetune can reasonably allocate memory and cards under different hardware resources
- Verify whether finetune framework priority mechanisms can cover common usage scenarios
Timeline
- V0.6 version development completion September 30th
4. Unified Fine-tuning and Inference Prompts with Fine-tuning Examples (V0.7)
Background & Objectives
In the current version, LazyLLM's fine-tuning and inference stages use different prompt systems, causing fine-tuned models to be unable to directly apply to inference, requiring additional format adaptation and increasing users' engineering burden. To solve this problem, we need to unify fine-tuning and inference prompts, establishing a universal prompt system. This can ensure fine-tuned models are ready to use out of the box while reducing development and maintenance costs, improving overall framework scalability and consistency. The ultimate goal is: users only need to maintain one set of prompts to simultaneously cover fine-tuning and inference stages, improving full-process coherence.
Testing Plan
Unit tests to verify whether TrainableModule's fine-tuning and inference can uniformly use the same set of prompts
Timeline
- V0.7 version development completion
5. GRPO Full Pipeline Support (V0.7)
Background & Objectives
GRPO (Generalized Reinforcement with Policy Optimization) is an emerging reinforcement learning training paradigm that performs excellently in large model alignment and adaptive optimization. To keep LazyLLM at the forefront, we need to provide full pipeline GRPO support in the framework, covering key modules like Reward, Policy, Value, and allowing user-defined functions. This not only enables users to quickly get started with GRPO experiments but also achieve customized optimization in specific domains. For example, research users can customize Reward functions to evaluate whether generated results meet academic standards, while industrial users can optimize generation speed and costs through custom Policies. The ultimate goal is: LazyLLM can provide users with an end-to-end GRPO training and inference solution that is both flexible and maintains engineering implementation capabilities.
Testing Plan
Verify process correctness, training convergence, and performance improvement on small-scale models.
Dependencies
Sandbox
Timeline
- V0.7 version development completion
2.4 Documentation
- API documentation improvement, all non-private interfaces and classes have Google-style API documentation (V0.6)
- CookBook documentation improvement (50 cases + comparison with mainstream open source frameworks) (V0.6)
- Environment documentation improvement (installation methods + package strategies) (V0.6)
- Learn documentation improvement (learning path for LazyLLM) (V0.6)
2.5 Quality
CI Time Optimization (V0.6)
- In daily testing processes, model deployment and output mocking to reduce CI time to 10 minutes
- Build daily and weekly build mechanisms, move high-time-consuming tests to daily builds, full tests to weekly builds
2.6 Development, Deployment & Release
- Environment isolation and automatic construction (V0.6)
- Debug optimization (V0.7)
- Process monitoring (output + performance) (V0.7)
2.7 Ecosystem
- LazyCraft open source (V0.6)
- LazyRAG open source (V0.6)