Skip to content

CLI

LazyLLM provides a Command Line Interface (CLI) for model deployment, dependency installation, and running various services. This document introduces the core commands available via the lazyllm.cli module and usage examples.


lazyllm deploy

Executes model deployment or starts an MCP (Model Context Protocol) server based on the input commands.

Function 1: Start MCP Server

When the command starts with mcp_server, it will launch an MCP server supporting environment variable injection, SSE server port configuration, and other features.

# Start MCP service using uvx and mcp-server-fetch
lazyllm deploy mcp_server uvx mcp-server-fetch

Args:

  • mcp_server: triggers MCP server deployment mode.
  • uvx: command used to run the MCP service (e.g., node/npm/npx or an executable).
  • mcp-server-fetch: specific package or module name to run the MCP server.
# Start MCP server with environment variables and SSE port configured
lazyllm deploy mcp_server -e GITHUB_TOKEN your_token --sse-port 8080 npx -- -y @modelcontextprotocol/server-github

Args:

  • mcp_server: run in MCP server mode.
  • -e GITHUB_TOKEN your_token: set environment variables (can be used multiple times); here setting GITHUB_TOKEN.
  • sse-port 8080: specify the SSE server listening port as 8080.
  • --: passes subsequent parameters to the external command (like npx).
  • -y @modelcontextprotocol/server-github: the actual MCP server module and its parameters.

Optional parameters:

  • sse-host: SSE server listening address, default is 127.0.0.1.
  • allow-origin: list of allowed origins for CORS; can specify multiple.
  • pass-environment: whether to pass all local environment variables (default is false).

Function 2: Model Deployment

If the command does not start with mcp_server, it runs in model deployment mode, supporting multiple frameworks (e.g., vllm, lightllm) and optional web chat interface.

# Deploy LLaMA3 model using vllm and enable chat mode
lazyllm deploy llama3-chat --framework vllm --chat=true --top_p=0.9 --max_tokens=2048

Args:

  • llama3-chat: the model name to deploy.

  • framework=vllm: specifies the deployment framework; supports:

    • vllm: high-performance inference engine.
    • lightllm: lightweight model deployment.
    • lmdeploy, infinity, embedding, mindie: other specialized frameworks.
    • auto: automatically detect and recommend framework.
  • chat=true: enable web chat service. Equivalent forms include chat=1, chat=on.

  • top_p=0.9: nucleus sampling truncation probability during inference.

  • max_tokens=2048: maximum number of tokens generated.

Additional notes:

  • Other parameters can be passed as key=value to customize framework-supported inference configurations.
  • If chat=true is not enabled, the deployment runs as a background service.

lazyllm install

Used to install extra feature groups (extras groups) or specified third-party Python packages.

You can install:

  • Predefined component groups (e.g., embedding, chat, finetune)
  • Specific Python packages (e.g., openai, transformers)

The installation logic automatically manages version dependencies and compatibility issues, such as adapting flash-attn for PyTorch.

Function 1: Install Component Groups

# Install embedding and chat component groups
lazyllm install embedding chat

Args:

  • embedding, chat: predefined feature groups for embedding models and chat-related functions.

Function 2: Install Third-party Python Packages

# Install specific third-party Python packages
lazyllm install openai sentence-transformers

Args:

  • openai, sentence-transformers: Python package names, used for calling OpenAI APIs or loading vector models.

lazyllm run

Executes corresponding services or workflows based on the passed subcommands.

Function 1: Start Chatbot Service

lazyllm run chatbot --model chatglm3-6b --framework vllm

Args:

  • chatbot: starts the chatbot service.
  • model: specifies the model name to use, e.g., chatglm3-6b.
  • framework: backend inference framework, supporting lightllm, vllm, lmdeploy.

Function 2: Start RAG QA Service

lazyllm run rag --model bge-base --framework lightllm --documents /path/to/docs

Args:

  • rag: starts a retrieval-augmented generation (RAG) question answering system.
  • model: specifies the model name, e.g., bge-base.
  • framework: backend inference framework.
  • documents: required; absolute path to knowledge documents.

Function 3: Run JSON-based Workflow

lazyllm run workflow.json

Args:

  • workflow.json: JSON workflow file path to run the specified computational graph.

Function 4: Start Training Service

lazyllm run training_service

Args:

  • training_service: starts the model training service; no additional parameters required.

Function 5: Start Inference Service

lazyllm run infer_service

Args:

  • infer_service: starts the model inference service; no additional parameters required.

❗ Note: For chatbot and rag, source and framework are mutually exclusive and must be chosen from predefined options. Invalid commands or parameters will result in error messages.