docs: update README for LLaMA-Factory training example

2026-06-17 01:55:50 +00:00 · 2025-06-30 20:21:50 +08:00 · 2025-06-30 20:21:50 +08:00 · 6553713e41
commit 6553713e41
parent 41bd69aae0
1 changed files with 104 additions and 58 deletions
--- a/train/llama_factory_llm4decompile/README.md
+++ b/train/llama_factory_llm4decompile/README.md
@ -1,72 +1,118 @@
 # LLaMA-Factory Training Example

-## Environment Setup
+This document provides a detailed guide on how to use LLaMA-Factory for model training.

-### Install dependencies:
+## 1\. Environment Setup
+
+First, ensure your environment is set up correctly.

 ```bash
+# Clone the example repository
 git clone http://github.com/7Sageer/llama-factory-example
-git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
+cd llama-factory-example
+
+# Clone LLaMA-Factory as a dependency
+# We place it in the parent directory to keep the project structure clean
+git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git ../LLaMA-Factory
+
+# Install the required packages
 pip install -r requirements.txt
 ```

-### Prepare dataset:
+## 2\. Prepare Data and Models

-Place your dataset files in the `data` directory, or specify the dataset path using environment variables.
+  - **Dataset**: Place your dataset files (e.g., `.json` files) and `dataset_info.json` into the `data` directory.
+  - **Pre-trained Model**: Download or place your pre-trained model files into the `models` directory.

-## Configuration
+The file `data/llm4binary_v1_example.json` serves as a sample data example.

-All configurations can be set through environment variables. Here's a summary of available parameters:
+## 3\. Run Training

-### Parameter Tables
+Training is executed by directly calling the `train.py` script from `LLaMA-Factory` with the `deepspeed` command. You need to modify the parameters in the command below according to your requirements.

-#### Path Configuration
+### Training Command Template

-| Variable | Description | Default Value |
-|----------|-------------|---------------|
-| `ROOT_DIR` | Project root directory | Current directory |
-| `LLAMA_FACTORY_DIR` | LLaMA-Factory directory | `${ROOT_DIR}/../LLaMA-Factory` |
-| `MODEL_PATH` | Pre-trained model path | `${ROOT_DIR}/models/llm4decompile-1.3b-v1.5` |
-| `DATASET_DIR` | Dataset directory. This directory should contain your `dataset_info.json` and the actual dataset files. | `${ROOT_DIR}/data` |
-| `OUTPUT_DIR` | Output model directory | `${ROOT_DIR}/output_models/${exp_id}` |
-| `WANDB_PROJECT` | WandB project name | `LLM4Binary` |
-| `WANDB_MODE` | WandB mode | `online` |
-| `WANDB_API_KEY` | WandB API key | `your_api_key_here` **Replace with your actual API key** |
-| `WANDB_BASE_URL` | WandB base URL | `https://api.wandb.ai` |
-| `WANDB_DISABLED` | Disable WandB | `false` (enabled) |
-| `DEEPSPEED_PORT` | DeepSpeed port | `11000` |
-| `HOSTFILE` | Host file path for distributed training | Not set |
-| `EXP_ID` | Experiment ID | `deepseek-1.3b-llm4decompile-v15-llm4binary-v2` |
-| `CUTOFF_LEN` | Sequence truncation length | `4096` |
-| `MAX_GRAD_NORM` | Gradient clipping value | `1.0` |
-| `NUM_WORKERS` | Preprocessing worker threads | `256` |
-| `BATCH_SIZE` | Per-device training batch size | `16` |
-| `GRAD_ACCUM_STEPS` | Gradient accumulation steps | `16` |
-| `LEARNING_RATE` | Learning rate | `5e-6` |
-| `LR_SCHEDULER` | Learning rate scheduler type | `cosine` |
-| `LOGGING_STEPS` | Logging frequency (steps) | `1` |
-| `WARMUP_RATIO` | Warmup ratio | `0.025` |
-| `SAVE_STEPS` | Model saving frequency (steps) | `20` |
-| `SAVE_TOTAL_LIMIT` | Maximum number of saved models | `10` |
-| `FLASH_ATTN` | Flash Attention type | `fa2` |
-| `MAX_SAMPLES` | Maximum number of samples | `20000000` |
-| `NUM_EPOCHS` | Number of training epochs | `1.0` |
-| `BF16` | Enable BF16 precision | Not set (disabled) |
-
-## Running Training
+This is a complete and ready-to-use training command template. Please **directly modify** the parameter values within it, and then execute it in your terminal.

 ```bash
-# Run with default configuration
-bash run_training.sh
+# Before executing, first create the model output directory manually
+# For example: mkdir -p output_models/my-first-experiment

-# Or customize configuration
-export MODEL_PATH="/path/to/your/model"
-export DATASET_DIR="/path/to/your/data_directory"
-export BATCH_SIZE=8
-export NUM_EPOCHS=3
-bash run_training.sh
+deepspeed --master_port=11000 \
+    ../LLaMA-Factory/src/train.py \
+    --deepspeed ../LLaMA-Factory/examples/deepspeed/ds_z3_config.json \
+    --stage sft \
+    --do_train \
+    --model_name_or_path models/llm4decompile-1.3b-v1.5 \
+    --dataset llm4binary_v1 \
+    --dataset_dir data \
+    --template empty \
+    --finetuning_type full \
+    --output_dir output_models/my-first-experiment \
+    --overwrite_cache \
+    --overwrite_output_dir \
+    --cutoff_len 4096 \
+    --preprocessing_num_workers 256 \
+    --per_device_train_batch_size 16 \
+    --gradient_accumulation_steps 16 \
+    --learning_rate 5e-6 \
+    --lr_scheduler_type "cosine" \
+    --max_grad_norm 1.0 \
+    --logging_steps 10 \
+    --save_steps 100 \
+    --warmup_ratio 0.025 \
+    --run_name "my-first-experiment" \
+    --save_total_limit 10 \
+    --gradient_checkpointing \
+    --flash_attn "fa2" \
+    --num_train_epochs 1.0 \
+    --plot_loss \
+    # --bf16 \ # If your hardware supports BF16 and you wish to enable it, uncomment this line
+    # --hostfile /path/to/your/hostfile \ # For multi-node training, uncomment this line and provide the correct hostfile path
+| tee output_models/my-first-experiment/train.log 2>output_models/my-first-experiment/train.err
 ```

-## Example Data
+### Parameter Descriptions

-The `data/llm4binary_v1_example.json` file contains example data.
+You can modify most of the parameters in the command above. The table below provides detailed descriptions of common parameters for your reference.
+
+| Argument (`--argument`)       | Description                                                 | Default in Template                     |
+| :---------------------------- | :---------------------------------------------------------- | :-------------------------------------- |
+| `model_name_or_path`          | Path to a local pre-trained model or a HuggingFace model ID | `models/llm4decompile-1.3b-v1.5`        |
+| `dataset`                     | Name of the dataset to use, must be defined in `dataset_info.json` | `llm4binary_v1`                         |
+| `dataset_dir`                 | Directory where the dataset is located                      | `data`                                  |
+| `output_dir`                  | The output directory for the model                          | `output_models/my-first-experiment`     |
+| `run_name`                    | Experiment name displayed in monitoring tools like WandB    | `my-first-experiment`                   |
+| `cutoff_len`                  | Maximum sequence truncation length                          | `4096`                                  |
+| `per_device_train_batch_size` | Training batch size per GPU                                 | `16`                                    |
+| `gradient_accumulation_steps` | Number of gradient accumulation steps                       | `16`                                    |
+| `learning_rate`               | The learning rate                                           | `5e-6`                                  |
+| `num_train_epochs`            | Total number of training epochs                             | `1.0`                                   |
+| `bf16`                        | **(Flag)** Enables BF16 mixed-precision training            | Commented out                           |
+| `logging_steps`               | Log every N steps                                           | `10`                                    |
+| `save_steps`                  | Save a model checkpoint every N steps                       | `100`                                   |
+| `save_total_limit`            | Maximum number of model checkpoints to save                 | `10`                                    |
+| `flash_attn`                  | Version of Flash Attention to use (e.g., `fa2`)             | `"fa2"`                                 |
+| `hostfile`                    | **(Multi-node)** Path to the DeepSpeed hostfile             | Commented out                           |
+
+### Execution Example
+
+Suppose you want to run a new experiment with a smaller batch size and 3 training epochs.
+
+**Step 1: Create the output directory**
+
+```bash
+mkdir -p output_models/deepseek-3-epochs
+```
+
+**Step 2: Modify and execute the command**
+
+Copy the **Training Command Template** from above and modify the following lines:
+
+  - `--per_device_train_batch_size 8`
+  - `--num_train_epochs 3.0`
+  - `--output_dir output_models/deepseek-3-epochs`
+  - `--run_name "deepseek-3-epochs"`
+  - Also, change the log paths after `tee` and `2>` to `output_models/deepseek-3-epochs/train.log` and `.../train.err`.
+
+Then, execute your modified complete command in the terminal.