Fine-Tuning YOLOv8: A Practical Guide

1. Introduction

“When it comes to training custom object detectors, YOLOv8 makes the process feel deceptively simple—but fine-tuning it properly is where things get interesting.”

In this guide, I’ll walk you through how I personally fine-tuned YOLOv8 on a custom industrial inspection dataset—something with tiny defects, overlapping parts, and inconsistent lighting. These weren’t textbook-perfect images, and that’s exactly why I had to get hands-on with every part of the pipeline.

If you’re working with a custom dataset—whether it’s traffic surveillance, wildlife monitoring, or product quality control—this guide is meant to get you from “it runs” to “it performs”.

I’m not going to waste your time on the basics. This is all based on what I’ve done, what’s worked, and where I’ve hit walls. From setting up the environment properly to avoiding silent failures mid-training, I’ll share everything that matters—and nothing that doesn’t.


2. Setup & Environment (With No Room for Errors)

2.1 Python Environment + Packages

Let me start by saying: if your setup isn’t airtight, your training will break in ways that make zero sense.

Personally, I prefer using a virtual environment (either venv or conda) just to keep dependencies isolated. Here’s the exact setup I’ve used:

pip install ultralytics==8.1.0 torch==2.0.1 opencv-python==4.8.0.76

This combo has worked reliably for me on both local GPU machines and cloud-based setups (like Paperspace or Colab Pro). If you’re using CUDA 11.8 or higher, this torch version is solid. Just be careful mixing torch and CUDA versions—if they don’t align, you’ll end up with errors that look unrelated.

Also, avoid installing torchvision unless you actually need it. YOLOv8 doesn’t depend on it directly, and I’ve seen weird conflicts on some machines.

2.2 GPU Check and Compatibility

Before anything else, make sure your GPU is being picked up by PyTorch. This sounds obvious, but I’ve personally wasted hours debugging slow training speeds only to realize PyTorch fell back to CPU because CUDA wasn’t set up correctly.

Here’s a simple sanity check:

import torch

if torch.cuda.is_available():
    print(f"CUDA is available. Device: {torch.cuda.get_device_name(0)}")
else:
    print("CUDA NOT available. You're training on CPU.")

If you see CPU here and you were sure CUDA was installed, check your driver version. I’ve had cases where torch.cuda silently failed due to a mismatch between the driver and the installed CUDA toolkit.

2.3 Folder Structure for Custom Projects

This might sound like a small thing, but a clean folder structure can save your sanity later when you start experimenting with multiple models, datasets, and training configs.

Here’s the project structure I personally stick to:

project_root/
│
├── data/
│   ├── images/
│   │   ├── train/
│   │   └── val/
│   ├── labels/
│   │   ├── train/
│   │   └── val/
│   └── data.yaml
│
├── models/
│   └── yolov8_custom.yaml
│
├── weights/
│   └── best.pt  # exported model weights
│
├── scripts/
│   ├── convert_labels.py
│   ├── visualize_annotations.py
│   └── infer_custom.py

I’ve made the mistake early on of dumping everything into one folder—logs, weights, scripts, you name it. It worked for a day or two… until I had to rerun experiments or retrain from scratch. Now, I treat my folders like code repos: modular, clean, and reproducible.


3. Dataset Preparation

3.1 Label Format (And Why Getting This Wrong Wastes Hours)

Let me be real—this part is where I’ve personally made the most mistakes early on. Even now, I triple-check the format before training because YOLO doesn’t throw loud errors for wrong annotations; it just silently learns garbage.

YOLOv8 expects annotations in .txt files, one per image, with each line in this format:

<class_id> <x_center> <y_center> <width> <height>

All coordinates are normalized (i.e., values between 0 and 1 relative to image size). Here’s a real example from my dataset:

2 0.512 0.348 0.210 0.160
0 0.752 0.602 0.188 0.294

In this case:

  • 2 and 0 are the class IDs
  • The rest are the bounding boxes: [x_center, y_center, width, height]

🔧 Tip from experience: If your dataset comes with bounding boxes in pixel format (which many do), you must convert them—YOLO will not do it for you.

3.2 Dataset Conversion Scripts (No One Talks About These Edge Cases)

Now here’s the deal—most datasets aren’t in YOLO format. I’ve had to convert from COCO JSON, Pascal VOC XML, and even raw CSVs. The trick is to not just write a converter—but write a reliable one that handles odd cases like:

  • Negative coordinates
  • Rotated or flipped boxes
  • Class name mismatches

Here’s a minimal example I’ve used to convert VOC XML to YOLO format:

import os
import xml.etree.ElementTree as ET
from PIL import Image

classes = ["cat", "dog", "person"]  # must match your dataset.yaml

def convert_bbox(size, box):
    dw = 1. / size[0]
    dh = 1. / size[1]
    x = (box[0] + box[1]) / 2.0
    y = (box[2] + box[3]) / 2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    return (x * dw, y * dh, w * dw, h * dh)

def convert_annotation(xml_path, output_path):
    tree = ET.parse(xml_path)
    root = tree.getroot()
    image_path = root.find("path").text
    img = Image.open(image_path)
    w, h = img.size

    with open(output_path, 'w') as out_file:
        for obj in root.iter("object"):
            cls = obj.find("name").text
            if cls not in classes:
                continue
            cls_id = classes.index(cls)
            xmlbox = obj.find("bndbox")
            b = (
                float(xmlbox.find("xmin").text),
                float(xmlbox.find("xmax").text),
                float(xmlbox.find("ymin").text),
                float(xmlbox.find("ymax").text)
            )
            bb = convert_bbox((w, h), b)
            out_file.write(f"{cls_id} {' '.join(map(str, bb))}\n")

Pro tip: Always round YOLO coords to 6 decimals to avoid floating-point issues. And keep a few samples manually reviewed to sanity check your script output.

3.3 Directory Layout for YOLOv8 (This One’s Non-Negotiable)

YOLOv8 expects a very specific folder layout. If it’s even slightly off, you’ll get silent failures—or worse, training runs but results are nonsense.

Here’s the structure I stick to:

dataset/
├── images/
│   ├── train/
│   └── val/
├── labels/
│   ├── train/
│   └── val/
└── data.yaml

And here’s a quick Python snippet I wrote to auto-structure my raw dataset:

import shutil, os, random

def organize_yolo_format(raw_img_dir, raw_label_dir, dest_dir, split_ratio=0.8):
    os.makedirs(f"{dest_dir}/images/train", exist_ok=True)
    os.makedirs(f"{dest_dir}/images/val", exist_ok=True)
    os.makedirs(f"{dest_dir}/labels/train", exist_ok=True)
    os.makedirs(f"{dest_dir}/labels/val", exist_ok=True)

    images = [f for f in os.listdir(raw_img_dir) if f.endswith(".jpg")]
    random.shuffle(images)
    split = int(len(images) * split_ratio)

    for i, img_name in enumerate(images):
        base = img_name.split(".")[0]
        label_name = base + ".txt"
        set_type = "train" if i < split else "val"
        shutil.copy(f"{raw_img_dir}/{img_name}", f"{dest_dir}/images/{set_type}/{img_name}")
        shutil.copy(f"{raw_label_dir}/{label_name}", f"{dest_dir}/labels/{set_type}/{label_name}")

Heads up: Make sure your label files have the exact same filename (minus extension) as the image. YOLO won’t warn you if a label is missing—it’ll just skip that image.

3.4 Verifying Annotations Visually (Trust, Don’t Assume)

If there’s one thing you should never skip, it’s verifying the labels visually. I’ve run entire experiments with misaligned bounding boxes just because I skipped this step.

Here’s a simple visualization script I use with OpenCV:

import cv2

def draw_yolo_label(image_path, label_path, class_names):
    img = cv2.imread(image_path)
    h, w = img.shape[:2]
    
    with open(label_path, "r") as file:
        for line in file:
            cls_id, x_c, y_c, bw, bh = map(float, line.strip().split())
            x1 = int((x_c - bw / 2) * w)
            y1 = int((y_c - bh / 2) * h)
            x2 = int((x_c + bw / 2) * w)
            y2 = int((y_c + bh / 2) * h)
            cv2.rectangle(img, (x1, y1), (x2, y2), (0,255,0), 2)
            cv2.putText(img, class_names[int(cls_id)], (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,255,0), 2)
    
    cv2.imshow("Annotated", img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

My habit: I sample 20–30 random images from both train and val and run them through this script before touching the training step. It’s saved me more than once from training on trash labels.


4. Model Configuration

4.1 Selecting the Right YOLOv8 Variant (Don’t Just Go for the Biggest One)

Here’s the deal: when I first got started with YOLOv8, I was like, “Why not just go with yolov8x? Bigger = better, right?” Big mistake. Not because it’s a bad model—it’s a beast—but unless you’ve got a top-tier GPU and a dataset with thousands of high-quality images, it’ll just bottleneck your workflow.

So here’s how I choose now—based on actual use:

VariantUse Case
yolov8nFast testing, embedded devices, very small datasets
yolov8sGreat starting point for small-to-medium custom datasets
yolov8mBalanced for most real-world projects, my personal go-to
yolov8lUse when you’ve got at least 16GB+ GPU VRAM and high-resolution inputs
yolov8xOnly worth it for large-scale, production-grade datasets and top-tier GPUs

My take: I usually start with yolov8s just to validate that the pipeline is working. Once I’m confident, I move to yolov8m for most production-quality experiments. If I can’t train a full epoch in under 5 minutes, I downsize.

Command example:

yolo task=detect mode=train model=yolov8m.pt data=dataset/data.yaml epochs=100 imgsz=640

4.2 Customizing data.yaml (This Tiny File Can Break Everything)

This might surprise you, but 99% of the time I see someone stuck in training, it’s because of this one file. It’s deceptively simple, but if the paths or class definitions are even slightly off, YOLOv8 either crashes or trains garbage silently.

Here’s a working example I’ve used:

path: /home/user/datasets/my_project/
train: images/train
val: images/val

nc: 3
names: ["cat", "dog", "person"]

Breakdown:

  • path is the root path to your dataset folder
  • train and val are relative to path
  • nc is the number of classes
  • names must be in index order matching your labels

Real mistake I made once: I accidentally set nc: 2 but had 3 class names in names:. YOLO didn’t complain—it just skipped training for the third class. Took me hours to catch.

Here’s a quick sanity check in Python:

from ultralytics import YOLO

model = YOLO("yolov8m.pt")
model.train(data="dataset/data.yaml", epochs=1, imgsz=640)

If this runs fine for one epoch, you’re structurally good to go.

4.3 Hyperparameter Tuning Strategy (This Is Where Performance Lives)

Let me be blunt: if you’re using default hyperparameters, you’re leaving performance on the table. I’ve seen mAP improvements of 5–10% just from tuning LR, augmentation, and image size.

Here’s the approach I follow:

Start Simple:

Start with the command below to train on your config:

yolo task=detect mode=train model=yolov8m.pt data=dataset/data.yaml epochs=100 imgsz=640

Then tweak based on what your dataset is doing.

Adjust the following first:

--lr0        # Initial learning rate
--epochs     # More if your loss plateaus late
--imgsz      # Larger = more detail, but more GPU needed
--batch      # Adjust based on your VRAM

Example:

yolo task=detect mode=train model=yolov8m.pt data=dataset/data.yaml \
epochs=200 imgsz=768 lr0=0.005 batch=16

Want more control? Use a custom hyp YAML:

lr0: 0.005
lrf: 0.1
momentum: 0.937
weight_decay: 0.0005
warmup_epochs: 3.0
warmup_momentum: 0.8
box: 0.05
cls: 0.5
cls_pw: 1.0
obj: 1.0
obj_pw: 1.0
hsv_h: 0.015
hsv_s: 0.7
hsv_v: 0.4
flipud: 0.0
fliplr: 0.5
mosaic: 1.0
mixup: 0.2
copy_paste: 0.0

Run it like this:

From my own tuning cycles: Lowering mixup and copy_paste helped me stabilize training on a noisy dataset. I also found that using imgsz=768 gave sharper boxes, but required slicing my batch size in half.


5. Fine-Tuning the Model

5.1 Training Command + Full Explanation (No Guesswork Here)

Let’s not sugarcoat it: I’ve lost days trying to debug training runs just because I forgot to double-check a flag or misconfigured a name. Over time, I’ve settled into a reliable training command that just works.

Here’s my go-to command:

yolo task=detect mode=train \
  model=yolov8m.pt \
  data=dataset/data.yaml \
  epochs=100 \
  imgsz=640 \
  batch=16 \
  name=wildlife_detector

Let me break it down based on how I actually use it:

  • task=detect – You’re doing object detection. Obvious, but I once left this out when running a segmentation task, and YOLOv8 defaulted weirdly.
  • mode=train – This kicks off training.
  • model=yolov8m.pt – Start from a pretrained model. I often use yolov8s.pt for quick iterations, then bump up to m or l if it’s worth it.
  • data=... – The YAML file you prepped earlier.
  • epochs=100 – I tend to start with 100, but I rarely go with round numbers blindly. If loss is still improving, I just extend.
  • imgsz=640 – Go with 768 or 896 if your GPU allows—it often gives better localization.
  • batch=16 – You’ll want to experiment here based on VRAM. I once had silent crashing on batch 32, fixed by cutting it to 12.
  • name=wildlife_detector – This is crucial. Every run gets its own folder in runs/detect/, and naming saves you from that dreaded exp2, exp3, exp4 chaos.

5.2 Checkpoints and Resume Training (Don’t Start from Scratch)

This might surprise you, but I’ve seen people rerun full training from scratch after their laptop rebooted. You don’t have to live that life.

Here’s how I resume:

Using the best checkpoint:

yolo task=detect mode=train \
  model=runs/detect/wildlife_detector/weights/best.pt \
  data=dataset/data.yaml \
  epochs=200 \
  imgsz=640 \
  name=wildlife_detector_v2

This continues training with the best-performing weights from the previous run. I usually do this when I want to push performance without starting from zero.

Pro tip: If you want to resume exactly where training left off (including optimizer state), use the --resume flag:

yolo task=detect mode=train --resume runs/detect/wildlife_detector/weights/last.pt

YOLOv8 handles this pretty well, and it saves me when my colab runtime times out halfway through.

5.3 Dealing with Overfitting or Underfitting (Read the Signs)

Here’s the deal: training logs and the results.png graph aren’t just pretty charts—they’re diagnostics. I learned to read them the hard way.

Let’s break it down like I do when reviewing a run:

Signs of overfitting (Been there, seen it):

  • Training loss keeps going down, but validation mAP stagnates or drops
  • Big gap between box_loss and val/box_loss
  • Precision improves, recall tanks

Fix it with:

  • Stronger augmentations (mosaic, hsv_h, fliplr)
  • Early stopping or reduce epochs
  • Reduce model size (yes, that helps)
  • Use dropout or decrease obj/cls weight in hyp file

Signs of underfitting (Had this too):

  • Both train and val losses stay high
  • mAP is flatlined near zero
  • Training is painfully slow in learning

Tweak this:

  • Raise lr0 slightly (e.g., 0.01 → 0.015)
  • Increase epochs (maybe 200–300)
  • Improve label quality (seriously, check 10 random samples manually)
  • Train longer with cosine decay (YOLOv8 does this automatically)

Example: Reading results.png

Let me show you what I look at:

from matplotlib import pyplot as plt
import cv2

img = cv2.imread("runs/detect/wildlife_detector/results.png")
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.title("Training Metrics Overview")
plt.axis("off")
plt.show()

If I see val/box_loss diverging from train/box_loss—I know something’s cooking.


6. Evaluating the Model

6.1 Metrics That Actually Matter

Let’s cut through the noise. I’ve seen plenty of people obsess over mAP without understanding which mAP they’re quoting. Personally, I focus on both mAP@0.5 and mAP@0.5:0.95—but for different reasons.

  • mAP@0.5: Great for a quick sanity check. Think of it as the low-hanging fruit.
  • mAP@0.5:0.95: This is what separates average from state-of-the-art. It’s stricter and gives a more honest view of how well your model localizes across IoU thresholds.

I’ve had models scoring 90+ on mAP@0.5 but barely crossing 60 at mAP@0.5:0.95. Trust me, the latter exposes weaknesses—especially on edge cases.

Precision-Recall Curves

This might surprise you, but I find more insight from PR curves than a single mAP score. Look for sharp drop-offs—that usually signals inconsistency in how confidently your model predicts certain classes.

When I saw a PR curve flatten out early for a specific class, it led me to realize the model wasn’t confident enough. Turns out, I had class imbalance issues in my dataset.

6.2 Using yolo val Like You Mean It

Here’s how I validate:

yolo task=detect mode=val model=best.pt data=dataset/data.yaml

And here’s what you get out of it:

  • mAP@0.5, mAP@0.5:0.95
  • Precision, Recall
  • Confusion matrix
  • Per-class performance

But don’t just look at numbers—tune your confidence threshold. By default, YOLOv8 uses 0.25. I usually sweep from 0.1 to 0.5 depending on the use case. For safety-critical domains, I’ve used 0.6+ just to avoid false positives.

6.3 Confusion Matrix & Class-Level Deep Dives

This is where it gets real. That matrix isn’t just decoration.

For one wildlife project, I noticed the model kept confusing “bobcat” with “lynx.” The confusion matrix made it painfully obvious. Both had similar patterns, but one class was underrepresented.

Here’s what I did:

  • Boosted samples of “lynx” using synthetic augmentation
  • Added a new background class to reduce noise
  • Increased image size from 640 to 896 for better feature capture

That one change? Took my per-class mAP from 61 → 79 in a single run.


7. Inference & Deployment

7.1 Inference on Single or Batch Images

Once I’ve got my best.pt, I always validate inference on sample images before even thinking about deployment.

from ultralytics import YOLO

model = YOLO("runs/detect/wildlife_detector/weights/best.pt")

# Single image inference
results = model("test_images/lion.jpg")
results[0].show()

For batch inference:

import os
image_folder = "test_images/"

for file in os.listdir(image_folder):
    if file.endswith(".jpg"):
        path = os.path.join(image_folder, file)
        results = model(path)
        results[0].save(filename=f"inferred/{file}")

7.2 Performance Benchmarking (Know Before You Deploy)

You might be wondering: Is this model fast enough for real-time deployment?

Here’s how I quickly benchmark on different devices:

import time
import torch
from ultralytics import YOLO

model = YOLO("best.pt")

start = time.time()
_ = model("test_images/lion.jpg")
end = time.time()

print("Inference Time (Single Image):", round(end - start, 3), "seconds")
print("Running on:", "GPU" if torch.cuda.is_available() else "CPU")

On my RTX 3060, I get ~23ms per frame at 640×640. On CPU? Closer to 300ms. That’s your bottleneck if deploying to edge.

7.3 Exporting the Model (The Gotchas)

Exporting your model for deployment isn’t always a one-liner. But YOLOv8 makes it as painless as it gets.

yolo export model=best.pt format=onnx

YOLO also supports:

  • torchscript
  • openvino
  • coreml
  • engine (TensorRT)

I personally prefer ONNX for cloud deployment. But be warned—ONNX opset mismatches can silently kill performance or throw cryptic errors.

Here’s one I hit:

RuntimeError: Exporting the operator 'aten::meshgrid' to ONNX opset version 11 is not supported.

Fix? Just bump the opset:

yolo export model=best.pt format=onnx opset=12

8. Advanced Tips (Hard-earned Lessons)

“In theory, there is no difference between theory and practice. In practice, there is.”
— Yogi Berra

That quote hits home when you start pushing YOLOv8 to its limits. What I’m about to share here are the kinds of things you learn the hard way—after multiple experiments, a few failed models, and way too many TensorBoard sessions.

Custom Anchor Boxes: When You Actually Need Them

YOLOv8 is anchor-free by default, so most folks ignore anchors entirely. But here’s the catch: If you’re still using legacy anchor-based versions (like YOLOv5 or your project depends on custom behavior), you can’t afford to overlook this.

I had a dataset once with mostly tiny objects—like 12×12 pixels in 640×640 images. Default anchors were useless.

What worked for me:

python utils/autanchor.py --data data.yaml --img 640

This generates optimized anchor boxes based on your dataset. After that, retraining made a huge difference in recall—especially for small-object detection.

If your objects vary wildly in size, don’t even bother with custom anchors. Let YOLOv8’s transformer-based backbone handle it.

Mosaic and HSV Augmentation—But Not Blindly

This might surprise you: overusing Mosaic can mess up localization on tightly packed objects. I once trained a traffic detection model and noticed weird bounding box jitters during inference. Turned out, Mosaic augmentation was distorting context too much.

Here’s a more surgical approach:

augment:
  mosaic: 0.7  # default is 1.0
  hsv_h: 0.015
  hsv_s: 0.7
  hsv_v: 0.4

My rule of thumb: dial Mosaic down to 0.6–0.8 if your dataset has dense or overlapping objects. For HSV, I tweak saturation more than hue—it tends to generalize better.

Model Ensembling: When One YOLO Isn’t Enough

There was a time I had two models—yolov8m trained on clean, curated data and yolov8s trained on noisy, diverse real-world samples. Neither was perfect. But together? That’s where ensembling came in.

I used Non-Maximum Suppression (NMS) voting for post-processing results from multiple models:

from ensemble_boxes import weighted_boxes_fusion

# Combine predictions from two models
boxes_list = [boxes_model1, boxes_model2]
scores_list = [scores_model1, scores_model2]
labels_list = [labels_model1, labels_model2]

boxes, scores, labels = weighted_boxes_fusion(
    boxes_list, scores_list, labels_list, iou_thr=0.55, skip_box_thr=0.4
)

It boosted mAP by 3 points in production. Not massive, but when you’re squeezing out every last bit of performance, it matters.

TTA (Test Time Augmentation)

I don’t use TTA on every project—but when I do, it’s for final eval or deployment edge-cases. It’s like giving your model a second (or third) opinion on the same image.

With YOLOv8, you can hack in TTA by flipping, resizing, or rotating inputs during inference and averaging the predictions.

Here’s a rough structure I use:

flipped = cv2.flip(image, 1)
resized = cv2.resize(image, (720, 720))
original = image.copy()

results = []
for img in [original, flipped, resized]:
    r = model(img)
    results.append(r[0].boxes.xyxy.cpu().numpy())
# Merge results with NMS

Pro tip: If you’re using TTA, make sure your NMS thresholds are tuned accordingly. You’ll get way more overlapping boxes.


Final Thoughts

What I’ve Learned (The Hard Way)

When I first started working with YOLOv8, I thought it would just work out of the box. It did—for basic stuff. But the moment I moved into real-world data (imperfect, noisy, imbalanced), I hit a wall.

What changed the game for me was:

  • Understanding how to balance augmentation vs. overfitting
  • Reading PR curves instead of just chasing mAP
  • Learning when to stop training early (thanks, TensorBoard)

The biggest shift? I stopped treating it like a plug-and-play tool and started treating it like a system I had to understand.

When Not to Use YOLOv8

You might be wondering: Is YOLOv8 always the right call?

Here’s the deal:

  • Not great for extreme precision: If you’re in medical imaging where false positives are unacceptable, YOLO’s confidence thresholding can be too relaxed.
  • Not ideal for ultra-tiny objects in huge images: You’ll be better off with DETR or hybrid transformer backbones that preserve spatial resolution.
  • Long training cycles: Large-scale datasets might need more efficient distributed training than YOLOv8 currently supports natively.

Bonus: My GitHub & Colab

I always like to leave something tangible. If you want to dive deeper or test a few of these ideas in your own project, here’s a sample Colab and GitHub I put together:

Leave a Comment