在准备数据集时,darknet并不要求我们预先对图片resize到固定的size. darknet自动帮我们做了图像的resize.
darknet训练前处理
本文所指的darknet版本:https://github.com/AlexeyAB/darknet
./darknet detector train data/trafficlights.data yolov3-tiny_trafficlights.cfg yolov3-tiny.conv.15
main函数位于darknet.c
训练时的入口函数为detector.c里
void train_detector(char *datacfg, char *cfgfile, char *weightfile, int *gpus, int ngpus, int clear, int dont_show, int calc_map, int mjpeg_port, int show_imgs)
{
load_args args = { 0 };
args.type = DETECTION_DATA;
args.letter_box = net.letter_box;
load_thread = load_data(args);
loss = train_network(net, train);
}
函数太长,只贴了几句关键的.注意args.type = DETECTION_DATA;
data.c中
void *load_thread(void *ptr)
{
//srand(time(0));
//printf("Loading data: %d\n", random_gen());
load_args a = *(struct load_args*)ptr;
if(a.exposure == 0) a.exposure = 1;
if(a.saturation == 0) a.saturation = 1;
if(a.aspect == 0) a.aspect = 1;
if (a.type == OLD_CLASSIFICATION_DATA){
*a.d = load_data_old(a.paths, a.n, a.m, a.labels, a.classes, a.w, a.h);
} else if (a.type == CLASSIFICATION_DATA){
*a.d = load_data_augment(a.paths, a.n, a.m, a.labels, a.classes, a.hierarchy, a.flip, a.min, a.max, a.size, a.angle, a.aspect, a.hue, a.saturation, a.exposure);
} else if (a.type == SUPER_DATA){
*a.d = load_data_super(a.paths, a.n, a.m, a.w, a.h, a.scale);
} else if (a.type == WRITING_DATA){
*a.d = load_data_writing(a.paths, a.n, a.m, a.w, a.h, a.out_w, a.out_h);
} else if (a.type == REGION_DATA){
*a.d = load_data_region(a.n, a.paths, a.m, a.w, a.h, a.num_boxes, a.classes, a.jitter, a.hue, a.saturation, a.exposure);
} else if (a.type == DETECTION_DATA){
*a.d = load_data_detection(a.n, a.paths, a.m, a.w, a.h, a.c, a.num_boxes, a.classes, a.flip, a.blur, a.mixup, a.jitter,
a.hue, a.saturation, a.exposure, a.mini_batch, a.track, a.augment_speed, a.letter_box, a.show_imgs);
} else if (a.type == SWAG_DATA){
*a.d = load_data_swag(a.paths, a.n, a.classes, a.jitter);
} else if (a.type == COMPARE_DATA){
*a.d = load_data_compare(a.n, a.paths, a.m, a.classes, a.w, a.h);
} else if (a.type == IMAGE_DATA){
*(a.im) = load_image(a.path, 0, 0, a.c);
*(a.resized) = resize_image(*(a.im), a.w, a.h);
}else if (a.type == LETTERBOX_DATA) {
*(a.im) = load_image(a.path, 0, 0, a.c);
*(a.resized) = letterbox_image(*(a.im), a.w, a.h);
} else if (a.type == TAG_DATA){
*a.d = load_data_tag(a.paths, a.n, a.m, a.classes, a.flip, a.min, a.max, a.size, a.angle, a.aspect, a.hue, a.saturation, a.exposure);
}
free(ptr);
return 0;
}
根据a.type不同,有不同的加载逻辑.在训练时,args.type = DETECTION_DATA,接着去看load_data_detection().
load_data_detection()有两套实现,用宏#ifdef OPENCV区别开来.我们看opencv版本
load_data_detection()
{
src = load_image_mat_cv(filename, flag);
image ai = image_data_augmentation(src, w, h, pleft, ptop, swidth, sheight, flip, jitter, dhue, dsat, dexp);
}
注意load_image_mat_cv()中imread读入的是bgr顺序的,用cv::cvtColor做了bgr-->rgb的转换.
if (mat.channels() == 3) cv::cvtColor(mat, mat, cv::COLOR_RGB2BGR);
这里有个让人困惑的地方,为什么是cv::COLOR_RGB2BGR而不是cv::COLOR_BGR2RGB,实际上这两个enum值是一样的,都是4.
见https://docs.opencv.org/3.1.0/d7/d1b/group__imgproc__misc.html
image_data_argmentation()的主要逻辑
cv::Mat cropped(src_rect.size(), img.type());
//cropped.setTo(cv::Scalar::all(0));
cropped.setTo(cv::mean(img));
img(new_src_rect).copyTo(cropped(dst_rect));
// resize
cv::resize(cropped, sized, cv::Size(w, h), 0, 0, cv::INTER_LINEAR);
其实主要就是cv::resize. 这里cropped的img是在原图上随机截取出来的一块区域(当然是有范围的).
在load_data_detection()中有这样一段逻辑,生成pleft,pright,ptop,pbot. 这些参数被传递给image_data_argmentation(),用以截取出cropped image.
int oh = get_height_mat(src);
int ow = get_width_mat(src);
int dw = (ow*jitter);
int dh = (oh*jitter);
if(!augmentation_calculated || !track)
{
augmentation_calculated = 1;
r1 = random_float();
r2 = random_float();
r3 = random_float();
r4 = random_float();
dhue = rand_uniform_strong(-hue, hue);
dsat = rand_scale(saturation);
dexp = rand_scale(exposure);
flip = use_flip ? random_gen() % 2 : 0;
}
int pleft = rand_precalc_random(-dw, dw, r1);
int pright = rand_precalc_random(-dw, dw, r2);
int ptop = rand_precalc_random(-dh, dh, r3);
int pbot = rand_precalc_random(-dh, dh, r4);
int swidth = ow - pleft - pright;
int sheight = oh - ptop - pbot;
float sx = (float)swidth / ow;
float sy = (float)sheight / oh;
float dx = ((float)pleft/ow)/sx;
float dy = ((float)ptop /oh)/sy;
这么做的目的是,参考作者AlexeyAB大神的回复:
https://github.com/AlexeyAB/darknet/issues/3703
但是之前,我在issue里有看到有人训练出来的模型效果并不好,改变了image的preprocess以后,效果就好了.这一点还有待研究.
原始的darknet里图像的preprocess用的是letterbox_image(),AlexeyAB的版本里用的是resize.据作者说这一改变使得对小目标的检测效果更好.
参考https://github.com/AlexeyAB/darknet/issues/1907 https://github.com/AlexeyAB/darknet/issues/232#issuecomment-336955485
resize()并不会保持宽高比,letterbox_image()会保持宽高比.作者认为如果你的dataset的train和test中图像分辨率一致的话,是没有必要保持宽高比的.
darknet 推导前处理
detector.c中
void test_detector(char *datacfg, char *cfgfile, char *weightfile, char *filename, float thresh,
float hier_thresh, int dont_show, int ext_output, int save_labels, char *outfile)
{
image im = load_image(input, 0, 0, net.c);
image sized = resize_image(im, net.w, net.h);
}
这里的resize_image是用C实现的,和cv::resize功能相同