本文介绍了tf object detection api - 为每个检测 bbox 提取特征向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Tensorflow 对象检测 API 并研究 pretrainedd ssd-mobilenet 模型.有没有办法为每个 bbox 提取移动网络的最后一个全局池作为特征向量?我找不到保存此信息的操作名称.

I'm using Tensorflow object detection API and working on pretrainedd ssd-mobilenet model. is there a way to extact the last global pooling of the mobilenet for each bbox as a feature vector? I can't find the name of the operation holding this info.

我已经能够根据 github 上的示例提取检测标签和 bbox:

I've been able to extract detection labels and bboxes based on the example on github:

 image_tensor = detection_graph.get_tensor_by_name( 'image_tensor:0' )
 # Each box represents a part of the image where a particular object was detected.
 detection_boxes = detection_graph.get_tensor_by_name( 'detection_boxes:0' )
 # Each score represent how level of confidence for each of the objects.
 # Score is shown on the result image, together with the class label.
 detection_scores = detection_graph.get_tensor_by_name( 'detection_scores:0' )
 detection_classes = detection_graph.get_tensor_by_name( 'detection_classes:0' )
 num_detections = detection_graph.get_tensor_by_name( 'num_detections:0' )
 #TODO: add also the feature vector output

 # Actual detection.
 (boxes, scores, classes, num) = sess.run(
                [detection_boxes, detection_scores, detection_classes, num_detections],
                feed_dict={image_tensor: image_np_expanded} )

推荐答案

正如 Steve 所说,对象检测 api 中 Faster RCNN 中的特征向量似乎在 SecondStageBoxPredictor 之后被丢弃.我能够通过修改 core/box_predictor.py 和 meta_architectures/faster_rcnn_meta_arch.py​​ 将它们穿过网络.

As Steve said the feature vectors in Faster RCNN in the object-detection api seem to get dropped after the SecondStageBoxPredictor. I was able to thread them through the network by modifying the core/box_predictor.py and meta_architectures/faster_rcnn_meta_arch.py.

关键是非最大抑制代码实际上有一个用于 additional_fields 的参数(请参阅 master 上的 core/post_processing.py:176).您可以传递在前两个维度中与框和分数具有相同形状的张量字典,该函数将返回它们以与框和分数相同的方式过滤.这是我所做的更改与 master 的不同之处:

The crux of it is that the non-max suppression code actually has a parameter for additional_fields (see core/post_processing.py:176 on master). You can pass a dict of tensors which have the same shape in the first two dimensions as the boxes and scores and the function will return them filtered the same way as the boxes and scores have been. Here's a diff against master of the changes I made:

https://gist.github.com/donniet/c95d19e00ff9abeb786415b362a9348

https://gist.github.com/donniet/c95d19e00ff9abeb786415b3a9348e62

然后我不得不重建网络并从这样的检查点加载变量而不是加载冻结图(注意:我从这里下载了检查点以获得更快的 rcnn:http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_2018_01_28>p).

Then instead of loading a frozen graph I had to rebuild the network and load the variables from a checkpoint like this (note: I downloaded the checkpoint for faster rcnn from here: http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_2018_01_28.tar.gz)

import sys
import os
import numpy as np

from object_detection.builders import model_builder
from object_detection.protos import pipeline_pb2

from google.protobuf import text_format
import tensorflow as tf

# load the pipeline structure from the config file
with open('object_detection/samples/configs/faster_rcnn_resnet101_coco.config', 'r') as content_file:
    content = content_file.read()

# build the model with model_builder
pipeline_proto = pipeline_pb2.TrainEvalPipelineConfig()
text_format.Merge(content, pipeline_proto)
model = model_builder.build(pipeline_proto.model, is_training=False)

# construct a network using the model
image_placeholder = tf.placeholder(shape=(None,None,3), dtype=tf.uint8, name='input')
original_image = tf.expand_dims(image_placeholder, 0)
preprocessed_image, true_image_shapes = model.preprocess(tf.to_float(original_image))
prediction_dict = model.predict(preprocessed_image, true_image_shapes)
detections = model.postprocess(prediction_dict, true_image_shapes)

# create an input network to read a file
filename_placeholder = tf.placeholder(name='file_name', dtype=tf.string)
image_file = tf.read_file(filename_placeholder)
image_data = tf.image.decode_image(image_file)

# load the variables from a checkpoint
init_saver = tf.train.Saver()
sess = tf.Session()
init_saver.restore(sess, 'object_detection/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt')

# get the image data
blob = sess.run(image_data, feed_dict={filename_placeholder:'image.jpeg'})
# process the inference
output = sess.run(detections, feed_dict={image_placeholder:blob})

# get the shape of the image_features
print(output['image_features'].shape)

警告:我没有针对我所做的更改运行 tensorflow 单元测试,因此仅出于演示目的考虑它们,并且应该进行更多测试以确保它们不会破坏对象检测 api 中的其他内容.

Caveat: I didn't run the tensorflow unit tests against the changes I made, so consider them for demo purposes only, and more testing should be done to make sure they didn't break something else in the object detection api.

这篇关于tf object detection api - 为每个检测 bbox 提取特征向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!