问题描述
我正在尝试使用来自
I am trying to do transfer learning using the Tensorflow Object Detection API using the CenterNet Resnet50 V1 FPN 512x512
from the Model Zoo
我在基于 tensorflow/tensorflow:2.5.0-gpu-jupyter
的 Docker 环境中运行 Tensorflow 和最近签出的 https://github.com/tensorflow/models.git 在提交 eb6687ac
I am running Tensorflow in a Docker environment based on tensorflow/tensorflow:2.5.0-gpu-jupyter
and a recent checkout of https://github.com/tensorflow/models.git at commit eb6687ac
我已经设置了目录结构并下载了预训练的模型:
I have set up the directory structure and download the pre-trained model:
mkdir -p /workspace/pre-trained-models/downloads/ && cd /workspace/pre-trained-models/downloads/
wget http://download.tensorflow.org/models/object_detection/tf2/20200711/centernet_resnet50_v1_fpn_512x512_coco17_tpu-8.tar.gz
tar -zxvf centernet_resnet50_v1_fpn_512x512_coco17_tpu-8.tar.gz -C /workspace/pre-trained-models/
mkdir -p /workspace/models/my_centernet_resnet50_v1_fpn
cp /workspace/pre-trained-models/centernet_resnet50_v1_fpn_512x512_coco17_tpu-8/pipeline.config /workspace/models/my_centernet_resnet50_v1_fpn/
我的pipeline.config
如下:
请注意,我使用的是 use_bfloat16: true
,因为我相信 RTX 3090 支持这一点.没有这一行,它也有同样的错误.
Note it that I am using use_bfloat16: true
as I believe the RTX 3090 supports this. It has the same error without this line.
# CenterNet meta-architecture from the "Objects as Points" [1] paper
# with the ResNet-v2-101 backbone. The ResNet backbone has a few differences
# as compared to the one mentioned in the paper, hence the performance is
# slightly worse. This config is TPU comptatible.
# [1]: https://arxiv.org/abs/1904.07850
#
model {
center_net {
num_classes: 1
feature_extractor {
type: "resnet_v1_50_fpn"
}
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 512
max_dimension: 512
pad_to_max_dimension: true
}
}
object_detection_task {
task_loss_weight: 1.0
offset_loss_weight: 1.0
scale_loss_weight: 0.1
localization_loss {
l1_localization_loss {
}
}
}
object_center_params {
object_center_loss_weight: 1.0
min_box_overlap_iou: 0.7
max_box_predictions: 100
classification_loss {
penalty_reduced_logistic_focal_loss {
alpha: 2.0
beta: 4.0
}
}
}
}
}
train_config: {
batch_size: 32
num_steps: 250000
data_augmentation_options {
random_horizontal_flip {
}
}
optimizer {
adam_optimizer: {
epsilon: 1e-7 # Match tf.keras.optimizers.Adam's default.
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: 1e-3
total_steps: 250000
warmup_learning_rate: 2.5e-4
warmup_steps: 5000
}
}
}
use_moving_average: false
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
fine_tune_checkpoint_version: V2
fine_tune_checkpoint: "/workspace/pre-trained-models/centernet_resnet50_v1_fpn_512x512_coco17_tpu-8/checkpoint/ckpt-0"
fine_tune_checkpoint_type: "detection"
use_bfloat16: true
}
train_input_reader: {
label_map_path: "/workspace/image-data/oli-fish/training_data/train.pbtxt"
tf_record_input_reader {
input_path: "/workspace/image-data/oli-fish/training_data/train.tfrecord"
}
}
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
batch_size: 1;
}
eval_input_reader: {
label_map_path: "/workspace/image-data/oli-fish/test_data/test.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "/workspace/image-data/oli-fish/test_data/test.tfrecord"
}
}
我的训练数据由一个类组成.
My training data consists of a single class.
我使用以下命令运行训练:
I run the training with the following command:
python object_detection/model_main_tf2.py --model_dir=/workspace/models/my_centernet_resnet50_v1_fpn/ --pipeline_config_path=/workspace/models/my_centernet_resnet50_v1_fpn/pipeline.config
我收到以下错误:
/home/tensorflow/.local/lib/python3.6/site-packages/tensorflow_addons/utils/ensure_tf_install.py:67: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.3.0 and strictly below 2.5.0 (nightly versions are not supported).
The versions of TensorFlow you are currently using is 2.5.0 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
UserWarning,
WARNING:tensorflow:Collective ops is not configured at program startup. Some performance features may not be enabled.
W0517 15:55:17.669740 140455981164352 mirrored_strategy.py:379] Collective ops is not configured at program startup. Some performance features may not be enabled.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I0517 15:55:17.837340 140455981164352 mirrored_strategy.py:369] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Maybe overwriting train_steps: None
I0517 15:55:17.839460 140455981164352 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0517 15:55:17.839519 140455981164352 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.870279 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.871679 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.873107 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.873559 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.876967 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.878880 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.891481 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.891972 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.892784 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.893235 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
WARNING:tensorflow:From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/model_lib_v2.py:546: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W0517 15:55:19.319097 140455981164352 deprecation.py:336] From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/model_lib_v2.py:546: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
INFO:tensorflow:Reading unweighted datasets: ['/workspace/image-data/oli-fish/training_data/train.tfrecord']
I0517 15:55:19.320800 140455981164352 dataset_builder.py:163] Reading unweighted datasets: ['/workspace/image-data/oli-fish/training_data/train.tfrecord']
INFO:tensorflow:Reading record datasets for input file: ['/workspace/image-data/oli-fish/training_data/train.tfrecord']
I0517 15:55:19.320896 140455981164352 dataset_builder.py:80] Reading record datasets for input file: ['/workspace/image-data/oli-fish/training_data/train.tfrecord']
INFO:tensorflow:Number of filenames to read: 1
I0517 15:55:19.320939 140455981164352 dataset_builder.py:81] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0517 15:55:19.320975 140455981164352 dataset_builder.py:88] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
W0517 15:55:19.322137 140455981164352 deprecation.py:336] From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
WARNING:tensorflow:From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in afuture version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0517 15:55:19.335709 140455981164352 deprecation.py:336] From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops)is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:206: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0517 15:55:24.661983 140455981164352 deprecation.py:336] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:206: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py:464: to_float (fromtensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0517 15:55:26.951461 140455981164352 deprecation.py:336] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py:464: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/backend.py:435: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
warnings.warn('`tf.keras.backend.set_learning_phase` is deprecated and '
Traceback (most recent call last):
File "object_detection/model_main_tf2.py", line 113, in <module>
tf.compat.v1.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "object_detection/model_main_tf2.py", line 110, in main
record_summaries=FLAGS.record_summaries)
File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/model_lib_v2.py", line 597, in train_loop
train_input, unpad_groundtruth_tensors)
File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/model_lib_v2.py", line 395, in load_fine_tune_checkpoint
fine_tune_checkpoint_type=checkpoint_type)
File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/meta_architectures/center_net_meta_arch.py", line 4155, in restore_from_objects
supported_types))
ValueError: Checkpoint type "detection" not supported for CenterNetResnetV1FpnFeatureExtractor. Supported types are ['classification', 'fine_tune']
根据我问的另一个问题 此处 使用将 fine_tune_checkpoint_type
的值设置为 detection
应该可以工作,但它不根据错误 Checkpoint type "检测"不支持 CenterNetResnetV1FpnFeatureExtractor
.我做错了什么?
According to another question I asked here using setting the value of fine_tune_checkpoint_type
to detection
should work, but it doesn't according to the error Checkpoint type "detection" not supported for CenterNetResnetV1FpnFeatureExtractor
. What am I doing wrong?
推荐答案
好的,这似乎有效.我使用的是 https://github.com/tensorflow/models.git 的旧结帐但我认为这是最新的(git 子模块问题).他们似乎修复了这个问题,或者至少在几周前更改了与此相关的代码.
Ok, this appears to work. I was using an older checkout of https://github.com/tensorflow/models.git but I thought it was the latest (git submodule issue). It appears they fixed this, or at least changed the code related to this a few weeks ago.
这篇关于Tensorflow 对象检测 API - “CenterNet Resnet50 V1 FPN 512x512"上的迁移学习模型错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!