背景
在开发过程中,偶然发现了spinand驱动的一个bug,满怀欣喜地往社区提补丁。这是怎么样的一个bug呢?
static int spinand_mtd_read(struct mtd_info *mtd, loff_t from,
struct mtd_oob_ops *ops)
{
......
nanddev_io_for_each_page(nand, from, ops, &iter) {
......
ret = spinand_read_page(spinand, &iter.req, enable_ecc);
if (ret < 0 && ret != -EBADMSG) /* 读取数据出错 */
break;
if (ret == -EBADMSG) {
/* -EBADMSG 返回表示坏块 */
ecc_failed = true;
mtd->ecc_stats.failed++;
ret = 0;
} else {
/* 出现位翻转或者读取正常,则记录历史位翻转最大值 */
mtd->ecc_stats.corrected += ret;
max_bitflips = max_t(unsigned int, max_bitflips, ret);
}
ops->retlen += iter.req.datalen;
ops->oobretlen += iter.req.ooblen;
}
if (ecc_failed && !ret)
ret = -EBADMSG;
return ret ? ret : max_bitflips;
}
代码逻辑如下:
- 遍历读取每一个page
- 如果读出错则直接返回
- 如果出现坏块,则置位ecc_failed,在函数最后会检查此标志
- 如果出现位翻转,则暂存最大位翻转的bit位数量
- 全部读取完后,如果有置位ecc_failed,则返回坏块错误码;如果出现位翻转,则返回最大位翻转;否则返回0,表示正常
问题出在于,如果刚好最后一次读取出现位翻转,此时ret != 0
就直接退出循环,此时会导致坏块标识无效,且返回最后的位翻转量而非历史位翻转最大值。这是代码不严谨的地方。
第一次提交
修改补丁如下,补丁逻辑不再解释。
In function spinand_mtd_read, if the last page to read occurs bitflip,
this function will return error value because veriable ret not equal to 0.
Signed-off-by: liaoweixiong <[email protected]>
---
drivers/mtd/nand/spi/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/spi/core.c b/drivers/mtd/nand/spi/core.c
index 556bfdb..6b9388d 100644
--- a/drivers/mtd/nand/spi/core.c
+++ b/drivers/mtd/nand/spi/core.c
@@ -511,12 +511,12 @@ static int spinand_mtd_read(struct mtd_info *mtd, loff_t from,
if (ret == -EBADMSG) {
ecc_failed = true;
mtd->ecc_stats.failed++;
- ret = 0;
} else {
mtd->ecc_stats.corrected += ret;
max_bitflips = max_t(unsigned int, max_bitflips, ret);
}
+ ret = 0;
ops->retlen += iter.req.datalen;
ops->oobretlen += iter.req.ooblen;
}
21:13分发出的邮件,21:45分陆续收到两个回复:
<maintainer A>:
Actually, that's exactly what the MTD core expects (see [1]), so you're
the one introducing a regression here.
<maintainer B>:
To me it looks like the patch description is somewhat incorrect, but the
fix itself looks okay, unless I'm getting it wrong.
In case of the last page containing bitflips (ret > 0),
spinand_mtd_read() will return that number of bitflips for the last
page. But to me it looks like it should instead return max_bitflips like
it does when the last page read returns with 0.
以及隔天回复
<maintainer A>:
Oh, you're right. liaoweixiong, can you adjust the commit message
accordingly?
好吧,问题出在与我没把问题描述清楚,改改再提交
第二次提交
只改了comment和补丁标题:
Subject: [PATCH v2] mtd: spinand: read return badly if the last page has bitflips
In case of the last page containing bitflips (ret > 0),
spinand_mtd_read() will return that number of bitflips for the last
page. But to me it looks like it should instead return max_bitflips like
it does when the last page read returns with 0.
然后哗啦啦收到两个Reviewed-by
,附带一个建议:
Reviewed-by: <maintainer B>
This should probably be resent with the following tags:
Cc: [email protected]
Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI
NANDs")
得,再提交一次吧
第三次提交
此时的我提交补丁到社区经验并不多,Maintainer让我resend,我就忐忑开始胡思乱想了:
版本号需要累加么?该怎么标记是重新发送?有两个maintainer已经"认可"了我的补丁(reviewed-by),我改怎么体现到新的邮件中?
仔细想想内容并没改,因此不需要累加版本号;查询前人提交,在邮件标题可以加上RESEND字样;搜索含RESEND字样的前人邮件,刚好找到一个在maintainer reviewed后resend为acked,写在signed-off-by区。
OK,确定下来就重新发吧
Subject: [RESEND PATCH v2] mtd: spinand: read return badly if the last page has bitflips
......
Signed-off-by: liaoweixiong <[email protected]>
Acked-by: <maintainer A>
Acked-by: <maintainer B>
Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI NANDs")
很快,就挨批了...
第四次提交
晚上10点多,收到回复:
<maintainer B>
Why did you change our Reviewed-by tags to Acked-by tags?
额...我也是看别人这么做我才这么做的,大佬生气了!赶紧补救
......
Reviewed-by: <maintainer A>
Reviewed-by: <maintainer B>
Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI NANDs")
第五次提交
埋下的坑终究是要踩的,很快,再次挨批了
<maintainer C>
This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.
<maintainer A>
FYI, you should not send the patch to [email protected], but
instead, as I said in my other reply, add the tag "Cc:
[email protected]". See "Option 1" in the document Greg referred to.
小白赶紧狠补基础操作规范...
第六次提交
......
Reviewed-by: <maintainer A>
Reviewed-by: <maintainer B>
Cc: [email protected]
Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI NANDs")
总结
哎,我只是挪了一行代码的位置而已啊,Maintainer严审下,我竟然提交了6次!6次!突然感觉心好累。
累归累,问题总结还是需要的
- 新手不具备提交代码的一些常识,包括a) 提交中各个tag的含义,在什么时候加这些tag,例如Reviewed-by和Acked-by的差别b) 提交补丁到stable的注意事项
- 对补丁的问题描述不够仔细清楚,导致无法理解,幸好帮我澄清了
解决方法:
- linux提交有规范文档的,抽时间撸一遍,并翻译发博客
- 在发补丁之前,让身边的人帮忙看一下补丁说明是否足够清晰
希望我的经历能帮助到正在或者准备向Linux内核开源社区的小伙伴