Java运行时如何针对pre

Java运行时如何针对pre

本文介绍了Java运行时如何针对pre-SSE2处理器实现浮点基本操作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

strictfp 被设置时,如何在不使用SSE2的情况下针对英特尔处理器的Java运行时处理浮点非规范?



即使将387 FPU设置为53位精度,它仍然保持超大的指数范围:


  1. 强制检测每个中间结果中的下溢/溢出,并且
  2. 使得难以避免双重正常化的结果。



策略包括重新计算导致带有模拟浮点的非正规值操作的操作,或者沿着为OCaml配备63位浮点,借用一点指数,以避免双舍入。

在任何情况下,我看不出每个浮点计算至少有一个条件分支,除非操作可以stati应确定不下溢/溢出。例外(上溢/下溢)情况如何处理是我的问题的一部分,但这不能与表示的问题分开(永久指数偏移策略似乎意味着只有溢出需要检查,例如)。

解决方案

从我看,从一个非常平凡的测试用例,像JVM往返每一个 double 通过内存计算得到它想要的舍入。它也似乎用一些魔术常量做一些奇怪的事情。以下是我为一个简单的计算2 ^ n天真程序做了什么:

  0xb1e444b0:fld1 
0xb1e444b2 :jmp 0xb1e444dd; * iload
; - fptest :: calc @ 9(第6行)
0xb1e444b7:nop
0xb1e444b8:fldt 0xb523a2c8; {external_word}
0xb1e444be:fmulp%st,%st(1)
0xb1e444c0:fmull 0xb1e44490; {section_word}
0xb1e444c6:fldt 0xb523a2bc; {external_word}
0xb1e444cc:fmulp%st,%st(1)
0xb1e444ce:fstpl 0x10(%esp)
0xb1e444d2:inc%esi; OopMap {off = 51}
; * goto
; - fptest :: calc @ 22(第6行)
0xb1e444d3:test%eax,0xb3f8d100; {poll}
0xb1e444d9:fldl 0x10(%esp); * goto
; - fptest :: calc @ 22(第6行)
0xb1e444dd:cmp%ecx,%esi
0xb1e444df:jl 0xb1e444b8; * if_icmpge
; - fptest :: calc @ 12(第6行)

我相信 0xb523a2c8 0xb523a2bc _fpu_subnormal_bias1 _fpu_subnormal_bias2 从热点源代码。 _fpu_subnormal_bias1 看起来是 0x03ff8000000000000000 _fpu_subnormal_bias2 code> 0x7bff8000000000000000 。 _fpu_subnormal_bias1 具有将最小正常 double 缩放到最小正常 long double ;如果FPU轮到53位,就会发生正确的事情。

我推测看似没有意义的 test 指令是存在的,这样就可以通过在GC需要的时候标记该页面不可读取来中断线程。



这里是Java代码: p>

  import java.io. *; 
public strictfp class fptest {
public static double calc(int k){
double a = 2.0;
double b = 1.0;
for(int i = 0; i b * = a;
}
return b;

public static double intest(){
double d = 0;
for(int i = 0; i return d;

public static void main(String [] args)throws Exception {
for(int i = 0; i System.out.println( INTEST());




$ b $ p
$ b

进一步挖掘这些操作的代码是纯在 hotspot / src / cpu / x86 / vm / x86_63.ad 中的OpenJDK代码中。相关片段:

 指示strictfp_mulD_reg(regDPR1 dst,regnotDPR1 src)%{
谓词(UseSSE< = 1& & Compile :: current() - > has_method()&& Compile :: current()
- > method() - > is_strict());
匹配(Set dst(MulD dst src));
ins_cost(1); //为所有严格的FP双重乘法选择此指令

格式%{FLD StubRoutines :: _ fpu_subnormal_bias1\\\
\t
DMULp $ dst,ST \\\\ t
FLD $ src\\\
\t
DMULp $ dst,ST\\\
\t
FLD StubRoutines :: _ fpu_subnormal_bias2\\\
\t
DMULp $ dst,ST \\\\%}
操作码(0xDE,0x1); / * DE C8 + i或DE / 1 * /
ins_encode(strictfp_bias1(dst),
Push_Reg_D(src),
OpcP,RegOpc(dst),
strictfp_bias2(dst ));
ins_pipe(fpu_reg_reg);
%}

指示strictfp_divD_reg(regDPR1 dst,regnotDPR1 src)%{
谓词(UseSSE< = 1);
匹配(Set dst(DivD dst src));
predicate(UseSSE< = 1&& Compile :: current() - > has_method()&&&& amp; Compile :: current()
- > method() - > is_strict ());
ins_cost(01);

格式%{FLD StubRoutines :: _ fpu_subnormal_bias1\\\
\t
DMULp $ dst,ST \\\
\t
FLD $ src\\ \\ n \ t
FDIVp $ dst,ST \\\\
FLD StubRoutines :: _ fpu_subnormal_bias2\\\
\t
DMULp $ dst,ST \\\
\t%}
操作码(0xDE,0x7); / * DE F8 + i或DE / 7 * /
ins_encode(strictfp_bias1(dst),
Push_Reg_D(src),
OpcP,RegOpc(dst),
strictfp_bias2(dst ));
ins_pipe(fpu_reg_reg);
%}

我没有看到加法和减法,但我敢打赌他们只是在53位模式下对FPU进行加/减运算,然后通过内存对结果进行往返运算。我有点好奇,是否有一个棘手的溢出情况,他们得到错误的,但我不够好奇,以查明。


How does(did) a Java runtime targeting an Intel processor without SSE2 deal with floating-point denormals, when strictfp is set?

Even when the 387 FPU is set for 53-bit precision, it keeps an oversized exponent range that:

  1. forces to detect underflow/overflow at each intermediate result, and
  2. makes it difficult to avoid double-rounding of denormals.

Strategies include re-computing the operation that resulted in a denormal value with emulated floating-point, or a permanent exponent offset along the lines of this technique to equip OCaml with 63-bit floats, borrowing a bit from the exponent in order to avoid double-rounding.

In any case, I see no way to avoid at least one conditional branch for each floating-point computation, unless the operation can statically be determined not to underflow/overflow. How exceptional (overflow/underflow) cases are dealt with is part of my question, but this cannot be separated from the question of the representation (the permanent exponent offset strategy seems to mean that only overflows need to be checked for, for instance).

解决方案

It looks to me, from a very trivial test case, like the JVM round-trips every double computation through memory to get the rounding it wants. It also seems to do something weird with a couple of magic constants. Here's what it did for me for a simple "compute 2^n naively" program:

0xb1e444b0: fld1
0xb1e444b2: jmp    0xb1e444dd         ;*iload
                                      ; - fptest::calc@9 (line 6)
0xb1e444b7: nop
0xb1e444b8: fldt   0xb523a2c8         ;   {external_word}
0xb1e444be: fmulp  %st,%st(1)
0xb1e444c0: fmull  0xb1e44490         ;   {section_word}
0xb1e444c6: fldt   0xb523a2bc         ;   {external_word}
0xb1e444cc: fmulp  %st,%st(1)
0xb1e444ce: fstpl  0x10(%esp)
0xb1e444d2: inc    %esi               ; OopMap{off=51}
                                      ;*goto
                                      ; - fptest::calc@22 (line 6)
0xb1e444d3: test   %eax,0xb3f8d100    ;   {poll}
0xb1e444d9: fldl   0x10(%esp)         ;*goto
                                      ; - fptest::calc@22 (line 6)
0xb1e444dd: cmp    %ecx,%esi
0xb1e444df: jl     0xb1e444b8         ;*if_icmpge
                                      ; - fptest::calc@12 (line 6)

I believe 0xb523a2c8 and 0xb523a2bc are _fpu_subnormal_bias1 and _fpu_subnormal_bias2 from the hotspot source code. _fpu_subnormal_bias1 looks to be 0x03ff8000000000000000 and _fpu_subnormal_bias2 looks to be 0x7bff8000000000000000. _fpu_subnormal_bias1 has the effect of scaling the smallest normal double to the smallest normal long double; if the FPU rounds to 53 bits, the "right thing" will happen.

I'd speculate that the seemingly-pointless test instruction is there so that the thread can be interrupted by marking that page unreadable in the event that a GC is necessary.

Here's the Java code:

import java.io.*;
public strictfp class fptest {
 public static double calc(int k) {
  double a = 2.0;
  double b = 1.0;
  for (int i = 0; i < k; i++) {
   b *= a;
  }
  return b;
 }
 public static double intest() {
  double d = 0;
  for (int i = 0; i < 4100; i++) d += calc(i);
  return d;
 }
 public static void main(String[] args) throws Exception {
  for (int i = 0; i < 100; i++)
   System.out.println(intest());
 }
}

Digging further, the code for these operations is in plain sight in the OpenJDK code in hotspot/src/cpu/x86/vm/x86_63.ad. Relevant snippets:

instruct strictfp_mulD_reg(regDPR1 dst, regnotDPR1 src) %{
  predicate( UseSSE<=1 && Compile::current()->has_method() && Compile::current()
->method()->is_strict() );
  match(Set dst (MulD dst src));
  ins_cost(1);   // Select this instruction for all strict FP double multiplies

  format %{ "FLD    StubRoutines::_fpu_subnormal_bias1\n\t"
            "DMULp  $dst,ST\n\t"
            "FLD    $src\n\t"
            "DMULp  $dst,ST\n\t"
            "FLD    StubRoutines::_fpu_subnormal_bias2\n\t"
            "DMULp  $dst,ST\n\t" %}
  opcode(0xDE, 0x1); /* DE C8+i or DE /1*/
  ins_encode( strictfp_bias1(dst),
              Push_Reg_D(src),
              OpcP, RegOpc(dst),
              strictfp_bias2(dst) );
  ins_pipe( fpu_reg_reg );
%}

instruct strictfp_divD_reg(regDPR1 dst, regnotDPR1 src) %{
  predicate (UseSSE<=1);
  match(Set dst (DivD dst src));
  predicate( UseSSE<=1 && Compile::current()->has_method() && Compile::current()
->method()->is_strict() );
  ins_cost(01);

  format %{ "FLD    StubRoutines::_fpu_subnormal_bias1\n\t"
            "DMULp  $dst,ST\n\t"
            "FLD    $src\n\t"
            "FDIVp  $dst,ST\n\t"
            "FLD    StubRoutines::_fpu_subnormal_bias2\n\t"
            "DMULp  $dst,ST\n\t" %}
  opcode(0xDE, 0x7); /* DE F8+i or DE /7*/
  ins_encode( strictfp_bias1(dst),
              Push_Reg_D(src),
              OpcP, RegOpc(dst),
              strictfp_bias2(dst) );
  ins_pipe( fpu_reg_reg );
%}

I see nothing for addition and subtraction, but I'd bet they just do an add/subtract with the FPU in 53-bit mode and then round-trip the result through memory. I'm a little curious whether there's a tricky overflow case that they get wrong, but I'm not curious enough to find out.

这篇关于Java运行时如何针对pre-SSE2处理器实现浮点基本操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 07:13