Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Enable CSE for VectorX.Create #50644

Merged
merged 4 commits into from
Apr 7, 2021
Merged

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Apr 2, 2021

A low hanging fruit a noticed while I was triaging #13811, but it'd be also nice to allow CSE for GT_LIST-type of args in future.

This PR enables CSE (and hoisting) for Vector128.Create (and other functions with variable amount of args where the actual amount of them is zero or one).

Example:

void Test()
{
    for (int i = 0; i < 100; i++)
    {
        var x = Sse2.MoveMask(Vector128.Create(42).AsByte());
        Console.WriteLine(x);
    }
}

codegen diff: https://www.diffchecker.com/LMZuC6QS

jit-diff (-f):

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 53048870
Total bytes of diff: 53048870
Total bytes of delta: 0 (0.00% of base)


0 total files with Code Size differences (0 improved, 0 regressed), 271 unchanged.

Top method regressions (bytes):
           8 ( 3.01% of base) : System.Private.CoreLib.dasm - HexConverter:EncodeToUtf16_Ssse3(ReadOnlySpan`1,Span`1,int)

Top method improvements (bytes):
          -4 (-1.41% of base) : System.Private.CoreLib.dasm - ASCIIUtility:GetIndexOfFirstNonAsciiChar_Default(long,long):long
          -4 (-1.40% of base) : System.Private.CoreLib.dasm - Latin1Utility:GetIndexOfFirstNonLatin1Char_Default(long,long):long

Top method regressions (percentages):
           8 ( 3.01% of base) : System.Private.CoreLib.dasm - HexConverter:EncodeToUtf16_Ssse3(ReadOnlySpan`1,Span`1,int)

Top method improvements (percentages):
          -4 (-1.41% of base) : System.Private.CoreLib.dasm - ASCIIUtility:GetIndexOfFirstNonAsciiChar_Default(long,long):long
          -4 (-1.40% of base) : System.Private.CoreLib.dasm - Latin1Utility:GetIndexOfFirstNonLatin1Char_Default(long,long):long

3 total methods with Code Size differences (2 improved, 1 regressed), 268544 unchanged.

The regression is actually a perf improvement: https://www.diffchecker.com/54f2vVgR

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 2, 2021
@EgorBo
Copy link
Member Author

EgorBo commented Apr 2, 2021

PTAL @dotnet/jit-contrib @tannergooding

if ((HWIntrinsicInfo::lookupNumArgs(hwIntrinsicNode->gtHWIntrinsicId) == -1) ||
const bool isVariableArgs = HWIntrinsicInfo::lookupNumArgs(hwIntrinsicNode->gtHWIntrinsicId) == -1;

// In case of functions with variable number of args we only support single-arg ones.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, why 0 or 1 and not 0, 1, or 2?

Copy link
Member

@tannergooding tannergooding Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable length args only impact Vector.Create calls for ARM, but also impact cases like Sse.ReciprocalScalar for x86, where the upper limit is 2 args (and therefore op1 will not be a GT_LIST, but NumArgs will be -1).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I wanted to add support for args >= 2 cases later but since you noticed this case I've added it as part of this pr.

Repro:

static int Test()
{
    return Sse2.MoveMask(Sse2.Add(
                Vector128.Create(1, 1), 
                Vector128.Create(1, 1)).AsByte());
}

New codegen:

; Method Program:Test():int
G_M58954_IG01:
       vzeroupper 
						;; bbWeight=1    PerfScore 1.00

G_M58954_IG02:
       vmovupd  xmm0, xmmword ptr [reloc @RWD00]
       vpaddq   xmm0, xmm0, xmm0
       vpmovmskb eax, xmm0
						;; bbWeight=1    PerfScore 4.33

G_M58954_IG03:
       ret      
						;; bbWeight=1    PerfScore 1.00
RWD00  	dq	0000000000000001h, 0000000000000001h
; Total bytes of code: 20

@EgorBo EgorBo merged commit 89ea4c4 into dotnet:main Apr 7, 2021
thaystg added a commit to thaystg/runtime that referenced this pull request Apr 7, 2021
* upstream/main: (568 commits)
  [wasm] Set __DistroRid on Windows to browser-wasm (dotnet#50842)
  [wasm] Fix order of include paths, to have the obj dir first (dotnet#50303)
  [wasm] Fix debug build of AOT cross compiler (dotnet#50418)
  Fix outdated comment (dotnet#50834)
  [wasm][tests] Add properties to allow passing args to xharness (dotnet#50678)
  Vectorized common String.Split() paths (dotnet#38001)
  Fix binplacing symbol files. (dotnet#50819)
  Move type check to after the null ref branch in out marshalling of blittable classes. (dotnet#50735)
  Remove extraneous CMake version requirement. (dotnet#50805)
  [wasm] Remove unncessary condition for EMSDK (dotnet#50810)
  Add loop alignment stats to JitLogCsv (dotnet#50624)
  Resolve ILLink warnings in System.Diagnostics.DiagnosticSource (dotnet#50265)
  Avoid unnecessary closures/delegates in Process (dotnet#50496)
  Fix for field layout verification across version bubble boundary (dotnet#50364)
  JIT: Enable CSE for VectorX.Create (dotnet#50644)
  [main] Update dependencies from mono/linker (dotnet#50779)
  [mono] More domain cleanup (dotnet#50771)
  Race condition in Mock reference tracker runtime with GC. (dotnet#50804)
  Remove IAssemblyName (and various fusion remnants) (dotnet#50755)
  Disable failing test for GCStress. (dotnet#50828)
  ...
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Apr 15, 2021
@ghost ghost locked as resolved and limited conversation to collaborators May 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants