From c0239b58050a3769ad0565093fd9e69b3c9b909f Mon Sep 17 00:00:00 2001
From: Greg Roth <grroth@microsoft.com>
Date: Tue, 17 Dec 2024 16:37:02 -0700
Subject: [PATCH 01/11] Rename spec to reflect its focus on long vectors

Since we'll be creating a separate DXIL spec to document native vectors in DXIL, this spec will be a little more constrined to deal with HLSL long vectors. This commit is to isolate the meaningful content changes that come later
---
 ...0026-hlsl-vector-type.md => 0026-hlsl-long-vector-type.md} | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
 rename proposals/{0026-hlsl-vector-type.md => 0026-hlsl-long-vector-type.md} (98%)

diff --git a/proposals/0026-hlsl-vector-type.md b/proposals/0026-hlsl-long-vector-type.md
similarity index 98%
rename from proposals/0026-hlsl-vector-type.md
rename to proposals/0026-hlsl-long-vector-type.md
index 7f52585b..30a1c46e 100644
--- a/proposals/0026-hlsl-vector-type.md
+++ b/proposals/0026-hlsl-long-vector-type.md
@@ -1,11 +1,11 @@
 <!-- {% raw %} -->
 
-* Proposal: [0026-HLSL-Vectors](0026-hlsl-vector-type.md)
+* Proposal: [0026-HLSL-Vectors](0026-hlsl-long-vector-type.md)
 * Author(s): [Anupama Chandrasekhar](https://github.com/anupamachandra)
 * Sponsor: [Damyan Pepper](https://github.com/damyanp)
 * Status: **Under Consideration**
 
-# HLSL Vectors
+# HLSL Long Vectors
 
 ## Introduction
 

From 343f6b3db134a0d03c8dcebc1594e7a10bc6664b Mon Sep 17 00:00:00 2001
From: Greg Roth <grroth@microsoft.com>
Date: Mon, 2 Dec 2024 12:49:42 -1000
Subject: [PATCH 02/11] initial rewording/formatting

Make md lint happy, eliminate Load/StoreN, revise some wording
---
 proposals/0026-hlsl-long-vector-type.md | 121 +++++++++++++++---------
 1 file changed, 75 insertions(+), 46 deletions(-)

diff --git a/proposals/0026-hlsl-long-vector-type.md b/proposals/0026-hlsl-long-vector-type.md
index 30a1c46e..d82f40fd 100644
--- a/proposals/0026-hlsl-long-vector-type.md
+++ b/proposals/0026-hlsl-long-vector-type.md
@@ -1,65 +1,81 @@
 <!-- {% raw %} -->
 
-* Proposal: [0026-HLSL-Vectors](0026-hlsl-long-vector-type.md)
-* Author(s): [Anupama Chandrasekhar](https://github.com/anupamachandra)
-* Sponsor: [Damyan Pepper](https://github.com/damyanp)
-* Status: **Under Consideration**
-
 # HLSL Long Vectors
 
+* Proposal: [0026-HLSL-Vectors](0026-hlsl-vector-type.md)
+* Author(s): [Anupama Chandrasekhar](https://github.com/anupamachandra), [Greg Roth](https://github.com/pow2clk)
+* Sponsor: [Greg Roth](https://github.com/pow2clk)
+* Status: **Under Consideration**
+
 ## Introduction
 
-HLSL has supported vectors in a limited capacity (int3, float4, etc.), and these are scalarized in DXIL; small vectors while useful in a traditional graphics context do not scale well with the evolution on HLSL as a more general purpose language targetting Graphics and Compute. Notably, with the ubiquitous adoption of machine learning techniques which often get expressed as vector-matrix operations, there is a need for supporting larger vector sizes in HLSL and preserving these vector objects at the DXIL level to take advantage of specialized hardware that can accelerate vector operations.
+HLSL has supported vectors in a limited capacity (int3, float4, etc.).
+These are scalarized in DXIL.
+While they are useful in a traditional graphics context,
+ small vectors do not scale well with the evolution of HLSL as a more general purpose language targeting Graphics and Compute.
+Notably, the adoption of machine learning techniques expressed as vector-matrix operations require larger vector sizes to be representable in HLSL.
+To take advantage of specialized hardware that can accelerate vector operations,
+ these and other vector objects need to be preserved at the DXIL level.
 
 ## Proposed solution
 
-Enable vectors of longer length in HLSL and preserve the vector type in DXIL.
+Enable vectors of length greater than 4 in HLSL using existing template-based vector declarations.
+Preserve the vector type in DXIL.
 
 ## Detailed design
 
-### HLSL vectors `vector<T, N>`
+### HLSL vectors
 
-Currently HLSL allows `vector<T, N> name;` where `T` is any [scalar](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-scalar) type and `N`, number of
-components, is a positive integer less than or equal to 4. See current definition [here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector). 
-This proposal extends this support to longer vectors (beyond 4). 
+Currently HLSL allows declaring vectors using a templated representation:
+
+```hlsl
+vector<T, N> name;
+```
+
+`T` is any [scalar](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-scalar) type.
+`N` is the number of components and must be an integer between 1 and 4 inclusive.
+See the vector definition [documentation](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector) for more details.
+This proposal adds support for vectors of length greater than 4.
 
 The default behavior of HLSL vectors is preserved for backward compatibility, meaning, skipping the last parameter `N`
 defaults to 4-component vectors and the use `vector name;` declares a 4-component float vector, etc. More examples
 [here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector).
+Declarations of vectors longer than 4 require the use of the template declaration.
+Unlike vector sizes between 1 and 4, no shorthand declarations are provided.
 
 The new vectors will be supported in all shader stages including Node shaders. There are no control flow or wave
-uniformity requirements, but implementations may specify best practices in certain uses for optimal performance. 
+uniformity requirements, but implementations may specify best practices in certain uses for optimal performance.
 
-**Restrictions on the uses of vectors with N > 4** 
+Restrictions on the uses of vectors with N > 4:
 
 * Vectors with length greater than 4 are not permitted inside a `struct`.
 * Vectors with length greater than 4 are not permitted as shader input/output parameters.
 
-**Constructing vectors**
+#### Constructing vectors
 
-HLSL vectors can be constructed through initializer lists and constructor syntax initializing or by assignment.
+HLSL vectors can be constructed through initializer lists, constructor syntax initialization, or by assignment.
 
 Examples:
 
-``` 
-vector<uint, 5> vecA = {1, 2, 3, 4, 5}; 
+``` hlsl
+vector<uint, 5> vecA = {1, 2, 3, 4, 5};
 vector<uint, 6> vecB = vector<uint, 6>(6, 7, 8, 9, 0, 0);
 uint4 initval = {0, 0, 0, 0};
 vector<uint, 8> vecC = {uint2(coord.xy), vecB};
 vector<uint, 6> vecD = vecB;
 ```
 
-**Load and Store vectors from Buffers/Arrays**
+#### Load and Store vectors from Buffers/Arrays
 
-For loading and storing N-dimensional vectors from ByteAddressBuffers we use the `LoadN` and `StoreN` methods, extending
-the existing Load/Store, Load2/Store2, Load3/Store3 and Load4/Store4 methods.
+For loading and storing N-dimensional vectors from ByteAddressBuffers we use the templated load and store methods
+by providing a vector type of the required size as the template parameter.
 
-``` 
+```hlsl
 // Load/Store from [RW]ByteAddressBuffers
 RWByteAddressBuffer myBuffer;
 
-vector<uint, N> val = myBuffer.LoadN(uint StartOffsetInBytes); 
-myBuffer.StoreN<T>(uint StartoffsetInBytes, vector<T, N> stVec);
+vector<T, N> val = myBuffer.Load< vector<T, N> >(uint StartOffsetInBytes);
+myBuffer.Store< vector<T, N> >(uint StartoffsetInBytes, vector<T, N> stVec);
 
 // Load/Store from groupshared arrays
 groupshared T inputArray[512];
@@ -69,26 +85,33 @@ Load(vector<T,N> ldVec, groupshared inputArray, uint offsetInBytes);
 Store(vector<T,N> stVec, groupshared outputArray, uint offsetInBytes);
 ```
 
-**Operations on vectors** 
+#### Operations on vectors
 
-Support all HLSL intrinsics that are important as activation functions: fma, exp, log, tanh, atan, min, max, clamp, and
-step. Eventually support all HLSL operators and math intrinsics that are currently enabled for vectors.
+Support all HLSL intrinsics that are important as activation functions:
 
-Refer to the HLSL spec for an exhaustive list of [Operators](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-operators) and [Intrinsics](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions).
+* fma
+* exp
+* log
+* tanh
+* atan
+* min
+* max
+* clamp
+* step
 
-Note: Additionally any mathematical operations missing from the above list but needed as activation functions for neural
-network computations will be added.
+Eventually support all HLSL operators and math intrinsics that are currently enabled for vectors.
+
+Refer to the HLSL spec for an exhaustive list of [Operators](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-operators) and [Intrinsics](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions).
 
 ### Debug Support
-First class debug support for HLSL vectors. Emit `llvm.dbg.declare` and `llvm.dbg.value` intrinsics that can be used by tools for better debugging experience. Open Issue: Handle DXIL scalarized and vector paths. 
 
+First class debug support for HLSL vectors. Emit `llvm.dbg.declare` and `llvm.dbg.value` intrinsics that can be used by tools for better debugging experience. Open Issue: Handle DXIL scalarized and vector paths.
 
 ### Diagnostic Changes
 
 * Additional error messages for illegal or unsupported use of arbitrary length vectors.
 * Remove current bound checks (N <= 4) for vector size in supported cases, both HLSL and DXIL.
 
-
 ### Validation Changes
 
 * What additional validation failures does this introduce?
@@ -106,7 +129,6 @@ Open Issue: Can implementations support vector DXIL?
 
 ### Minimum Support Set
 
-
 ## Testing
 
 * How will correct codegen for DXIL/SPIRV be tested?
@@ -117,30 +139,37 @@ Open Issue: Can implementations support vector DXIL?
 * How will the execution results be tested?
 * A: *HLK tests*
 
-
 ## Alternatives considered
 
-Our original proposal introduced an opaque Cooperative Vector type to HLSL to limit the scope of the feature to small
-neural network evaluation and also contain the scope for testing. But aligning with the long term roadmap of HLSL to
-enable generic vectors, it makes sense to not introduce a new datatype but use HLSL vectors, even if the initial
-implementation only exposes partial functionality.
+The original proposal introduced an opaque type to HLSL that could represent longer vectors.
+This would have been used only for cooperative vector operations.
+This would have limited the scope of the feature to small neural network evaluation and also contain the scope for testing some.
+
+Representing vectors used in neural networks as LLVM vectors also allows leveraging existing optimizations.
+This direction also aligns with the long term roadmap of HLSL to enable generic vectors.
+Since the new data type would have required extensive testing as well,
+the testing burden saved may not have been substantial.
+Since these vectors are to be added eventually anyway, the testing serves multiple purposes.
+It makes sense to not introduce a new datatype but use HLSL vectors,
+even if the initial implementation only exposes partial functionality.
 
 ## Open Issues
+
 * Q: Is there a limit on the Number of Components in a vector?
-* A: Chose a number based on precedents set by other languages. Support atleast 128.
+  * A: Chose a number based on precedents set by other languages. Support atleast 128.
 * Q: Usage restrictions
-* A: *General vectors (N > 4) are not permitted inside structs.*
+  * A: General vectors (N > 4) are not permitted inside structs.
 * Q: Does this have implications for existing HLSL source code compatibility?
-* A: *No, existing HLSL code is unaffected by this change.*
-* A: *Change the default N = 4 for vectors? Will affect existing shaders.*
+  * A: No, existing HLSL code is unaffected by this change.
+* Q: Should this change the default N = 4 for vectors?
+  * A: No. While the default size of 4 is less intuitive in a world of larger vectors, existing code depends on this default, so it remains unchanged.
 * Q: How will SPIRV be supported?
-* A: 
-* Q: When do HLSL vectors remain as vectors and when do they get scalarized in DXIL?
-* A: 
+  * A: TBD
+* Q: Under what conditions do HLSL vectors remain as vectors and when do they get scalarized in DXIL?
+  * A: UNRESOLVED
 * Q: Can all implementations support vector DXIL?
-* A: Feature check?
+  * A: Feature check?
 
 ## Acknowledgments
 
-
 <!-- {% endraw %} -->
\ No newline at end of file

From 7ffe4d4820bbe587ba33a2d9f836f79282eb9e89 Mon Sep 17 00:00:00 2001
From: Greg Roth <grroth@microsoft.com>
Date: Tue, 17 Dec 2024 16:36:03 -0700
Subject: [PATCH 03/11] Rework hlsl-vector-type

This splits the spec into two. dxil-vectors concerns the addition of vectors to DXIL only.
hlsl-long-vector-type relates to the addition of long vectors in the HLSL language and also
for select DXIL intrinsics.

Throughout, this adds additional details concerning testing and support.
It makes a few alterations to the originally proposed behavior particularly concerning
the loading and storing of long vectors whether from/to raw buffers or groupshared variables.
The latter intrinsics were dropped entirely in favor of existing assignment operations being
lowered to appropriate operations.
Long vectors are allowed in structs and non-entry function signatures and disallowed
in shader signatures, cbuffers/tbuffers, and as elements of non-raw buffers.

Note that the use of 6.9 is a placeholder for the release vehicle for this feature.
---
 proposals/0026-hlsl-long-vector-type.md | 217 +++++++++++++++++-------
 proposals/NNNN-dxil-vectors.md          | 183 ++++++++++++++++++++
 2 files changed, 343 insertions(+), 57 deletions(-)
 create mode 100644 proposals/NNNN-dxil-vectors.md

diff --git a/proposals/0026-hlsl-long-vector-type.md b/proposals/0026-hlsl-long-vector-type.md
index d82f40fd..7c7d70c9 100644
--- a/proposals/0026-hlsl-long-vector-type.md
+++ b/proposals/0026-hlsl-long-vector-type.md
@@ -9,18 +9,27 @@
 
 ## Introduction
 
-HLSL has supported vectors in a limited capacity (int3, float4, etc.).
-These are scalarized in DXIL.
-While they are useful in a traditional graphics context,
- small vectors do not scale well with the evolution of HLSL as a more general purpose language targeting Graphics and Compute.
-Notably, the adoption of machine learning techniques expressed as vector-matrix operations require larger vector sizes to be representable in HLSL.
+HLSL has always supported vectors of as many as four elements of different element types (int3, float4, etc.).
+These are useful in a traditional graphics context for representation and manipulation of
+ geometry and color information.
+The evolution of HLSL as a more general purpose language targeting Graphics and Compute
+ greatly benefit from longer vectors to fully represent these operations rather than to try to
+ break them down into smaller constituent vectors.
+This feature adds the ability to declare and use native HLSL vectors longer than four elements.
+
+## Motivation
+
+The adoption of machine learning techniques expressed as vector-matrix operations
+ require larger vector sizes to be representable in HLSL.
 To take advantage of specialized hardware that can accelerate vector operations,
  these and other vector objects need to be preserved at the DXIL level.
 
 ## Proposed solution
 
-Enable vectors of length greater than 4 in HLSL using existing template-based vector declarations.
-Preserve the vector type in DXIL.
+Enable vectors of length between 4 and 128 inclusive in HLSL using existing template-based vector declarations.
+Such vectors will hereafter be referred to as "long vectors".
+These will be supported for all elementwise intrinsics that take variable-length vector parameters.
+For certain operations, these vectors will be represented as native vectors using [dxil vectors](NNNN-dxil-vectors.md).
 
 ## Detailed design
 
@@ -35,21 +44,24 @@ vector<T, N> name;
 `T` is any [scalar](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-scalar) type.
 `N` is the number of components and must be an integer between 1 and 4 inclusive.
 See the vector definition [documentation](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector) for more details.
-This proposal adds support for vectors of length greater than 4.
+This proposal adds support for long vectors of length greater than 4 by
+ allowing `N` to be greater than 4 where previously such a declaration would produce an error.
 
 The default behavior of HLSL vectors is preserved for backward compatibility, meaning, skipping the last parameter `N`
 defaults to 4-component vectors and the use `vector name;` declares a 4-component float vector, etc. More examples
 [here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector).
-Declarations of vectors longer than 4 require the use of the template declaration.
-Unlike vector sizes between 1 and 4, no shorthand declarations are provided.
+Declarations of long vectors require the use of the template declaration.
+Unlike vector sizes between 1 and 4, no shorthand declarations that concatenate
+ the element type and number of elements (e.g. float2, double4) are allowed for long vectors.
 
 The new vectors will be supported in all shader stages including Node shaders. There are no control flow or wave
 uniformity requirements, but implementations may specify best practices in certain uses for optimal performance.
 
-Restrictions on the uses of vectors with N > 4:
+Long vectors are not permitted in:
 
-* Vectors with length greater than 4 are not permitted inside a `struct`.
-* Vectors with length greater than 4 are not permitted as shader input/output parameters.
+* Resource types other than ByteAddressBuffer or StructuredBuffer.
+* Any element of the shader's signature including entry function parameters and return types.
+* Cbuffers or tbuffers.
 
 #### Constructing vectors
 
@@ -65,29 +77,62 @@ vector<uint, 8> vecC = {uint2(coord.xy), vecB};
 vector<uint, 6> vecD = vecB;
 ```
 
-#### Load and Store vectors from Buffers/Arrays
+#### Vectors in Raw Buffers
 
-For loading and storing N-dimensional vectors from ByteAddressBuffers we use the templated load and store methods
-by providing a vector type of the required size as the template parameter.
+N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods
+with a vector type of the required size as the template parameter and byte offset parameters.
 
 ```hlsl
-// Load/Store from [RW]ByteAddressBuffers
 RWByteAddressBuffer myBuffer;
 
-vector<T, N> val = myBuffer.Load< vector<T, N> >(uint StartOffsetInBytes);
-myBuffer.Store< vector<T, N> >(uint StartoffsetInBytes, vector<T, N> stVec);
+vector<T, N> val = myBuffer.Load< vector<T, N> >(StartOffsetInBytes); 
+myBuffer.Store< vector<T, N> >(StartoffsetInBytes + 100, val);
 
-// Load/Store from groupshared arrays
-groupshared T inputArray[512];
-groupshared T outputArray[512];
+```
+
+StructuredBuffers with N-element vectors are declared using the template syntax
+ with a long vector type as the template parameter.
+N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods
+with the element index parameters.
+
+```hlsl
+RWStructuredBuffer< vector<T, N> > myBuffer;
+
+vector<T, N> val = myBuffer.Load(elementIndex); 
+myBuffer.Store(elementIndex, val);
 
-Load(vector<T,N> ldVec, groupshared inputArray, uint offsetInBytes);
-Store(vector<T,N> stVec, groupshared outputArray, uint offsetInBytes);
 ```
 
-#### Operations on vectors
+#### Accessing elements of long vectors
+
+Long vectors support the existing vector subscript operators to return the scalar element values.
+They do not support swizzle operations as they are limited to only the first four elements.
 
-Support all HLSL intrinsics that are important as activation functions:
+#### Operations on long vectors
+
+Support all HLSL intrinsics that perform [elementwise calculations](NNNN-dxil-vectors.md#elementwise-intrinsics)
+ that take parameters that could be long vectors and whose function doesn't limit them to shorter vectors.
+These are operations that perform the same operation on an element regardless of its position in the vector
+ except that the position indicates which element(s) of other vector parameters might be used in that calculation.
+
+Refer to the HLSL spec for an exhaustive list of [Operators](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-operators) and [Intrinsics](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions).
+
+#### Allowed elementwise vector intrinsics
+
+* Trigonometry : acos, asin, atan, atan2, cos, cosh, degrees, radians, sin, sinh, tan, tanh
+* Math: abs, ceil, clamp, exp, exp2, floor, fma, fmod, frac, frexp, ldexp, lerp, log, log10, log2, mad, max, min, pow, rcp, round, rsqrt, sign, smoothstep, sqrt, step, trunc
+* Float Ops: f16tof32, f32tof16, isfinite, isinf, isnan, modf, saturate
+* Bitwise Ops: reversebits, countbits, firstbithigh, firstbitlow
+* Logic Ops: and, or, select
+* Reductions: all, any, clamp, dot
+* Quad Ops: ddx, ddx_coarse, ddx_fine, ddy, ddy_coarse, ddy_fine, fwidth, QuadReadLaneAt, QuadReadLaneAcrossX, QuadReadLaneAcrossY, QuadReadLaneAcrossDiagonal
+* Wave Ops: WaveActiveBitAnd, WaveActiveBitOr, WaveActiveBitXor, WaveActiveProduct, WaveActiveSum, WaveActiveMin, WaveActiveMax, WaveMultiPrefixBitAnd, WaveMultiPrefixBitOr, WaveMultiPrefixBitXor, WaveMultiPrefixProduct, WaveMultiPrefixSum, WavePrefixSum, WavePrefixProduct, WaveReadLaneAt, WaveReadLaneFirst
+* Wave Reductions: WaveActiveAllEqual, WaveMatch
+
+#### Native vector intrinsics
+
+Of the above list, the following will produce the appropriate unary, binary, or tertiary
+ DXIL intrinsic that take native vector parameters:
 
 * fma
 * exp
@@ -99,9 +144,10 @@ Support all HLSL intrinsics that are important as activation functions:
 * clamp
 * step
 
-Eventually support all HLSL operators and math intrinsics that are currently enabled for vectors.
+#### Disallowed vector intrinsics
 
-Refer to the HLSL spec for an exhaustive list of [Operators](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-operators) and [Intrinsics](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions).
+* Only applicable to for shorter vectors: AddUint64, asdouble, asfloat, asfloat16, asint, asint16, asuint, asuint16, D3DCOLORtoUBYTE4, cross, distance, dst, faceforward, length, normalize, reflect, refract, NonUniformResourceIndex
+* Only useful for disallowed variables: EvaluateAttributeAtSample, EvaluateAttributeCentroid, EvaluateAttributeSnapped, GetAttributeAtVertex
 
 ### Debug Support
 
@@ -109,40 +155,92 @@ First class debug support for HLSL vectors. Emit `llvm.dbg.declare` and `llvm.db
 
 ### Diagnostic Changes
 
-* Additional error messages for illegal or unsupported use of arbitrary length vectors.
-* Remove current bound checks (N <= 4) for vector size in supported cases, both HLSL and DXIL.
+Error messages should be produced for use of long vectors in unsupported interfaces.
+
+* The shader signature.
+* A cbuffer/tbuffer.
+* A work graph record.
+* A mesh or ray tracing payload.
+
+Errors should also be produced when long vectors are used as parameters to intrinsics
+ with vector parameters of variable length, but aren't permitted as listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics)
+Attempting to use any swizzle member-style accessors on long vectors should produce an error.
+Declaring vectors of length longer than 128 should produce an error.
 
 ### Validation Changes
 
-* What additional validation failures does this introduce?
-*Illegal uses of vectors should produce errors*
-* What existing validation failures does this remove?
-*Allow legal uses of vectors with number of components greater than 4*
+Validation should produce errors when a long vector is found in:
 
-## D3D12 API Additions
+* The shader signature.
+* A cbuffer/tbuffer.
+* A work graph record.
+* A mesh or ray tracing payload.
 
-TODO: Possible checks for DXIL vector support and tiered support.
+Use of long vectors in unsupported intrinsics should produce validation errors.
 
-## Check Feature Support
+## Runtime Additions
 
-Open Issue: Can implementations support vector DXIL?
+Support for Long vectors requires dxil vector support as defined in [the specification](NNNN-dxil-vectors.md).
 
-### Minimum Support Set
+Use of long vectors in a shader should be indicated in DXIL with the corresponding
+ shader model version and shader feature flag.
 
 ## Testing
 
-* How will correct codegen for DXIL/SPIRV be tested?
-* How will the diagnostics be tested?
-* How will validation errors be tested?
-* How will validation of new DXIL elements be tested?
-* A: *unit tests in dxc*
-* How will the execution results be tested?
-* A: *HLK tests*
+### Compilation Testing
+
+#### Correct output testing
+
+Verify that long vectors can be declared in all appropriate contexts:
+
+* local variables
+* non-entry parameters
+* non-entry return types
+* StructuredBuffer elements
+* Templated Load/Store methods on ByteAddressBuffers
+* As members of arrays and structs in any of the above contexts
+
+Verify that long vectors in supported intrinsics produce appropriate outputs.
+For the intrinsic functions listed in [Native vector intrinsics](#native-vector-intrinsics),
+ the generated DXIL intrinsic calls will have long vector parameters.
+For other elementwise vector intrinsic functions listed in [Allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics),
+ the generated DXIL should scalarize the parameters and produce scalar calls to the corresponding DXIL intrinsics.
+
+Verify that long vector elements can be accessed using the subscript operation.
+
+#### Invalid usage testing
+
+Verify that compilation errors are produced for long vectors used in:
+
+* Entry function parameters
+* Entry function returns
+* Type buffer declarations
+* Cbuffer blocks
+* Cbuffer global variables
+* Work graph records
+* Mesh and ray tracing payloads
+* Any intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics)
+* All swizzle operations (e.g. `lvec.x`, `lvec.rg`, `lvec.wzyx`)
+
+### Validation Testing
+
+Verify that Validation produces errors for any DXIL intrinsic that corresponds to the
+ HLSL intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics).
+Verify that Validation produces errors for any DXIL intrinsic with native vector parameters
+ that corresponds to the [allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics)
+ and are not listed in [native vector intrinsics](#native-vector-intrinsics).
+
+### Execution Testing
+
+Correct behavior for all of the intrinsics listed in [allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics)
+ will be verified with execution tests that perform the operations on long vectors and confirm correct results
+ for the given test values.
+Where possible, these tests will be variations on existing tests for these intrinsics.
 
 ## Alternatives considered
 
 The original proposal introduced an opaque type to HLSL that could represent longer vectors.
-This would have been used only for cooperative vector operations.
+This would have been used only for native vector operations.
 This would have limited the scope of the feature to small neural network evaluation and also contain the scope for testing some.
 
 Representing vectors used in neural networks as LLVM vectors also allows leveraging existing optimizations.
@@ -156,20 +254,25 @@ even if the initial implementation only exposes partial functionality.
 ## Open Issues
 
 * Q: Is there a limit on the Number of Components in a vector?
-  * A: Chose a number based on precedents set by other languages. Support atleast 128.
+  * A: 128. It's big enough for some known uses.
+There aren't concrete reasons to restrict the vector length.
+Having a limit facilitates testing and sets expectations for both hardware and software developers.
+
 * Q: Usage restrictions
-  * A: General vectors (N > 4) are not permitted inside structs.
+  * A: Long vectors may not form part of the shader signature.
+       There are many restrictions on signature elements including bit fields that determine if they are fully written.
+       By definition, these involve more interfaces that would require additional changes and testing.
 * Q: Does this have implications for existing HLSL source code compatibility?
-  * A: No, existing HLSL code is unaffected by this change.
+  * A: Existing HLSL code that makes no use of long vectors will have no semantic changes.
 * Q: Should this change the default N = 4 for vectors?
   * A: No. While the default size of 4 is less intuitive in a world of larger vectors, existing code depends on this default, so it remains unchanged.
-* Q: How will SPIRV be supported?
+* Q: How will SPIR-V be supported?
   * A: TBD
-* Q: Under what conditions do HLSL vectors remain as vectors and when do they get scalarized in DXIL?
-  * A: UNRESOLVED
-* Q: Can all implementations support vector DXIL?
-  * A: Feature check?
-
-## Acknowledgments
+* Q: should swizzle accessors be allowed for long vectors?
+  * A: No. It doesn't make sense since they can't be used to access all elements
+       and there's no way to create enough swizzle members to accommodate the longest allowed vector.
+* Q: How should scalar groupshared arrays be loaded/stored into/out of long vectors.
+  * A: After some consideration, we opted not to include explicit Load/Store operations for this function.
+       There are at least a couple ways this could be resolved, and the preferred solution is outside the scope.
 
 <!-- {% endraw %} -->
\ No newline at end of file
diff --git a/proposals/NNNN-dxil-vectors.md b/proposals/NNNN-dxil-vectors.md
new file mode 100644
index 00000000..4edae3a7
--- /dev/null
+++ b/proposals/NNNN-dxil-vectors.md
@@ -0,0 +1,183 @@
+<!-- {% raw %} -->
+
+# DXIL Vectors
+
+---
+
+* Proposal: [NNNN](NNNN-dxil-vectors.md)
+* Author(s): [Greg Roth](https://github.com/pow2clk)
+* Sponsor: [Greg Roth](https://github.com/pow2clk)
+* Status: **Under Consideration**
+* Planned Version: Shader Model 6.9
+
+## Introduction
+
+While DXIL is intended and able to support language vectors,
+ those vectors must be broken up into individual scalars to be valid DXIL.
+This feature introduces the ability to represent native vectors in DXIL for some uses.
+
+## Motivation
+
+While the original shape of the vectors may be reconstructed from their scalarized form,
+ it requires additional work of the DXIL consumer and results in larger DXIL binary sizes.
+Although it has never been allowed in DXIL, the LLVM IR that DXIL is based on can represent native vectors.
+By allowing these native vector types in DXIL, the size of generated DXIL can be reduced and
+ new opportunities for expanding vector capabilities in DXIL ar introduced.
+
+## Proposed solution
+
+Native vectors are allowed in DXIL version 1.9 or greater.
+These can be stored in allocas, static globals, and groupshared variables.
+They can be loaded from or stored to raw buffers and used as arguments to a selection
+ of element-wise intrinsic functions as well as the standard math operators.
+They cannot be used in shader signatures, constant buffers, typed buffer, or texture types.
+
+## Detailed design
+
+### Vectors in memory representations
+
+In their alloca and variable representations, vectors in DXIL will always be represented as vectors.
+Previously individual vectors would get scalarized into scalar arrays and arrays of vectors would be flattened
+ into a one-dimensional scalar array with indexing to reflect the original intents.
+Individual vectors will now be represented as a single native vector and arrays of vectors will remain
+ as arrays of native vectors, though multi-dimensional arrays will still be flattened to one dimension.
+
+Scalarization of these vectors will continue to be done for uses that don't support native vectors,
+ but it will be done using extractelement instructions from the native vectors
+ instead of loads from the scalarized array representation.
+
+Single-element vectors are not valid in DXIL.
+At the language level, they may be supported for corresponding intrinsic overloads,
+  but such vectors should be represented as scalars in the final DXIL output.
+
+Although matrices are represented as vectors in some contexts such as unlinked library shaders,
+ their final DXIL representation will continue to be as arrays of scalars.
+This is consistent with both their past and future intended representation.
+
+### Changes to DXIL Intrinsics
+
+A new form of rawBufferLoad allows loading of full vectors instead of four scalars.
+The status integer for tiled resource access is loaded just as before.
+The returned vector value and the status indicator are grouped into a new `ResRet` helper structure type
+ that the load intrinsic returns.
+
+```asm
+  declare %dx.types.ResRet.v[NUM][TY] @dx.op.rawBufferLoad.v[NUM][TY](
+      i32,                  ; opcode
+      %dx.types.Handle,     ; resource handle
+      i32,                  ; coordinate c0 (index)
+      i32,                  ; coordinate c1 (elementOffset)
+      i8,                   ; mask
+      i32,                  ; alignment
+  )
+```
+
+The return struct contains a single vector and a single integer representing mapped tile status.
+
+```asm
+  %dx.types.ResRet.v[NUM][TY] = type { vector<TYPE, NUM>, i32 }
+```
+
+Here and hereafter, `NUM` is the number of elements in the loaded vector, `TYPE` is the element type name,
+ and `TY` is the corresponding abbreviated type name (e.g. `i64`, `f32`).
+
+#### Elementwise intrinsics
+
+A selection of elementwise intrinsics are given additional native vector forms.
+Elementwise intrinsics are those that perform their calculations irrespective of the location of the element
+ in the vector or matrix arguments except insofar as that position corresponds to those of the other elements
+ that might be used in the individual element calculations.
+An elementwise intrinsic `foo` that takes scalar or vector arguments could theoretically implement its vector version using a simple loop and the scalar intrinsic variant.
+
+```c++
+vec<TYPE, NUM> foo(vec<TYPE, NUM> a, vec<TYPE, NUM> b) {
+  vec<TYPE, NUM> ret;
+  for (int i = 0; i < NUM; i++)
+    ret[i] = foo(a[i], b[i]);
+}
+```
+  
+For example, `fma` is an elementwise intrinsic because it multiplies or adds each element of its argument vectors,
+ but `cross` is not because it performs an operation on the vectors as units,
+ pulling elements from different locations as the operation requires.
+
+The elementwise intrinsics that have native vector variants represent the
+ unary, binary, and tertiary generic operations:
+
+```asm
+ <[NUM] x [TYPE]> @dx.op.unary.v[NUM][TY](i32 opcode, <[NUM] x [TYPE]> operand1)
+ <[NUM] x [TYPE]> @dx.op.binary.v[NUM][[TY]](i32 opcode, <[NUM] x [TYPE]> operand1, <[NUM] x [TYPE]> operand2)
+ <[NUM] x [TYPE]> @dx.op.tertiary.v[NUM][TY](i32 opcode, <[NUM] x [TYPE]> operand1, <[NUM] x [TYPE]> operand2, <[NUM] x [TYPE]> operand3)
+```
+
+The only opcodes allowed with vector variants are:
+
+* Unary
+  * Exp
+  * Htan
+  * Atan
+  * Log
+* Binary
+  * FMin
+  * FMax
+* Tertiary
+  * Fma
+
+Unsupported DXIL intrinsics will continue to operate on scalarized representations even if those scalars
+ are extracted from native vectors.
+
+### Potential Changes to DXIL Consumers
+
+As this removes no existing DXIL features, the former representation of vectors is still valid.
+However, DXIL consumers may expect native vectors where they are supported and may misinterpret
+ vectors scalarized into arrays as being native arrays.
+This is unlikely to produce any faulty results, but may miss some optimizations.
+
+As DXIL with native vectors might be linked to create a DXIL shader without that support,
+ some additional scalarization might be necessary when linking in such cases.
+
+This feature involves no changes to previous shader models and any DXIL produced for earlier versions
+  should continue to behave exactly as before.
+
+#### Validation Changes
+
+Validation errors for use of native vectors in DXIL are removed.
+Any errors for using vectors in unsupported intrinsics or operations are maintained,
+ but made more specific to the operations or locations that don't allow native vector types.
+More specific errors will be generated for usage of native vectors in any unsupported intrinsics.
+New errors will be generated for any use of native vectors in shader signatures or cbuffer locations.
+
+A validation error should be produced for any representation of a single element vector.
+Such vectors should be represented as scalars.
+
+### Runtime Additions
+
+#### Runtime information
+
+When native vectors are present, a DXIL unit will signal a dependency on Shader Model 6.9.
+
+#### Device Capability
+
+Devices that support Shader Model 6.9 will be required to support native vectors in rawbuffer resources,
+ allocas, and groupshared memory.
+These native vectors must be supported for the above indicated DXIL intrinsics.
+
+## Testing
+
+A compiler targeting shader model 6.9 should be able to represent vectors in the supported memory spaces
+ in their native form and generate native calls for supported intrinsics
+ and scalarized versions for unsupported intrinsics.
+
+The DXIL 6.9 validator should allow native vectors in the supported memory and intrinsic uses.
+It should produce errors for uses in signatures, cbuffers, and type buffers and any uses in unsupported intrinsics.
+Any representation of a single element vector should produce a validation error.
+These shouldn't be directlty produceable with a compatible compiler and will require custom DXIL generation.
+
+Full runtime execution should be tested by using the native vector intrinsics on different types of memory
+ and confirming that the calculations produce the correct results in all cases for an assortment of vector sizes.
+
+## Acknowledgments
+
+* [Anupama Chandrasekhar](https://github.com/anupamachandra) and [Tex Riddell](https://github.com/tex3d) for foundational contributions to the design.
+
+<!-- {% endraw %} -->

From edcb7e1d96e3895318e7ca423703d19827d331e4 Mon Sep 17 00:00:00 2001
From: Greg Roth <grroth@microsoft.com>
Date: Thu, 9 Jan 2025 13:44:11 -0700
Subject: [PATCH 04/11] respond to feedback and add a few other details

---
 proposals/0026-hlsl-long-vector-type.md | 17 ++++++++++++++---
 proposals/NNNN-dxil-vectors.md          | 14 +++++++++++---
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/proposals/0026-hlsl-long-vector-type.md b/proposals/0026-hlsl-long-vector-type.md
index 7c7d70c9..aab8ff4b 100644
--- a/proposals/0026-hlsl-long-vector-type.md
+++ b/proposals/0026-hlsl-long-vector-type.md
@@ -9,13 +9,13 @@
 
 ## Introduction
 
-HLSL has always supported vectors of as many as four elements of different element types (int3, float4, etc.).
+HLSL has previously supported vectors of as many as four elements (int3, float4, etc.).
 These are useful in a traditional graphics context for representation and manipulation of
  geometry and color information.
 The evolution of HLSL as a more general purpose language targeting Graphics and Compute
  greatly benefit from longer vectors to fully represent these operations rather than to try to
  break them down into smaller constituent vectors.
-This feature adds the ability to declare and use native HLSL vectors longer than four elements.
+This feature adds the ability to load, store, and perform select operations on HLSL vectors longer than four elements.
 
 ## Motivation
 
@@ -57,6 +57,13 @@ Unlike vector sizes between 1 and 4, no shorthand declarations that concatenate
 The new vectors will be supported in all shader stages including Node shaders. There are no control flow or wave
 uniformity requirements, but implementations may specify best practices in certain uses for optimal performance.
 
+Long vectors can be:
+
+* Elements of arrays, structs, StructuredBuffers, and ByteAddressBuffers.
+* Parameters and return types of non-etry functions.
+* Stored in groupshared memory.
+* Static global varaibles.
+
 Long vectors are not permitted in:
 
 * Resource types other than ByteAddressBuffer or StructuredBuffer.
@@ -200,14 +207,18 @@ Verify that long vectors can be declared in all appropriate contexts:
 * Templated Load/Store methods on ByteAddressBuffers
 * As members of arrays and structs in any of the above contexts
 
+Verify that long vectors can be correctly initialized in all the forms listed in [Constructing vectors](constructing-vectors).
+
 Verify that long vectors in supported intrinsics produce appropriate outputs.
 For the intrinsic functions listed in [Native vector intrinsics](#native-vector-intrinsics),
  the generated DXIL intrinsic calls will have long vector parameters.
 For other elementwise vector intrinsic functions listed in [Allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics),
  the generated DXIL should scalarize the parameters and produce scalar calls to the corresponding DXIL intrinsics.
-
 Verify that long vector elements can be accessed using the subscript operation.
 
+Verify that long vectors of different sizes will reference different overloads of user and built-in functions.
+Verify that template instantiation using long vectors correctly creates variants for the right sizes.
+
 #### Invalid usage testing
 
 Verify that compilation errors are produced for long vectors used in:
diff --git a/proposals/NNNN-dxil-vectors.md b/proposals/NNNN-dxil-vectors.md
index 4edae3a7..d467a728 100644
--- a/proposals/NNNN-dxil-vectors.md
+++ b/proposals/NNNN-dxil-vectors.md
@@ -48,7 +48,11 @@ Scalarization of these vectors will continue to be done for uses that don't supp
 
 Single-element vectors are not valid in DXIL.
 At the language level, they may be supported for corresponding intrinsic overloads,
-  but such vectors should be represented as scalars in the final DXIL output.
+ but such vectors should be represented as scalars in the final DXIL output.
+Since they only contain a single scalar, single-element vectors are
+ informationally equivalent to actual scalars.
+Rather than include conversions to and from scalars and single-element vectors,
+ it is cleaner and functionally equivalent to represent these as scalars in DXIL.
 
 Although matrices are represented as vectors in some contexts such as unlinked library shaders,
  their final DXIL representation will continue to be as arrays of scalars.
@@ -62,6 +66,8 @@ The returned vector value and the status indicator are grouped into a new `ResRe
  that the load intrinsic returns.
 
 ```asm
+  ; overloads: SM6.9: f16|f32|i16|i32
+  ; returns: status, vector
   declare %dx.types.ResRet.v[NUM][TY] @dx.op.rawBufferLoad.v[NUM][TY](
       i32,                  ; opcode
       %dx.types.Handle,     ; resource handle
@@ -90,8 +96,8 @@ Elementwise intrinsics are those that perform their calculations irrespective of
 An elementwise intrinsic `foo` that takes scalar or vector arguments could theoretically implement its vector version using a simple loop and the scalar intrinsic variant.
 
 ```c++
-vec<TYPE, NUM> foo(vec<TYPE, NUM> a, vec<TYPE, NUM> b) {
-  vec<TYPE, NUM> ret;
+vector<TYPE, NUM> foo(vector<TYPE, NUM> a, vector<TYPE, NUM> b) {
+  vector<TYPE, NUM> ret;
   for (int i = 0; i < NUM; i++)
     ret[i] = foo(a[i], b[i]);
 }
@@ -168,6 +174,8 @@ A compiler targeting shader model 6.9 should be able to represent vectors in the
  in their native form and generate native calls for supported intrinsics
  and scalarized versions for unsupported intrinsics.
 
+Verify that supported intrinsics and operations will retain vector types.
+
 The DXIL 6.9 validator should allow native vectors in the supported memory and intrinsic uses.
 It should produce errors for uses in signatures, cbuffers, and type buffers and any uses in unsupported intrinsics.
 Any representation of a single element vector should produce a validation error.

From e7b1442b30a0b1295e5ce078b938067aaa4ce247 Mon Sep 17 00:00:00 2001
From: Greg Roth <grroth@microsoft.com>
Date: Thu, 9 Jan 2025 14:02:50 -0700
Subject: [PATCH 05/11] fix two simple typos

---
 proposals/0026-hlsl-long-vector-type.md | 2 +-
 proposals/NNNN-dxil-vectors.md          | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/proposals/0026-hlsl-long-vector-type.md b/proposals/0026-hlsl-long-vector-type.md
index aab8ff4b..7deb8eb9 100644
--- a/proposals/0026-hlsl-long-vector-type.md
+++ b/proposals/0026-hlsl-long-vector-type.md
@@ -153,7 +153,7 @@ Of the above list, the following will produce the appropriate unary, binary, or
 
 #### Disallowed vector intrinsics
 
-* Only applicable to for shorter vectors: AddUint64, asdouble, asfloat, asfloat16, asint, asint16, asuint, asuint16, D3DCOLORtoUBYTE4, cross, distance, dst, faceforward, length, normalize, reflect, refract, NonUniformResourceIndex
+* Only applicable to shorter vectors: AddUint64, asdouble, asfloat, asfloat16, asint, asint16, asuint, asuint16, D3DCOLORtoUBYTE4, cross, distance, dst, faceforward, length, normalize, reflect, refract, NonUniformResourceIndex
 * Only useful for disallowed variables: EvaluateAttributeAtSample, EvaluateAttributeCentroid, EvaluateAttributeSnapped, GetAttributeAtVertex
 
 ### Debug Support
diff --git a/proposals/NNNN-dxil-vectors.md b/proposals/NNNN-dxil-vectors.md
index d467a728..35b7d157 100644
--- a/proposals/NNNN-dxil-vectors.md
+++ b/proposals/NNNN-dxil-vectors.md
@@ -22,7 +22,7 @@ While the original shape of the vectors may be reconstructed from their scalariz
  it requires additional work of the DXIL consumer and results in larger DXIL binary sizes.
 Although it has never been allowed in DXIL, the LLVM IR that DXIL is based on can represent native vectors.
 By allowing these native vector types in DXIL, the size of generated DXIL can be reduced and
- new opportunities for expanding vector capabilities in DXIL ar introduced.
+ new opportunities for expanding vector capabilities in DXIL are introduced.
 
 ## Proposed solution
 

From 35dccde7d310cda727ca0f3ab85d8d7193ba7c22 Mon Sep 17 00:00:00 2001
From: Greg Roth <grroth@microsoft.com>
Date: Tue, 14 Jan 2025 10:52:36 -0700
Subject: [PATCH 06/11] Respond to a lot of feedback

The biggest changes are removing most references to scalarized implementation of certain intrinsics. This has the effect of removing any hard dependencies between the specs. This further strengthens my opinion that the specs should be divided along feature lines rather than the DXIL/language barrier.

A lot of rewording and specifics added where vague statements were before.
---
 proposals/0026-hlsl-long-vector-type.md | 149 ++++++++++++++----------
 proposals/NNNN-dxil-vectors.md          | 128 ++++++++++----------
 2 files changed, 152 insertions(+), 125 deletions(-)

diff --git a/proposals/0026-hlsl-long-vector-type.md b/proposals/0026-hlsl-long-vector-type.md
index 7deb8eb9..c2dde075 100644
--- a/proposals/0026-hlsl-long-vector-type.md
+++ b/proposals/0026-hlsl-long-vector-type.md
@@ -21,15 +21,16 @@ This feature adds the ability to load, store, and perform select operations on H
 
 The adoption of machine learning techniques expressed as vector-matrix operations
  require larger vector sizes to be representable in HLSL.
-To take advantage of specialized hardware that can accelerate vector operations,
- these and other vector objects need to be preserved at the DXIL level.
+To take advantage of specialized hardware that can accelerate longer vector operations,
+ these vectors need to be preserved in the exchange format as well.
 
 ## Proposed solution
 
 Enable vectors of length between 4 and 128 inclusive in HLSL using existing template-based vector declarations.
 Such vectors will hereafter be referred to as "long vectors".
 These will be supported for all elementwise intrinsics that take variable-length vector parameters.
-For certain operations, these vectors will be represented as native vectors using [dxil vectors](NNNN-dxil-vectors.md).
+For certain operations, these vectors will be represented as native vectors using
+ [Dxil vectors](NNNN-dxil-vectors.md) and equivalent SPIR-V representations.
 
 ## Detailed design
 
@@ -54,36 +55,52 @@ Declarations of long vectors require the use of the template declaration.
 Unlike vector sizes between 1 and 4, no shorthand declarations that concatenate
  the element type and number of elements (e.g. float2, double4) are allowed for long vectors.
 
+#### Allowed Usage
+
 The new vectors will be supported in all shader stages including Node shaders. There are no control flow or wave
 uniformity requirements, but implementations may specify best practices in certain uses for optimal performance.
 
 Long vectors can be:
 
 * Elements of arrays, structs, StructuredBuffers, and ByteAddressBuffers.
-* Parameters and return types of non-etry functions.
+* Parameters and return types of non-entry functions.
 * Stored in groupshared memory.
 * Static global varaibles.
 
 Long vectors are not permitted in:
 
 * Resource types other than ByteAddressBuffer or StructuredBuffer.
-* Any element of the shader's signature including entry function parameters and return types.
+* Any part of the shader's signature including entry function parameters and return types.
 * Cbuffers or tbuffers.
+* A mesh/amplification `Payload` entry parameter structure.
+* A ray tracing `Parameter`, `Attributes`, or `Payload` parameter structure.
+* A work graph record.
 
 #### Constructing vectors
 
 HLSL vectors can be constructed through initializer lists, constructor syntax initialization, or by assignment.
+Vectors can be initialized and assigned from various casting operations including scalars and arrays.
+Long vectors will maintain equivalent casting abilities.
 
 Examples:
 
-``` hlsl
-vector<uint, 5> vecA = {1, 2, 3, 4, 5};
-vector<uint, 6> vecB = vector<uint, 6>(6, 7, 8, 9, 0, 0);
+```hlsl
+vector<uint, 5> InitList = {1, 2, 3, 4, 5};
+vector<uint, 6> Construct = vector<uint, 6>(6, 7, 8, 9, 0, 0);
 uint4 initval = {0, 0, 0, 0};
-vector<uint, 8> vecC = {uint2(coord.xy), vecB};
-vector<uint, 6> vecD = vecB;
+vector<uint, 8> VecVec = {uint2(coord.xy), vecB};
+vector<uint, 6> Assigned = vecB;
+float arr[5];
+vector<float, 5> CastArr = (vector<float, 5>)arr;
+vector<float, 6> ArrScal = {arr, 7.9};
+vector<float, 10> ArrArr = {arr, arr};
+vector<float, 15> Scal = 4.2;
 ```
 
+float4 main(uint size: S) : SV_Target {
+   return (float4)arr;
+vector<uint, 8> vecC = {uint2(coord.xy), vecB};
+
 #### Vectors in Raw Buffers
 
 N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods
@@ -112,8 +129,9 @@ myBuffer.Store(elementIndex, val);
 
 #### Accessing elements of long vectors
 
-Long vectors support the existing vector subscript operators to return the scalar element values.
-They do not support swizzle operations as they are limited to only the first four elements.
+Long vectors support the existing vector subscript operators `[]` to access the scalar element values.
+They do not support any swizzle operations.
+Swizzle operations are limited to the first four elements and the accessors are named according to the graphics domain.
 
 #### Operations on long vectors
 
@@ -136,43 +154,41 @@ Refer to the HLSL spec for an exhaustive list of [Operators](https://learn.micro
 * Wave Ops: WaveActiveBitAnd, WaveActiveBitOr, WaveActiveBitXor, WaveActiveProduct, WaveActiveSum, WaveActiveMin, WaveActiveMax, WaveMultiPrefixBitAnd, WaveMultiPrefixBitOr, WaveMultiPrefixBitXor, WaveMultiPrefixProduct, WaveMultiPrefixSum, WavePrefixSum, WavePrefixProduct, WaveReadLaneAt, WaveReadLaneFirst
 * Wave Reductions: WaveActiveAllEqual, WaveMatch
 
-#### Native vector intrinsics
-
-Of the above list, the following will produce the appropriate unary, binary, or tertiary
- DXIL intrinsic that take native vector parameters:
-
-* fma
-* exp
-* log
-* tanh
-* atan
-* min
-* max
-* clamp
-* step
-
 #### Disallowed vector intrinsics
 
 * Only applicable to shorter vectors: AddUint64, asdouble, asfloat, asfloat16, asint, asint16, asuint, asuint16, D3DCOLORtoUBYTE4, cross, distance, dst, faceforward, length, normalize, reflect, refract, NonUniformResourceIndex
 * Only useful for disallowed variables: EvaluateAttributeAtSample, EvaluateAttributeCentroid, EvaluateAttributeSnapped, GetAttributeAtVertex
 
+### Interchange Format Additions
+
+Long vectors can be represented in DXIL, SPIR-V or other interchange formats as scalarized elements or native vectors.
+Representation of native vectors in DXIL depends on [dxil vectors](NNNN-dxil-vectors.md).
+
 ### Debug Support
 
-First class debug support for HLSL vectors. Emit `llvm.dbg.declare` and `llvm.dbg.value` intrinsics that can be used by tools for better debugging experience. Open Issue: Handle DXIL scalarized and vector paths.
+First class debug support for HLSL vectors. Emit `llvm.dbg.declare` and `llvm.dbg.value` intrinsics that can be used by tools for better debugging experience.
+These should enable tracking vectors through their scalarized and native vector usages.
 
 ### Diagnostic Changes
 
 Error messages should be produced for use of long vectors in unsupported interfaces.
 
-* The shader signature.
-* A cbuffer/tbuffer.
-* A work graph record.
-* A mesh or ray tracing payload.
+* Typed buffer element types.
+* Parameters to the entry function.
+* Return types from the entry function.
+* Cbuffers blocks.
+* Cbuffers global variables.
+* Tbuffers.
+* Work graph records.
+* Mesh/amplification payload entry parameter structures.
+* Ray tracing `Payload` parameter structures used in `TraceRay` and `anyhit`/`closesthit`/`miss` entry functions.
+* Ray tracing `Parameter` parameter structures used in `CallShader` and `callable` entry functions.
+* Ray tracing `Attributes` parameter structures used in `ReportHit` and `closesthit` entry functions.
 
 Errors should also be produced when long vectors are used as parameters to intrinsics
  with vector parameters of variable length, but aren't permitted as listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics)
 Attempting to use any swizzle member-style accessors on long vectors should produce an error.
-Declaring vectors of length longer than 128 should produce an error.
+Declaring vectors of length longer than 1024 should produce an error.
 
 ### Validation Changes
 
@@ -180,17 +196,18 @@ Validation should produce errors when a long vector is found in:
 
 * The shader signature.
 * A cbuffer/tbuffer.
-* A work graph record.
-* A mesh or ray tracing payload.
+* Work graph records.
+* Mesh/amplification payload entry parameter structures.
+* Ray tracing `Payload` parameter structures used in `TraceRay` and `anyhit`/`closesthit`/`miss` entry functions.
+* Ray tracing `Parameter` parameter structures used in `CallShader` and `callable` entry functions.
+* Ray tracing `Attributes` parameter structures used in `ReportHit` and `closesthit` entry functions.
+* Metadata
 
 Use of long vectors in unsupported intrinsics should produce validation errors.
 
-## Runtime Additions
+### Device Capability
 
-Support for Long vectors requires dxil vector support as defined in [the specification](NNNN-dxil-vectors.md).
-
-Use of long vectors in a shader should be indicated in DXIL with the corresponding
- shader model version and shader feature flag.
+Devices that support Shader Model 6.9 will be required to fully support this feature.
 
 ## Testing
 
@@ -200,42 +217,46 @@ Use of long vectors in a shader should be indicated in DXIL with the correspondi
 
 Verify that long vectors can be declared in all appropriate contexts:
 
-* local variables
-* non-entry parameters
-* non-entry return types
-* StructuredBuffer elements
-* Templated Load/Store methods on ByteAddressBuffers
-* As members of arrays and structs in any of the above contexts
+* Local variables.
+* Static global variables.
+* Non-entry parameters.
+* Non-entry return types.
+* StructuredBuffer elements.
+* Templated Load/Store methods on ByteAddressBuffers.
+* As members of arrays and structs in any of the above contexts.
 
 Verify that long vectors can be correctly initialized in all the forms listed in [Constructing vectors](constructing-vectors).
 
 Verify that long vectors in supported intrinsics produce appropriate outputs.
-For the intrinsic functions listed in [Native vector intrinsics](#native-vector-intrinsics),
- the generated DXIL intrinsic calls will have long vector parameters.
-For other elementwise vector intrinsic functions listed in [Allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics),
- the generated DXIL should scalarize the parameters and produce scalar calls to the corresponding DXIL intrinsics.
-Verify that long vector elements can be accessed using the subscript operation.
+Supported intrinsic functions listed in [Allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics)
+ may produce intrinsic calls with native vector parameters where available
+ or scalarized parameters with individual scalar calls to the corresponding interchange format intrinsics.
+
+Verify that long vector elements can be accessed using the subscript operation with static or dynamic indices.
 
 Verify that long vectors of different sizes will reference different overloads of user and built-in functions.
 Verify that template instantiation using long vectors correctly creates variants for the right sizes.
 
+Verification of correct interchange format output depends on the implementation and representation.
+Native vector DXIL intrinsics might be checked for as described in [Dxil vectors](NNNN-dxil-vectors.md)
+ if native DXIL vector output is supported.
+SPIR-V equivalent output should be checked as well.
+Scalarized representations are also possible depending on the compilation implementation.
+
 #### Invalid usage testing
 
-Verify that compilation errors are produced for long vectors used in:
+Verify that long vectors produce compilation errors when:
 
-* Entry function parameters
-* Entry function returns
-* Type buffer declarations
-* Cbuffer blocks
-* Cbuffer global variables
-* Work graph records
-* Mesh and ray tracing payloads
-* Any intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics)
+* Declared in interfaces listed in [Diagnostic changes](diagnostic-changes).
+* Passed as parameters to any intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics)
 * All swizzle operations (e.g. `lvec.x`, `lvec.rg`, `lvec.wzyx`)
+* Declaring a vector over the maximum size in any of the allowed contexts listed in [Allowed usage](allowed-usage).
 
 ### Validation Testing
 
-Verify that Validation produces errors for any DXIL intrinsic that corresponds to the
+Verify that long vectors produce validation errors when:
+
+* Verify that Validation produces errors for any DXIL intrinsic that corresponds to the
  HLSL intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics).
 Verify that Validation produces errors for any DXIL intrinsic with native vector parameters
  that corresponds to the [allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics)
@@ -262,6 +283,12 @@ Since these vectors are to be added eventually anyway, the testing serves multip
 It makes sense to not introduce a new datatype but use HLSL vectors,
 even if the initial implementation only exposes partial functionality.
 
+The restrictions outlined in [Allowed Usage](allowed-usage) were chosen because they weren't
+ needed for the targeted applications, but are not inherently impossible.
+They omitted out of unclear utility and to simplify the design.
+There's nothing about those use cases that is inherently incompatible with long vectors
+ and future work might consider relaxing those restrictions.
+
 ## Open Issues
 
 * Q: Is there a limit on the Number of Components in a vector?
diff --git a/proposals/NNNN-dxil-vectors.md b/proposals/NNNN-dxil-vectors.md
index 35b7d157..821f9cbf 100644
--- a/proposals/NNNN-dxil-vectors.md
+++ b/proposals/NNNN-dxil-vectors.md
@@ -18,16 +18,27 @@ This feature introduces the ability to represent native vectors in DXIL for some
 
 ## Motivation
 
-While the original shape of the vectors may be reconstructed from their scalarized form,
- it requires additional work of the DXIL consumer and results in larger DXIL binary sizes.
-Although it has never been allowed in DXIL, the LLVM IR that DXIL is based on can represent native vectors.
-By allowing these native vector types in DXIL, the size of generated DXIL can be reduced and
- new opportunities for expanding vector capabilities in DXIL are introduced.
+Although many GPUs support vector operations, DXIL has been unable to directly leverage those capabilities.
+Instead, it has scalarized all vector operations, losing their original representation.
+To restore those vector representations, platforms have had to rely on auto-vectorization to
+ rematerialize vectors late in the compilation.
+Scalarization is a trivial compiler transformation that never fails,
+ but auto-vectorization is a notoriously difficult compiler optimization that frequently generates sub-optimal code.
+Allowing DXIL to retain vectors as they appeared in source allows hardware that can utilize
+ vector optimizations to do so more easily without penalizing hardware that requires scalarization.
+
+Native vector support can also help with the size of compiled DXIL programs.
+Vector operations can express in a single instruction operations that would have taken N instructions in scalar DXIL.
+This may allow reduced file sizes for compiled DXIL programs that utilize vectors.
+
+DXIL is based on LLVM 3.7 which already supports native vectors.
+These could only be used to a limited degree in DXIL library targets, and never for DXIL operations.
+This innate support is expected to make adding them a relatively low impact change to DXIL tools.
 
 ## Proposed solution
 
 Native vectors are allowed in DXIL version 1.9 or greater.
-These can be stored in allocas, static globals, and groupshared variables.
+These can be stored in allocas, static globals, groupshared variables, and SSA values.
 They can be loaded from or stored to raw buffers and used as arguments to a selection
  of element-wise intrinsic functions as well as the standard math operators.
 They cannot be used in shader signatures, constant buffers, typed buffer, or texture types.
@@ -42,21 +53,15 @@ Previously individual vectors would get scalarized into scalar arrays and arrays
 Individual vectors will now be represented as a single native vector and arrays of vectors will remain
  as arrays of native vectors, though multi-dimensional arrays will still be flattened to one dimension.
 
-Scalarization of these vectors will continue to be done for uses that don't support native vectors,
- but it will be done using extractelement instructions from the native vectors
- instead of loads from the scalarized array representation.
-
-Single-element vectors are not valid in DXIL.
+Single-element vectors are generally not valid in DXIL.
 At the language level, they may be supported for corresponding intrinsic overloads,
  but such vectors should be represented as scalars in the final DXIL output.
 Since they only contain a single scalar, single-element vectors are
  informationally equivalent to actual scalars.
 Rather than include conversions to and from scalars and single-element vectors,
  it is cleaner and functionally equivalent to represent these as scalars in DXIL.
-
-Although matrices are represented as vectors in some contexts such as unlinked library shaders,
- their final DXIL representation will continue to be as arrays of scalars.
-This is consistent with both their past and future intended representation.
+The exception is in exported library functions, which need to maintain vector representations
+ to correctly match overloads when linking.
 
 ### Changes to DXIL Intrinsics
 
@@ -87,6 +92,12 @@ The return struct contains a single vector and a single integer representing map
 Here and hereafter, `NUM` is the number of elements in the loaded vector, `TYPE` is the element type name,
  and `TY` is the corresponding abbreviated type name (e.g. `i64`, `f32`).
 
+#### Vector access
+
+Dynamic access to vectors were previously converted to array accesses.
+Native vectors can be accessed using `extractelement`, `insertelement`, or `getelementptr` operations.
+Previously usage of `extractelement` and `insertelement` in DXIL didn't allow dynamic index parameters.
+
 #### Elementwise intrinsics
 
 A selection of elementwise intrinsics are given additional native vector forms.
@@ -116,73 +127,62 @@ The elementwise intrinsics that have native vector variants represent the
  <[NUM] x [TYPE]> @dx.op.tertiary.v[NUM][TY](i32 opcode, <[NUM] x [TYPE]> operand1, <[NUM] x [TYPE]> operand2, <[NUM] x [TYPE]> operand3)
 ```
 
-The only opcodes allowed with vector variants are:
-
-* Unary
-  * Exp
-  * Htan
-  * Atan
-  * Log
-* Binary
-  * FMin
-  * FMax
-* Tertiary
-  * Fma
+The scalarized variants of these DXIL intrinsics will remain unchanged and can be used in conjunction
+ with the vector variants.
+This means that the same language-level vector could be used in scalarized operations and native vector operations
+ within the same shader by being scalarized as needed even within the same shader.
 
-Unsupported DXIL intrinsics will continue to operate on scalarized representations even if those scalars
- are extracted from native vectors.
+### Validation Changes
 
-### Potential Changes to DXIL Consumers
+Blanket validation errors for use of native vectors DXIL are removed.
+Specific disallowed usages of native vector types will be determined by
+ examining arguments to operations and intrinsics and producing errors where appropriate.
+Aggregate types will be recursed into to identify any native vector components.
 
-As this removes no existing DXIL features, the former representation of vectors is still valid.
-However, DXIL consumers may expect native vectors where they are supported and may misinterpret
- vectors scalarized into arrays as being native arrays.
-This is unlikely to produce any faulty results, but may miss some optimizations.
+Native vectors should produce validation errors when:
 
-As DXIL with native vectors might be linked to create a DXIL shader without that support,
- some additional scalarization might be necessary when linking in such cases.
+* Used in cbuffers.
+* Used in unsupported intrinsics or operations as before, but made more specific to the operations.
+* Any usage in previous shader model shaders apart from exported library functions.
 
-This feature involves no changes to previous shader models and any DXIL produced for earlier versions
-  should continue to behave exactly as before.
+Error should be produced for any representation of a single element vector outside of
+ exported library functions.
 
-#### Validation Changes
+Specific errors might be generated for invalid overloads of `LoadInput` and `StoreOutput`
+ as they represent usage of vectors in entry point signatures.
 
-Validation errors for use of native vectors in DXIL are removed.
-Any errors for using vectors in unsupported intrinsics or operations are maintained,
- but made more specific to the operations or locations that don't allow native vector types.
-More specific errors will be generated for usage of native vectors in any unsupported intrinsics.
-New errors will be generated for any use of native vectors in shader signatures or cbuffer locations.
+### Device Capability
 
-A validation error should be produced for any representation of a single element vector.
-Such vectors should be represented as scalars.
+Devices that support Shader Model 6.9 will be required to fully support this feature.
 
-### Runtime Additions
+## Testing
 
-#### Runtime information
+### Compilation Testing
 
-When native vectors are present, a DXIL unit will signal a dependency on Shader Model 6.9.
+A compiler targeting shader model 6.9 should be able to represent vectors in the supported memory spaces
+ in their native form and generate native calls for supported intrinsics.
 
-#### Device Capability
+Test that appropriate output is produced for:
 
-Devices that support Shader Model 6.9 will be required to support native vectors in rawbuffer resources,
- allocas, and groupshared memory.
-These native vectors must be supported for the above indicated DXIL intrinsics.
+* Supported intrinsics and operations will retain vector types.
+* Dynamic indexing of vectors produces the correct `extractelement`, `insertelement`
+ operations with dynamic index parameters.
 
-## Testing
+### Validation testing
 
-A compiler targeting shader model 6.9 should be able to represent vectors in the supported memory spaces
- in their native form and generate native calls for supported intrinsics
- and scalarized versions for unsupported intrinsics.
+The DXIL 6.9 validator should allow native vectors in the supported memory and intrinsic uses.
+It should produce errors for uses in unsupported intrinsics, cbuffers, and typed buffers.
 
-Verify that supported intrinsics and operations will retain vector types.
+Single-element vectors are allowed only as interfaces to library shaders.
+Other usages of a single element vector should produce a validation error.
 
-The DXIL 6.9 validator should allow native vectors in the supported memory and intrinsic uses.
-It should produce errors for uses in signatures, cbuffers, and type buffers and any uses in unsupported intrinsics.
-Any representation of a single element vector should produce a validation error.
-These shouldn't be directlty produceable with a compatible compiler and will require custom DXIL generation.
+### Execution testing
 
-Full runtime execution should be tested by using the native vector intrinsics on different types of memory
- and confirming that the calculations produce the correct results in all cases for an assortment of vector sizes.
+Full runtime execution should be tested by using the native vector intrinsics using
+ groupshared and non-groupshared memory.
+Calculations should produce the correct results in all cases for a range of vector sizes.
+In practice, this testing will largely represent verifying correct intrinsic output
+ with the new shader model.
 
 ## Acknowledgments
 

From 3f8c22c78fc51546fef0050e31259a41b8cfba36 Mon Sep 17 00:00:00 2001
From: Greg Roth <grroth@microsoft.com>
Date: Tue, 14 Jan 2025 17:00:51 -0700
Subject: [PATCH 07/11] respond to some offline discussions and missed bits

Move asXXX interinsics to the approved list.

Finish reworking validation errors and testing in long vectors spec.

Simplify some listing of allowed locations given that some of them fall under entry function parameters by nature. I left work graphs as explicit since their parameters are not directly user-defined structs, but templates.
---
 proposals/0026-hlsl-long-vector-type.md | 46 ++++++++++++++-----------
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/proposals/0026-hlsl-long-vector-type.md b/proposals/0026-hlsl-long-vector-type.md
index c2dde075..a9c36045 100644
--- a/proposals/0026-hlsl-long-vector-type.md
+++ b/proposals/0026-hlsl-long-vector-type.md
@@ -15,7 +15,8 @@ These are useful in a traditional graphics context for representation and manipu
 The evolution of HLSL as a more general purpose language targeting Graphics and Compute
  greatly benefit from longer vectors to fully represent these operations rather than to try to
  break them down into smaller constituent vectors.
-This feature adds the ability to load, store, and perform select operations on HLSL vectors longer than four elements.
+This feature adds the ability to load, store, and perform elementwise operations on HLSL
+ vectors longer than four elements.
 
 ## Motivation
 
@@ -70,10 +71,10 @@ Long vectors can be:
 Long vectors are not permitted in:
 
 * Resource types other than ByteAddressBuffer or StructuredBuffer.
-* Any part of the shader's signature including entry function parameters and return types.
+* Any part of the shader's signature including entry function parameters and return types or
+  user-defined struct parameters.
 * Cbuffers or tbuffers.
-* A mesh/amplification `Payload` entry parameter structure.
-* A ray tracing `Parameter`, `Attributes`, or `Payload` parameter structure.
+* A ray tracing `Parameter`, `Attributes`, or `Payload` parameter structures.
 * A work graph record.
 
 #### Constructing vectors
@@ -153,10 +154,11 @@ Refer to the HLSL spec for an exhaustive list of [Operators](https://learn.micro
 * Quad Ops: ddx, ddx_coarse, ddx_fine, ddy, ddy_coarse, ddy_fine, fwidth, QuadReadLaneAt, QuadReadLaneAcrossX, QuadReadLaneAcrossY, QuadReadLaneAcrossDiagonal
 * Wave Ops: WaveActiveBitAnd, WaveActiveBitOr, WaveActiveBitXor, WaveActiveProduct, WaveActiveSum, WaveActiveMin, WaveActiveMax, WaveMultiPrefixBitAnd, WaveMultiPrefixBitOr, WaveMultiPrefixBitXor, WaveMultiPrefixProduct, WaveMultiPrefixSum, WavePrefixSum, WavePrefixProduct, WaveReadLaneAt, WaveReadLaneFirst
 * Wave Reductions: WaveActiveAllEqual, WaveMatch
+* Type Conversions: asdouble, asfloat, asfloat16, asint, asint16, asuint, asuint16
 
 #### Disallowed vector intrinsics
 
-* Only applicable to shorter vectors: AddUint64, asdouble, asfloat, asfloat16, asint, asint16, asuint, asuint16, D3DCOLORtoUBYTE4, cross, distance, dst, faceforward, length, normalize, reflect, refract, NonUniformResourceIndex
+* Only applicable to shorter vectors: AddUint64, D3DCOLORtoUBYTE4, cross, distance, dst, faceforward, length, normalize, reflect, refract, NonUniformResourceIndex
 * Only useful for disallowed variables: EvaluateAttributeAtSample, EvaluateAttributeCentroid, EvaluateAttributeSnapped, GetAttributeAtVertex
 
 ### Interchange Format Additions
@@ -171,7 +173,7 @@ These should enable tracking vectors through their scalarized and native vector
 
 ### Diagnostic Changes
 
-Error messages should be produced for use of long vectors in unsupported interfaces.
+Error messages should be produced for use of long vectors in unsupported interfaces:
 
 * Typed buffer element types.
 * Parameters to the entry function.
@@ -181,9 +183,8 @@ Error messages should be produced for use of long vectors in unsupported interfa
 * Tbuffers.
 * Work graph records.
 * Mesh/amplification payload entry parameter structures.
-* Ray tracing `Payload` parameter structures used in `TraceRay` and `anyhit`/`closesthit`/`miss` entry functions.
-* Ray tracing `Parameter` parameter structures used in `CallShader` and `callable` entry functions.
-* Ray tracing `Attributes` parameter structures used in `ReportHit` and `closesthit` entry functions.
+* `Payload`, `Parameter`, and `Attributes` parameter user-defined structs used in
+  `TraceRay()`, `CallShader()`, and `ReportHit()` ray tracing intrinsics.
 
 Errors should also be produced when long vectors are used as parameters to intrinsics
  with vector parameters of variable length, but aren't permitted as listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics)
@@ -197,12 +198,13 @@ Validation should produce errors when a long vector is found in:
 * The shader signature.
 * A cbuffer/tbuffer.
 * Work graph records.
-* Mesh/amplification payload entry parameter structures.
-* Ray tracing `Payload` parameter structures used in `TraceRay` and `anyhit`/`closesthit`/`miss` entry functions.
-* Ray tracing `Parameter` parameter structures used in `CallShader` and `callable` entry functions.
-* Ray tracing `Attributes` parameter structures used in `ReportHit` and `closesthit` entry functions.
+* `Payload`, `Parameter`, and `Attributes` parameter user-defined structs used in
+  `TraceRay()`, `CallShader()`, and `ReportHit()` ray tracing intrinsics.
 * Metadata
 
+Note that the disallowing long vectors in entry function signatures includes any user-defined structs
+ used in mesh and ray tracing shaders.
+
 Use of long vectors in unsupported intrinsics should produce validation errors.
 
 ### Device Capability
@@ -254,13 +256,17 @@ Verify that long vectors produce compilation errors when:
 
 ### Validation Testing
 
-Verify that long vectors produce validation errors when:
-
-* Verify that Validation produces errors for any DXIL intrinsic that corresponds to the
- HLSL intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics).
-Verify that Validation produces errors for any DXIL intrinsic with native vector parameters
- that corresponds to the [allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics)
- and are not listed in [native vector intrinsics](#native-vector-intrinsics).
+Verify that long vectors produce validation errors in:
+
+* Each element of the shader signature.
+* A cbuffer block struct.
+* Work graphs record structs.
+* The mesh/amplification entry `Payload` parameter struct.
+* Each of the `Payload`, `Parameter`, `Attributes` parameter structs used in
+  `TraceRay()`, `CallShader()`, and `ReportHit()`,
+  and `anyhit`, `closesthit`, `miss`, `callable`, and `closesthit` entry functions.
+* Any DXIL intrinsic that corresponds to the HLSL intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics).
+* Any metadata type.
 
 ### Execution Testing
 

From 6a2334779e7063fd31c9ff05ecf7af12dbe5520e Mon Sep 17 00:00:00 2001
From: Greg Roth <grroth@microsoft.com>
Date: Wed, 15 Jan 2025 13:09:09 -0700
Subject: [PATCH 08/11] Respond to Damyan's latest feedback

---
 proposals/0026-hlsl-long-vector-type.md | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/proposals/0026-hlsl-long-vector-type.md b/proposals/0026-hlsl-long-vector-type.md
index a9c36045..77d12664 100644
--- a/proposals/0026-hlsl-long-vector-type.md
+++ b/proposals/0026-hlsl-long-vector-type.md
@@ -66,7 +66,7 @@ Long vectors can be:
 * Elements of arrays, structs, StructuredBuffers, and ByteAddressBuffers.
 * Parameters and return types of non-entry functions.
 * Stored in groupshared memory.
-* Static global varaibles.
+* Static global variables.
 
 Long vectors are not permitted in:
 
@@ -98,10 +98,6 @@ vector<float, 10> ArrArr = {arr, arr};
 vector<float, 15> Scal = 4.2;
 ```
 
-float4 main(uint size: S) : SV_Target {
-   return (float4)arr;
-vector<uint, 8> vecC = {uint2(coord.xy), vecB};
-
 #### Vectors in Raw Buffers
 
 N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods
@@ -132,7 +128,6 @@ myBuffer.Store(elementIndex, val);
 
 Long vectors support the existing vector subscript operators `[]` to access the scalar element values.
 They do not support any swizzle operations.
-Swizzle operations are limited to the first four elements and the accessors are named according to the graphics domain.
 
 #### Operations on long vectors
 
@@ -295,6 +290,12 @@ They omitted out of unclear utility and to simplify the design.
 There's nothing about those use cases that is inherently incompatible with long vectors
  and future work might consider relaxing those restrictions.
 
+Swizzle operations were not supported because they are limited to the first four elements.
+The names of the accessors (xyzw or rgba) are named according to the expected content of
+ those vectors in a graphics context.
+Since that intretation does not apply to longer vectors, it could be confusing.
+The subscript access is flexible and generic and makes other accessors redundant.
+
 ## Open Issues
 
 * Q: Is there a limit on the Number of Components in a vector?

From e932faddaeef973503503ca30dfb8365658b34c3 Mon Sep 17 00:00:00 2001
From: Greg Roth <grroth@microsoft.com>
Date: Tue, 21 Jan 2025 17:50:12 -0700
Subject: [PATCH 09/11] Clarify platform guidance and revise native vector load

Anupama had some feedback on the description of where long vectors can be used. This attempts to add language that is more useful.

Removed the mask param and made the load function independent per discussions with Tex and others.
---
 proposals/0026-hlsl-long-vector-type.md | 6 ++++--
 proposals/NNNN-dxil-vectors.md          | 9 ++++-----
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/proposals/0026-hlsl-long-vector-type.md b/proposals/0026-hlsl-long-vector-type.md
index 77d12664..75423baa 100644
--- a/proposals/0026-hlsl-long-vector-type.md
+++ b/proposals/0026-hlsl-long-vector-type.md
@@ -58,8 +58,7 @@ Unlike vector sizes between 1 and 4, no shorthand declarations that concatenate
 
 #### Allowed Usage
 
-The new vectors will be supported in all shader stages including Node shaders. There are no control flow or wave
-uniformity requirements, but implementations may specify best practices in certain uses for optimal performance.
+The new vectors will be supported in all shader stages including Node shaders.
 
 Long vectors can be:
 
@@ -77,6 +76,9 @@ Long vectors are not permitted in:
 * A ray tracing `Parameter`, `Attributes`, or `Payload` parameter structures.
 * A work graph record.
 
+While this describes where long vecgtors can be used and later sections will describe how,
+implementations may specify best practices in certain uses for optimal performance.
+
 #### Constructing vectors
 
 HLSL vectors can be constructed through initializer lists, constructor syntax initialization, or by assignment.
diff --git a/proposals/NNNN-dxil-vectors.md b/proposals/NNNN-dxil-vectors.md
index 821f9cbf..c19cd933 100644
--- a/proposals/NNNN-dxil-vectors.md
+++ b/proposals/NNNN-dxil-vectors.md
@@ -73,16 +73,15 @@ The returned vector value and the status indicator are grouped into a new `ResRe
 ```asm
   ; overloads: SM6.9: f16|f32|i16|i32
   ; returns: status, vector
-  declare %dx.types.ResRet.v[NUM][TY] @dx.op.rawBufferLoad.v[NUM][TY](
+  declare %dx.types.ResRet.v[NUM][TY] @dx.op.rawBufferVectorLoad.v[NUM][TY](
       i32,                  ; opcode
       %dx.types.Handle,     ; resource handle
-      i32,                  ; coordinate c0 (index)
+      i32,                  ; coordinate c0 (byteOffset)
       i32,                  ; coordinate c1 (elementOffset)
-      i8,                   ; mask
-      i32,                  ; alignment
-  )
+      i32)                  ; alignment
 ```
 
+
 The return struct contains a single vector and a single integer representing mapped tile status.
 
 ```asm

From bbeff68983760b82be184cf7558da863a0250ac7 Mon Sep 17 00:00:00 2001
From: Greg Roth <grroth@microsoft.com>
Date: Mon, 27 Jan 2025 13:20:27 -0700
Subject: [PATCH 10/11] Respond to dneto and anupama's feedback

clarify range of long vectors

Add text that constant integer expression for length requirement is maintained

explicitly mention allowing local function scoped long vectors

Elaborate on vector initialization from shorter vectors and initiailziation lists.

Add potential SPIR-V solutions to issues.

Explicitly state that native vectors can be dynamically accessed.

Clean up language about intrinsics allowing scalars and vectors.
---
 proposals/0026-hlsl-long-vector-type.md | 16 ++++++++--------
 proposals/NNNN-dxil-vectors.md          |  8 ++++----
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/proposals/0026-hlsl-long-vector-type.md b/proposals/0026-hlsl-long-vector-type.md
index 75423baa..163dd08d 100644
--- a/proposals/0026-hlsl-long-vector-type.md
+++ b/proposals/0026-hlsl-long-vector-type.md
@@ -27,7 +27,7 @@ To take advantage of specialized hardware that can accelerate longer vector oper
 
 ## Proposed solution
 
-Enable vectors of length between 4 and 128 inclusive in HLSL using existing template-based vector declarations.
+Enable vectors of length between 5 and 128 inclusive in HLSL using existing template-based vector declarations.
 Such vectors will hereafter be referred to as "long vectors".
 These will be supported for all elementwise intrinsics that take variable-length vector parameters.
 For certain operations, these vectors will be represented as native vectors using
@@ -44,10 +44,11 @@ vector<T, N> name;
 ```
 
 `T` is any [scalar](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-scalar) type.
-`N` is the number of components and must be an integer between 1 and 4 inclusive.
+`N` is the number of components and must be a constant integer expression between 1 and 4 inclusive.
 See the vector definition [documentation](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector) for more details.
 This proposal adds support for long vectors of length greater than 4 by
- allowing `N` to be greater than 4 where previously such a declaration would produce an error.
+ allowing `N` to be a constant integer expression greater than 4
+ where previously such a declaration would produce an error.
 
 The default behavior of HLSL vectors is preserved for backward compatibility, meaning, skipping the last parameter `N`
 defaults to 4-component vectors and the use `vector name;` declares a 4-component float vector, etc. More examples
@@ -66,6 +67,7 @@ Long vectors can be:
 * Parameters and return types of non-entry functions.
 * Stored in groupshared memory.
 * Static global variables.
+* Local function scoped variables.
 
 Long vectors are not permitted in:
 
@@ -82,7 +84,8 @@ implementations may specify best practices in certain uses for optimal performan
 #### Constructing vectors
 
 HLSL vectors can be constructed through initializer lists, constructor syntax initialization, or by assignment.
-Vectors can be initialized and assigned from various casting operations including scalars and arrays.
+Vectors can be initialized and assigned from various casting operations including scalars, arrays, and initialization lists.
+Initialization of vectors from vectors or initialization lists with fewer elements than the assigned vector are not allowed.
 Long vectors will maintain equivalent casting abilities.
 
 Examples:
@@ -314,12 +317,9 @@ Having a limit facilitates testing and sets expectations for both hardware and s
 * Q: Should this change the default N = 4 for vectors?
   * A: No. While the default size of 4 is less intuitive in a world of larger vectors, existing code depends on this default, so it remains unchanged.
 * Q: How will SPIR-V be supported?
-  * A: TBD
+  * A: TBD. SPIR-V could be represented as an array of elements, scalarized to use scalars, or a new vector type.
 * Q: should swizzle accessors be allowed for long vectors?
   * A: No. It doesn't make sense since they can't be used to access all elements
        and there's no way to create enough swizzle members to accommodate the longest allowed vector.
-* Q: How should scalar groupshared arrays be loaded/stored into/out of long vectors.
-  * A: After some consideration, we opted not to include explicit Load/Store operations for this function.
-       There are at least a couple ways this could be resolved, and the preferred solution is outside the scope.
 
 <!-- {% endraw %} -->
\ No newline at end of file
diff --git a/proposals/NNNN-dxil-vectors.md b/proposals/NNNN-dxil-vectors.md
index c19cd933..3c5e5154 100644
--- a/proposals/NNNN-dxil-vectors.md
+++ b/proposals/NNNN-dxil-vectors.md
@@ -94,7 +94,7 @@ Here and hereafter, `NUM` is the number of elements in the loaded vector, `TYPE`
 #### Vector access
 
 Dynamic access to vectors were previously converted to array accesses.
-Native vectors can be accessed using `extractelement`, `insertelement`, or `getelementptr` operations.
+Native vectors can be dynamically accessed using `extractelement`, `insertelement`, or `getelementptr` operations.
 Previously usage of `extractelement` and `insertelement` in DXIL didn't allow dynamic index parameters.
 
 #### Elementwise intrinsics
@@ -126,10 +126,10 @@ The elementwise intrinsics that have native vector variants represent the
  <[NUM] x [TYPE]> @dx.op.tertiary.v[NUM][TY](i32 opcode, <[NUM] x [TYPE]> operand1, <[NUM] x [TYPE]> operand2, <[NUM] x [TYPE]> operand3)
 ```
 
-The scalarized variants of these DXIL intrinsics will remain unchanged and can be used in conjunction
+The scalar variants of these DXIL intrinsics will remain unchanged and can be used in conjunction
  with the vector variants.
-This means that the same language-level vector could be used in scalarized operations and native vector operations
- within the same shader by being scalarized as needed even within the same shader.
+This means that the same language-level vector (of any length) could be used
+ in scalarized operations and native vector operations even within the same shader.
 
 ### Validation Changes
 

From 5345b717ba5dee4a60bb763dee2bdc393f68bcf8 Mon Sep 17 00:00:00 2001
From: Greg Roth <grroth@microsoft.com>
Date: Fri, 31 Jan 2025 11:26:09 -0700
Subject: [PATCH 11/11] update limits to 1024 in all places

missed a couple
---
 proposals/0026-hlsl-long-vector-type.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/proposals/0026-hlsl-long-vector-type.md b/proposals/0026-hlsl-long-vector-type.md
index 163dd08d..fc5afb2a 100644
--- a/proposals/0026-hlsl-long-vector-type.md
+++ b/proposals/0026-hlsl-long-vector-type.md
@@ -27,7 +27,7 @@ To take advantage of specialized hardware that can accelerate longer vector oper
 
 ## Proposed solution
 
-Enable vectors of length between 5 and 128 inclusive in HLSL using existing template-based vector declarations.
+Enable vectors of length between 5 and 1024 inclusive in HLSL using existing template-based vector declarations.
 Such vectors will hereafter be referred to as "long vectors".
 These will be supported for all elementwise intrinsics that take variable-length vector parameters.
 For certain operations, these vectors will be represented as native vectors using
@@ -304,7 +304,7 @@ The subscript access is flexible and generic and makes other accessors redundant
 ## Open Issues
 
 * Q: Is there a limit on the Number of Components in a vector?
-  * A: 128. It's big enough for some known uses.
+  * A: 1024. It's big enough for known uses.
 There aren't concrete reasons to restrict the vector length.
 Having a limit facilitates testing and sets expectations for both hardware and software developers.