Consider the following loop that adds a constant to a vector. There’s quite a lot of overhead associated with the solitary SIMD instruction. Suppose you were designing a new ISA that implemented operations like paddb. How would you make the code more efficient?
movq mm1,c ;load constant into mm1 (8 copies)
mov cx, 3 ;set up loop counter for three trips 8 × 3 = 24
mov esi, 0 ;set pointer to 0 (use as index into vector)
Next: movq mm0,x[esi] ;Repeat: load 8 bytes into mm0 using indexed addressing
paddb mm0, mm1 ; now do 8 bytes of the vector addition
movq x[esi],mm0 ; store 8 bytes of result in x
add esi,8 ; increment index by 8
loop Next ;Until all done
This problem goes to the heart of ISA design. How do we make operations more efficient? Here the parallel
addition, paddb, is first buried in a loop, and then sandwiched between a load and a store. The loop has its
overhead and the load/store can introduce wait states if the data is not cached.
We could improve things by unravelling the loop and getting more data from memory at a time, and then doing
two parallel additions. That would reduce the loop overhead.
Another solution is to use even longer wordlengths (128, 256 or greater) in order to reduce both the number
cycles of iteration and the number of memory accesses (assuming that wide memory accesses are possible).
A complex instruction could be designed that incremented a source pointer, a destination pointer, and
decremented a counter so that the hypothetical operation lpaddb could perform (in terms of the above
code):
mov esi, 0 ;
Next: movq mm0,x[esi] ;
lpaddb mm0, mm1 ;addition, pointer updates, loop count
movq x[esi],mm0 ;store result and loop
In this code the special parallel add and loop would have to perform a delayed loop or branch until after the
next instruction (the store result operation) had been executed.
You might also like to view...
If using slides created in other presentation programs in a PowerPoint 2013 presentation, ________
A) you will need to convert them to pdf files to place them in the PowerPoint 2013 presentation B) you will need to use copy and paste to place them in the PowerPoint 2013 presentation C) be sure to save the slides in a format compatible with PowerPoint 2013 D) you will need to save them as a template
Which of the following statements is TRUE about data types and delimiters?
A) Date fields are enclosed in quotation marks. B) Different data types require different delimiters. C) Numeric fields require quotation marks as delimiters. D) All data types use the same delimiters.
A(n) ________ is a window that displays when a program requires interaction with the user
Fill in the blank(s) with correct word
What is the name of the feature that is used by Metro Ethernet service providers to ensure that traffic received from a customer doesn't extend beyond the contracted rate?
A) Policing B) Shaping C) Traffic Enforcer D) Buffer Engine