Document
SPARC Technology
Business
Preliminary
STP1030
May 1995
DATA SHEET
UltraSPARC-I
High-Performance 64-Bit RISC Processor
INTRODUCTION
The STP1030, UltraSPARC-I, is a high-performance, highly-integrated superscalar processor implementing the SPARC V9 64-bit RISC architecture. The STP1030 is capable of sustaining the execution of up to four instructions per cycle even in the presence of conditional branches and cache misses. This sustained performance is supported by a decoupled Prefetch and Dispatch Unit with Instruction Buffer to feed the Execution Unit. On the output side of the Execution Unit, Load and Store buffers completely decouple pipeline execution from data cache misses. Instructions predicted to be executed are issued in program order to multiple functional units, execute in parallel and can complete out of order. In order to further increase the number of instructions executed per cycle, instructions from different blocks (e.g. instructions before and after a conditional branch) can be issued in the same group.
The STP1030 supports 2D, 3D graphics, image processing, video compression and decompression and video effects through the sophisticated VISual Instruction Set. This instruction set supports high levels of multimedia performance including real-time H.261 video compression/decompression and 2 streams of MPEG-2 decompression at full broadcast quality with no additional hardware support.
Features:
• SPARC V9 Architecture Compliant • Binary Compatible with all SPARC Application code • VISual (Multimedia Capable) Instruction Set • Multi-Processing Support
- Glueless 4-processor connection with minimum latency
- Snooping or Directory Based Protocol Support • 4-way SuperScalar Design with 9 execution units
- 4 Integer Execution Units - 3 Floating-point Execution Units - 2 Graphics Execution Units • Selectable Little or Big Endian Byte Ordering • 64-Bit Address Pointers • 16KByte Non-blocking Data Cache • 16KByte Instruction Cache - In-Cache 2-bit Branch Prediction - Single Cycle Branch Following
• Integrated 2nd Level Cache Controller - Supports .5-4MBytes Cache Sizes - Sustained throughput of 1 load/cycle - 2.6Gbyte/sec Processor-Cache bandwidth
• Block Load/Store Instructions - 1.3GByte/sec processor-memory bandwidth - 600 MByte/sec Sustained Processor-Memory Transfers
• Ease of Use - JTAG Boundary scan - Performance Instrumentation
• Technology/packaging - 0.5um 4-layer metal CMOS process - Operates at 3.3V - 521 pin plastic Ball Grid Array (BGA)
• Power management
Preliminary STP1030
SPARC Technology
Business
BLOCK DIAGRAM
Prefetch and Dispatch Unit (PDU) Instruction Cache and Buffer
Grouping Logic
Integer Reg and Annex
Integer Execution Unit (IEU)
Memory Management Unit (MMU)
Load Store Unit (LSU)
Data Cache
Load Queue
Store Queue
FP Reg
Floating Point Unit (FPU)
FP multiply FP add
FP divide Graphics Unit (GRU)
External Cache Unit (ECU) Memory Interface Unit (MIU)
UltraSPARC-I Bus
Figure 1. Functional Block Diagram
External Cache RAM
ULTRASPARC-I COMPONENT OVERVIEW
In a single chip implementation, the UltraSPARC-I processor integrates the following components (seeFigure 1):
• A prefetch, branch prediction and dispatch unit • A 16 Kbytes instruction cache • An MMU composed of a 64-entry iTLB and a 64-entry dTLB • An integer execution unit with two ALUs • One load/store unit with a separate address generation adder • A load buffer and a store buffer decoupling data accesses from the pipeline • A 16 Kbyte data cache • A floating-point unit with independent add, multiply and divide/square root sub-units • a graphics unit composed of two independent execution pipelines • a unit controlling accesses to the external cache • a unit responsible for main memory and I/O accesses
Prefetch and Dispatch Unit
The prefetch and dispatch unit fetches instructions ahead of time (before they are actually needed in the pipeline) so that the execution units do not starve for instructions. Instructions can be prefetched from all levels of the memory hierarchy, i.e. the instruction cache, the external cache and main memory. In order to prefetch across conditional branches, a dynamic branch prediction scheme is implemented in
Sun Microsystems, Inc
2
High-Performance 64-Bit RISC Processor - UltraSPARC-I
Preliminary STP1030
hardware. The outcome of a branch is based on a two-bit history of the branch. A “next field” associated with every four instructions in the instruction cache (I-cache) points to the next I-cache line to be fetched. The use of the next field makes it possible to follow taken branches and basically provided the same instruction bandwidth achieved while running sequential code. Prefetched instructions are stored in the instruction buffer until they are sent to the rest of the pipeline. Up to 12 instructions can be buffered.
Instruction Cache
The instruction cache is a 16 Kbyte two-way set associative cache with 32 byte blocks. The cache is physically indexed and contains physi.