February 2012 Altera Corporation
AIB-01020-1.1 Advance Information Brief
© 2012 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS,
QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the U.S. Patent and Trademark
Office and in other countries. All other words and logos identified as trademarks or service marks are the property of their
respective holders as described
products to current specificatio
products and services at any ti
of any information, product, o
advised to obtain the latest ver
for products or services.
101 Innovation Drive
San Jose, CA 95134
www.altera.com
ISO
SoC FPGA ARM Cortex-A9 MPCore
Processor Advance Information Brief
at www.altera.com/common/legal.html. Altera warrants performance of its semiconductor
ns in accordance with Altera's standard warranty, but reserves the right to make changes to any
me without notice. Altera assumes no responsibility or liability arising out of the application or use
r service described herein except as expressly agreed to in writing by Altera. Altera customers are
sion of device specifications before relying on any published information and before placing orders
9001:2008
Registered
This document describes the dual-core ARM® Cortex™-A9 MPCore™ processor
integrated in the hard processor system (HPS) of the Altera Cyclone® V and Arria® V
system on a chip (SoC) FPGAs. This innovative HPS contains a microprocessor unit
(MPU) with a dual-core ARM Cortex-A9 MPCore 32-bit application-class processor,
memory controllers, and a rich set of system peripherals, hardened in Altera's most
advanced 28-nm FPGA fabric. These SoC FPGAs provide the performance, power,
and cost savings of hard logic, with the flexibility and time-to-market benefits of
programmable logic.
ARM Cortex-A9 MPCore Processor Architecture
ARM processors are the standard in embedded systems. Altera SoC FPGAs leverage
one of ARM's latest, high-performance Cortex-A9 processor architectures. The
Cortex-A9 architecture provides industry-leading performance, the latest ARM
features and capabilities, and is widely deployed in products ranging from wireless
handsets to tablet computers. Figure 1 shows the progression of ARM processor
performance and capability.
Figure 1. ARM Processor Family Lineup
ARM7 Cortex-M0
ARM9
ARM11
Cortex-M1
Cortex-M3
Cortex-R4
Cortex-R5
Cortex-M4
Cortex-R7
Cortex-A5
Cortex-A8
Cortex-A9
Cortex-A15
Capability
Pe
rfo
rm
a
n
ce
a
n
d
Fu
nc
tio
na
lity
Microcontroller
Application Processors
(Cortex-A Series)
Real-Time Control
(Cortex-R Series)
Subscribe
ARM Cortex-A9 MPCore Processor Architecture Page 2
The dual-core ARM Cortex-A9 MPCore processor in Altera SoC FPGAs is designed
for maximum performance and power efficiency, implementing the widely-supported
ARMv7 instruction set architecture to address a broad range of industrial,
automotive, and wireless applications. The Cortex-A9 MPCore processor architecture
includes the following features:
■ Dual-core multiprocessing, supporting both symmetric multiprocessing (SMP)
and asymmetric multiprocessing (AMP)
■ Multi-issue superscalar, out-of-order, speculative execution 8-stage pipeline
delivering 2.5 DMIPS/MHz per CPU
■ Advanced branch prediction
■ Single- and double-precision IEEE standard 754-1985 floating-point mathematical
operations
■ ARM NEON™ 128-bit single instruction multiple data (SIMD) media processing
engine
■ Jazelle® byte-code dynamic compiler support
■ TrustZone® architecture for enhanced system security
1 For more details about ARM Cortex-A9 processors, refer to the ARM Cortex-A9
Processors white paper.
As shown in Figure 2, the Cortex-A9 processor architecture supports the ARM
performance-optimized instruction set and the latest memory-optimized Thumb®-2
mixed instruction set.
February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief
ARM Cortex-A9 MPCore Processor Architecture Page 3
The Thumb-2 instruction set optimizes processing performance in systems with
narrow memory data paths and improves energy efficiency. On average, Thumb-2
code has a 31% smaller memory footprint and runs 38% faster than the original
Thumb instruction set.
Figure 2. ARM Processor Family Architectural Features
ARM1156T2ARM946ARM7TDMI Cortex-R4 Cortex-M4 Cortex-M0
ARM1136JARM968SC100 Cortex-R5 Cortex-M3 Cortex-M1
ARM176JZARM926 Cortex-A5 SC300 SC000
ARM1156T2 Cortex-A8
ARM11MP Cortex-A9
Cortex-R7
Cortex-A15
Microcontroller-Oriented
Processors
Application and Real-Time
Processors
ARM Performance-Optimized 32-bit Instruction Set
Thumb 16-bit
Instruction Set Thumb-2 Memory-Optimized Instruction Set
IEEE-754 Floating Point Arithmetic
Jazelle Bytecode Dynamic Compiler Support
Nested Vectored
Interrupts
Wake-up Interrupt
Controller
TrustZone System Security
Single Instruction Multiple Data
(SIMD) Processing
NEON Media Engine
Virtualization
Ar
ch
ite
ct
ur
a
l F
e
a
tu
re
s
Pr
oc
es
so
r A
rc
hi
te
ct
ur
e
(In
cre
as
ing
C
ap
ab
ilit
y)
February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief
Cache Memory Page 4
Figure 3 provides a detailed block diagram of the MPU subsystem. The MPU
subsystem includes two Cortex-A9 processor cores, the level 2 (L2) cache and memory
subsystem, Snoop Control Unit (SCU), Accelerator Coherency Port (ACP), and debug
functions.
.
Cache Memory
Cache memory improves the performance of a processor-based system and helps
reduce power consumption. Cache memory that is tightly integrated with an
associated processor core is called level 1 (L1) cache. Each Cortex-A9 CPU has two
independent 32-kilobyte (KB) L1 caches—one for instructions and one for data—
allowing simultaneous instruction fetches and data access. Each L1 cache is 4-way set
associative and has an eight-word line length.
Figure 3. MPU Subsystem
D
eb
ug
gi
ng
Performance Monitor
Program Trace
Performance Monitor
Program Trace
Event Trace
Cross Triggering
CoreSight Multicore Debug & Trace
Pr
oc
es
so
r C
or
es
&
L1
C
ac
he
800 MHz
ARM Cortex-A9
32-bit Dual-Issue
Superscaler RISC CPU
800 MHz
ARM Cortex-A9
32-bit Dual-Issue
Superscaler RISC CPU
32-KB
L1 Data
Cache
32-KB
L1 Data
Cache
32-KB
Instruction
Cache
32-KB
Instruction
Cache
NEON DSP/Media SIMD
Processing Engine
IEEE 754 Floating Point
(single-,double-Precision)
Jazelle Bytecode
Dynamic Compiler
NEON DSP/Media SIMD
Processing Engine
Private
Watchdog
Timer
Private MMU with
TrustZone Security
L2
C
ac
he
SCU
512-KB L2 Unified
Cache with ECC
ACP
Cache
Lockdown
Support
Interrupt Controller
Jazelle Bytecode
Dynamic Compiler
IEEE 754 Floating Point
(single-,double-Precision)
Private
Watchdog
Timer
Private
32-bit
Timer
Private
32-bit
Timer
Private MMU with
TrustZone Security
February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief
Cache Memory Page 5
The HPS also includes a 512-KB L2 shared, unified cache memory (instruction and
data for both Cortex-A9 cores). The L2 cache is 8-way set-associative with
programmable locking by line, way, and master. The L2 cache includes error
correction code (ECC) reporting.
Snoop Control Unit (SCU)
The snoop control unit (SCU) is an integral part of the cache memory systems and
manages data traffic for the two Cortex-A9 CPUs and the memory system, including
the L2 cache. In a multiprocessor system, each CPU may operate on shared data. The
SCU ensures that each processor core operates on the most up-to-date copy of data,
maintaining cache coherency. Figure 4 shows the coherent memory, SCU, and ACP.
The SCU maintains bidirectional coherency among the L1 data caches ensuring both
CPUs access to the most recent data. When a CPU writes to any coherent memory
location, the SCU ensures that the relevant data is coherent
(updated/tagged/invalidated). Similarly, the SCU monitors read operations from a
coherent memory location. If the required data is already stored in the L1 caches, the
data is returned directly to the requesting CPU. If the data is not in the L1 cache, the
L2 cache checks its contents before the data is finally retrieved from the main memory.
The SCU also manages accesses from the ACP and arbitrates between the Cortex-A9
CPUs if both attempt simultaneous access to the L2 cache.
Accelerator Coherency Port
The ACP allows level 3 (L3) interconnect masters—such as Ethernet media access
controller (EMAC), direct memory access (DMA), and FPGA-to-HPS bridge—to share
data coherently with the MPU subsystem. With the ACP, read accesses to coherent
memory regions always return the most current data, whether in L1 cache, L2 cache,
or main memory. Similarly, write operations to coherent memory regions cause the
SCU to force coherence before the data is forwarded to the memory system.
Figure 4. Coherent Memory, SCU, and ACP
ARM Cortex
32-bit Dual-I
Superscaler RIS
SCU
L2
Unified Cache
FPGA
Fabric
ARM Cortex-A9
32-bit Dual-Issue
Superscaler RISC CPU
L1 Data
Cache
L1 Data
Cache
32-Bit
Instruction
Cache
3
Ins
C
HPS
Peripherals
L3
Interconnect
ACP ID
Mapper
y
Bidirectional Coherency
Coherent Memory
ACP SCU
February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief
IEEE standard 754-1985 Floating Point Unit Page 6
The ACP ID mapper is located between the L3 interconnect and the ACP. The ARM
ACP port is designed to support up to eight unique transactions concurrently (eight
unique transaction IDs are supported). However, the FPGA fabric can have any
number of masters requesting coherent transactions. The ACP mapper dynamically
allocates the available eight transaction IDs to the requesting masters to ensure that all
masters have access to coherent memory regions.
IEEE standard 754-1985 Floating Point Unit
Both ARM Cortex-A9 processor cores include full support for IEEE standard 754-1985
floating-point operations—important for imaging, signal processing, scientific
computing, and graphics.
The floating-point unit (FPU) fully supports single- and double-precision add,
subtract, multiply, divide, multiply/accumulate, and square-root operations. The FPU
also converts between floating-point data formats and integers, including special
operations to round-towards-zero required by high-level languages.
The FPU greatly increases performance for applications that heavily rely on
floating-point arithmetic operations such as advanced control algorithms, imaging
(scaling, 3D transforms), fast Fourier transforms (FFT), and digital filtering in
graphics.
NEON Media Processing Engine
Both of the ARM Cortex-A9 processor cores include an ARM NEON media
processing engine (MPE) that supports simultaneous operations on multiple data,
also called SIMD processing, shown in Figure 5. The NEON processing engine
accelerates multimedia and signal processing algorithms, such as video
encoding/decoding, 2D/3D graphics, audio and speech processing, image
processing, telephony, and sound synthesis. The SIMD architecture completes some
signal processing algorithms up to eight times faster than a scaler processor.
Figure 5. SIMD Processing
Destination Register
Source Register
Source Register
OpOpOpOp
Single Instruction Multiple Data (SIMD)
February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief
Jazelle Dynamic Byte-Code Compiler Support Page 7
The Cortex-A9 NEON MPE performs operations on the following data types:
■ SIMD and scalar single-precision floating-point computations
■ Scalar double-precision floating-point computation
■ SIMD and scalar half-precision floating-point conversion
■ 8-, 16-, 32-, and 64-bit signed and unsigned integer SIMD computation
■ 8- or 16-bit polynomial computation for single-bit coefficients
The available operations include the following functionality:
■ Addition and subtraction
■ Multiplication with optional accumulation
■ Maximum or minimum value-driven lane selection operations
■ Inverse square-root approximation
■ Comprehensive data-structure load instructions, including
register-bank-resident table lookup
1 For more details on the ARM NEON processing engine, including application
benchmarks, refer to the NEON Technology Introduction presentation.
Jazelle Dynamic Byte-Code Compiler Support
Each Cortex-A9 CPU includes support for ARM Jazelle technology, a combined
hardware and software solution. ARM Jazelle software is a full-featured,
multi-tasking Java Virtual Machine (JVM), highly optimized to exploit Jazelle
architecture extensions.
The ARM Jazelle Direct Bytecode eXecution (DBX) technology supports direct
execution of Java bytecodes. This flexibility allows a Cortex-A9 processor to efficiently
run an established operating system, middleware, and Java applications.
The ARM Jazelle Runtime Compilation Target (RCT) technology supports efficient
ahead-of-time (AOT) and just-in-time (JIT) compilation with Java and other execution
environments.
TrustZone System-Wide Security
ARM's TrustZone technology lets you create secure subsystem extensions throughout
the HPS and FPGA fabric. This system-wide approach enables secure transactions
between the processors, peripherals, and memory, ensuring that errant or malicious
software cannot interact with or record data traffic between devices in the secure
domain.
The processor, DMA controller, and FPGA-to-HPS bridge support security on a per-
transaction basis. The SDRAM controller subsystem supports secure and non-secure
regions for each port of the SDRAM multiport front end. At boot time the processors
and slave ports in the system are set to a secure state. The security settings for most of
the slave ports can be modified under software control.
February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief
Generic Interrupt Controller Page 8
Generic Interrupt Controller
The Generic Interrupt Controller (GIC) is shared by both Cortex-A9 CPUs, as shown
in Figure 6. There are over 130 unique interrupt sources, including the dedicated HPS
peripherals and functions implemented in the FPGA fabric. There are up to 64 unique
interrupts originating from IP in the FPGA fabric. The FPGA configuration manager
and the timeout signals from the FPGA-to-HPS and HPS-to-FPGA bus bridges are
also potential interrupt sources.
For some peripherals, such as the USB controllers, multiple interrupt sources are
combined to a single interrupt to the processors. Each USB controller supports up to
32 interrupts from individual USB endpoints, all of which are combined into a single
interrupt to the processor.
Private CPU Timers
As shown in Figure 3, each Cortex-A9 processor contains both private timers and
shares a global timer. Private timers are only accessible from the associated processor
core. Each CPU has a private 32-bit interval timer and a private 32-bit watchdog timer.
The 32-bit interval timer has a programmable 8-bit prescaler, supports single-shot or
auto-reload modes, and has an optional processor interrupt.
Figure 6. Interrupts Sources to Interrupt Controller
ARM Cortex-A9
CPU 0
ARM Cortex-A9
CPU 1
Generic
Interrupt
Controller
Interrupt Sources
ARM Cortex-A9
CPU0/Caches
ARM Cortex-A9
CPU1/Caches
SCU
Snoop Control Unit
L2 Cache
DDR SDRAM
DMA Controller
Ethernet 1:0
USB 1:0
CAN 1:0
MMC/SD
NAND Flash
Quad SPI Flash
SPI 3:0
I C 3:02
UART 1:0
GPIO 2:0
Timer 3:0
Watchdog 1:0
PLL Lock
FPGA Manager
FPGA-based IP
FPGA2SoC
Bridge Timeout
SoC2FPGA
Bridge Timeout
February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief
HPS Boot Options Page 9
The 32-bit watchdog timer is primarily designed to react to errant programs by
resetting the CPU. When the watchdog timer is enabled, application software must
periodically reset the counter. If the counter reaches zero, it implies that the
application software is stuck in an infinite loop or is otherwise locked. When the
watchdog timer reaches zero, the system resets the CPU. If not used as a watchdog
timer, it becomes an optional second interval timer.
Shared Global Timer
Both CPUs share a global 64-bit auto-incrementing timer, which is primarily used by
the operating system. Each CPU has a private 64-bit compare value that generates
private interrupts when the counter reaches the specified value.
HPS Boot Options
The HPS of the Altera SoC FPGA can operate independently from the FPGA fabric.
The processor booting does not depend upon the FPGA fabric, unless desired.
After a power-on reset or processor reset, one processor begins executing from an on-
chip RAM containing the primary bootloader program while the second processor is
held in reset. The boot processor reads the state of three boot select pins that specify
where the initial software image is stored, typically in flash memory. In the case of a
boot from flash memory, the ROM code initializes the boot source interface, copies the
initial software image to the internal SRAM, and then executes the program. The
initial software image includes the entry point to the user's application software
image. In the case of a boot from the FPGA fabric, the CPU waits until the FPGA
fabric reports that it is configured and has entered user mode before the initial
software image is copied from the FPGA fabric to internal SRAM and program
execution begins.
The processor can boot from any of the following sources:
■ External Quad Shared Peripheral Interrupt (SPI) flash memory (NOR)
■ External NAND flash memory (Open NAND Flash Interface (ONFI)
1.0-compliant)
■ External MultiMediaCard (MMC)/Secure Digital (SD) flash memory
■ Via the FPGA fabric—the CPU waits until the FPGA fabric reports that it is
configured and has entered user mode
The initial software image also contains various elements to protect the image against
accidental or intentional modification. If the processor cannot successfully boot from
the specified location, the HPS automatically reverts to the location of the last image
loaded. If there are no previous images loaded, then the system attempts to boot from
the FPGA fabric.
February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief
System Interconnect Page 10
System Interconnect
The remainder of the HPS is located outside of the MPU subsystem, as shown in
Figure 7. The processor accesses the rest of the HPS through a pair of 64-bit Advanced
Microcontroller Bus Architecture (AMBA®) Advanced eXtensible Interface (AXI™)
masters. The high-bandwidth peripherals, including the FPGA data ports, connect to
the L3 interconnect structure. The L3 interconnect is further partitioned into three
major sub-switches. The L3 interconnect uses a multilayer, non-blocking architecture
that supports multiple, simultaneous transactions between peripherals, sub-switches,
SDRAM, and the MPU subsystem. Each L3 bus master has programmable priority.
SoC FPGAs use the 32-bit AMBA High-performance Bus (AHB™) as a low-power bus
to low-power peripherals, and high-performance 64-bit AXI to high-performance
peripherals. The lower-bandwidth peripherals reside on the level 4 (L4) bus,
implemented as a 32-bit ARM Advanced Peripheral Bus (APB™). Some peripherals,
like the DMA controller, have low-bandwidth control connections on the L4
interconnect and high-bandwidth data transfer connections on the L3 interconnect.
February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief
System Interconnect Page 11
Figure 7. SoC FPGA Hard Processor System Connections
CPU0 CPU1
SCU
L2 Cache
ARM Cortex-A9 MP Core
MPU SubsystemL3 Interconnect
(NIC-301)
L3 Slave Peripheral S
本文档为【aib-01020-soc-fpga-cortex-a9-processor】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。