aib-01020-soc-fpga-cortex-a9-processor

aib-01020-soc-fpga-cortex-a9-processor February 2012 Altera Corporation AIB-01020-1.1 Advance Information Brief © 2012 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and reg...

February 2012 Altera Corporation AIB-01020-1.1 Advance Information Brief © 2012 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the U.S. Patent and Trademark Office and in other countries. All other words and logos identified as trademarks or service marks are the property of their respective holders as described products to current specificatio products and services at any ti of any information, product, o advised to obtain the latest ver for products or services. 101 Innovation Drive San Jose, CA 95134 www.altera.com ISO SoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief at www.altera.com/common/legal.html. Altera warrants performance of its semiconductor ns in accordance with Altera's standard warranty, but reserves the right to make changes to any me without notice. Altera assumes no responsibility or liability arising out of the application or use r service described herein except as expressly agreed to in writing by Altera. Altera customers are sion of device specifications before relying on any published information and before placing orders 9001:2008 Registered This document describes the dual-core ARM® Cortex™-A9 MPCore™ processor integrated in the hard processor system (HPS) of the Altera Cyclone® V and Arria® V system on a chip (SoC) FPGAs. This innovative HPS contains a microprocessor unit (MPU) with a dual-core ARM Cortex-A9 MPCore 32-bit application-class processor, memory controllers, and a rich set of system peripherals, hardened in Altera's most advanced 28-nm FPGA fabric. These SoC FPGAs provide the performance, power, and cost savings of hard logic, with the flexibility and time-to-market benefits of programmable logic. ARM Cortex-A9 MPCore Processor Architecture ARM processors are the standard in embedded systems. Altera SoC FPGAs leverage one of ARM's latest, high-performance Cortex-A9 processor architectures. The Cortex-A9 architecture provides industry-leading performance, the latest ARM features and capabilities, and is widely deployed in products ranging from wireless handsets to tablet computers. Figure 1 shows the progression of ARM processor performance and capability. Figure 1. ARM Processor Family Lineup ARM7 Cortex-M0 ARM9 ARM11 Cortex-M1 Cortex-M3 Cortex-R4 Cortex-R5 Cortex-M4 Cortex-R7 Cortex-A5 Cortex-A8 Cortex-A9 Cortex-A15 Capability Pe rfo rm a n ce a n d Fu nc tio na lity Microcontroller Application Processors (Cortex-A Series) Real-Time Control (Cortex-R Series) Subscribe ARM Cortex-A9 MPCore Processor Architecture Page 2 The dual-core ARM Cortex-A9 MPCore processor in Altera SoC FPGAs is designed for maximum performance and power efficiency, implementing the widely-supported ARMv7 instruction set architecture to address a broad range of industrial, automotive, and wireless applications. The Cortex-A9 MPCore processor architecture includes the following features: ■ Dual-core multiprocessing, supporting both symmetric multiprocessing (SMP) and asymmetric multiprocessing (AMP) ■ Multi-issue superscalar, out-of-order, speculative execution 8-stage pipeline delivering 2.5 DMIPS/MHz per CPU ■ Advanced branch prediction ■ Single- and double-precision IEEE standard 754-1985 floating-point mathematical operations ■ ARM NEON™ 128-bit single instruction multiple data (SIMD) media processing engine ■ Jazelle® byte-code dynamic compiler support ■ TrustZone® architecture for enhanced system security 1 For more details about ARM Cortex-A9 processors, refer to the ARM Cortex-A9 Processors white paper. As shown in Figure 2, the Cortex-A9 processor architecture supports the ARM performance-optimized instruction set and the latest memory-optimized Thumb®-2 mixed instruction set. February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief ARM Cortex-A9 MPCore Processor Architecture Page 3 The Thumb-2 instruction set optimizes processing performance in systems with narrow memory data paths and improves energy efficiency. On average, Thumb-2 code has a 31% smaller memory footprint and runs 38% faster than the original Thumb instruction set. Figure 2. ARM Processor Family Architectural Features ARM1156T2ARM946ARM7TDMI Cortex-R4 Cortex-M4 Cortex-M0 ARM1136JARM968SC100 Cortex-R5 Cortex-M3 Cortex-M1 ARM176JZARM926 Cortex-A5 SC300 SC000 ARM1156T2 Cortex-A8 ARM11MP Cortex-A9 Cortex-R7 Cortex-A15 Microcontroller-Oriented Processors Application and Real-Time Processors ARM Performance-Optimized 32-bit Instruction Set Thumb 16-bit Instruction Set Thumb-2 Memory-Optimized Instruction Set IEEE-754 Floating Point Arithmetic Jazelle Bytecode Dynamic Compiler Support Nested Vectored Interrupts Wake-up Interrupt Controller TrustZone System Security Single Instruction Multiple Data (SIMD) Processing NEON Media Engine Virtualization Ar ch ite ct ur a l F e a tu re s Pr oc es so r A rc hi te ct ur e (In cre as ing C ap ab ilit y) February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief Cache Memory Page 4 Figure 3 provides a detailed block diagram of the MPU subsystem. The MPU subsystem includes two Cortex-A9 processor cores, the level 2 (L2) cache and memory subsystem, Snoop Control Unit (SCU), Accelerator Coherency Port (ACP), and debug functions. . Cache Memory Cache memory improves the performance of a processor-based system and helps reduce power consumption. Cache memory that is tightly integrated with an associated processor core is called level 1 (L1) cache. Each Cortex-A9 CPU has two independent 32-kilobyte (KB) L1 caches—one for instructions and one for data— allowing simultaneous instruction fetches and data access. Each L1 cache is 4-way set associative and has an eight-word line length. Figure 3. MPU Subsystem D eb ug gi ng Performance Monitor Program Trace Performance Monitor Program Trace Event Trace Cross Triggering CoreSight Multicore Debug & Trace Pr oc es so r C or es & L1 C ac he 800 MHz ARM Cortex-A9 32-bit Dual-Issue Superscaler RISC CPU 800 MHz ARM Cortex-A9 32-bit Dual-Issue Superscaler RISC CPU 32-KB L1 Data Cache 32-KB L1 Data Cache 32-KB Instruction Cache 32-KB Instruction Cache NEON DSP/Media SIMD Processing Engine IEEE 754 Floating Point (single-,double-Precision) Jazelle Bytecode Dynamic Compiler NEON DSP/Media SIMD Processing Engine Private Watchdog Timer Private MMU with TrustZone Security L2 C ac he SCU 512-KB L2 Unified Cache with ECC ACP Cache Lockdown Support Interrupt Controller Jazelle Bytecode Dynamic Compiler IEEE 754 Floating Point (single-,double-Precision) Private Watchdog Timer Private 32-bit Timer Private 32-bit Timer Private MMU with TrustZone Security February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief Cache Memory Page 5 The HPS also includes a 512-KB L2 shared, unified cache memory (instruction and data for both Cortex-A9 cores). The L2 cache is 8-way set-associative with programmable locking by line, way, and master. The L2 cache includes error correction code (ECC) reporting. Snoop Control Unit (SCU) The snoop control unit (SCU) is an integral part of the cache memory systems and manages data traffic for the two Cortex-A9 CPUs and the memory system, including the L2 cache. In a multiprocessor system, each CPU may operate on shared data. The SCU ensures that each processor core operates on the most up-to-date copy of data, maintaining cache coherency. Figure 4 shows the coherent memory, SCU, and ACP. The SCU maintains bidirectional coherency among the L1 data caches ensuring both CPUs access to the most recent data. When a CPU writes to any coherent memory location, the SCU ensures that the relevant data is coherent (updated/tagged/invalidated). Similarly, the SCU monitors read operations from a coherent memory location. If the required data is already stored in the L1 caches, the data is returned directly to the requesting CPU. If the data is not in the L1 cache, the L2 cache checks its contents before the data is finally retrieved from the main memory. The SCU also manages accesses from the ACP and arbitrates between the Cortex-A9 CPUs if both attempt simultaneous access to the L2 cache. Accelerator Coherency Port The ACP allows level 3 (L3) interconnect masters—such as Ethernet media access controller (EMAC), direct memory access (DMA), and FPGA-to-HPS bridge—to share data coherently with the MPU subsystem. With the ACP, read accesses to coherent memory regions always return the most current data, whether in L1 cache, L2 cache, or main memory. Similarly, write operations to coherent memory regions cause the SCU to force coherence before the data is forwarded to the memory system. Figure 4. Coherent Memory, SCU, and ACP ARM Cortex 32-bit Dual-I Superscaler RIS SCU L2 Unified Cache FPGA Fabric ARM Cortex-A9 32-bit Dual-Issue Superscaler RISC CPU L1 Data Cache L1 Data Cache 32-Bit Instruction Cache 3 Ins C HPS Peripherals L3 Interconnect ACP ID Mapper y Bidirectional Coherency Coherent Memory ACP SCU February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief IEEE standard 754-1985 Floating Point Unit Page 6 The ACP ID mapper is located between the L3 interconnect and the ACP. The ARM ACP port is designed to support up to eight unique transactions concurrently (eight unique transaction IDs are supported). However, the FPGA fabric can have any number of masters requesting coherent transactions. The ACP mapper dynamically allocates the available eight transaction IDs to the requesting masters to ensure that all masters have access to coherent memory regions. IEEE standard 754-1985 Floating Point Unit Both ARM Cortex-A9 processor cores include full support for IEEE standard 754-1985 floating-point operations—important for imaging, signal processing, scientific computing, and graphics. The floating-point unit (FPU) fully supports single- and double-precision add, subtract, multiply, divide, multiply/accumulate, and square-root operations. The FPU also converts between floating-point data formats and integers, including special operations to round-towards-zero required by high-level languages. The FPU greatly increases performance for applications that heavily rely on floating-point arithmetic operations such as advanced control algorithms, imaging (scaling, 3D transforms), fast Fourier transforms (FFT), and digital filtering in graphics. NEON Media Processing Engine Both of the ARM Cortex-A9 processor cores include an ARM NEON media processing engine (MPE) that supports simultaneous operations on multiple data, also called SIMD processing, shown in Figure 5. The NEON processing engine accelerates multimedia and signal processing algorithms, such as video encoding/decoding, 2D/3D graphics, audio and speech processing, image processing, telephony, and sound synthesis. The SIMD architecture completes some signal processing algorithms up to eight times faster than a scaler processor. Figure 5. SIMD Processing Destination Register Source Register Source Register OpOpOpOp Single Instruction Multiple Data (SIMD) February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief Jazelle Dynamic Byte-Code Compiler Support Page 7 The Cortex-A9 NEON MPE performs operations on the following data types: ■ SIMD and scalar single-precision floating-point computations ■ Scalar double-precision floating-point computation ■ SIMD and scalar half-precision floating-point conversion ■ 8-, 16-, 32-, and 64-bit signed and unsigned integer SIMD computation ■ 8- or 16-bit polynomial computation for single-bit coefficients The available operations include the following functionality: ■ Addition and subtraction ■ Multiplication with optional accumulation ■ Maximum or minimum value-driven lane selection operations ■ Inverse square-root approximation ■ Comprehensive data-structure load instructions, including register-bank-resident table lookup 1 For more details on the ARM NEON processing engine, including application benchmarks, refer to the NEON Technology Introduction presentation. Jazelle Dynamic Byte-Code Compiler Support Each Cortex-A9 CPU includes support for ARM Jazelle technology, a combined hardware and software solution. ARM Jazelle software is a full-featured, multi-tasking Java Virtual Machine (JVM), highly optimized to exploit Jazelle architecture extensions. The ARM Jazelle Direct Bytecode eXecution (DBX) technology supports direct execution of Java bytecodes. This flexibility allows a Cortex-A9 processor to efficiently run an established operating system, middleware, and Java applications. The ARM Jazelle Runtime Compilation Target (RCT) technology supports efficient ahead-of-time (AOT) and just-in-time (JIT) compilation with Java and other execution environments. TrustZone System-Wide Security ARM's TrustZone technology lets you create secure subsystem extensions throughout the HPS and FPGA fabric. This system-wide approach enables secure transactions between the processors, peripherals, and memory, ensuring that errant or malicious software cannot interact with or record data traffic between devices in the secure domain. The processor, DMA controller, and FPGA-to-HPS bridge support security on a per- transaction basis. The SDRAM controller subsystem supports secure and non-secure regions for each port of the SDRAM multiport front end. At boot time the processors and slave ports in the system are set to a secure state. The security settings for most of the slave ports can be modified under software control. February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief Generic Interrupt Controller Page 8 Generic Interrupt Controller The Generic Interrupt Controller (GIC) is shared by both Cortex-A9 CPUs, as shown in Figure 6. There are over 130 unique interrupt sources, including the dedicated HPS peripherals and functions implemented in the FPGA fabric. There are up to 64 unique interrupts originating from IP in the FPGA fabric. The FPGA configuration manager and the timeout signals from the FPGA-to-HPS and HPS-to-FPGA bus bridges are also potential interrupt sources. For some peripherals, such as the USB controllers, multiple interrupt sources are combined to a single interrupt to the processors. Each USB controller supports up to 32 interrupts from individual USB endpoints, all of which are combined into a single interrupt to the processor. Private CPU Timers As shown in Figure 3, each Cortex-A9 processor contains both private timers and shares a global timer. Private timers are only accessible from the associated processor core. Each CPU has a private 32-bit interval timer and a private 32-bit watchdog timer. The 32-bit interval timer has a programmable 8-bit prescaler, supports single-shot or auto-reload modes, and has an optional processor interrupt. Figure 6. Interrupts Sources to Interrupt Controller ARM Cortex-A9 CPU 0 ARM Cortex-A9 CPU 1 Generic Interrupt Controller Interrupt Sources ARM Cortex-A9 CPU0/Caches ARM Cortex-A9 CPU1/Caches SCU Snoop Control Unit L2 Cache DDR SDRAM DMA Controller Ethernet 1:0 USB 1:0 CAN 1:0 MMC/SD NAND Flash Quad SPI Flash SPI 3:0 I C 3:02 UART 1:0 GPIO 2:0 Timer 3:0 Watchdog 1:0 PLL Lock FPGA Manager FPGA-based IP FPGA2SoC Bridge Timeout SoC2FPGA Bridge Timeout February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief HPS Boot Options Page 9 The 32-bit watchdog timer is primarily designed to react to errant programs by resetting the CPU. When the watchdog timer is enabled, application software must periodically reset the counter. If the counter reaches zero, it implies that the application software is stuck in an infinite loop or is otherwise locked. When the watchdog timer reaches zero, the system resets the CPU. If not used as a watchdog timer, it becomes an optional second interval timer. Shared Global Timer Both CPUs share a global 64-bit auto-incrementing timer, which is primarily used by the operating system. Each CPU has a private 64-bit compare value that generates private interrupts when the counter reaches the specified value. HPS Boot Options The HPS of the Altera SoC FPGA can operate independently from the FPGA fabric. The processor booting does not depend upon the FPGA fabric, unless desired. After a power-on reset or processor reset, one processor begins executing from an on- chip RAM containing the primary bootloader program while the second processor is held in reset. The boot processor reads the state of three boot select pins that specify where the initial software image is stored, typically in flash memory. In the case of a boot from flash memory, the ROM code initializes the boot source interface, copies the initial software image to the internal SRAM, and then executes the program. The initial software image includes the entry point to the user's application software image. In the case of a boot from the FPGA fabric, the CPU waits until the FPGA fabric reports that it is configured and has entered user mode before the initial software image is copied from the FPGA fabric to internal SRAM and program execution begins. The processor can boot from any of the following sources: ■ External Quad Shared Peripheral Interrupt (SPI) flash memory (NOR) ■ External NAND flash memory (Open NAND Flash Interface (ONFI) 1.0-compliant) ■ External MultiMediaCard (MMC)/Secure Digital (SD) flash memory ■ Via the FPGA fabric—the CPU waits until the FPGA fabric reports that it is configured and has entered user mode The initial software image also contains various elements to protect the image against accidental or intentional modification. If the processor cannot successfully boot from the specified location, the HPS automatically reverts to the location of the last image loaded. If there are no previous images loaded, then the system attempts to boot from the FPGA fabric. February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief System Interconnect Page 10 System Interconnect The remainder of the HPS is located outside of the MPU subsystem, as shown in Figure 7. The processor accesses the rest of the HPS through a pair of 64-bit Advanced Microcontroller Bus Architecture (AMBA®) Advanced eXtensible Interface (AXI™) masters. The high-bandwidth peripherals, including the FPGA data ports, connect to the L3 interconnect structure. The L3 interconnect is further partitioned into three major sub-switches. The L3 interconnect uses a multilayer, non-blocking architecture that supports multiple, simultaneous transactions between peripherals, sub-switches, SDRAM, and the MPU subsystem. Each L3 bus master has programmable priority. SoC FPGAs use the 32-bit AMBA High-performance Bus (AHB™) as a low-power bus to low-power peripherals, and high-performance 64-bit AXI to high-performance peripherals. The lower-bandwidth peripherals reside on the level 4 (L4) bus, implemented as a 32-bit ARM Advanced Peripheral Bus (APB™). Some peripherals, like the DMA controller, have low-bandwidth control connections on the L4 interconnect and high-bandwidth data transfer connections on the L3 interconnect. February 2012 Altera CorporationSoC FPGA ARM Cortex-A9 MPCore Processor Advance Information Brief System Interconnect Page 11 Figure 7. SoC FPGA Hard Processor System Connections CPU0 CPU1 SCU L2 Cache ARM Cortex-A9 MP Core MPU SubsystemL3 Interconnect (NIC-301) L3 Slave Peripheral S

                    本文档为【aib-01020-soc-fpga-cortex-a9-processor】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

aib-01020-soc-fpga-cortex-a9-processor

你可能还喜欢