首页 VEGAS Soft Vector Processor with Scratchpad Memory拉斯维加斯的软向量处理器暂存存储器

VEGAS Soft Vector Processor with Scratchpad Memory拉斯维加斯的软向量处理器暂存存储器

举报
开通vip

VEGAS Soft Vector Processor with Scratchpad Memory拉斯维加斯的软向量处理器暂存存储器VEGAS:SoftVectorProcessorwithScratchpadMemoryChristopherHan-YuChouAaronSeverance,AlexD.Brant,ZhiduoLiu,SaurabhSant,GuyLemieuxUniversityofBritishColumbiaMotivationEmbeddedprocessingonFPGAsHighperformance,computationallyintensiveSoftprocessors,e.g.Nios/MicroBlaz...

VEGAS Soft Vector Processor with Scratchpad Memory拉斯维加斯的软向量处理器暂存存储器
VEGAS:SoftVectorProcessorwithScratchpadMemoryChristopherHan-YuChouAaronSeverance,AlexD.Brant,ZhiduoLiu,SaurabhSant,GuyLemieuxUniversityofBritishColumbiaMotivationEmbeddedprocessingonFPGAsHighperformance,computationallyintensiveSoftprocessors,e.g.Nios/MicroBlaze,tooslowHowtodeliverHighPerformance?MultiprocessoronFPGACustomHardwareaccelerators(VerilogRTL)Synthesizedaccelerators(CtoFPGA)MotivationSoftvectorprocessortotherescuePreviousworkshavedemonstratedsoftvectorprocessorasaviableoptiontoprovide:ScalableperformanceandareaPurelysoftware-basedDecoupleshardware/softwaredevelopmentKeyperformancebottlenecksMemoryaccesslatencyOn-chipdatastorageefficiencyContributionVEGASArchitecturekeyfeaturesCachelessScratchpadMemoryFracturableALUsConcurrentmemoryaccessviaDMAAdvantagesEliminateson-chipdatareplicationAlso:huge#ofvectors,longvectorlengthsMoreparallelALUsFewermemoryloads/storesVEGASArchitectureScalarCore:NiosII/f@200MHzDMAEngine&ExternalDDR2VectorCore:VEGAS@120MHzConcurrentExecutionFIFOsynchronizedVEGASScratchpadMemoryinActionVectorScratchpadMemoryVectorLane0VectorLane1VectorLane2VectorLane3srcAsrcBDestsrcAsrcBDestScratchpadMemoryinActionsrcADestScratchpadAdvantagePerformanceHugeworkingset(256kB++)ExplicitlymanagedbysoftwareAsyncload/storeviaconcurrentDMAEfficientdatastorageDouble-clockedmemory(Trad.RF2xcopies)8bdatastaysas8b(Trad.RF4xcopies)Nocache(Trad.RF+1copy)ScratchpadAdvantageAccessedbyaddressregisterHuge#ofvectorsinscratchpadVEGASusesonly8vectoraddr.reg.(V0..V7)ModifycontenttoaccessdifferentvectorsAuto-incrementlessensneedtochangeV0..V7LongvectorlengthsFillentirescratchpadScratchpadAdvantage:MedianFilterVectoraddressregisterseasierthanunrollingTraditionalVectorMedianFilterForJ=0..12ForI=J..24V1=vector[i]vectorloadV2=vector[j]vectorloadCompareAndSwap(V1,V2)vector[j]=V2vectorstoreVector[i]=V1vectorstoreOptimizeaway1vectorload+1vectorstoreusingtempTotalof222loadsand222storesScratchpadAdvantage:MedianFilterFracturableALUsMultiplier–uses4x16bmultipliersMultiplieralsodoesshifts+rotateAdder–uses4x8baddersFracturableALUsAdvantageIncreasedprocessingpower4-LaneVEGAS4x32boperations/cycle8x16boperations/cycle16x8boperations/cycleMedianfilterexample32bdata:184cycles/pixel16bdata:93cycles/pixel8bdata:47cycles/pixelAreaandFrequencyNum.LanesVEGASALMDSPM9KFmax13831840131248811240131469762040130811824364012516198436840122323661113240116ALMUsagePerformanceBenchmarkNiosII/fVEGASNiosII/V32SpeedupV1V32fir509919855494693108xmotest1668869825152471767xmedian13881857208xautocor12433845027282244xconven489883462189725ximgblend12311721758903548534xfilt3x365565928134717534987xArea-DelayProductArea*Delaymeasures“throughputpermm2”Comparedtoearliervectorprocessors,VEGASoffers2-3xbetterthroughputperunitareaIntegerMatrixMultiplyIntegerMatrixMultiply4096x4096integers(64MBdataset)IntelCore2(65nm),2.5GHz,16GBDDR2VanillaIJK:474secondsVanillaKIJ:134sTiledIJK:93sTiledKIJ:68sVEGAS(65nmAlteraStratix3)Vector:44s(Niosonly:5407s)256kBScratchpad,32Lanes(about50%ofchip)200MHzNIOS,100MHzVector,1GBDDR2SODIMMConclusionsVectorprocessorPurelysoftware-basedaccelerationNohardwaredesign/RTLrecompileneeded—justprogramFasterchipdesignCanbuildvectorprocessorbeforesoftwarealgorithmsfinalizedSimpleprogrammingmodelMapswelltoFPGAManysmallmemories,multiplierblocksShouldmapwelltoASICConclusionsKeyfeaturesScratchpadMemoryEnhanceperformancewithfewerloads/storesNoon-chipdatareplication;efficientstorageDouble-clockedtohidememorylatencyFracturableALUsOperateson8b,16b,32bdataefficientlySinglevectorcoreacceleratesmanyapplicationsResult2-3xbetterArea-DelayproductthanVIPERS/VESPAOutperformsIntelCore2atIntegerMatrixMultiplyIssues/FutureWorkNofloating-pointyetAdding“complexfunction”support,toincludefloating-pointorsimilaroperationsAlgorithmswithonlyshortvectorsSplitvectorprocessorinto2,4,8piecesRunmultipleinstancesofalgorithmMultiplevectorprocessorsConnectingthemtoworkcooperativelyGoals:increasethroughput,exploittask-levelparallelism(ie,chainingorpipelining)
本文档为【VEGAS Soft Vector Processor with Scratchpad Memory拉斯维加斯的软向量处理器暂存存储器】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
机构认证用户
hs154
hx主要从事图文设计、ppt制作,范文写作!
格式:ppt
大小:2MB
软件:PowerPoint
页数:0
分类:其他高等教育
上传时间:2021-10-12
浏览量:0