Tuesday, April 11, 2017
The efficiency of memory continues to lag, according to one startup that believes it has an agnostic answer.
Performance-IP recently introduced its Memory Request Optimizer, a block of IP able to improve memory efficiency and increase the performance of a system on chip (SoC) by reducing latency between the memory subsystem and the SoC client. In a telephone interview with EE Times, Chief Technology Officer Gregg Recupero said the embedded IP manages widely divergent request streams to create a virtual locality of reference that make requests appear more linear.
This improves memory bandwidth, he said, as most memory subsystems operate at less than 80 percent efficiency. This inefficiency slows the pipeline performance of the communications to and from the SoC client to the memory. “At the end of the day you need to keep pipeline busy, and not just the CPU," said Recupero. Other parts of the subsystem need better efficiency as well, such as the graphics processing unit (GPU), codec or video processor.
The company's benchmarks show its Memory Request Optimizer reduces read latency from between 71 percent to 78 percent. Unlike a memory scheduler, the IP is a memory prefetch engine that works with memory schedulers by grouping similar requests together. Recupero said it analyzes multiple concurrent requests streams from clients and determines which requests should be optimized, or prefetched, and which should not. The result is high hit rates with ultra-low false fetch rates.
"As new memory standards evolve, you need even more efficiency out the memory subsystem," Recupero said. “We like to say we recover lost system performance."
When a client request has been optimized, it is stored in a request optimization buffer, a small micro-cache holding optimized client requests, until it is needed by a client. Recupero said a multi-client Interface that supports both AXI and OCP protocols. can manage up to 16 clients, specified by the designer when configuring the technology.
The configuration tool will build automatically the specified number of client interfaces, each functioning independently and able to support concurrent operation. This allows the IP to issue multiple concurrent client requests for any responses issued from the request optimization buffers. Consequently, the IP supplies a higher peak burst bandwidth than is provided by the underlying memory subsystem.
The IP can be implemented anywhere in the memory hierarchy, said Recupero. “It could be sitting right in front of your DDR controller. It reaps the benefits to any client that is trying to get to the memory subsystem." As the requests pass through the Memory Request Optimizer, they're analyzed by the tracker sand the trackers will determine which requests they should prefetch and place into the request optimization buffers, and which ones they should not. “It allows you to dynamically tune power performance profile you'd like to operate at."
By: DocMemory Copyright © 2023 CST, Inc. All Rights Reserved
|