New Techniques Boost Performance of Non-Volatile Memory Systems
Researchers have developed new software and hardware designs that should limit programming errors and improve system performance in devices that use non-volatile memory (NVM) technologies.
October 17, 2017 Staff
Computer engineering researchers at North Carolina State University have developed new software and hardware designs that should limit programming errors and improve system performance in devices that use non-volatile memory (NVM) technologies.
“Currently, computers rely on dynamic random access memory (DRAM) for their operations,” says James Tuck, an associate professor of electrical and computer engineering at NC State and co-author of two papers on the work. “But DRAM has significant limitations, making it difficult to scale up to deal with next generation systems.
“As a result, next generation computer systems will likely rely on emerging NVM technologies for both operations and data storage. Our work here is focused on addressing some of the programming and performance challenges inherent in shifting from a DRAM computing paradigm to NVM,” says Yan Solihin, a professor of electrical and computer engineering at NC State and co-author of the papers.
One challenge with NVM systems is determining how to log, or save, a chunk of memory before making changes to it. These logs allow users to reset memory if the system fails, corrupting the memory that is being modified.
At present, logging in an NVM system would require programmers to incorporate additional code into their programs – slowing performance – and increasing the number of operations that write over memory. Memory reliability suffers if it is written over too often.
To address this, researchers have developed a system called Proteus, which includes a software model and complementary hardware.
Because NVM computers are at present largely theoretical, the researchers compared the performance of Proteus against other techniques in a detailed simulator.
Other techniques wrote to memory two to six times more than Proteus, meaning Proteus was much better at preserving the long-term reliability of memory.
“Compared to existing techniques, Proteus was able to log memory almost for free, in terms of writing to memory,” Solihin says.
Proteus also performed better than other techniques in terms of run speed, though the advantage there was more modest – a 9 percent to 11 percent improvement over the best existing techniques.
A second challenge with NVM systems has to do with how a system gives data an address so that it can be retrieved. Some programs require those addresses to be changed, for security and other reasons – but this can complicate programming and reduce performance in NVM systems.
To address this problem, researchers developed a hardware-driven technique that effectively creates permanent addresses for data, but allows programs to give pseudonyms to those addresses as needed.
“The programming still needs to account for the hardware, but it allows programmers to use the virtual memory approaches they’re used to,” Tuck says. “In simulations, our approach operated at least 1.5 times faster than previous techniques.”
Papers on both new techniques will be presented at the Annual IEEE/ACM International Symposium on Microarchitecture, being held Oct. 14-18 in Boston, Massachusetts.
Lead author of the first paper, “Proteus: A Flexible and Fast Software Supported Hardware Logging approach for NVM,” is Seunghee Shin, a Ph.D. student at NC State. The paper was co-authored by Satish Kumar Tirukkovalluri, a Ph.D. student at NC State, and Yan Solihin, a professor of electrical and computer engineering at NC State.
Lead author of the second paper, “Hardware Supported Persistent Object Address Translation,” is Tiancong Wang, a Ph.D. student at NC State. The paper was co-authored by Sakthikumaran Sambasivam, a Ph.D. student at NC State, and Yan Solihin, a professor of electrical and computer engineering at NC State.
The work was supported in part by the National Science Foundation.
-shipman-
Note to Editors: The study abstract follows.
“Proteus: A Flexible and Fast Software Supported Hardware Logging approach for NVM”
Authors: Seunghee Shin, Satish Kumar Tirukkovalluri, James Tuck and Yan Solihin, North Carolina State University
Presented: Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 14-18, Boston, Mass.
Abstract: Emerging non-volatile memory (NVM) technologies, such as phasechange memory, spin-transfer torque magnetic memory, memristor, and 3D Xpoint, are encouraging the development of new architectures that support the challenges of persistent programming. An important remaining challenge is dealing with the high logging overheads introduced by durable transactions.
In this paper, we propose a new logging approach, Proteus for durable transactions that achieves the favorable characteristics of both prior software and hardware approaches. Like software, it has no hardware constraint limiting the number of transactions or logs available to it, and like hardware, it has very low overhead. Our approach introduces two new instructions: one that creates a log entry by loading an original data and a log-flush instruction writes the log entry to the log. We add hardware support, primarily within the core, to manage the execution of these instructions and critical ordering requirements between logging operations and updates to data. We also propose a novel optimization at the memory controller that is enabled by a persistent write pending queue in the memory controller. We drop log updates that have not yet written back to NVMM by the time a transaction is considered durable.
We implemented our design on a cycle accurate simulator, MarssX86, and compared it against state-of-the-art hardware logging, ATOM[19], and a software only approach. Our experiments show that Proteus improves performance by 1.44-1.47× depending on configuration, on average, compared to a system without hardware logging and 9-11% faster than ATOM. A significant advantage of our approach is dropping writes to the log when they are not needed. On average, ATOM makes 3.4× more writes to memory than our design.
“Hardware Supported Persistent Object Address Translation”
Authors: Tiancong Wang, Sakthikumaran Sambasivam, Yan Solihin and James Tuck, North Carolina State University
Presented: Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 14-18, Boston, Mass.
Abstract: Emerging non-volatile main memory technologies create a new opportunity for writing programs with a large, byte-addressable persistent storage that can be accessed through regular memory instructions. These new memory-as-storage technologies impose significant challenges to current programming models. In particular, some emerging persistent programming frameworks, like the NVM Library (NVML), implement relocatable persistent objects that can be mapped anywhere in the virtual address space. To make this work, persistent objects are referenced using object identifiers (ObjectID), rather than pointers, that need to be translated to an address before the object can be read or written. Frequent translation from ObjectID to address incurs significant overhead.
We propose treating ObjectIDs as a new persistent memory address space and provide hardware support for efficiently translating ObjectIDs to virtual addresses. With our design, a program can use load and store instructions to directly access persistent data using ObjectIDs, and these new instructions can reduce the programming complexity of this system. We also describe several possible microarchitectural designs and evaluate them.
We evaluate our design on Sniper modeling both in-order and out-of-order processors with 6 micro-benchmarks and the TPCC application. The results show our design can give significant speedup over the baseline system using software translation. We demonstrate for the Pipelined implementation that our design has an average speedup of 1.96× and 1.58× on an in-order and out-of-order processor, respectively, over the baseline system on microbenchmarks that place persistent data randomly into persistent pools. For the same in-order and out-of-order microarchitectures, we measure a speedup of 1.17× and 1.12×, respectively, on the TPC-C application when B+Trees are put in different pools and rewritten to use our new hardware.