Correcting soft errors online in fast fourier transform

dc.contributor.authorLiang, Xin
dc.contributor.authorChen, Jieyang
dc.contributor.authorTao, Dingwen
dc.contributor.authorLi, Sihuan
dc.contributor.authorWu, Panruo
dc.contributor.authorLi, Hongbo
dc.contributor.authorOuyang, Kaiming
dc.contributor.authorLiu, Yuanlai
dc.contributor.authorSong, Fengguang
dc.contributor.authorChen, Zizhong
dc.contributor.departmentComputer and Information Science, School of Scienceen_US
dc.date.accessioned2018-09-14T17:41:57Z
dc.date.available2018-09-14T17:41:57Z
dc.date.issued2017
dc.description.abstractWhile many algorithm-based fault tolerance (ABFT) schemes have been proposed to detect soft errors offline in the fast Fourier transform (FFT) after computation finishes, none of the existing ABFT schemes detect soft errors online before the computation finishes. This paper presents an online ABFT scheme for FFT so that soft errors can be detected online and the corrupted computation can be terminated in a much more timely manner. We also extend our scheme to tolerate both arithmetic errors and memory errors, develop strategies to reduce its fault tolerance overhead and improve its numerical stability and fault coverage, and finally incorporate it into the widely used FFTW library - one of the today's fastest FFT software implementations. Experimental results demonstrate that: (1) the proposed online ABFT scheme introduces much lower overhead than the existing offline ABFT schemes; (2) it detects errors in a much more timely manner; and (3) it also has higher numerical stability and better fault coverage.en_US
dc.eprint.versionAuthor's manuscripten_US
dc.identifier.citationLiang, X., Chen, J., Tao, D., Li, S., Wu, P., Li, H., … Chen, Z. (2017). Correcting Soft Errors Online in Fast Fourier Transform. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 30:1–30:12). New York, NY, USA: ACM. https://doi.org/10.1145/3126908.3126915en_US
dc.identifier.urihttps://hdl.handle.net/1805/17315
dc.language.isoenen_US
dc.publisherACMen_US
dc.relation.isversionof10.1145/3126908.3126915en_US
dc.relation.journalProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysisen_US
dc.rightsPublisher Policyen_US
dc.sourceAuthoren_US
dc.subjectalgorithm-based fault toleranceen_US
dc.subjectsoft errorsen_US
dc.subjectDFTen_US
dc.titleCorrecting soft errors online in fast fourier transformen_US
dc.typeConference proceedingsen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Liang-2017-Correcting.pdf
Size:
897.03 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: