Publications | Can Firtina

2025

ICS
"MARS: Processing-In-Memory Acceleration of Raw Signal Genome Analysis Inside the Storage Subsystem",

Melina Soysal, Konstantina Koliogeorgi, Can Firtina, Nika Mansouri Ghiasi, Rakesh Nadig, Haiyu Mao, Geraldo Francisco Oliveira Junior, Yu Liang, Klea Zambaku, Mohammad Sadrosadati, and Onur Mutlu

In 39th International Conference on Supercomputing (ICS), Jun 2025

Abstract BIB Paper (PDF)

Raw signal genome analysis (RSGA) has emerged as a promising approach to enable real-time genome analysis by directly analyzing raw electrical signals. However, rapid advancements in sequencing technologies make it increasingly difficult for software-based RSGA to match the throughput of raw signal generation. This paper demonstrates that while hardware acceleration techniques can significantly accelerate RSGA, the high volume of genomic data shifts the performance and energy bottleneck from computation to IO data movement. As sequencing throughput increases, IO overhead becomes the main contributor to both runtime and energy consumption. Therefore, there is a need to design a high-performance, energy-efficient system for RSGA that can both alleviate the data movement bottleneck and provide large acceleration capabilities. We propose MARS, a storage-centric system that leverages the heterogeneous resources within modern storage systems (e.g., storage-internal DRAM, storage controller, flash chips) alongside their large storage capacity to tackle both data movement and computational overheads of RSGA in an area-efficient and low-cost manner. MARS accelerates RSGA through a novel hardwaresoftware co-design approach. First, MARS modifies the RSGA pipeline via two filtering mechanisms and a quantization scheme, reducing hardware demands and optimizing for in-storage execution. Second, MARS accelerates the RSGA steps directly within the storage by leveraging both Processing-Near-Memory and Processing-Using-Memory paradigms. Third, MARS orchestrates the execution of all steps to fully exploit in-storage parallelism and minimize data movement. Our evaluation shows that MARS outperforms basecalling-based software and hardware-accelerated state-of-the-art read mapping pipelines by 93x and 40x, on average across different datasets, while reducing their energy consumption by 427x and 72x.
@inproceedings{soysal_mars_2025, title = {{MARS}: Processing-In-Memory Acceleration of Raw Signal Genome Analysis Inside the Storage Subsystem}, author = {Soysal, Melina and Koliogeorgi, Konstantina and Firtina, Can and Ghiasi, Nika Mansouri and Nadig, Rakesh and Mao, Haiyu and de Oliveira Junior, Geraldo Francisco and Liang, Yu and Zambaku, Klea and Sadrosadati, Mohammad and Mutlu, Onur}, booktitle = {39th International Conference on Supercomputing (ICS)}, year = {2025}, month = jun, url = {https://doi.org/10.48550/arXiv.2506.10931}, }
arXiv
"SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Analysis",

Nika Mansouri Ghiasi, Talu Güloglu, Harun Mustafa, Can Firtina, Konstantina Koliogeorgi, Konstantinos Kanellopoulos, Haiyu Mao, Rakesh Nadig, Mohammad Sadrosadati, Jisung Park, and Onur Mutlu

arXiv, Mar 2025

Abstract BIB Paper (PDF)

Given the exponentially growing volumes of genomic data, there are extensive efforts to accelerate genome analysis. We demonstrate a major bottleneck that greatly limits and diminishes the benefits of state-of-the-art genome analysis accelerators: the data preparation bottleneck, where genomic data is stored in compressed form and needs to be decompressed and formatted first before an accelerator can operate on it. To mitigate this bottleneck, we propose SAGe, an algorithm-architecture co-design for highly-compressed storage and high-performance access of large-scale genomic data. SAGe overcomes the challenges of mitigating the data preparation bottleneck while maintaining high compression ratios (comparable to genomic-specific compression algorithms) at low hardware cost. This is enabled by leveraging key features of genomic datasets to co-design (i) a new (de)compression algorithm, (ii) hardware, (iii) storage data layout, and (iv) interface commands to access storage. SAGe stores data in structures that can be rapidly interpreted and decompressed by efficient streaming accesses and lightweight hardware. To achieve high compression ratios using only these lightweight structures, SAGe exploits unique features of genomic data. We show that SAGe can be seamlessly integrated with a broad range of genome analysis hardware accelerators to mitigate their data preparation bottlenecks. Our results demonstrate that SAGe improves the average end-to-end performance and energy efficiency of two state-of-the-art genome analysis accelerators by 3.0x-32.1x and 18.8x-49.6x, respectively, compared to when the accelerators rely on state-of-the-art decompression tools.
@article{mansouri_ghiasi_sage_2025, title = {{SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Analysis}}, author = {Ghiasi, Nika Mansouri and Güloglu, Talu and Mustafa, Harun and Firtina, Can and Koliogeorgi, Konstantina and Kanellopoulos, Konstantinos and Mao, Haiyu and Nadig, Rakesh and Sadrosadati, Mohammad and Park, Jisung and Mutlu, Onur}, journal = {arXiv}, year = {2025}, month = mar, doi = {10.48550/arXiv.2504.03732}, url = {https://doi.org/10.48550/arXiv.2504.03732}, }

2024

IEEE Access
"RawAlign: Accurate, Fast, and Scalable Raw Nanopore Signal Mapping via Combining Seeding and Alignment",

Joël Lindegger, Can Firtina, Nika Mansouri Ghiasi, Mohammad Sadrosadati, Mohammed Alser, and Onur Mutlu

IEEE Access, Dec 2024

Abstract BIB Paper (PDF) Code

Nanopore sequencers generate raw electrical signals representing the contents of a biological sequence molecule passing through the nanopore. These signals can be analyzed directly, avoiding base-calling entirely. We observe that while existing proposals for raw signal analysis typically do well in all metrics for small genomes ( e.g. , viral genomes), they all perform poorly for large genomes ( e.g. , the human genome). Our goal is to analyze raw nanopore signals in an accurate, fast, and scalable manner. To this end, we propose RawAlign, the first work to integrate fine-grained signal alignment into the state-of-the-art raw signal mapper. To enable accurate, fast, and scalable mapping with alignment, RawAlign implements three algorithmic improvements and hardware acceleration via a vectorized implementation of fine-grained alignment. Together, these significantly reduce the overhead of typically computationally expensive fine-grained alignment. Our extensive evaluations on different use cases and various datasets show RawAlign provides 1) the most accurate mapping for large genomes and 2) and on-par performance compared to RawHash (between 0.80×-1.08×), while achieving better performance than UNCALLED and Sigmap by on average (geo. mean) 2.83× and 2.06×, respectively. Availability: https://github.com/CMU-SAFARI/RawAlign.
@article{lindegger_rawalign_2024, title = {{RawAlign}: {Accurate, Fast, and Scalable Raw Nanopore Signal Mapping via Combining Seeding and Alignment}}, author = {Lindegger, Joël and Firtina, Can and Ghiasi, Nika Mansouri and Sadrosadati, Mohammad and Alser, Mohammed and Mutlu, Onur}, journal = {IEEE Access}, year = {2024}, month = dec, doi = {10.1109/ACCESS.2024.3520669}, url = {https://doi.org/10.1109/ACCESS.2024.3520669}, }
arXiv
"Rawsamble: Overlapping and Assembling Raw Nanopore Signals using a Hash-based Seeding Mechanism",

Can Firtina, Maximilian Mordig, Harun Mustafa, Sayan Goswami, Nika Mansouri Ghiasi, Stefano Mercogliano, Furkan Eris, Joël Lindegger, Andre Kahles, and Onur Mutlu

arXiv, Oct 2024

Abstract BIB Paper (PDF) Code Slides (PDF) Slides (PPTX)

Raw nanopore signal analysis is a common approach in genomics to provide fast and resource-efficient analysis without translating the signals to bases (i.e., without basecalling). However, existing solutions cannot interpret raw signals directly if a reference genome is unknown due to a lack of accurate mechanisms to handle increased noise in pairwise raw signal comparison. Our goal is to enable the direct analysis of raw signals without a reference genome. To this end, we propose Rawsamble, the first mechanism that can 1) identify regions of similarity between all raw signal pairs, known as all-vs-all overlapping, using a hash-based search mechanism and 2) use these to construct genomes from scratch, called de novo assembly. Our extensive evaluations across multiple genomes of varying sizes show that Rawsamble provides a significant speedup (on average by 16.36x and up to 41.59x) and reduces peak memory usage (on average by 11.73x and up to by 41.99x) compared to a conventional genome assembly pipeline using the state-of-the-art tools for basecalling (Dorado’s fastest mode) and overlapping (minimap2) on a CPU. We find that 36.57% of overlapping pairs generated by Rawsamble are identical to those generated by minimap2. Using the overlaps from Rawsamble, we construct the first de novo assemblies directly from raw signals without basecalling. We show that we can construct contiguous assembly segments (unitigs) up to 2.7 million bases in length (half the genome length of E. coli). We identify previously unexplored directions that can be enabled by finding overlaps and constructing de novo assemblies. We also provide the scripts to fully reproduce our results on our GitHub page at https://github.com/CMU-SAFARI/RawHash.
@article{firtina_rawsamble_2024, title = {{Rawsamble: Overlapping and Assembling Raw Nanopore Signals using a Hash-based Seeding Mechanism}}, author = {Firtina, Can and Mordig, Maximilian and Mustafa, Harun and Goswami, Sayan and Ghiasi, Nika Mansouri and Mercogliano, Stefano and Eris, Furkan and Lindegger, Joël and Kahles, Andre and Mutlu, Onur}, journal = {arXiv}, year = {2024}, month = oct, doi = {10.48550/arXiv.2410.17801}, url = {https://doi.org/10.48550/arXiv.2410.17801}, }
Front. Genet.
"TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering",

Meryem Banu Cavlak, Gagandeep Singh, Mohammed Alser, Can Firtina, Joel Lindegger, Mohammad Sadrosadati, Nika Mansouri Ghiasi, Can Alkan, and Onur Mutlu

Frontiers in Genetics, Sep 2024

Abstract BIB Paper (PDF) Code

Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, i.e., reads. State-of-the-art basecallers employ complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally-inefficient and memory-hungry; bottlenecking the entire genome analysis pipeline. However, for many applications, the majority of reads do no match the reference genome of interest (i.e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation.To overcome this issue, we propose TargetCall, the first fast and widely-applicable pre-basecalling filter to eliminate the wasted computation in basecalling. TargetCall’s key idea is to discard reads that will not match the target reference (i.e., off-target reads) prior to basecalling. TargetCall consists of two main components: (1) LightCall, a lightweight neural network basecaller that produces noisy reads; and (2) Similarity Check, which labels each of these noisy reads as on-target or off-target by matching them to the target reference. TargetCall filters out all off-target reads before basecalling; and the highly-accurate but slow basecalling is performed only on the raw signals whose noisy reads are labeled as on-target.Our thorough experimental evaluations using both real and simulated data show that TargetCall 1) improves the end-to-end basecalling performance of the state-of-the-art basecaller by 3.31x while maintaining high (98.88%) sensitivity in keeping on-target reads, 2) maintains high accuracy in downstream analysis, 3) precisely filters out up to 94.71% of off-target reads, and 4) achieves better performance, sensitivity, and generality compared to prior works. We freely open-source TargetCall to aid future research in pre-basecalling filtering at https://github.com/CMU-SAFARI/TargetCall.
@article{cavlak_targetcall_2024, title = {{TargetCall}: {Eliminating} the {Wasted} {Computation} in {Basecalling} via {Pre}-{Basecalling} {Filtering}}, author = {Cavlak, Meryem Banu and Singh, Gagandeep and Alser, Mohammed and Firtina, Can and Lindegger, Joel and Sadrosadati, Mohammad and Ghiasi, Nika Mansouri and Alkan, Can and Mutlu, Onur}, year = {2024}, month = sep, url = {https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2024.1429306/}, doi = {10.3389/fgene.2024.1429306}, journal = {Frontiers in Genetics}, }
IEEE/ACM TCBB
"AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes",

Jeremie S. Kim, Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Nastaran Hajinazar, Mohammed Alser, Can Alkan, and Onur Mutlu

IEEE/ACM Transactions on Computational Biology and Bioinformatics, Aug 2024

Abstract BIB Paper (PDF) Code Slides (PDF) Slides (PPTX) Video

As genome sequencing tools and techniques improve, researchers are able to incrementally assemble more accurate reference genomes, which enable sensitivity in read mapping and downstream analysis such as variant calling. A more sensitive downstream analysis is critical for a better understanding of the genome donor (e.g., health characteristics). Therefore, read sets from sequenced samples should ideally be mapped to the latest available reference genome that represents the most relevant population. Unfortunately, the increasingly large amount of available genomic data makes it prohibitively expensive to fully re-map each read set to its respective reference genome every time the reference is updated. There are several tools that attempt to accelerate the process of updating a read data set from one reference to another (i.e., remapping) by 1) identifying regions that appear similarly between two references and 2) updating the mapping location of reads that map to any of the identified regions in the old reference to the corresponding similar region in the new reference. The main drawback of existing approaches is that if a read maps to a region in the old reference that does not appear with a reasonable degree of similarity in the new reference, the read cannot be remapped. We find that, as a result of this drawback, a significant portion of annotations (i.e., coding regions in a genome) are lost when using state-of-the-art remapping tools. To address this major limitation in existing tools, we propose AirLift, a fast and comprehensive technique for remapping alignments from one genome to another. Compared to the state-of-the-art method for remapping reads (i.e., full mapping), AirLift reduces 1) the number of reads (out of the entire read set) that need to be fully mapped to the new reference by up to 99.99% and 2) the overall execution time to remap read sets between two reference genome versions by 6.7×, 6.6×, and 2.8× for large (human), medium (C. elegans), and small (yeast) reference genomes, respectively. We validate our remapping results with GATK and find that AirLift provides similar accuracy in identifying ground truth SNP and INDEL variants as the baseline of fully mapping a read set.Code Availability AirLift source code and readme describing how to reproduce our results are available at https://github.com/CMU-SAFARI/AirLift.
@article{kim_airlift_2024, title = {{AirLift}: {A} {Fast} and {Comprehensive} {Technique} for {Remapping} {Alignments} between {Reference} {Genomes}}, doi = {10.1109/TCBB.2024.3433378}, url = {https://doi.org/10.1109/TCBB.2024.3433378}, journal = {IEEE/ACM Transactions on Computational Biology and Bioinformatics}, author = {Kim, Jeremie S. and Firtina, Can and Cavlak, Meryem Banu and Cali, Damla Senol and Hajinazar, Nastaran and Alser, Mohammed and Alkan, Can and Mutlu, Onur}, month = aug, pages = {1-9}, year = {2024}, }
Bioinformatics
"RawHash2: Mapping Raw Nanopore Signals Using Hash-Based Seeding and Adaptive Quantization",

Can Firtina, Melina Soysal, Joël Lindegger, and Onur Mutlu

Bioinformatics, Jul 2024

Abstract BIB Paper (PDF) Code

Raw nanopore signals can be analyzed while they are being generated, a process known as real-time analysis. Real-time analysis of raw signals is essential to utilize the unique features that nanopore sequencing provides, enabling the early stopping of the sequencing of a read or the entire sequencing run based on the analysis. The state-of-the-art mechanism, RawHash, offers the first hash-based efficient and accurate similarity identification between raw signals and a reference genome by quickly matching their hash values. In this work, we introduce RawHash2, which provides major improvements over RawHash, including more sensitive quantization and chaining algorithms, weighted mapping decisions, frequency filters to reduce ambiguous seed hits, minimizers for hash-based sketching, and support for the R10.4 flow cell version and POD5 and SLOW5 file formats. Compared to RawHash, RawHash2 provides better F1 accuracy (on average by 10.57% and up to 20.25%) and better throughput (on average by 4.0× and up to 9.9×) than RawHash.RawHash2 is available at https://github.com/CMU-SAFARI/RawHash. We also provide the scripts to fully reproduce our results on our GitHub page.
@article{firtina_rawhash2_2024, title = {{RawHash2}: {Mapping} {Raw} {Nanopore} {Signals} {Using} {Hash}-{Based} {Seeding} and {Adaptive} {Quantization}}, author = {Firtina, Can and Soysal, Melina and Lindegger, Joël and Mutlu, Onur}, journal = {Bioinformatics}, year = {2024}, month = jul, doi = {10.1093/bioinformatics/btae478}, url = {https://doi.org/10.1093/bioinformatics/btae478}, pages = {btae478}, issn = {1367-4811} }

ISCA

"MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing",

Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, and Onur Mutlu

In 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), Jul 2024

BIB Paper (PDF) Slides (PDF) Slides (PPTX)

@inproceedings{mansouri_ghiasi_megis_2024,
  author = {Ghiasi, Nika Mansouri and Sadrosadati, Mohammad and Mustafa, Harun and Gollwitzer, Arvid and Firtina, Can and Eudine, Julien and Mao, Haiyu and Lindegger, Joël and Cavlak, Meryem Banu and Alser, Mohammed and Park, Jisung and Mutlu, Onur},
  booktitle = {2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)},
  title = {MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing},
  year = {2024},
  volume = {},
  number = {},
  pages = {660-677},
  doi = {10.1109/ISCA59077.2024.00054},
  url = {https://ieeexplore.ieee.org/document/10609570},
}

Nature Prot.
"Packaging and containerization of computational methods",

Mohammed Alser, Brendan Lawlor, Richard J. Abdill, Sharon Waymost, Ram Ayyala, Neha Rajkumar, Nathan LaPierre, Jaqueline Brito, André M. Ribeiro-dos-Santos, Nour Almadhoun, Varuni Sarwal, Can Firtina, Tomasz Osinski, Eleazar Eskin, Qiyang Hu, Derek Strong, Byoung-Do (B.D) Kim, Malak S. Abedalthagafi, Onur Mutlu, and Serghei Mangul

Nature Protocols, Apr 2024

Abstract BIB Paper (PDF)

Methods for analyzing the full complement of a biomolecule type, e.g., proteomics or metabolomics, generate large amounts of complex data. The software tools used to analyze omics data have reshaped the landscape of modern biology and become an essential component of biomedical research. These tools are themselves quite complex and often require the installation of other supporting software, libraries and/or databases. A researcher may also be using multiple different tools that require different versions of the same supporting materials. The increasing dependence of biomedical scientists on these powerful tools creates a need for easier installation and greater usability. Packaging and containerization are different approaches to satisfy this need by delivering omics tools already wrapped in additional software that makes the tools easier to install and use. In this systematic review, we describe and compare the features of prominent packaging and containerization platforms. We outline the challenges, advantages and limitations of each approach and some of the most widely used platforms from the perspectives of users, software developers and system administrators. We also propose principles to make the distribution of omics software more sustainable and robust to increase the reproducibility of biomedical and life science research.
@article{alser_packaging_2024, title = {Packaging and containerization of computational methods}, issn = {1750-2799}, url = {https://doi.org/10.1038/s41596-024-00986-0}, doi = {10.1038/s41596-024-00986-0}, journal = {Nature Protocols}, author = {Alser, Mohammed and Lawlor, Brendan and Abdill, Richard J. and Waymost, Sharon and Ayyala, Ram and Rajkumar, Neha and LaPierre, Nathan and Brito, Jaqueline and Ribeiro-dos-Santos, André M. and Almadhoun, Nour and Sarwal, Varuni and Firtina, Can and Osinski, Tomasz and Eskin, Eleazar and Hu, Qiyang and Strong, Derek and Kim, Byoung-Do (B.D) and Abedalthagafi, Malak S. and Mutlu, Onur and Mangul, Serghei}, month = apr, year = {2024}, }
Genome Biology
"RUBICON: a framework for designing efficient deep learning-based genomic basecallers",

Gagandeep Singh, Mohammed Alser, Kristof Denolf, Can Firtina, Alireza Khodamoradi, Meryem Banu Cavlak, Henk Corporaal, and Onur Mutlu

Genome Biology, Feb 2024

Abstract BIB Paper (PDF)

Nanopore sequencing generates noisy electrical signals that need to be converted into a standard string of DNA nucleotide bases using a computational step called basecalling. The performance of basecalling has critical implications for all later steps in genome analysis. Therefore, there is a need to reduce the computation and memory cost of basecalling while maintaining accuracy. We present RUBICON, a framework to develop efficient hardware-optimized basecallers. We demonstrate the effectiveness of RUBICON by developing RUBICALL, the first hardware-optimized mixed-precision basecaller that performs efficient basecalling, outperforming the state-of-the-art basecallers. We believe RUBICON offers a promising path to develop future hardware-optimized basecallers.
@article{singh_rubicon_2024, title = {{RUBICON}: a framework for designing efficient deep learning-based genomic basecallers}, volume = {25}, issn = {1474-760X}, url = {https://doi.org/10.1186/s13059-024-03181-2}, doi = {10.1186/s13059-024-03181-2}, number = {1}, journal = {Genome Biology}, author = {Singh, Gagandeep and Alser, Mohammed and Denolf, Kristof and Firtina, Can and Khodamoradi, Alireza and Cavlak, Meryem Banu and Corporaal, Henk and Mutlu, Onur}, month = feb, year = {2024}, pages = {49}, }
ACM TACO
"ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis",

Can Firtina, Kamlesh Pillai, Gurpreet S. Kalsi, Bharathwaj Suresh, Damla Senol Cali, Jeremie S. Kim, Taha Shahroodi, Meryem Banu Cavlak, Joël Lindegger, Mohammed Alser, Juan Gómez Luna, Sreenivas Subramoney, and Onur Mutlu

ACM Trans. Archit. Code Optim., Feb 2024

Abstract BIB Paper (PDF) Code

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures, where states and edges capture modifications (i.e., insertions, deletions, and substitutions) by assigning probabilities to them. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. Accurate computation of these probabilities is essential for the correct identification of sequence similarities. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. When we analyze state-of-the-art works, we identify an urgent need for a flexible, high-performance, and energy-efficient hardware-software co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs.We introduce ApHMM, the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM employs hardware-software co-design to tackle the major inefficiencies in the Baum-Welch algorithm by (1) designing flexible hardware to accommodate various pHMM designs, (2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, (3) rapidly filtering out unnecessary computations using a hardware-based filter, and (4) minimizing redundant computations.ApHMM achieves substantial speedups of 15.55\texttimes–260.03\texttimes, 1.83\texttimes–5.34\texttimes, and 27.97\texttimes when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: (1) error correction, (2) protein family search, and (3) multiple sequence alignment, by 1.29\texttimes–59.94\texttimes, 1.03\texttimes–1.75\texttimes, and 1.03\texttimes–1.95\texttimes, respectively, while improving their energy efficiency by 64.24\texttimes–115.46\texttimes, 1.75\texttimes, and 1.96\texttimes.
@article{firtina_aphmm_2024, author = {Firtina, Can and Pillai, Kamlesh and Kalsi, Gurpreet S. and Suresh, Bharathwaj and Cali, Damla Senol and Kim, Jeremie S. and Shahroodi, Taha and Cavlak, Meryem Banu and Lindegger, Jo\"{e}l and Alser, Mohammed and Luna, Juan G\'{o}mez and Subramoney, Sreenivas and Mutlu, Onur}, title = {ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis}, year = {2024}, issue_date = {March 2024}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {21}, number = {1}, issn = {1544-3566}, doi = {10.1145/3632950}, journal = {ACM Trans. Archit. Code Optim.}, month = feb, articleno = {19}, numpages = {29}, url = {https://dl.acm.org/doi/10.1145/3632950}, }

2023

arXiv
"MetaFast: Enabling Fast Metagenomic Classification via Seed Counting and Edit Distance Approximation",

Arvid E. Gollwitzer, Mohammed Alser, Joel Bergtholdt, Joel Lindegger, Maximilian-David Rumpf, Can Firtina, Serghei Mangul, and Onur Mutlu

arXiv, Nov 2023

Abstract BIB Paper (PDF)

Metagenomics, the study of genome sequences of diverse organisms cohabiting in a shared environment, has experienced significant advancements across various medical and biological fields. Metagenomic analysis is crucial, for instance, in clinical applications such as infectious disease screening and the diagnosis and early detection of diseases such as cancer. A key task in metagenomics is to determine the species present in a sample and their relative abundances. Currently, the field is dominated by either alignment-based tools, which offer high accuracy but are computationally expensive, or alignment-free tools, which are fast but lack the needed accuracy for many applications. In response to this dichotomy, we introduce MetaFast, a tool based on heuristics, to achieve a fundamental improvement in accuracy-runtime tradeoff over existing methods. MetaFast delivers accuracy comparable to the alignment-based and highly accurate tool Metalign but with significantly enhanced efficiency. In MetaFast, we accelerate memory-frugal reference database indexing and filtering. We further employ heuristics to accelerate read mapping. Our evaluation demonstrates that MetaFast achieves a 4x speedup over Metalign without compromising accuracy. MetaFast is publicly available at https://github.com/CMU-SAFARI/MetaFast
@article{gollwitzer_metafast_2023, title = {MetaFast: Enabling Fast Metagenomic Classification via Seed Counting and Edit Distance Approximation}, journal = {arXiv}, author = {Gollwitzer, Arvid E. and Alser, Mohammed and Bergtholdt, Joel and Lindegger, Joel and Rumpf, Maximilian-David and Firtina, Can and Mangul, Serghei and Mutlu, Onur}, year = {2023}, month = nov, doi = {10.48550/ARXIV.2311.02029}, }
arXiv
"SequenceLab: A Comprehensive Benchmark of Computational Methods for Comparing Genomic Sequences",

Maximilian-David Rumpf, Mohammed Alser, Arvid E. Gollwitzer, Joel Lindegger, Nour Almadhoun, Can Firtina, Serghei Mangul, and Onur Mutlu

arXiv, Oct 2023

Abstract BIB Paper (PDF)

Computational complexity is a key limitation of genomic analyses. Thus, over the last 30 years, researchers have proposed numerous fast heuristic methods that provide computational relief. Comparing genomic sequences is one of the most fundamental computational steps in most genomic analyses. Due to its high computational complexity, new, more optimized exact and heuristic algorithms are still being developed. We find that these methods are highly sensitive to the underlying data, its quality, and various hyperparameters. Despite their wide use, no in-depth analysis has been performed, potentially falsely discarding genetic sequences from further analysis and unnecessarily inflating computational costs. We provide the first analysis and benchmark of this heterogeneity. We deliver an actionable overview of the 11 most widely used state-of-the-art methods for comparing genomic sequences. We also inform readers about their pros and cons using thorough experimental evaluation and different real datasets from all major manufacturers (i.e., Illumina, ONT, and PacBio). SequenceLab is publicly available at https://github.com/CMU-SAFARI/SequenceLab
@article{rumpf_sequencelab_2023, title = {SequenceLab: A Comprehensive Benchmark of Computational Methods for Comparing Genomic Sequences}, journal = {arXiv}, author = {Rumpf, Maximilian-David and Alser, Mohammed and Gollwitzer, Arvid E. and Lindegger, Joel and Almadhoun, Nour and Firtina, Can and Mangul, Serghei and Mutlu, Onur}, year = {2023}, month = oct, doi = {10.48550/ARXIV.2310.16908}, }
MICRO
"Swordfish: A Framework for Evaluating Deep Neural Network-Based Basecalling Using Computation-In-Memory with Non-Ideal Memristors",

Taha Shahroodi, Gagandeep Singh, Mahdi Zahedi, Haiyu Mao, Joel Lindegger, Can Firtina, Stephan Wong, Onur Mutlu, and Said Hamdioui

In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct 2023

Abstract BIB

Basecalling, an essential step in many genome analysis studies, relies on large Deep Neural Network s (DNN s) to achieve high accuracy. Unfortunately, these DNN s are computationally slow and inefficient, leading to considerable delays and resource constraints in the sequence analysis process. A Computation-In-Memory (CIM) architecture using memristors can significantly accelerate the performance of DNN s. However, inherent device non-idealities and architectural limitations of such designs can greatly degrade the basecalling accuracy, which is critical for accurate genome analysis. To facilitate the adoption of memristor-based CIM designs for basecalling, it is important to (1) conduct a comprehensive analysis of potential CIM architectures and (2) develop effective strategies for mitigating the possible adverse effects of inherent device non-idealities and architectural limitations. This paper proposes Swordfish, a novel hardware/software co-design framework that can effectively address the two aforementioned issues. Swordfish incorporates seven circuit and device restrictions or non-idealities from characterized real memristor-based chips. Swordfish leverages various hardware/software co-design solutions to mitigate the basecalling accuracy loss due to such non-idealities. To demonstrate the effectiveness of Swordfish, we take Bonito, the state-of-the-art (i.e., accurate and fast), open-source basecaller as a case study. Our experimental results using Swordfish show that a CIM architecture can realistically accelerate Bonito for a wide range of real datasets by an average of 25.7 \texttimes, with an accuracy loss of 6.01%.
@inproceedings{shahroodi_swordfish_2023, title = {Swordfish: A Framework for Evaluating Deep Neural Network-Based Basecalling Using Computation-In-Memory with Non-Ideal Memristors}, booktitle = {Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)}, author = {Shahroodi, Taha and Singh, Gagandeep and Zahedi, Mahdi and Mao, Haiyu and Lindegger, Joel and Firtina, Can and Wong, Stephan and Mutlu, Onur and Hamdioui, Said}, year = {2023}, month = oct, pages = {1437–1452}, isbn = {9798400703294}, url = {https://doi.org/10.1145/3613424.3614252}, doi = {10.1145/3613424.3614252}, }
MICRO
"Utopia: Fast and Efficient Address Translation via Hybrid Restrictive & Flexible Virtual-to-Physical Address Mappings",

Konstantinos Kanellopoulos, Rahul Bera, Kosta Stojiljkovic, F. Nisa Bostanci, Can Firtina, Rachata Ausavarungnirun, Rakesh Kumar, Nastaran Hajinazar, Mohammad Sadrosadati, Nandita Vijaykumar, and Onur Mutlu

In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct 2023

Abstract BIB Paper (PDF)

Conventional virtual memory (VM) frameworks enable a virtual address to flexibly map to any physical address. This flexibility necessitates large data structures to store virtual-to-physical mappings, which leads to high address translation latency and large translation-induced interference in the memory hierarchy, especially in data-intensive workloads. On the other hand, restricting the address mapping so that a virtual address can only map to a specific set of physical addresses can significantly reduce address translation overheads by making use of compact and efficient translation structures. However, restricting the address mapping flexibility across the entire main memory severely limits data sharing across different processes and increases data accesses to the swap space of the storage device even in the presence of free memory. We propose Utopia, a new hybrid virtual-to-physical address mapping scheme that allows both flexible and restrictive hash-based address mapping schemes to harmoniously co-exist in the system. The key idea of Utopia is to manage physical memory using two types of physical memory segments: restrictive segments and flexible segments. A restrictive segment uses a restrictive, hash-based address mapping scheme that maps virtual addresses to only a specific set of physical addresses and enables faster address translation using compact translation structures. A flexible segment employs the conventional fully-flexible address mapping scheme. By mapping data to a restrictive segment, Utopia enables faster address translation with lower translation-induced interference. At the same time, Utopia retains the ability to use the flexible address mapping to (i) support conventional VM features such as data sharing and (ii) avoid storing data in the swap space of the storage device when program data does not fit inside a restrictive segment. Our evaluation using 11 diverse data-intensive workloads shows that Utopia improves performance by 24% in a single-core system over the baseline conventional four-level radix-tree page table design, whereas the best prior state-of-the-art contiguity-aware translation scheme improves performance by 13%. Utopia provides 95% of the performance benefits of an ideal address translation scheme where every translation request hits in the first-level TLB. All of Utopia’s benefits come at a modest cost of 0.64% area overhead and 0.72% power overhead compared to a modern high-end CPU. The source code of Utopia is freely available at https://github.com/CMU-SAFARI/Utopia.
@inproceedings{kanellopoulos_utopia_2023, title = {Utopia: Fast and Efficient Address Translation via Hybrid Restrictive \& Flexible Virtual-to-Physical Address Mappings}, booktitle = {Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)}, author = {Kanellopoulos, Konstantinos and Bera, Rahul and Stojiljkovic, Kosta and Bostanci, F. Nisa and Firtina, Can and Ausavarungnirun, Rachata and Kumar, Rakesh and Hajinazar, Nastaran and Sadrosadati, Mohammad and Vijaykumar, Nandita and Mutlu, Onur}, year = {2023}, month = oct, pages = {1196–1212}, isbn = {9798400703294}, doi = {10.1145/3613424.3623789}, }
ISMB/ECCB
"RawHash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes",

Can Firtina, Nika Mansouri Ghiasi, Joel Lindegger, Gagandeep Singh, Meryem Banu Cavlak, Haiyu Mao, and Onur Mutlu

Bioinformatics, Jun 2023

In Proceedings of the 31st Annual Conference on Intelligent Systems for Molecular Biology (ISMB) and the 22nd European Conference on Computational Biology (ECCB)

Abstract BIB Paper (PDF) Code Slides (PDF) Slides (PPTX) Video

Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. An important feature of nanopore sequencing, Read Until, can eject strands from sequencers without fully sequencing them, which provides opportunities to computationally reduce the sequencing time and cost. However, existing works utilizing Read Until either 1) require powerful computational resources that may not be available for portable sequencers or 2) lack scalability for large genomes, rendering them inaccurate or ineffective. We propose RawHash, the first mechanism that can accurately and efficiently perform real-time analysis of nanopore raw signals for large genomes using a hash-based similarity search. To enable this, RawHash ensures the signals corresponding to the same DNA content lead to the same hash value, regardless of the slight variations in these signals. RawHash achieves an accurate hash-based similarity search via an effective quantization of the raw signals such that signals corresponding to the same DNA content have the same quantized value and, subsequently, the same hash value. We evaluate RawHash on three applications: 1) read mapping, 2) relative abundance estimation, and 3) contamination analysis. Our evaluations show that RawHash is the only tool that can provide high accuracy and high throughput for analyzing large genomes in real-time. When compared to the state-of-the-art techniques, UNCALLED and Sigmap, RawHash provides 1) 25.8x and 3.4x better average throughput and 2) an average speedup of 32.1x and 2.1x in the mapping time, respectively. Source code is available at https://github.com/CMU-SAFARI/RawHash.
@article{firtina_rawhash_2023, note = {In Proceedings of the 31st Annual Conference on Intelligent Systems for Molecular Biology (ISMB) and the 22nd European Conference on Computational Biology (ECCB)}, title = {{RawHash}: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes}, author = {Firtina, Can and Mansouri Ghiasi, Nika and Lindegger, Joel and Singh, Gagandeep and Cavlak, Meryem Banu and Mao, Haiyu and Mutlu, Onur}, journal = {Bioinformatics}, volume = {39}, number = {Supplement_1}, pages = {i297-i307}, year = {2023}, month = jun, doi = {10.1093/bioinformatics/btad272}, issn = {1367-4811}, url = {https://doi.org/10.1093/bioinformatics/btad272}, }
DAC
"Accelerating Genome Analysis via Algorithm-Architecture Co-Design",

Onur Mutlu, and Can Firtina

In 2023 60th ACM/IEEE Design Automation Conference (DAC), Jul 2023

Abstract BIB Paper (PDF)

High-throughput sequencing (HTS) technologies have revolutionized the field of genomics, enabling rapid and cost-effective genome analysis for various applications. However, the increasing volume of genomic data generated by HTS technologies presents significant challenges for computational techniques to effectively analyze genomes. To address these challenges, several algorithm-architecture co-design works have been proposed, targeting different steps of the genome analysis pipeline. These works explore emerging technologies to provide fast, accurate, and low-power genome analysis. This paper provides a brief review of the recent advancements in accelerating genome analysis, covering the opportunities and challenges associated with the acceleration of the key steps of the genome analysis pipeline. Our analysis highlights the importance of integrating multiple steps of genome analysis using suitable architectures to unlock significant performance improvements and reduce data movement and energy consumption. We conclude by emphasizing the need for novel strategies and techniques to address the growing demands of genomic data generation and analysis.
@inproceedings{mutlu_accelerating_2023, booktitle = {2023 60th ACM/IEEE Design Automation Conference (DAC)}, title = {Accelerating Genome Analysis via Algorithm-Architecture Co-Design}, author = {Mutlu, Onur and Firtina, Can}, year = {2023}, month = jul, doi = {10.1109/DAC56929.2023.10247887}, url = {https://ieeexplore.ieee.org/document/10247887}, }
NARGAB
"BLEND: a fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis",

Can Firtina, Jisung Park, Mohammed Alser, Jeremie S Kim, Damla Senol Cali, Taha Shahroodi, Nika Mansouri Ghiasi, Gagandeep Singh, Konstantinos Kanellopoulos, Can Alkan, and Onur Mutlu

NAR Genomics and Bioinformatics, Mar 2023

Abstract BIB Paper (PDF) Code Slides (PDF) Slides (PPTX) Video

Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup of their hash values. However, these hash values can be used only for finding exact-matching seeds as the conventional hashing methods assign distinct hash values for different seeds, including highly similar seeds. Finding only exact-matching seeds causes either (i) increasing the use of the costly sequence alignment or (ii) limited sensitivity. We introduce BLEND, the first efficient and accurate mechanism that can identify both exact-matching and highly similar seeds with a single lookup of their hash values, called fuzzy seed matches. BLEND (i) utilizes a technique called SimHash, that can generate the same hash value for similar sets, and (ii) provides the proper mechanisms for using seeds as sets with the SimHash technique to find fuzzy seed matches efficiently. We show the benefits of BLEND when used in read overlapping and read mapping. For read overlapping, BLEND is faster by 2.4×–83.9× (on average 19.3×), has a lower memory footprint by 0.9×–14.1× (on average 3.8×), and finds higher quality overlaps leading to accurate de novo assemblies than the state-of-the-art tool, minimap2. For read mapping, BLEND is faster by 0.8×–4.1× (on average 1.7×) than minimap2. Source code is available at https://github.com/CMU-SAFARI/BLEND.
@article{firtina_blend_2023, title = {{BLEND}: a fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis}, volume = {5}, url = {https://academic.oup.com/nargab/article/5/1/lqad004/6993940}, doi = {10.1093/nargab/lqad004}, number = {1}, journal = {NAR Genomics and Bioinformatics}, author = {Firtina, Can and Park, Jisung and Alser, Mohammed and Kim, Jeremie S and Cali, Damla Senol and Shahroodi, Taha and Ghiasi, Nika Mansouri and Singh, Gagandeep and Kanellopoulos, Konstantinos and Alkan, Can and Mutlu, Onur}, month = mar, year = {2023}, pages = {lqad004}, }

2022

MICRO
"GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping",

Haiyu Mao, Mohammed Alser, Mohammad Sadrosadati, Can Firtina, Akanksha Baranwal, Damla Senol Cali, Aditya Manglik, Nour Almadhoun Alserr, and Onur Mutlu

In Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct 2022

Abstract BIB Paper (PDF) Slides (PDF) Slides (PPTX)

Nanopore sequencing is a widely-used high-throughput genome sequencing technology that can sequence long fragments of a genome into raw electrical signals at low cost. Nanopore sequencing requires two computationally-costly processing steps for accurate downstream genome analysis. The first step, basecalling, translates the raw electrical signals into nucleotide bases (i.e., A, C, G, T). The second step, read mapping, finds the correct location of a read in a reference genome. In existing genome analysis pipelines, basecalling and read mapping are executed separately. We observe in this work that such separate execution of the two most time-consuming steps inherently leads to (1) significant data movement and (2) redundant computations on the data, slowing down the genome analysis pipeline. This paper proposes GenPIP, an in-memory genome analysis accelerator that tightly integrates basecalling and read mapping. GenPIP improves the performance of the genome analysis pipeline with two key mechanisms: (1) in-memory fine-grained collaborative execution of the major genome analysis steps in parallel; (2) a new technique for early-rejection of low-quality and unmapped reads to timely stop the execution of genome analysis for such reads, reducing inefficient computation. Our experiments show that, for the execution of the genome analysis pipeline, GenPIP provides 41.6X (8.4X) speedup and 32.8X (20.8X) energy savings with negligible accuracy loss compared to the state-of-the-art software genome analysis tools executed on a state-of-the-art CPU (GPU). Compared to a design that combines state-of-the-art in-memory basecalling and read mapping accelerators, GenPIP provides 1.39X speedup and 1.37X energy savings.
@inproceedings{mao_genpip_2022, booktitle = {Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)}, title = {GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping}, author = {Mao, Haiyu and Alser, Mohammed and Sadrosadati, Mohammad and Firtina, Can and Baranwal, Akanksha and Cali, Damla Senol and Manglik, Aditya and Alserr, Nour Almadhoun and Mutlu, Onur}, year = {2022}, month = oct, url = {https://ieeexplore.ieee.org/document/9923847/}, pages = {710-726}, doi = {10.1109/MICRO56248.2022.00056}, }
Bioinformatics
"FastRemap: a tool for quickly remapping reads between genome assemblies",

Jeremie S Kim, Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Can Alkan, and Onur Mutlu

Bioinformatics, Sep 2022

Abstract BIB Paper (PDF) Code

A genome read dataset can be quickly and efficiently remapped from one reference to another similar reference (e.g., between two reference versions or two similar species) using a variety of tools, e.g., the commonly used CrossMap tool. With the explosion of available genomic datasets and references, high-performance remapping tools will be even more important for keeping up with the computational demands of genome assembly and analysis.We provide FastRemap, a fast and efficient tool for remapping reads between genome assemblies. FastRemap provides up to a 7.82× speedup (6.47×, on average) and uses as low as 61.7% (80.7%, on average) of the peak memory consumption compared to the state-of-the-art remapping tool, CrossMap.FastRemap is written in C++. Source code and user manual are freely available at: github.com/CMU-SAFARI/FastRemap. Docker image available at: https://hub.docker.com/r/alkanlab/fastremap. Also available in Bioconda at: https://anaconda.org/bioconda/fastremap-bio.
@article{kim_fastremap_2022, title = {{FastRemap}: a tool for quickly remapping reads between genome assemblies}, volume = {38}, issn = {1367-4803}, number = {19}, url = {https://doi.org/10.1093/bioinformatics/btac554}, doi = {10.1093/bioinformatics/btac554}, journal = {Bioinformatics}, author = {Kim, Jeremie S and Firtina, Can and Cavlak, Meryem Banu and Senol Cali, Damla and Alkan, Can and Mutlu, Onur}, year = {2022}, pages = {4633--4635}, month = sep, }
CSBJ
"From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures",

Mohammed Alser, Joel Lindegger, Can Firtina, Nour Almadhoun, Haiyu Mao, Gagandeep Singh, Juan Gomez-Luna, and Onur Mutlu

Computational and Structural Biotechnology Journal, Jan 2022

Abstract BIB Paper (PDF)

We now need more than ever to make genome analysis more intelligent. We need to read, analyze, and interpret our genomes not only quickly, but also accurately and efficiently enough to scale the analysis to population level. There currently exist major computational bottlenecks and inefficiencies throughout the entire genome analysis pipeline, because state-of-the-art genome sequencing technologies are still not able to read a genome in its entirety. We describe the ongoing journey in significantly improving the performance, accuracy, and efficiency of genome analysis using intelligent algorithms and hardware architectures. We explain state-of-the-art algorithmic methods and hardware-based acceleration approaches for each step of the genome analysis pipeline and provide experimental evaluations. Algorithmic approaches exploit the structure of the genome as well as the structure of the underlying hardware. Hardware-based acceleration approaches exploit specialized microarchitectures or various execution paradigms (e.g., processing inside or near memory) along with algorithmic changes, leading to new hardware/software co-designed systems. We conclude with a foreshadowing of future challenges, benefits, and research directions triggered by the development of both very low cost yet highly error prone new sequencing technologies and specialized hardware chips for genomics. We hope that these efforts and the challenges we discuss provide a foundation for future work in making genome analysis more intelligent.
@article{alser_molecules_2022, title = {From molecules to genomic variations: {Accelerating} genome analysis via intelligent algorithms and architectures}, issn = {2001-0370}, volume = {20}, pages = {4579--4599}, url = {https://doi.org/10.1016/j.csbj.2022.08.019}, doi = {10.1016/j.csbj.2022.08.019}, journal = {Computational and Structural Biotechnology Journal}, author = {Alser, Mohammed and Lindegger, Joel and Firtina, Can and Almadhoun, Nour and Mao, Haiyu and Singh, Gagandeep and Gomez-Luna, Juan and Mutlu, Onur}, month = jan, year = {2022}, }

IEEE Access

"Demeter: A Fast and Energy-Efficient Food Profiler Using Hyperdimensional Computing in Memory",

Taha Shahroodi, Mahdi Zahedi, Can Firtina, Mohammed Alser, Stephan Wong, Onur Mutlu, and Said Hamdioui

IEEE Access, Aug 2022

BIB Paper (PDF)

@article{shahroodi_demeter_2022,
  title = {Demeter: {A} {Fast} and {Energy}-{Efficient} {Food} {Profiler} {Using} {Hyperdimensional} {Computing} in {Memory}},
  volume = {10},
  doi = {10.48550/ARXIV.2206.01932},
  url = {https://doi.org/10.1109/ACCESS.2022.3195878},
  journal = {IEEE Access},
  author = {Shahroodi, Taha and Zahedi, Mahdi and Firtina, Can and Alser, Mohammed and Wong, Stephan and Mutlu, Onur and Hamdioui, Said},
  year = {2022},
  pages = {82493--82510},
  month = aug,
}

ISCA
"SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Mapping",

Damla Senol Cali, Konstantinos Kanellopoulos, Joël Lindegger, Zülal Bingöl, Gurpreet S. Kalsi, Ziyi Zuo, Can Firtina, Meryem Banu Cavlak, Jeremie Kim, Nika Mansouri Ghiasi, Gagandeep Singh, Juan Gómez-Luna, Nour Almadhoun Alserr, Mohammed Alser, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu

In Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA), Jun 2022

Abstract BIB Paper (PDF) Slides (PDF) Slides (PPTX)

A critical step of genome sequence analysis is the mapping of sequenced DNA fragments (i.e., reads) collected from an individual to a known linear reference genome sequence (i.e., sequence-to-sequence mapping). Recent works replace the linear reference sequence with a graph-based representation of the reference genome, which captures the genetic variations and diversity across many individuals in a population. Mapping reads to the graph-based reference genome (i.e., sequence-to-graph mapping) results in notable quality improvements in genome analysis. Unfortunately, while sequence-to-sequence mapping is well studied with many available tools and accelerators, sequence-to-graph mapping is a more difficult computational problem, with a much smaller number of practical software tools currently available.We analyze two state-of-the-art sequence-to-graph mapping tools and reveal four key issues. We find that there is a pressing need to have a specialized, high-performance, scalable, and low-cost algorithm/hardware co-design that alleviates bottlenecks in both the seeding and alignment steps of sequence-to-graph mapping. Since sequence-to-sequence mapping can be treated as a special case of sequence-to-graph mapping, we aim to design an accelerator that is efficient for both linear and graph-based read mapping.To this end, we propose SeGraM, a universal algorithm/hardware co-designed genomic mapping accelerator that can effectively and efficiently support both sequence-to-graph mapping and sequence-to-sequence mapping, for both short and long reads. To our knowledge, SeGraM is the first algorithm/hardware co-design for accelerating sequence-to-graph mapping. SeGraM consists of two main components: (1) MinSeed, the first minimizer-based seeding accelerator, which finds the candidate locations in a given genome graph; and (2) BitAlign, the first bitvector-based sequence-to-graph alignment accelerator, which performs alignment between a given read and the subgraph identified by MinSeed. We couple SeGraM with high-bandwidth memory to exploit low latency and highly-parallel memory access, which alleviates the memory bottleneck.We demonstrate that SeGraM provides significant improvements for multiple steps of the sequence-to-graph (i.e., S2G) and sequence-to-sequence (i.e., S2S) mapping pipelines. First, SeGraM outperforms state-of-the-art S2G mapping tools by 5.9×/3.9× and 106×/- 742× for long and short reads, respectively, while reducing power consumption by 4.1×/4.4× and 3.0×/3.2×. Second, BitAlign outperforms a state-of-the-art S2G alignment tool by 41×-539× and three S2S alignment accelerators by 1.2×-4.8×. We conclude that SeGraM is a high-performance and low-cost universal genomics mapping accelerator that efficiently supports both sequence-to-graph and sequence-to-sequence mapping pipelines.
@inproceedings{cali_segram_2022, title = {{SeGraM}: {A} {Universal} {Hardware} {Accelerator} for {Genomic} {Sequence}-to-{Graph} and {Sequence}-to-{Sequence} {Mapping}}, url = {https://dl.acm.org/doi/10.1145/3470496.3527436}, doi = {10.1145/3470496.3527436}, booktitle = {Proceedings of the 49th {Annual} {International} {Symposium} on {Computer} {Architecture} (ISCA)}, author = {Cali, Damla Senol and Kanellopoulos, Konstantinos and Lindegger, Joël and Bingöl, Zülal and Kalsi, Gurpreet S. and Zuo, Ziyi and Firtina, Can and Cavlak, Meryem Banu and Kim, Jeremie and Ghiasi, Nika Mansouri and Singh, Gagandeep and Gómez-Luna, Juan and Alserr, Nour Almadhoun and Alser, Mohammed and Subramoney, Sreenivas and Alkan, Can and Ghose, Saugata and Mutlu, Onur}, year = {2022}, month = jun, pages = {638--655}, }
ASPLOS
"GenStore: A High-Performance in-Storage Processing System for Genome Sequence Analysis",

Nika Mansouri Ghiasi, Jisung Park, Harun Mustafa, Jeremie Kim, Ataberk Olgun, Arvid Gollwitzer, Damla Senol Cali, Can Firtina, Haiyu Mao, Nour Almadhoun Alserr, Rachata Ausavarungnirun, Nandita Vijaykumar, Mohammed Alser, and Onur Mutlu

In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar 2022

Abstract BIB Paper (PDF) Code Slides (PDF) Slides (PPTX) Video

Read mapping is a fundamental step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome (called a reference genome). Read mapping is costly because it needs to perform approximate string matching (ASM) on large amounts of data. To address the computational challenges in genome analysis, many prior works propose various approaches such as accurate filters that select the reads within a dataset of genomic reads (called a read set) that must undergo expensive computation, efficient heuristics, and hardware acceleration. While effective at reducing the amount of expensive computation, all such approaches still require the costly movement of a large amount of data from storage to the rest of the system, which can significantly lower the end-to-end performance of read mapping in conventional and emerging genomics systems. We propose GenStore, the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequence analysis by exploiting low-cost and accurate in-storage filters. GenStore leverages hardware/software co-design to address the challenges of in-storage processing, supporting reads with 1) different properties such as read lengths and error rates, which highly depend on the sequencing technology, and 2) different degrees of genetic variation compared to the reference genome, which highly depends on the genomes that are being compared. Through rigorous analysis of read mapping processes of reads with different properties and degrees of genetic variation, we meticulously design low-cost hardware accelerators and data/computation flows inside a NAND flash-based solid-state drive (SSD). Our evaluation using a wide range of real genomic datasets shows that GenStore, when implemented in three modern NAND flash-based SSDs, significantly improves the read mapping performance of state-of-the-art software (hardware) baselines by 2.07-6.05× (1.52-3.32×) for read sets with high similarity to the reference genome and 1.45-33.63× (2.70-19.2×) for read sets with low similarity to the reference genome.
@inproceedings{mansouri_ghiasi_genstore_2022, title = {{GenStore}: {A} {High}-{Performance} in-{Storage} {Processing} {System} for {Genome} {Sequence} {Analysis}}, url = {https://dl.acm.org/doi/10.1145/3503222.3507702}, doi = {10.1145/3503222.3507702}, booktitle = {Proceedings of the 27th {ACM} {International} {Conference} on {Architectural} {Support} for {Programming} {Languages} and {Operating} {Systems} (ASPLOS)}, publisher = {Association for Computing Machinery}, author = {Mansouri Ghiasi, Nika and Park, Jisung and Mustafa, Harun and Kim, Jeremie and Olgun, Ataberk and Gollwitzer, Arvid and Senol Cali, Damla and Firtina, Can and Mao, Haiyu and Almadhoun Alserr, Nour and Ausavarungnirun, Rachata and Vijaykumar, Nandita and Alser, Mohammed and Mutlu, Onur}, year = {2022}, month = mar, pages = {635--654}, }

2021

Comput. Econ.
"Modeling Economic Activities and Random Catastrophic Failures of Financial Networks via Gibbs Random Fields",

Levent Onural, Mustafa Çelebi Pınar, and Can Firtina

Computational Economics, Aug 2021

Abstract BIB Paper (PDF)

The complicated economic behavior of entities in a population can be modeled as a Gibbs random field (GRF). Even with simple GRF models, which restrict direct statistical interactions with a small number of neighbors of an entity, real life economic and financial activities may be effectively described. A computer simulator is developed to run empirical experiments to assess different coupling structures and parameters of the presented model; it is possible to test many economic and financial models and policies in terms of their transient and steady-state consequences.
@article{onural_modeling_2021, title = {Modeling {Economic} {Activities} and {Random} {Catastrophic} {Failures} of {Financial} {Networks} via {Gibbs} {Random} {Fields}}, volume = {58}, issn = {1572-9974}, url = {https://link.springer.com/article/10.1007/s10614-020-10023-3}, doi = {10.1007/s10614-020-10023-3}, number = {2}, journal = {Computational Economics}, author = {Onural, Levent and Pınar, Mustafa Çelebi and Firtina, Can}, month = aug, year = {2021}, pages = {203--232}, }

2020

MICRO
"GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis",

Damla Senol Cali, Gurpreet S. Kalsi, Zülal Bingöl, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gómez-Luna, Amirali Boroumand, Anant Norion, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu

In Proceedings of the 53rd International Symposium on Microarchitecture (MICRO), Oct 2020

Abstract BIB Paper (PDF) Code Slides (PDF) Slides (PPTX) Video

Genome sequence analysis has enabled significant advancements in medical and scientific areas such as personalized medicine, outbreak tracing, and the understanding of evolution. To perform genome sequencing, devices extract small random fragments of an organism’s DNA sequence (known as reads). The first step of genome sequence analysis is a computational process known as read mapping. In read mapping, each fragment is matched to its potential location in the reference genome with the goal of identifying the original location of each read in the genome. Unfortunately, rapid genome sequencing is currently bottlenecked by the computational power and memory bandwidth limitations of existing systems, as many of the steps in genome sequence analysis must process a large amount of data. A major contributor to this bottleneck is approximate string matching (ASM), which is used at multiple points during the mapping process. ASM enables read mapping to account for sequencing errors and genetic variations in the reads. We propose GenASM, the first ASM acceleration framework for genome sequence analysis. GenASM performs bitvectorbased ASM, which can efficiently accelerate multiple steps of genome sequence analysis. We modify the underlying ASM algorithm (Bitap) to significantly increase its parallelism and reduce its memory footprint. Using this modified algorithm, we design the first hardware accelerator for Bitap. Our hardware accelerator consists of specialized systolic-array-based compute units and on-chip SRAMs that are designed to match the rate of computation with memory capacity and bandwidth, resulting in an efficient design whose performance scales linearly as we increase the number of compute units working in parallel. We demonstrate that GenASM provides significant performance and power benefits for three different use cases in genome sequence analysis. First, GenASM accelerates read alignment for both long reads and short reads. For long reads, GenASM outperforms state-of-the-art software and hardware accelerators by 116× and 3.9×, respectively, while reducing power consumption by 37× and 2.7×. For short reads, GenASM outperforms state-of-the-art software and hardware accelerators by 111× and 1.9×. Second, GenASM accelerates pre-alignment filtering for short reads, with 3.7× the performance of a state-of-the-art pre-alignment filter, while reducing power consumption by 1.7× and significantly improving the filtering accuracy. Third, GenASM accelerates edit distance calculation, with 22-12501× and 9.3-400× speedups over the state-of-the-art software library and FPGA-based accelerator, respectively, while reducing power consumption by 548-582× and 67×. We conclude that GenASM is a flexible, high-performance, and low-power framework, and we briefly discuss four other use cases that can benefit from GenASM.
@inproceedings{cali_genasm_2020, address = {Virtual}, title = {{GenASM}: {A} {High}-{Performance}, {Low}-{Power} {Approximate} {String} {Matching} {Acceleration} {Framework} for {Genome} {Sequence} {Analysis}}, url = {https://ieeexplore.ieee.org/document/9251930}, doi = {10.1109/MICRO50266.2020.00081}, booktitle = {Proceedings of the 53rd {International} {Symposium} on {Microarchitecture (MICRO)}}, author = {Senol Cali, Damla and Kalsi, Gurpreet S. and Bingöl, Zülal and Firtina, Can and Subramanian, Lavanya and Kim, Jeremie S. and Ausavarungnirun, Rachata and Alser, Mohammed and G{\'o}mez-Luna, Juan and Boroumand, Amirali and Norion, Anant and Scibisz, Allison and Subramoney, Sreenivas and Alkan, Can and Ghose, Saugata and Mutlu, Onur}, year = {2020}, month = oct, pages = {951--966}, }
Bioinformatics
"Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm",

Can Firtina, Jeremie S. Kim, Mohammed Alser, Damla Senol Cali, A Ercument Cicek, Can Alkan, and Onur Mutlu

Bioinformatics, Jun 2020

Abstract BIB Paper (PDF) Code

Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively.We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary data are available at Bioinformatics online.
@article{firtina_apollo_2020, title = {Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm}, volume = {36}, issn = {1367-4803}, url = {https://academic.oup.com/bioinformatics/article/36/12/3669/5804978}, doi = {10.1093/bioinformatics/btaa179}, number = {12}, journal = {Bioinformatics}, author = {Firtina, Can and Kim, Jeremie S. and Alser, Mohammed and Senol Cali, Damla and Cicek, A Ercument and Alkan, Can and Mutlu, Onur}, month = jun, year = {2020}, pages = {3669--3679}, }

2018

NAR
"Hercules: a profile HMM-based hybrid error correction algorithm for long reads",

Can Firtina, Ziv Bar-Joseph, Can Alkan, and A. Ercument Cicek

Nucleic Acids Research, Nov 2018

Abstract BIB Paper (PDF) Code

Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the short reads. Current approaches rely on various graph or alignment based techniques and do not take the error profile of the underlying technology into account. Efficient machine learning algorithms that address these shortcomings have the potential to achieve more accurate integration of these two technologies. We propose Hercules, the first machine learning-based long read error correction algorithm. Hercules models every long read as a profile Hidden Markov Model with respect to the underlying platform’s error profile. The algorithm learns a posterior transition/emission probability distribution for each long read to correct errors in these reads. We show on two DNA-seq BAC clones (CH17-157L1 and CH17-227A2) that Hercules-corrected reads have the highest mapping rate among all competing algorithms and have the highest accuracy when the breadth of coverage is high. On a large human CHM1 cell line WGS data set, Hercules is one of the few scalable algorithms; and among those, it achieves the highest accuracy.
@article{firtina_hercules_2018, title = {Hercules: a profile {HMM}-based hybrid error correction algorithm for long reads}, volume = {46}, issn = {0305-1048}, url = {https://academic.oup.com/nar/article/46/21/e125/5075030}, doi = {10.1093/nar/gky724}, number = {21}, journal = {Nucleic Acids Research}, author = {Firtina, Can and Bar-Joseph, Ziv and Alkan, Can and Cicek, A. Ercument}, month = nov, year = {2018}, pages = {e125--e125}, }

2017

Bioinformatics
"GLANET: genomic loci annotation and enrichment tool",

Burçak Otlu, Can Firtina, Sündüz Keleş, and Oznur Tastan

Bioinformatics, Sep 2017

Abstract BIB Paper (PDF) Code

Genomic studies identify genomic loci representing genetic variations, transcription factor (TF) occupancy, or histone modification through next generation sequencing (NGS) technologies. Interpreting these loci requires evaluating them with known genomic and epigenomic annotations.We present GLANET as a comprehensive annotation and enrichment analysis tool which implements a sampling-based enrichment test that accounts for GC content and/or mappability biases, jointly or separately. GLANET annotates and performs enrichment analysis on these loci with a rich library. We introduce and perform novel data-driven computational experiments for assessing the power and Type-I error of its enrichment procedure which show that GLANET has attained high statistical power and well-controlled Type-I error rate. As a key feature, users can easily extend its library with new gene sets and genomic intervals. Other key features include assessment of impact of single nucleotide variants (SNPs) on TF binding sites and regulation based pathway enrichment analysis.GLANET can be run using its GUI or on command line. GLANET’s source code is available at https://github.com/burcakotlu/GLANET. Tutorials are provided at https://glanet.readthedocs.org.Supplementary data are available at Bioinformatics online.
@article{otlu_glanet_2017, title = {{GLANET}: genomic loci annotation and enrichment tool}, volume = {33}, issn = {1367-4803}, url = {https://academic.oup.com/bioinformatics/article/33/18/2818/3852077}, doi = {10.1093/bioinformatics/btx326}, number = {18}, journal = {Bioinformatics}, author = {Otlu, Burçak and Firtina, Can and Keleş, Sündüz and Tastan, Oznur}, month = sep, year = {2017}, pages = {2818--2828}, }

2016

Bioinformatics
"On genomic repeats and reproducibility",

Can Firtina, and Can Alkan

Bioinformatics, Aug 2016

Abstract BIB Paper (PDF) Code

Results: Here, we present a comprehensive analysis on the reproducibility of computational characterization of genomic variants using high throughput sequencing data. We reanalyzed the same datasets twice, using the same tools with the same parameters, where we only altered the order of reads in the input (i.e. FASTQ file). Reshuffling caused the reads from repetitive regions being mapped to different locations in the second alignment, and we observed similar results when we only applied a scatter/gather approach for read mapping—without prior shuffling. Our results show that, some of the most common variation discovery algorithms do not handle the ambiguous read mappings accurately when random locations are selected. In addition, we also observed that even when the exact same alignment is used, the GATK HaplotypeCaller generates slightly different call sets, which we pinpoint to the variant filtration step. We conclude that, algorithms at each step of genomic variation discovery and characterization need to treat ambiguous mappings in a deterministic fashion to ensure full replication of results.
@article{firtina_genomic_2016, title = {On genomic repeats and reproducibility}, volume = {32}, issn = {1367-4803}, url = {https://academic.oup.com/bioinformatics/article/32/15/2243/1743552}, doi = {10.1093/bioinformatics/btw139}, number = {15}, journal = {Bioinformatics}, author = {Firtina, Can and Alkan, Can}, month = aug, year = {2016}, pages = {2243--2247}, }