Publication:
Albertlast: A bidirectional encoder representation of a transformer's approach for the estimation of Line-1 content
Albertlast: A bidirectional encoder representation of a transformer's approach for the estimation of Line-1 content
dc.contributor.advisor | Seguel Campodónico, Juan Jaime | |
dc.contributor.author | Chamorro Parejo, Andrés David | |
dc.contributor.college | College of Engineering | |
dc.contributor.committee | Ramos, Kenneth | |
dc.contributor.committee | Schütz Schmuck, Marko | |
dc.contributor.committee | Rivera Gallego, Wilson | |
dc.contributor.committee | Arzuaga Cruz, Emmanuel | |
dc.contributor.department | Department of Computer Science and Engineering | |
dc.contributor.representative | Rodrı́guez Román, Daniel | |
dc.date.accessioned | 2023-05-17T19:07:52Z | |
dc.date.available | 2023-05-17T19:07:52Z | |
dc.date.issued | 2023-05-12 | |
dc.description.abstract | Technological breakthroughs in high-throughput sequencing platforms have triggered a revolution in genomics. This revolution has significantly augmented an already large number of genomic datasets, and their sizes. Every increase in the amount of data brings about challenges to the ability to process it. For certain bioinformatics tasks, it is no longer possible, or desirable, to rely exclusively on classical alignment and mapping methods. This is, for example, the case of methods for the identification of LINE-1 in the genome, which present challenges in accurately identifying the variations associated with the inserts in a sample. This dissertation developed a masking model using the Bidirectional Encoder Representations from Transformers (BERT) technique and used it to develop a transformer classification model. The final product is an innovative alignment-free system that detect and analyze polymorphic LINE-1 insertions and content estimation in a sample. | |
dc.description.abstract | Los avances tecnológicos en las plataformas de secuenciación de alto rendimiento han desencadenado una revolución en la genómica. Esta revolución ha aumentado considerablemente el número de conjuntos de datos genómicos, y su tamaño. Todo aumento de la cantidad de datos plantea retos a la capacidad de procesarlos. Para determinadas tareas bioinformáticas, ya no es posible, o deseable, confiar exclusivamente en los métodos clásicos de alineación y mapeo. Es el caso, por ejemplo, de los métodos de identificación de LINE-1 en el genoma, que plantean retos a la hora de identificar con precisión las variaciones asociadas a las inserciones en una muestra. En esta tesis se ha desarrollado un modelo de enmascaramiento mediante la técnica de representaciones codificadoras bidireccionales de transformers (BERT) y se ha utilizado para desarrollar un modelo de clasificación. El producto final es un innovador sistema libre de alineamiento que detecta y analiza inserciones polimórficas LINE-1 y estima de contenido en una muestra. | |
dc.description.graduationSemester | Spring | |
dc.description.graduationYear | 2023 | |
dc.description.note | Center for Genomic and Precision Medicine | |
dc.description.sponsorship | Mathematics Department and SAFERSIM for their generous financial support | |
dc.identifier.uri | https://hdl.handle.net/20.500.11801/3501 | |
dc.language.iso | en | |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | * |
dc.rights.holder | (c) 2023 Andrés David Chamorro Parejo | |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | * |
dc.subject | LINE-1 | |
dc.subject | Machine Learning | |
dc.subject | Precision Medicine | |
dc.subject | Transformers | |
dc.subject.lcsh | Genomics - Models | |
dc.subject.lcsh | Data sets | |
dc.subject.lcsh | Sequence alignment (Bioinformatics) | |
dc.subject.lcsh | Classification - Mathematical models | |
dc.subject.lcsh | Treebanks (Linguistics) | |
dc.subject.lcsh | Machine learning | |
dc.title | Albertlast: A bidirectional encoder representation of a transformer's approach for the estimation of Line-1 content | |
dc.type | Dissertation | |
dspace.entity.type | Publication | |
thesis.degree.discipline | Computing and Information Sciences and Engineering | |
thesis.degree.level | Ph.D. |