Archived

No description

This repository has been archived on 2026-04-05. You can view files and clone it, but you cannot make any changes to its state, such as pushing and creating new issues, pull requests or comments.

R 100%

Find a file

Harrison Deng 157580af62 Added "biocViews" section in description file		2023-12-10 00:21:10 +00:00
.vscode	Fixed automatic ensembl case-control analysis function	2023-12-09 23:17:33 +00:00
data	Added some unit tests.	2023-11-15 00:57:23 +00:00
inst/extdata	Changed dataset and implemented variance analysis with Ensembl as control.	2023-11-14 17:34:00 +00:00
man	Changed to using import in DataLoading.R.	2023-12-10 00:04:54 +00:00
R	Added imports for rest of R files	2023-12-10 00:07:31 +00:00
tests	Created snapshot test for automatic case-control with Ensembl	2023-12-09 23:34:18 +00:00
vignettes	Fixed vignette mapping example	2023-12-10 00:20:52 +00:00
.gitignore	Finalized project for submission	2023-11-15 01:35:56 +00:00
.Rbuildignore	Initial commit	2023-11-13 09:36:05 +00:00
DESCRIPTION	Added "biocViews" section in description file	2023-12-10 00:21:10 +00:00
LICENSE	Initial commit	2023-11-13 09:36:05 +00:00
LICENSE.md	Initial commit	2023-11-13 09:36:05 +00:00
NAMESPACE	Added imports for rest of R files	2023-12-10 00:07:31 +00:00
PhenoGenRLib.Rproj	Initial commit	2023-11-13 09:36:05 +00:00
README.md	Readme updated.	2023-11-15 01:52:15 +00:00
README.Rmd	Readme updated.	2023-11-15 01:52:15 +00:00

README.md

PhenoGenRLib

The goal of PhenoGenRLib is to simplify nucleotide variant analysis.

As next generational sequencing (NGS) begins taking off, more and more data is readily available to be used. Arguably, there is an overabundance of data that has yet been used to it’s fullest potential. PhenoGenRLib promises to provide simple ways of loading VCFs, associating them with sample metadata, and lastly, running associative studies by applying the metadata.

Installation

You can install the development version of PhenoGenRLib like so:

require("devtools")
devtools::install github("RealYHD/PhenoGenRLib",
build vignettes = TRUE)
library("PhenoGenRLib")

Getting Started

To get started, have a datasheet ready in the form of a CSV. This datasheet should at the very least, contain one column, where each row in that column contains the filename of the VCF including the .vcf. For the following example, we will assume that such a file is called huntingtons_datasheet_shortened.csv and is located at ./inst/extdata/huntingtons_datasheet_shortened.csv with the column containing the VCF filenames being named vcfs. We will also need the location of the VCFs. Let’s assume they can be found at the same place as the metadata CSV ./inst/extdata/. Then:

library(PhenoGenRLib)
variants <- PhenoGenRLib::linkVariantsWithMetadata(
  metadata = "inst/extdata/huntingtons_datasheet_shortened.csv",
  vcfDir = "inst/extdata/",
  vcfColName = "vcfs"
)
#> Rows: 8 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): vcfs, dummy_pheno
#> dbl (1): chromosome
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> READING VCF
#>  * checking if file exists... PASS
#>  * Reading vcf header...
#>    Done
#>  * Reading vcf body...
#>    Done
#>  * Parse vcf header...
#>    Done
#>  * Split info...
#>  * Done
#>  * Split samples...
#>    Done
#> READING VCF
#>  * checking if file exists... PASS
#>  * Reading vcf header...
#>    Done
#>  * Reading vcf body...
#>    Done
#>  * Parse vcf header...
#>    Done
#>  * Split info...
#>  * Done
#>  * Split samples...
#>    Done
#> READING VCF
#>  * checking if file exists... PASS
#>  * Reading vcf header...
#>    Done
#>  * Reading vcf body...
#>    Done
#>  * Parse vcf header...
#>    Done
#>  * Split info...
#>  * Done
#>  * Split samples...
#>    Done
#> READING VCF
#>  * checking if file exists... PASS
#>  * Reading vcf header...
#>    Done
#>  * Reading vcf body...
#>    Done
#>  * Parse vcf header...
#>    Done
#>  * Split info...
#>  * Done
#>  * Split samples...
#>    Done
#> READING VCF
#>  * checking if file exists... PASS
#>  * Reading vcf header...
#>    Done
#>  * Reading vcf body...
#>    Done
#>  * Parse vcf header...
#>    Done
#>  * Split info...
#>  * Done
#>  * Split samples...
#>    Done
#> READING VCF
#>  * checking if file exists... PASS
#>  * Reading vcf header...
#>    Done
#>  * Reading vcf body...
#>    Done
#>  * Parse vcf header...
#>    Done
#>  * Split info...
#>  * Done
#>  * Split samples...
#>    Done
#> READING VCF
#>  * checking if file exists... PASS
#>  * Reading vcf header...
#>    Done
#>  * Reading vcf body...
#>    Done
#>  * Parse vcf header...
#>    Done
#>  * Split info...
#>  * Done
#>  * Split samples...
#>    Done
#> READING VCF
#>  * checking if file exists... PASS
#>  * Reading vcf header...
#>    Done
#>  * Reading vcf body...
#>    Done
#>  * Parse vcf header...
#>    Done
#>  * Split info...
#>  * Done
#>  * Split samples...
#>    Done

Checkout the documents and vignettes for where to go from here!

Contributions

PhenoGenRLib stands on the shoulder of giants, and it would be a disservice to not name them:

Thank you Syed Haider et al. for providing bedr. It was greatly helpful in simplifying the data ingress features.
Thanks to the entire Biomart Team for providing an awesome and easy to use interface to large public databases!
ggplot2 was very helpful in generating figures. Thanks to Wickham et. al!
This entire project wouldn’t have been possible without the help of the TidyVerse team. Despite not using every single package from their library, much work and diagnostic made use of their tools.
Tibble helped simplify data storage and accession. Thanks Muller et. al!

No generative AI was used for this project directly, however, learning about how R works and how some of the syntax differs from other languages was aided by ChatGPT.

This was a BCB410H1 UofT Bioinformatics project by Harrison Deng.

Citations

Müller K, Wickham H (2023). _tibble: Simple Data Frames_. R package
  version 3.2.1, <https://CRAN.R-project.org/package=tibble>.

Haider S, Waggott D, C. Boutros P (2019). _bedr: Genomic Region
  Processing using Tools Such as 'BEDTools', 'BEDOPS' and 'Tabix'_. R
  package version 1.0.7, <https://CRAN.R-project.org/package=bedr>.

BioMart and Bioconductor: a powerful link between biological
  databases and microarray data analysis. Steffen Durinck, Yves Moreau,
  Arek Kasprzyk, Sean Davis, Bart De Moor, Alvis Brazma and Wolfgang
  Huber, Bioinformatics 21, 3439-3440 (2005).

H. Wickham. ggplot2: Elegant Graphics for Data Analysis.
  Springer-Verlag New York, 2016.

Wickham H, Hester J, Bryan J (2023). _readr: Read Rectangular Text
  Data_. R package version 2.1.4,
  <https://CRAN.R-project.org/package=readr>.

Acknowledgements

This package was developed as part of an assessment for 2023 BCB410H: Applied Bioinformatics course at the University of Toronto, Toronto, CANADA. PhenoGenRLib welcomes issues, enhancement requests, and other contributions. To submit an issue, use the GitHub issues.

README.md Unescape Escape