What is UnfoldCDL?

UnfoldCDL (Unfolded Convolutional Dictionary Learning) is a DNA sequence motif discovery method. We first formulate a convolutional dictionary learning problem and then "unfold" its optimization algorithm into a neural network. The resulting network is fully interpretable, fast to train, and outputs a sparse representation of the dataset. The sparse representation allows us to infer the motifs in the dataset efficiently.

Why use UnfoldCDL for motif discovery?

Many methods can find statistically significant motifs, but the motifs in the dataset may only be "partially found" because proteins can bind to DNA in complicated ways. For example, motifs may have multiple modes: each mode may share similar patterns (multimeric binding, alternate structural conformations), have distinct "parts" (variable spacing), or have multiple motifs that look entirely dissimilar to each other (multiple DNA binding domains). Some traditional motif discovery methods use heuristics such as substring-masking to deal with the above scenarios, which leads to a sequential motif discovery method and results in some secondary motifs being masked and not revealed. The inference on the motifs using other black-box deep learning approaches is currently challenging.

The sparse representation we obtained from UnfoldCDL reveals where the enriched patterns are in the dataset, and we seek to use such representation to discover all the motifs simultaneously.

We found many unreported motifs on the JASPAR datasets. Check our preprint's result section for detail.

How to Install

We are currently adding this package to the Julia registry. Once it's added, the user can simply install our package via the Julia's package manager:

pkg> add UnfoldCDL

Software requirements

This package requires Weblogo. You need python3 and install Weblogo with following command:

pip3 install weblogo

Hardware requirements

We require the user to have an Nvidia GPU; we plan to implement a CPU version in the future.

How to Use

In Julia, import the UnfoldCDL package first:

using UnfoldCDL

To do motif discovery on a single fasta file, execute

find_motif(<fasta-path>, <output-folder>)

Perform motif discovery on the fasta file <fasta-path>, and
Output the result in a pre-specified folder <output-folder>.

To do motif discovery on a batch of fasta files, execute

find_motif_fasta_folder(<fasta-folder-path>, <output-folder>)

Perform motif discovery on all the fasta files in the <fasta-folder-path>, and
Output each of the results in a pre-specified folder <output-folder>.

Citation

The paper for UnfoldCDL is at https://www.biorxiv.org/content/10.1101/2022.11.06.515322v3. It can be cited using the following BibTex entry:

@article {Chu2022.11.06.515322,
	author = {Chu, Shane Kuei-Hsien and Stormo, Gary D},
	title = {Deep unfolded convolutional dictionary learning for motif discovery},
	elocation-id = {2022.11.06.515322},
	year = {2022},
	doi = {10.1101/2022.11.06.515322},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2022/11/10/2022.11.06.515322},
	eprint = {https://www.biorxiv.org/content/early/2022/11/10/2022.11.06.515322.full.pdf},
	journal = {bioRxiv}
}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
docs		docs
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md
cgraph.jpg		cgraph.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Unfolded Convolutional Dictionary Learning for Motif Discovery

Table of Contents

What is UnfoldCDL?

Why use UnfoldCDL for motif discovery?

How to Install

Software requirements

Hardware requirements

How to Use

Citation

About

Releases 1

Packages

Languages

License

kchu25/UnfoldCDL.jl

Folders and files

Latest commit

History

Repository files navigation

Deep Unfolded Convolutional Dictionary Learning for Motif Discovery

Table of Contents

What is UnfoldCDL?

Why use UnfoldCDL for motif discovery?

How to Install

Software requirements

Hardware requirements

How to Use

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages