DNA相关知识
Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid
细胞模式图 | 亚细胞结构 |
---|---|
Components of a typical animal cell: 1.Nucleolus, 2.Nucleus, 3.Ribosome, 4.Vesicle, 5.Rough endoplasmic reticulum, 6.Golgi apparatus, 7.Cytoskeleton, 8.Smooth endoplasmic reticulum, 9.Mitochondrion, 10.Vacuole, 11.Cytosol, 12.Lysosome, 13.Centrosome, 14.Cell membrane |
- Wiki
蛋白形成过程 | 蛋白3D结构 |
---|---|
DNA and RNA codon tables
Transcription
ref: Chromatin plasticity: A versatile landscape that underlies cell fate and identity. 2018
ref: Developmental enhancers and chromosome topology. 2018
ref: The dynamics of chromatin architecture in brain development and function. 2021
参考基因组
- NCBI-Genome
- Ensembl
- GENECODE
- UCSC
- REPBASE #重复序列
基因组注释统计信息:
人类(Homo sapiens):Human assembly and gene annotation
小鼠(Mus musculus):Mouse assembly and gene annotation
数据表示方法
字符形式
- fasta
- fastq
- sequence features
DiProDB: a database for dinucleotide properties Seq2Feature: a comprehensive web-based feature extraction tool
Properties I | Properties II (details) | Description |
---|---|---|
Physicochemical properties | Stacking energy, Enthalpy, Entropy, Flexibility shift, Flexibility_slide, Free energy, Melting Temperature, Mobility to bend towards major groove, Mobility to bend towards minor groove, Probability contacting nucleosome core, Rise stiffness, Roll stiffness, Shift stiffness, Slide stiffness, Tilt stiffness, Twist stiffness | 理化性质 |
Conformational properties | Bend, Rise, Roll, Inclination, Major Groove Depth, Major Groove Distance, Major Groove Size, Major Groove Width, Minor Groove Depth, Minor Groove Distance, Minor Groove Size, Minor Groove Width, Shift, Propeller Twist, Slide, Tilt, Tip, Twist | 构象性质 |
Nucleotide content | Adenine content, Cytosine content, GC content, Guanine content, Keto (GT) content, Purine (AG) content, Thymine content, Pyrimidine (CT) | 碱基含量 |
- gene features
真核生物基因结构
原核生物基因结构
- toolkits
bio.tools
Sequence Manipulation Suite
Seq2Feature
数字形式
- One-Hot
One-hot方法,将DNA序列直观的表示为0-1的矩阵,如图所示:
矩阵维度为[4,n],n为序列的长度。代表的模型有DeepSEA、DeepBind和Basset等。
- Embedding
自然语言处理中的架构如word2vec、Transformer等均采用了词向量表示方法,可参考博客理解。借鉴NLP的思路,先将DNA序列切分为一定长度的k-mer,然后把k-mer的序列片段视为词语,一段序列视为句子,训练词向量模型,最终的词库大小为4^k个。
如dna2vec、kmer2vec和DNABERT等模型均采用这种方法表示基因组序列。
The real voyage of discovery consists not in seeking new lands but seeing with new eyes.
–Marcel Proust, 1923, La Prisonierre