We recommend users to first consult the Tool showcase before this detailed tutorial. To demonstrate the toolbox applications and facilitate an understanding of the methods, we build in example datasets for SMRT and ONT analyses in MeMoRe app.

1. Analytical principle

In Bacteria and Archaea, DNA methylation events (6mA, 4mC, 5mC) are motif-driven, meaning that nearly all occurrences of the same sequence motif(s) will be modified. This property can be used to refine the motifs discovered from SMRTPortal/SMRTLink Base Modification Analysis or nanodisco pipelines.

For each methylation motif de novo discovered, we identify all occurrences in the provided reference genome, and we aggregate the methylation signal to provide a simple visual representation for motif sequence validation. The same procedure is repeated for all related motifs with one substitution to confirmed that the methylation is precisely represented by a the motif of interest. For example, considering GATC de novo discovered, we also extract the methylation signal for:

  • 1st base substitution: AATC, CATC, TATC.

  • 2nd base substitution: GCTC, GGTC, GTTC.

  • 3rd base substitution: GAAC, GACC, GAGC.

  • 4th base substitution: GATA, GATG, GATT.

2. Analysis of SMRT results

In SMRT sequencing, DNA methylation affect the kinetics of the polymerases used for the synthesis of the SMRTBell templates. The changes of polymerase’s kinetics are observed through the Inter-Pulse Duration (IPD) metric which are compared to prediction from an in silico model at each genomic position. The resulting metric is called the IPD ratio (IPD native/IPD in silico). For 6mA and 4mC DNA modification, the IPD ratio increase on top of the methylated positions while an IPD ratio of 1 means no kinetic change. However, 5mC do not typically produce detectable signal and cannot be reliably found from SMRT data.

The following figures showcase typical situations that can be resolved with MeMoRe analysis: de novo discovered motif is “too general” (e.g. CCNGG instead of CCWGG), and de novo discovered motif is “incomplete” (e.g. CCAGG instead of CCWGG). They were generated from an illustrative de novo methylation motif analysis resulting in the following set of motifs:

  • TTT6mACNNNNNGTG (has error)

  • GAC6mAT (has error)

  • GTAT6mAC

  • C6mACNNNNNRTAAA

  • WGG4mCCW

  • GGW5mCC (not detectable with SMRT data)

  • GAT5mC (not detectable with SMRT data)

  • 5mCCGG (not detectable with SMRT data)

2.1. A motif is too general

In this example, the putative motif reported by the analytical pipeline is GAC6mAT. We run MeMoRe on the dataset, and the visualization only shows partial high IPD ratio for GAC6mAT (i.e. dense IPD ratio distribution at background level, around one), while the other related motifs (with one substitution) have IPD ratio at background levels (see Figure 1). This indicate that the putative motif is too general and that the actual methylation motif must be more precise.

C. perfringens's GAC6mAT methylation motif results

Figure 1: MeMoRe results for SMRT dataset of C. perfringens’s GAC6mAT methylation motif.

To refine the motif of interest, we can use the “Motif summary” panel to extend the motif evaluation space by adding “NN” as prefix and suffix so that many more motif compositions are considered (e.g. AGACAT, TNGACAT, GACATNC, etc.). The resulting analysis is displayed in Figure 2 below.

C. perfringens's NNGAC6mATNN methylation motif results

Figure 2: MeMoRe results for SMRT dataset of C. perfringens’s NNGAC6mATNN methylation motif.

This indicate that the actual methylation motif is VGAC6mAT (V = A , C, or G). The resulting motif can be added to the “Motif summary” panel and the associated plot can be generated (see Figure 3 below).

C. perfringens's VGAC6mAT methylation motif results

Figure 3: MeMoRe results for SMRT dataset of C. perfringens’s VGAC6mAT methylation motif.

2.2. A motif is incomplete

In this example, the putative motif reported by the analytical pipeline is TTT6mACNNNNNGTG. We run MeMoRe on the dataset, and the visualization shows high IPD ratio for TTTACNNNNNGTG, and TTTATNNNNNGTG, while the other related motifs (with one substitution) have IPD ratio at background levels (see Figure 4). This indicate that the putative motif is incomplete and that the actual methylation motif is TTT6mAYNNNNNGTG (Y = C or T).

C. perfringens's TTT6mACNNNNNGTG methylation motif results

Figure 4: MeMoRe results for SMRT dataset of C. perfringens’s TTT6mACNNNNNGTG methylation motif.

We can use the “Motif summary” panel to add the complete motif and generate the associated plot (see Figure 5 below).

C. perfringens's TTT6mAYNNNNNGTG methylation motif results

Figure 5: MeMoRe results for SMRT dataset of C. perfringens’s TTT6mAYNNNNNGTG methylation motif.

3. Analysis of ONT results

In ONT sequencing, DNA methylation affect the electric current measured while the DNA molecules transfers through the nanopores. Using nanodisco, current differences between the native and the Whole Genome Amplified samples are computed at each genomic position and this metric represent the methylation signal for ONT dataset. The further from 0 the current difference are, the more likely the genomic is modified. Contrary to SMRT sequencing, the signal is broadly distributed and not restricted to the modified base, meaning that signal for multiple genomic positions needs to be monitored.

The following figures showcase typical situations that can be resolved with MeMoRe analysis: de novo discovered motif is “too general” (e.g. CCNGG instead of CCWGG), de novo discovered motif is “incomplete” (e.g. CCAGG instead of CCWGG), and de novo discovered motifs partially overlapping (e.g. 5mCCGG and 5mCCWGG). They were generated from an illustrative de novo methylation motif analysis resulting in the following set of motifs:

  • GAC6mAT (has error)

  • GGT5mCC (has error)

  • GAT5mC

  • 5mCCGG

  • GTAT6mAC

  • TTT6mAYNNNNNGTG

  • C6mACNNNNNRTAAA

  • WGG4mCCW

3.1. A motif is too general

In this example, the putative motif reported by the analytical pipeline is GAC6mAT. We run MeMoRe on the dataset, and the visualization only shows partial current differences disturbance for GAC6mAT (i.e. dense current difference distribution at background level, around zero), while the other related motifs (with one substitution) have current difference at background levels (see Figure 6). This indicate that the putative motif is too general and that the actual methylation motif must be more precise.

C. perfringens's GAC6mAT methylation motif results

Figure 6: MeMoRe results for ONT dataset of C. perfringens’s GAC6mAT methylation motif.

To refine the motif of interest, we can use the “Motif summary” panel to extend the motif evaluation space by adding “NN” as prefix and suffix so that many more motif compositions are considered (e.g. AGACAT, TNGACAT, GACATNC, etc.). The resulting analysis is displayed in Figure 7 below.

C. perfringens's NNGAC6mATNN methylation motif results

Figure 7: MeMoRe results for ONT dataset of C. perfringens’s NNGAC6mATNN methylation motif.

This indicate that the actual methylation motif is VGAC6mAT (V = A , C, or G). The resulting motif can be added to the “Motif summary” panel and the associated plot can be generated (see Figure 8 below). The figure also shows weak signal for VGACCT which is explained by partial overlap with GGWCC (i.e. GGACCt, see Overlapping motifs).

C. perfringens's VGAC6mAT methylation motif results

Figure 8: MeMoRe results for ONT dataset of C. perfringens’s VGAC6mAT methylation motif.

3.2. A motif is incomplete

In this example, the putative motif reported by the analytical pipeline is GGT5mCC. We run MeMoRe on the dataset, and the visualization shows disturbed current differences for GGTCC, GGACC, and GATCC, while the other related motifs (with one substitution) have current difference at background levels (see Figure 9). GATCC is fully overlapping with GATC and therefore is not new (see Overlapping motifs). This indicate that the putative motif is incomplete and that the actual methylation motif is GGW5mCC (W = A or T).

C. perfringens's GGT5mCC methylation motif results

Figure 9: MeMoRe results for ONT dataset of C. perfringens’s GGT5mCC methylation motif.

We can use the “Motif summary” panel to add the complete motif and generate the associated plot (see Figure 10 below). We also observed two additional related motifs with signal as GGWCC overlap with other motifs (i.e. GGWTC and GGWCA which respectively correspond to GATC and GACAT, see Overlapping motifs).

C. perfringens's GGW5mCC methylation motif results

Figure 10: MeMoRe results for ONT dataset of C. perfringens’s GGW5mCC methylation motif.

3.3. Overlapping motifs

In this example, the motif reported by the analytical pipeline is GAT5mC. We run MeMoRe on the dataset, and the visualization shows disturbed current differences for GATC as well as for GGTC and GACC, while the other related motifs (with one substitution) have current difference at background levels (see Figure 11). GGTC and GACC are partially overlapping with GGWCC and therefore should not be considered as new independent motifs. This indicate that all the additional methylation signal can be explained by GGW5mCC, therefore GATC and GGWCC explain all the signal visualized.

C. perfringens's GAT5mC methylation motif results

Figure 11: MeMoRe results for ONT dataset of C. perfringens’s GAT5mC methylation motif.

This can be visually confirmed by generating the refine plot for HGATCD (H = A, C, or T; D = A, G, or T) which explicitly exclude overlaps with GGW5mCC.

C. perfringens's HGAT5mCD methylation motif results

Figure 12: MeMoRe results for ONT dataset of C. perfringens’s HGAT5mCD methylation motif.