Epigenetics applications with Omni-C¶
In this section you will learn how to:
Identify Topologically Associated Domains (TADs)¶
The .hic file that you generated can be used for identifying Topologically Associated Domains (TADs) in the contact map using the arrowhead
utility of Juicer Tools
Parameter |
Function |
---|---|
-Xmx |
The flag Xmx specifies the maximum memory allocation pool for a Java virtual machine, from our experience 48000m works well when processing human data sets. If you are not sure how much memory your system has, run the command |
Djava.awt.headless=true |
Java is ran in a headless mode when the application does not interact with a user (if not specified, the default is Djava.awt.headless=false) |
arrowhead |
The arrowhead command is used to identify contact domains along the diagonal |
--threads |
Specifies the numbers of threads to be used (integer number) |
-k |
Normalization to be used. KR (Knight-Ruiz) balancing is recommended (other avilable normalization methods are VC and VC_SQRT or NONE, (this is not recommended) |
-m |
Size of the sliding window along the diagonal in which contact domains will be found. Must be an even number as (m/2) is used as the increment for the sliding window. Recommended value: 2000 (this is also the default) |
-r |
Resolution for which Arrowhead will be run. We recommend running with various resolutions, e.g.: 5kB (5000), 10kB (10000), 25kb (25000) and 50kb (50000). The quality of the results at the various resolutions is highly dependent on the depth of sequencing |
*.hic |
hic input file, containing the contact matrix |
output directory |
a path to the directory in which a bedpe file containing the contact domains will be saved (the name of the file is composed of the resolution followed by _blocks.bedpe). The bedpe file, is a text file with 16 fields in each row. For more details on the file format, see the table below |
Command:
java -jar -Xmx<memory> -Djava.awt.headless=true -jar <path_to_juicer_tools.jar> arrowhead --threads <no_of_threads> -k <normalization_type> -m <sliding_window> -r <resolution> <*.hic> <output_path>
Example:
java -jar -Xmx48000m -Djava.awt.headless=true -jar ./Omni-C/juicer_tools.jar arrowhead --threads 16 -k KR -m 2000 -r 10000 contact_map.hic TAD_calls
The above command will result in a new file “10000_blocks.bedpe” inside the “TAD_calls” directory.
The bedpe
output file will contain a list of contact domains with the following 16-field header:
chr1 x1 x2 chr2 y1 y2 name score strand1 strand2 color score uVarScore lVarScore upSign loSign
Field |
Description |
---|---|
chr1/chr2 |
The chromosome that the domain is located on |
x1,x2/y1,y2 |
The interval spanned by the domain (contact domains manifest as squares on the diagonal of a Hi-C matrix and as such: x1=y1, x2=y2) |
name |
typically this field is empty (contains “.”) |
score |
typically this field is empty (contains “.”) |
strand1 |
typically this field is empty (contains “.”) |
strand2 |
typically this field is empty (contains “.”) |
color |
The color that the feature will be rendered as if loaded in Juicebox |
score |
The corner score, a score indicating the likelihood that a pixel is at the corner of a contact domain. Higher values indicate a greater likelihood of being at the corner of a domain |
uVarScore |
The variance of the upper triangle |
lVarScore |
The variance of the lower triangle |
upSign |
-1*(sum of the sign of the entries in the upper triangle) |
loSign |
Sum of the sign of the entries in the lower triangle |
Principle component of the contact matrix¶
The contact matrix .hic can also be used for calculating the Pearson’s correlation matrix of the Observed/Expected for intra-chromosomal contacts. The eigenvector is the first principal component of the Pearson’s matrix and it implies on the compartmentalization of the DNA at high level. The sign of the eigenvector typically indicates the compartment type A/B (active/inactive).
The Principle component of the contact matrix can be calculated using the eigenvector
utility of Juicer Tools
Parameter |
Function |
---|---|
-Xmx |
The flag Xmx specifies the maximum memory allocation pool for a Java virtual machine, from our experience 48000m works well when processing human data sets. If you are not sure how much memory your system has, run the command |
Djava.awt.headless=true |
Java is ran in a headless mode when the application does not interact with a user (if not specified, the default is Djava.awt.headless=false) |
eigenvector |
The eigenvector command is used to calculate the first principal component of the Pearson’s Observed/Expected matrix |
-p |
Optional, force with curren parameters. When running with resolution < 500kb this may take very long time and by default the run will be canceled (output file will be empty). To force the run with any resolution < 500kb use the -p flag |
VC/VC_SQRT/KR |
Normalization to be used. KR (Knight-Ruiz) balancing is recommended. |
*.hic |
hic input file, containing the contact matrix |
chr |
The chromosome for which the principle component will be calculated (make sure that same format as in the reference is used, e.g. chrX or x, depending on the reference) |
output directory |
a path to the directory in which a bedpe file containing the contact domains will be saved (the name of the file is composed of the resolution followed by _blocks.bedpe). The bedpe file, is a text file with 16 fields in each row. For more details on the file format, see the table below |
BP |
resolution type (FRAG is another resolution type, this option is not covered in this documentation) |
Resolution |
Which resolution to be used for the calculation (integer). Since the goal here is to identify high level compartments 1M (100000) or 500k (500000) are typically sufficient |
Output file |
Name of text file for recording the calculated values. The output is a list of eigenvalues recorede for each bin, sorted by order on the chromosome. E.g. if the chromosome lenght 43Mb and a bin size of 1000000 was used, the output will be a list of 43 eigenvalues in text format. |
Command:
java -jar -Xmx<memory> -Djava.awt.headless=true -jar <path_to_juicer_tools.jar> eigenvector <normalization_type> <*.hic> <chromosome> <resolution type> <resolution> <output file>
Example 1 (resolution >= 500kb):
java -jar -Xmx48000m -Djava.awt.headless=true -jar ./Omni-C/juicer_tools.jar eigenvector KR contact_map.hic chr1 BP 1000000 PC_chr1.txt
Example 2 (resolution <= 500kb):
java -jar -Xmx48000m -Djava.awt.headless=true -jar ./Omni-C/juicer_tools.jar eigenvector -p KR contact_map.hic chr1 BP 100000 PC_chr1.txt
As described in the table above, the output is a one columnn list of the eigenvalues as calculated for each one of the bins. For most of the common applications this format is not useful and we will guide you on how to convert it to more useful formats:
bedGraph format¶
To generate a bedgraph which details each bin interval followed by the associated eigenvalue you will generate a list of the bins intervals using the bedtools makewindows
command and combine it with the output file of the previous step (juicer tools eigenvector
).
Specify the chromosome of interest, first bp of the chromosome, last bp of the chromosome (you can find this information in the .genome file) in a tab delmitied format (as in a bed file format) and feed it to bedtools makewindows
alongside the window size (same as the resolution used for juicer tools eigenvector
). Finally, combine the two files using the paste
command:
Command:
echo -e '<chromosome>\t<start position>\t<end position>'|bedtools makewindows -b stdin -w <window>|paste - <eigenvector output file> > <output.bedgraph>
Example:
echo -e 'chr1\t0\t248956422'|bedtools makewindows -b stdin -w 1000000|paste - PC_chr1.txt > PC_chr1.bedgraph
A/B compartments¶
Active regions in the genome were observed to have similar eigenvalues, the same is true for inactive regions. To identify the distinct compartments all the neighboring bins with same sign (>0 or <0) are merged into one region. In each chromosome all the regions with the same sign are classified as the same compartment type (A/B). Please note that the signs of PC values are arbitrary, to accurately assign compartment type to signs of PC values, you need knowledge beyond Omni-C data such as RNA-Seq, Methylation states, etc`
We will use the bedGraph
file for generating a bedpe
file containing the borders of the calculated A/B compartments:
bedtools
allows merging of overlapping or neighboring (-d
flag) intervals based on strandness (-s
flag). We will use this utility by assigning +
to all intervals with PC1 values >0 and -
to all intervals with PC1 values <0 (A/B compartments has nothing to do with strandness, the +
and -
assignments are only from practical reasons for easy manupulation of the bed file). The strandness option assumes the strand information is on the 6th column, we will use the awk
command to record +
or -
on the 6th column based on the PC values.
Adjacent intervals with the same sign are merged together using bedtools merge
Following merging, we will use the awk
command to genetrate the final bedpe format.
Command:
awk '{if ($4 > 0) print $1"\t"$2"\t"$3"\t"".""\t"".""\t""+"; else if ($4 < 0) print $1"\t"$2"\t"$3"\t"".""\t"".""\t""-"}' <*.bedgraph>|bedtools merge -s -d 10 -c 6 -o distinct|awk '{print $1"\t"$2"\t"$3"\t"$1"\t"$2"\t"$3"\t"$4}'> <output.bedpe>
Example:
awk '{if ($4 > 0) print $1"\t"$2"\t"$3"\t"".""\t"".""\t""+"; else if ($4 < 0) print $1"\t"$2"\t"$3"\t"".""\t"".""\t""-"}' PC_chr1.bedgraph|bedtools merge -s -d 10 -c 6 -o distinct|awk '{print $1"\t"$2"\t"$3"\t"$1"\t"$2"\t"$3"\t"$4}'>chr1_AB.bedpe
The figure below shows the observed/expected matrix of chr1 with the A/B compartments (black boxes) as generated with the commands above (visualized using Juicebox
)