ENC TF Binding Uniform TFBS Track Settings

JavaScript is disabled in your web browser

You must have JavaScript enabled in your web browser to use the Genome Browser

Transcription Factor ChIP-seq Uniform Peaks from ENCODE/Analysis

Track collection: ENCODE Transcription Factor Binding

Description

All tracks in this collection (7)

Display mode: Reset to defaults

Score range: min:

(0 to 1000)

Select subtracks by cell line and factor: (help)

All	*Cell Line*	GM12878 (Tier 1)	H1-hESC (Tier 1)	K562 (Tier 1)	HeLa-S3 (Tier 2)	HepG2 (Tier 2)	HUVEC (Tier 2)	IMR90 (Tier 2*)	A549 (Tier 2*)	MCF-7 (Tier 2*)	SK-N-SH (Tier 2*)	AG04449	AG04450	AG09309	AG09319	AG10803	AoAF	BE2 C	BJ	Caco-2	Dnd41	ECC-1	Fibrobl	Gliobla	GM06990	GM08714	GM10847	GM12801	GM12864	GM12865	GM12872	GM12873	GM12874	GM12875	GM12891	GM12892	GM15510	GM18505	GM18526	GM18951	GM19099	GM19193	GM19238	GM19239	GM19240	HAc	HA-sp	HBMEC	HCFaa	HCM	HCPEpiC	HCT-116	HEEpiC	HEK293	HEK293-T-REx	HFF	HFF-Myc	HL-60	HMEC	HMF	HPAF	HPF	HRE	HRPEpiC	HSMM	HSMMtube	HVMF	MCF10A-Er-Src	NB4	NH-A	NHDF-Ad	NHDF-neo	NHEK	NHLF	NT2-D1	Osteobl	PANC-1	PBDE	PBDEFetal	PFSK-1	ProgFib	Raji	RPTEC	SAEC	SH-SY5Y	SK-N-MC	SK-N-SH RA	T-47D	U2OS	U87	WERI-Rb-1	WI-38	*Cell Line*	All
*Factor*																																																																																													*Factor*
ARID3A																																																																																													ARID3A
ATF1																																																																																													ATF1
ATF2																																																																																													ATF2
ATF3																																																																																													ATF3
BACH1																																																																																													BACH1
BATF																																																																																													BATF
BCL11A																																																																																													BCL11A
BCL3																																																																																													BCL3
BCLAF1																																																																																													BCLAF1
BDP1																																																																																													BDP1
BHLHE40																																																																																													BHLHE40
BRCA1																																																																																													BRCA1
BRF1																																																																																													BRF1
BRF2																																																																																													BRF2
CBX3																																																																																													CBX3
CCNT2																																																																																													CCNT2
CEBPB																																																																																													CEBPB
CEBPD																																																																																													CEBPD
CHD1																																																																																													CHD1
CHD2																																																																																													CHD2
CREB1																																																																																													CREB1
CTBP2																																																																																													CTBP2
CTCF																																																																																													CTCF
CTCFL																																																																																													CTCFL
E2F1																																																																																													E2F1
E2F4																																																																																													E2F4
E2F6																																																																																													E2F6
EBF1																																																																																													EBF1
EGR1																																																																																													EGR1
ELF1																																																																																													ELF1
ELK1																																																																																													ELK1
ELK4																																																																																													ELK4
EP300																																																																																													EP300
ESR1																																																																																													ESR1
ESRRA																																																																																													ESRRA
ETS1																																																																																													ETS1
EZH2																																																																																													EZH2
FAM48A																																																																																													FAM48A
FOS																																																																																													FOS
FOSL1																																																																																													FOSL1
FOSL2																																																																																													FOSL2
FOXA1																																																																																													FOXA1
FOXA2																																																																																													FOXA2
FOXM1																																																																																													FOXM1
FOXP2																																																																																													FOXP2
GABPA																																																																																													GABPA
GATA1																																																																																													GATA1
GATA2																																																																																													GATA2
GATA3																																																																																													GATA3
GRp20																																																																																													GRp20
GTF2B																																																																																													GTF2B
GTF2F1																																																																																													GTF2F1
GTF3C2																																																																																													GTF3C2
HDAC1																																																																																													HDAC1
HDAC2																																																																																													HDAC2
HDAC6																																																																																													HDAC6
HDAC8																																																																																													HDAC8
HMGN3																																																																																													HMGN3
HNF4A																																																																																													HNF4A
HNF4G																																																																																													HNF4G
HSF1																																																																																													HSF1
IKZF1																																																																																													IKZF1
IRF1																																																																																													IRF1
IRF3																																																																																													IRF3
IRF4																																																																																													IRF4
JUN																																																																																													JUN
JUNB																																																																																													JUNB
JUND																																																																																													JUND
KAP1																																																																																													KAP1
KDM5A																																																																																													KDM5A
KDM5B																																																																																													KDM5B
MAFF																																																																																													MAFF
MAFK																																																																																													MAFK
MAX																																																																																													MAX
MAZ																																																																																													MAZ
MBD4																																																																																													MBD4
MEF2A																																																																																													MEF2A
MEF2C																																																																																													MEF2C
MTA3																																																																																													MTA3
MXI1																																																																																													MXI1
MYBL2																																																																																													MYBL2
MYC																																																																																													MYC
NANOG																																																																																													NANOG
NFATC1																																																																																													NFATC1
NFE2																																																																																													NFE2
NFIC																																																																																													NFIC
NFYA																																																																																													NFYA
NFYB																																																																																													NFYB
NR2C2																																																																																													NR2C2
NR2F2																																																																																													NR2F2
NR3C1																																																																																													NR3C1
NRF1																																																																																													NRF1
PAX5																																																																																													PAX5
PBX3																																																																																													PBX3
PHF8																																																																																													PHF8
PML																																																																																													PML
POLR2A																																																																																													POLR2A
POLR3G																																																																																													POLR3G
POU2F2																																																																																													POU2F2
POU5F1																																																																																													POU5F1
PPARGC1A																																																																																													PPARGC1A
PRDM1																																																																																													PRDM1
RAD21																																																																																													RAD21
RBBP5																																																																																													RBBP5
RCOR1																																																																																													RCOR1
RDBP																																																																																													RDBP
RELA																																																																																													RELA
REST																																																																																													REST
RFX5																																																																																													RFX5
RPC155																																																																																													RPC155
RUNX3																																																																																													RUNX3
RXRA																																																																																													RXRA
SAP30																																																																																													SAP30
SETDB1																																																																																													SETDB1
SIN3A																																																																																													SIN3A
SIN3AK20																																																																																													SIN3AK20
SIRT6																																																																																													SIRT6
SIX5																																																																																													SIX5
SMARCA4																																																																																													SMARCA4
SMARCB1																																																																																													SMARCB1
SMARCC1																																																																																													SMARCC1
SMARCC2																																																																																													SMARCC2
SMC3																																																																																													SMC3
SP1																																																																																													SP1
SP2																																																																																													SP2
SP4																																																																																													SP4
SPI1																																																																																													SPI1
SREBP1																																																																																													SREBP1
SRF																																																																																													SRF
STAT1																																																																																													STAT1
STAT2																																																																																													STAT2
STAT3																																																																																													STAT3
STAT5A																																																																																													STAT5A
SUZ12																																																																																													SUZ12
TAF1																																																																																													TAF1
TAF7																																																																																													TAF7
TAL1																																																																																													TAL1
TBL1XR1																																																																																													TBL1XR1
TBP																																																																																													TBP
TCF12																																																																																													TCF12
TCF3																																																																																													TCF3
TCF7L2																																																																																													TCF7L2
TEAD4																																																																																													TEAD4
TFAP2A																																																																																													TFAP2A
TFAP2C																																																																																													TFAP2C
THAP1																																																																																													THAP1
TRIM28																																																																																													TRIM28
UBTF																																																																																													UBTF
USF1																																																																																													USF1
USF2																																																																																													USF2
WRNIP1																																																																																													WRNIP1
YY1																																																																																													YY1
ZBTB33																																																																																													ZBTB33
ZBTB7A																																																																																													ZBTB7A
ZEB1																																																																																													ZEB1
ZKSCAN1																																																																																													ZKSCAN1
ZNF143																																																																																													ZNF143
ZNF217																																																																																													ZNF217
ZNF263																																																																																													ZNF263
ZNF274																																																																																													ZNF274
ZZZ3																																																																																													ZZZ3
*Factor*																																																																																													*Factor*
All	*Cell Line*	GM12878 (Tier 1)	H1-hESC (Tier 1)	K562 (Tier 1)	HeLa-S3 (Tier 2)	HepG2 (Tier 2)	HUVEC (Tier 2)	IMR90 (Tier 2*)	A549 (Tier 2*)	MCF-7 (Tier 2*)	SK-N-SH (Tier 2*)	AG04449	AG04450	AG09309	AG09319	AG10803	AoAF	BE2 C	BJ	Caco-2	Dnd41	ECC-1	Fibrobl	Gliobla	GM06990	GM08714	GM10847	GM12801	GM12864	GM12865	GM12872	GM12873	GM12874	GM12875	GM12891	GM12892	GM15510	GM18505	GM18526	GM18951	GM19099	GM19193	GM19238	GM19239	GM19240	HAc	HA-sp	HBMEC	HCFaa	HCM	HCPEpiC	HCT-116	HEEpiC	HEK293	HEK293-T-REx	HFF	HFF-Myc	HL-60	HMEC	HMF	HPAF	HPF	HRE	HRPEpiC	HSMM	HSMMtube	HVMF	MCF10A-Er-Src	NB4	NH-A	NHDF-Ad	NHDF-neo	NHEK	NHLF	NT2-D1	Osteobl	PANC-1	PBDE	PBDEFetal	PFSK-1	ProgFib	Raji	RPTEC	SAEC	SH-SY5Y	SK-N-MC	SK-N-SH RA	T-47D	U2OS	U87	WERI-Rb-1	WI-38	*Cell Line*	All

List subtracks: only selected/visible all ()

Tier^↓1

Cell Line^↓2

Factor^↓3

Lab^↓4

Track Name^↓5

Source data version: ENCODE March 2012 Freeze
Assembly: Human Feb. 2009 (GRCh37/hg19)

Description

This track represents a comprehensive set of human transcription factor binding sites based on ChIP-seq experiments generated by production groups in the ENCODE Consortium from the inception of the project in September 2007, through the March 2012 internal data freeze. The track represents peak calls (regions of enrichment) that were generated by the ENCODE Analysis Working Group (AWG) based on a uniform processing pipeline developed for the ENCODE Integrative Analysis effort and published in a set of coordinated papers in September 2012. Peak calls from that effort, based on datasets from the January 2011 ENCODE data freeze) are available at the ENCODE Analysis Data Hub. This track is an update that includes newer data, and slightly modified methods for the peak calling.

This track contains 690 ChIP-seq datasets representing 161 unique regulatory factors (generic and sequence-specific factors). The datasets span 91 human cell types and some are in various treatment conditions. These datasets were generated by the five ENCODE TFBS ChIP-seq production groups: Broad, Stanford/Yale/UC-Davis/Harvard, HudsonAlpha Institute, University of Texas-Austin and University of Washington, and University of Chicago. The University of Chicago ChIP-seq were performed with an alternative epitope-tagged ChIP-seq methodology. The primary and lab-processed data (along with methods descriptions, credits and references) on which this track is based are available in the following ENCODE tracks: HAIB TFBS, SYDH TFBS, UChicago TFBS, UTA TFBS, UW CTCF Binding. These tracks are accessible from the ENC TF Binding Super-track.

Display and File Conventions and Configuration

The display for this track shows site location with the point-source of the peak marked with a colored vertical bar and the level of enrichment at the site indicated by the darkness of the item. The display can be filtered to higher valued items, using the Score range: configuration item. The score values were computed at UCSC based on signal values assigned by the ENCODE uniform analysis pipeline. The input signal values were multiplied by a normalization factor calculated as the ratio of the maximum score value (1000) to the signal value at 1 standard deviation from the mean, with values exceeding 1000 capped at 1000. This has the effect of distributing scores up to mean + 1std across the score range, but assigning all above to the maximum score.

This track is a composite annotation track containing multiple subtracks, one for each cell type. The display mode and filtering of each subtrack can be individually controlled. For more information about track configuration, see Configuring Multi-View Tracks. Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks. The UCSC Accession listed in the metadata can be used with the File Search tool to retrieve primary data files underlying datasets of interest, by selecting UCSC Accession from the "ENCODE terms" drop down menu option.

In the subtrack selection list, the ENCODE tier (priority) is listed for each cell type. Tier 1 and Tier 2 represent categories with cell types designated for intensive study by the ENCODE investigators. After the January 2011 data freeze, an additional set of cell types were promoted from Tier 3 to Tier 2 to broaden the list of intensively studied cell types. These cell types are listed as Tier 2* in the subtrack list here (and are described as 'newly promoted to tier 2: not in 2011 analysis' on the ENCODE Common Cell Types page).

Download files for this track are in ENCODE NarrowPeak format.

Methods

All ChIP-seq experiments were performed at least in duplicate, and were scored against an appropriate control designated by the production groups (either input DNA or DNA obtained from a control immunoprecipitation).

Short Read Mapping

For each dataset, mapped reads in the form of BAM files were downloaded from the ENCODE UCSC DCC. These BAM files were generated by the ENCODE data production labs (using different mappers and mapping parameters), but all used a standardized version of the GRCh37 (hg19) reference human genome sequence with the following modifications:

Mitochondrial sequence was included.
Alternate sequences were excluded.
Random contigs were excluded.
The female version of the genome was represented by the autosomes and chrX, whereas the male genome was represented by the autosomes, chrX, and chrY with the PAR regions masked.

In order to standardize the mapping protocol, custom unique-mappability tracks were used to only retain unique mapping reads, i.e. reads that map to exactly one location in the genome. Positional and PCR duplicates were also filtered out.

Quality Control

A number of quality metrics for individual replicates listed on the ENCODE portal Quality Metrics page, including measures of library complexity and signal enrichment, were calculated and are available for review (Landt et al., 2012; Kundaje et al., 2013a). The Integrated Quality Flag from this quality assessment was used to assign the quality metadata term for each dataset (e.g., Good vs. Caution). Datasets that did not pass the minimum quality control thresholds are not included in this track.

Peak Calling

Since every ENCODE dataset is represented by at least two biological replicate experiments, a novel measure of consistency and reproducibility of peak calling results between replicates, known as the Irreproducible Discovery Rate (IDR), was used to determine an optimal number of reproducible peaks (Li et al., 2011; Kundaje et al., 2013b). Code and detailed step-by-step instructions to call peaks using the IDR method are available. In brief, the SPP peak caller (Kharchenko et al., 2008) was used with a relaxed peak calling threshold (FDR = 0.9) to obtain a large number of peaks (maximum of 300K) that span true signal as well as noise (false identifications). The IDR method analyzes a pair of replicates, and considers peaks that are present in both replicates to belong to one of two populations : a reproducible signal group or an irreproducible noise group. Peaks from the reproducible group are expected to show relatively higher ranks (ranked based on signal scores) and stronger rank-consistency across the replicates, relative to peaks in the irreproducible groups. Based on these assumptions, a two-component probabilistic copula-mixture model is used to fit the bivariate peak rank distributions from the pairs of replicates. The method adaptively learns the degree of peak-rank consistency in the signal component and the proportion of peaks belonging to each component. The model can then be used to infer an IDR score for every peak that is found in both replicates. The IDR score of a peak represents the expected probability that the peak belongs to the noise component, and is based on its ranks in the two replicates. Hence, low IDR scores represent high-confidence peaks. An IDR score threshold of 0.02 (2%) was used to obtain an optimal peak rank threshold on the replicate peak sets (cross-replicate threshold). If a dataset had more than two replicates, all pairs of replicates were analyzed using the IDR method. The maximum peak rank threshold across all pairwise analyses was used as the final cross-replicate peak rank threshold. Reads from replicate datasets were then pooled and SPP was once again used to call peaks on the pooled data with a relaxed FDR of 0.9. Pooled-data peaks were once again ranked by signal-score. The cross-replicate rank threshold learned from the replicates was used to threshold the ranked set of pooled-data peaks.

Any thresholds based on reproducibility of peak calling between biological replicates are bounded by the quality and enrichment of the worst replicate. Valuable signal is lost in cases for which a dataset has one replicate that is significantly worse in data quality than another replicate. A rescue pipeline was used for such cases in order to balance data quality between a set of replicates. Mapped reads were pooled across all replicates of a dataset, and then randomly sampled (without replacement) to generate two pseudo-replicates with equal numbers of reads. This sampling strategy tends to transfer signal from stronger replicates to the weaker replicates, thereby balancing cross-replicate data quality and sequencing depth. These pseudo-replicates were then processed using the IDR method in order to learn a rescue threshold. For datasets with comparable replicates (based on independent measures of data quality), the rescue threshold and cross-replicate thresholds were found to be very similar. However, for datasets with replicates of differing data quality, the rescue thresholds were often higher than the cross-replicate thresholds, and were able to capture true peaks that showed statistically significant and visually compelling ChIP-seq signal in one replicate but not in the other. Ultimately, for each dataset, the best of the cross-replicate and rescue thresholds were used to obtain a final consolidated optimal set of peaks.

All peak sets were then screened against a specially curated empirical blacklist of regions in the human genome (wgEncodeDacMapabilityConsensusExcludable.bed.gz) and peaks overlapping the blacklisted regions were discarded (Kundaje et al., 2013b). Briefly, these artifact regions typically show the following characteristics:

Unstructured and extreme artifactual high signal in sequenced input-DNA and control datasets, as well as open chromatin datasets irrespective of cell type identity.
An extreme ratio of multi-mapping to unique mapping reads from sequencing experiments.
Overlap with pathological repeat regions such as centromeric, telomeric and satellite repeats that often have few unique mappable locations interspersed in repeats.

Differences from the January 2011 freeze pipeline

The January 2011 uniform processing was performed as part of the ENCODE Integrative Analysis reported in coordinated publications in September 2012. The results from this effort are available from the ENCODE Analysis Hub at the EBI.

For the March 2012 freeze, only the SPP peak caller was used. SPP and PeakSeq were used for the January 2011 freeze.
For March 2012, In the read mapping phase, an extra step was performed to remove all positional duplicates. This was done to avoid low library complexity issues. In January 2011, remove positional duplicates were retained.
For March 2012, an IDR threshold of 2% was used for comparing and thresholding the true replicates and the pooled pseudo-replicates. In January 2011, the IDR threshold was set to 1% for the true replicates and 0.25% for the pooled pseudo-replicates. These thresholds were determined to be too stringent.

Credits

The processed data for this track were generated by Anshul Kundaje on behalf of the ENCODE Analysis Working Group. Credits for the primary data underlying this track are included in track description pages listed in the Description section above.

Contact: Anshul Kundaje

References

ENCODE Project Consortium. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 2011 Apr;9(4):e1001046. PMID: 21526222; PMCID: PMC3079585

ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74. PMID: 22955616; PMCID: PMC3439153

Kharchenko PV, Tolstorukov MY, Park PJ. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol. 2008 Dec;26(12):1351-9. PMID: 19029915; PMCID: PMC2597701

Kundaje A, Jung L, Kharchenko PV, Sidow A, Batzoglou S, Park PJ. Assessment of ChIP-seq data quality using strand cross-correlation analysis. (submitted), 2012a.

Kundaje A, Li Q, Brown JB, Rozowsky J, Harmanci A, Wilder SP, Batzoglou S, Dunham I, Gerstein M, Birney E, et al. Reproducibility measures for automatic threshold selection and quality control in ChIP-seq datasets. (submitted), 2012b.

Li QH, Brown JB, Huang HY, Bickel PJ. Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 2011; 5(3):1752-1779.

Data Release Policy

While primary ENCODE data is subject to a restriction period as described in the ENCODE data release policy, this restriction does not apply to the integrative analysis results. The data in this track are freely available.