OCELOT: Overlapped Cell on Tissue Dataset for Histopathology



The dataset is now available on Zenodo! Before downloading the dataset, please make sure to carefully read and agree to the Terms and Conditions.


OCELOT2023 challenge has been accepted to MICCAI 2023 Challenge! See details here.


OCELOT has been accepted to CVPR 2023!


Cell detection is a fundamental task in computational pathology that can be used for extracting high-level medical information from whole-slide images. For accurate cell detection, pathologists often zoom out to understand the tissue-level structures and zoom in to classify cells based on their morphology and the surrounding context. However, there is a lack of efforts to reflect such behaviors by pathologists in the cell detection models, mainly due to the limited amount of datasets containing both cell and tissue annotations with overlapping regions. To overcome this limitation, we propose and publicly release OCELOT, a dataset purposely dedicated to the study of cell-tissue relationships for cell detection in histopathology. OCELOT provides overlapping cell and tissue annotations on images acquired from multiple organs. Within this setting, we also propose multi-task learning approaches that benefit from learning both cell and tissue tasks simultaneously. When compared against a model trained only for the cell detection task, our proposed approaches improve cell detection performance on 3 datasets: proposed OCELOT, public TIGER, and internal CARP datasets. On the OCELOT test set in particular, we show up to 6.79 improvement in F1-score. We believe the contributions of this paper, including the release of the OCELOT dataset, are a crucial starting point toward the important research direction of incorporating cell-tissue relationships in computation pathology.


Each sample of the OCELOT dataset is composed of six components, $$\mathcal{D} = \{\left(x_{s}, y_s^{c}, x_l, y_l^{t}, c_x, c_y\right)_{i}\}_{i=1}^{N}$$ where $x_s, x_l$ are the small and large FoV patches extracted from the WSI, $y_s^{c}, y_l^{t}$ refer to the corresponding cell and tissue annotations, respectively, and $c_x, c_y$ are the relative coordinates of the center of $x_s$ within $x_l$. The below figure shows the visualization of a sample.

A sample from the OCELOT dataset. Each sample of the dataset consists of two input patches and the corresponding annotations. Left shows the large FoV patch $x_{l}$ with tissue segmentation annotation $y_{l}^{t}$, where green denotes the cancer area. Right shows the small FoV patch $x_{s}$ with cell point annotation $y_{s}^{c}$, where blue and yellow dots denote tumor and background cells, respectively. The red box indicates the size and location of the $x_{s}$ with respect to the $x_{l}$.

For more details, please check this page.

Cell Detection Results

In addition to releasing the dataset, we have introduced simple yet effective approaches to leverage tissue annotation for improving cell detection. As a result, we present the results of two models: a cell-only model and a cell model enhanced with tissue annotation, denoted as “Cell w/ Tissue”.

blue: tumor cells, yellow: background cells, green: cancer area

In general, when compared to other background cells, tumor cells have the following characteristics: large size and irregular shape. However, cancer is heterogeneous and this is not always the case. Indeed, in the above figure, most of the cells are small and have a regular round shape. Based on these appearances, and without a larger context, those cells can be easily misclassified as background cells, which is the case of the Cell-only model. On the other hand, Cell w/ Tissue model shows a more accurate prediction by correctly understanding the cancer area in large FoV regions. This implies that Cell w/ Tissue model indeed considers both the morphology of cells and the tissue context, while Cell-only relies on the cells morphology alone. We present a few more examples.

blue: tumor cells, yellow: background cells, green: cancer area
blue: tumor cells, yellow: background cells, green: cancer area
blue: tumor cells, yellow: background cells, green: cancer area


  author={Jeongun Ryu and Aaron Valero Puche and JaeWoong Shin and Seonwook Park and Biagio Brattoli and Jinhee Lee and Wonkyung Jung and Soo Ick Cho and Kyunghyun Paeng and Chan-Young Ock and Donggeun Yoo and Sérgio Pereira},
  title={OCELOT: Overlapped Cell on Tissue Dataset for Histopathology},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},


This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). You are permitted to utilize any of the material in your own research, provided that you acknowledge the source by citing the title and authors of our paper.

This license is intended for non-commercial research purposes only. Any use of the material for commercial purposes is strictly prohibited.