Logo TU Ilmenau


Univ.-Prof. Dr.-Ing. Horst-Michael Groß

Head of department

Telefon +49 3677 692858

E-Mail senden


German Asphalt Pavement Distress Dataset - GAPs


The German Asphalt Pavement Distress (GAPs) dataset addresses the issue of comparability in the pavement distress domain by providing a standardized high-quality dataset of large size. This does not only enable researchers to compare their work with other approaches, but also allows to analyze  algorithms on real world road surface evaluation data.

GAPs v2

The extended version of the GAPs dataset includes a lot of improvements over the first version of the dataset:

  • More data: The GAPs v2 dataset comprises a total of 2 468 gray valued images (8 bit), partitioned into 1 417 training images, 51 validation images, 500 validation-test images, and 500 test images
  • Refined annotations: The images have been annotated manually by multiple trained operators at a high-resolution scale
  • More context: While GAPs v1 offered only patches of size 64×64 extracted within the annotated regions and the intact surface regions, GAPs v2 offers several patch sizes showing more context
  • 50k subset available: Since deep learning benefited most from small size real-world datasets, we also created a smaller subset for fast experiments

Download (GAPs v1/v2)

1) Install the Python download script using pip:

  pip install gaps-dataset

2) Send the completed form  by email to to acquire a login (for academic use only).

3) Download the dataset (v2) and use it in Python (Note: Examples for downloading patches and images of the different version of the dataset are available in the examples folder of the python package):

  >>> from gaps_dataset import gaps
                    output_dir='desired folder',

>>> x_train0, y_train0 = gaps.load_chunk(chunk_id=0,
                                           datadir='desired folder (same folder used in download function)')
#load the first chunk of the dataset

Caution: The download may take a while and by default the progress is not displayed. If you would like to be informed about progress, please use the option debug_outputs=True:

  >>>, debug_outputs=True)

The dataset is parted into chunks of size 500MB. The size of the different datasets depends on the version and the chunk size and ranges from 1 Gb to 91 Gb.


GAPs v1

Eisenbach, M., Stricker, R., Seichter, D., Amende, K., Debes, K., Sesselmann, M., Ebersbach, D., Stöckert, U.,  Gross, H.-M.
How to Get Pavement Distress Detection Ready for Deep Learning? A Systematic Approach.
in:  Int. Joint Conf. on Neural Networks (IJCNN), Anchorage, USA, pp. 2039-2047, IEEE 2017

  title={How to Get Pavement Distress Detection Ready for Deep Learning? A Systematic Approach.},
  author={Eisenbach, Markus and Stricker, Ronny and Seichter, Daniel and Amende, Karl and Debes, Klaus

     and Sesselmann, Maximilian and Ebersbach, Dirk and Stoeckert, Ulrike

Gross, Horst-Michael},
  booktitle={International Joint Conference on Neural Networks (IJCNN)},

GAPs v2

Stricker, R., Eisenbach, M., Sesselmann, M., Debes, K., Gross, H.-M.
Improving Visual Road Condition Assessment by Extensive Experiments on the Extended GAPs Dataset.
in:  Int. Joint Conf. on Neural Networks (IJCNN), Budapest, Hungary, pp. 1-8, IEEE 2019

  title={Improving Visual Road Condition Assessment by Extensive Experiments on the Extended GAPs Dataset.},
  author={Stricker, Ronny and Eisenbach, Markus and Sesselmann, Maximilian and Debes, Klaus and Gross, Horst-Michael
  booktitle={International Joint Conference on Neural Networks (IJCNN)},

Acquisition Process

Accurate measurement data about the road's current condition are crucial for planning maintenance or expansion projects and reliable cost estimation. Thus, the German Road and Transportation Research Association (FGSV) developed a specific approach for collecting data of road condition - the so-called Road Monitoring and Assessment (RMA) [1].
The RMA process standardizes data acquisition on a systematic basis and provides nationwide uniform parameters to ensure objective analyses of surface conditions as well as a high degree of quality. The key aspects are longitudinal and transversal evenness, skid resistance and surface distresses. Mobile mapping systems, equipped with high-resolution cameras and laser-based sensors, are the state of the art in the RMA context.

Certification and quality standards

The German Federal Highway Research Institute (BASt) does not only participate in developing and optimizing such mobile mapping systems, but also acts as an approving authority for measurement systems and analysis processes that are deployed in the field of RMA. In Germany, providers of RMA services require an annual BASt certification to run RMA campaigns. This certification process includes static tests, like general technical checks of the measurement platform and its components, tests of the camera system using special test images, and tests of the laser sensors using a test specimen of granite. In addition, there are dynamic tests that include comparative measurements with the BASt reference measurement vehicles on a special proving ground (test against "golden device"). Apart from the data acquisition, the BASt certification process also includes strict reviews of the data analyses procedures.

Measurement vehicle


       Mobile mapping system S.T.I.E.R of LEHMANN + PARTNER GmbH

The data have been captured by the mobile mapping system S.T.I.E.R. This measuring vehicle is manufactured and operated by the German engineering company LEHMANN + PARTNER GmbH.
S.T.I.E.R has been designed for large-scale pavement condition surveys and is certified annually by the BASt since 2012. Therefore, it complies with the high German quality standards in the field of RMA. The main components of S.T.I.E.R are an inertial navigation system, laser sensors for evenness and texture measurements, a 2D laser range finder and different camera systems for capturing both the vehicles environment and the pavement's surface. The relevant data source for this paper is the surface camera system. It consists of two photogrammetrically calibrated JAI Pulnix TM2030 monochrome cameras. Each one features the Kodak KAI-2093 1" progressive scan CCD imager with 7.4µm square pixels, a frame rate of 32 fps and a resolution of 1920x1080 pixels. The surface camera system is synchronized with a high-performance lighting unit. This allows continuous capturing of road surface images even at high velocities (ca 80 km/h) and independent of the natural lighting situation. The cameras are mounted left and right at the rear of S.T.I.E.R's roof rack pointing at a right angle towards the road.
As each camera image covers a pavement patch of 2.84m x 1.0m, both images combined describe the entire driven lane.

RMA-specified labeling

        Left: Labeling as expected by German FGSV-regulation.
        Right: fine labeling of different distress types using bounding boxes.

Within the scope of the conventional RMA workflow, a sequence of left and right surface camera images is stitched together in driving direction. The result is a continuous sequence of surface images that represent 10 meters of the entire driven traffic lane. According to the FGSV-regulation, the surface damage detection and analysis process is based on these images.

For this, an inspection grid is applied to each 10-meter-image (see image above). A single grid cell has a longitudinal length of 1m and a transversal length of 1/3 of the lane width. If a grid cell contains a relevant surface damage, the whole cell is assigned to this damage type. Once the damage detection and classification is done, the measured raw-data is used to calculate condition variables and finally condition grades ranging from "very good" to "very poor" using a weighting scheme defined by the FGSV. The presented conventional labeling approach is sufficient for indicating the level of safety and comfort for road users, but due to the lack of the precise damage location labels in terms of pixel coordinates, this labeling is not appropriate to train a classifier. Also stitched images are problematic, since artificial edges at stitched image borders may complicate the learning process.

Dataset for neural network training

To provide a high-quality training dataset, we use the HD-images from the left and right surface camera instead of stitched images. The GAPs dataset includes a total of 1,969 gray valued images (8bit), partitioned into 1,418 training images, 51 validation images, and 500 test images. The image resolution is 1920x1080 pixels with a per pixel resolution of 1.2mm x 1.2mm. The pictured surface material contains pavement of three different German federal roads. Images of two German federal roads are used for training. Another section of one of these roads is used for validation. The two roads can be characterized by relatively poor pavement condition. The third German federal road is uniquely used for testing. Its condition is better.
Thus the ratio of intact to defect road surface differs significantly from the other two roads.
The data acquisition took place in summer 2015, so the measuring condition were dry and warm.

The images have been annotated manually by trained operators at a high-resolution scale such that an actual damage is enclosed by a bounding box and the non-damage space within a bounding box has a size of lower than 64x64 pixels. The relevant damage classes are cracks, potholes, inlaid patches, applied patches, open joints and bleedings (see Figure below). Cracks are the dominant damage class. This class comprises all sorts of cracks like single/multiple cracking, longitudinal/transversal cracking, alligator cracking, and sealed/filled cracks.

       Surface defect classes as defined by FGSV-regulation from left to right:
       Crack, Pothole, Inlaid patch, Applied patch, Open joint, Bleeding (not present in acquired images of GAPs dataset)

[1] Forschungsgesellschaft für Straßen- und Verkehrswesen, ZTV ZEB-StB - Zusätzliche Technische Vertragsbedingungen und Richtlinien zur Zustandserfassung und -bewertung von Straßen [FGSV-Nr. 489]. FGSV Verlag, 2006


Questions regarding the GAPs dataset:

Q1: It looks like the arrays I get for training, validation, and testing are all of size (32,000x1x64x64). Is this the actual size of the data that you are making available? The paper has 4.9M for training.

A1: The training, validation, and test sets come in several chunks of 32,000 samples (500 MB). You can select the chunk by the first parameter of gaps.load_chunk(...).

Q2: What is the difference between x_train0 and x_train1?

A2: You need all 154 chunks for training; x_train0 and x_train1 are only the first two of 154 chunks.

Q3: The labels in y_train, y_test, and y_valid seem to be 0 and 1. Does this mean the 6 detailed labels (CRACK, POTHO, etc.) are not available?

A3: In this first version of the GAPs dataset, there are only two labels: intact road (label 0) and distress (label 1). Distress contains all surface defect classes like CRACK, POTHO, ... (see the paper for details).

There will be an update of the GAPs dataset soon. Version 2 will contain detailed labels that distiguish all these defect classes. There will also be other features, like different patch sizes, available.

If you need these detailed labels right now, you might have a look at an example script in the python package ( To find out, where pip installed the python package, please type "pip show gaps-dataset" in your command prompt.

Q4: It seems like you only provide the 64x64 patches. How can I download the full HD images?

A4: Please have a look at the example script in the python package.

Q5: I use the HD image dataset which is around 2000 images. But how can I use the big 90 GB dataset? What is the relation with the HD images?

A5: In the gaps-dataset package there is a file that explains how the millions of 64x64 patches are extracted from the approx. 2000 HD images. Overall these patches have a size of 90 GB when stored as chunks in numpy.

Q6: Is there a possibility to get larger patch sizes?

A6: The current version of the GAPs dataset does only provide patches of size 64x64. There will be an update of the GAPs dataset soon. Version 2 will make different patch sizes available.

If you need larger patch sizes right now, you might have a look at an example script in the python package ( The HD images and the patch positions are available. Using this information, you can extract any patch size. In this case you should ensure that the 64x64 patch is placed in the center of the larger patch. To find out, where pip installed the python package, please type "pip show gaps-dataset" in your command prompt.

Q7: I cannot download the data when using python 3.

A7: This problem is fixed. The latest version of the gaps-dataset package supports python 2 and python 3.

Q8: I cannot download the data when using Windows.

A8: Currently we only support Linux. In Windows the download fails due to erroneously calculated checksums.

Q9: We need labels for image segmentation. Where can I get these kind of annotations?

A9: F. Yang et al. re-annotated some of the images of the GAPs dataset in their paper
Yang, F., Zhang, L., Yu, S., Prokhorov, D., Mei, X., & Ling, H. (2019). Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection. arXiv preprint arXiv:1901.06340.

The paper can be found here:

The annotations can be found here:

Q10: Who uses this dataset?

A10: Currenly 125 research teams are using this dataset. Please have a look at this map:

Questions regarding the implemented Convolutional Neural Network:

Q10: What loss functions did you use?

A10: We used cross entropy loss as usual for the softmax layer.

Q11: How many epochs have you trained this model?

A11: The peak performance (reported in the paper) on the validation set was reached after 82 epochs. Training was stopped at ca 200 epochs when the performance did not improve for a long time.

Q12: What is the position of the dropout layers in the network?

A12: Dropout is applied directly before each Conv2d and Dense/Linear layer respectively.

Q13: What training algorithim did you use? Which parameters did you use?

A13: We used standard SGD with batch size = 256, learning rate = 0.01, and momentum = 0.7. A momentum of 0.9 gives similar results. Other learning rates and batch sizes decrease the performance.