Our goals

 

We are looking for an algorithm, which will allow us to automatically extract and organize documents such as blueprints and plans. These are our main goals

 

 

Autodetection of various headers and folders
Header extraction & transfer into data structures
Portability of the algorithm onto unknown headers
toggle all
Facts

The Deutsche Bahn has a multitude of plans and sketches depicting our infrastructure, i.e. all of our buildings and technical installations. The goal is to digitalize and automatize these plans in order to transfer them into our database.

Requirements

Our collection of documents includes a variety of different document types. The Deutsche Bahn wants to catalogue all existing digital plans, sketches etc. to transfer them into a joint database. In order to reduce manual labor during this transfer we need a system that can automatically identify and extract the content of plans and headers. The collected data should then be prepared for import into the database.

The current challenge is developing an algorithm, which is capable of identifying the relevant data contained in the headers in order to import them into a predetermined structure. As it is unknown how many headers and text blocks exist, the developed approach must be able to handle unidentified documents. Therefore, the algorithm must be easily extendable. In addition, the algorithm should be able to learn how to make use of unknown documents in order to be used on previously unidentified headers.

Tasks
  1. Automatized identification of different headers and the sorting of these into different stacks – known, unknown, indecipherable.
  2. Extraction of information from known headers and text blocks and subsequent transfer into an organized data structure.
  3. Portability of the algorithm onto unknown headers.

The final evaluation of the results will contain the identification of headers/ text blocks (20%), the extraction and structuring of the contained information (50%) as well as the portability onto other headers (30%). At the end of April an expert jury consisting of DB representatives and external experts will announce the winners and award the prizes.

Prizes

We will award cash prizes totalling 14.000 €:

  • The winner will be awarded 8.000 €
  • 2nd place will be awarded 4.000 €
  • 3rd place will be awarded 2.000 €.

Prizes

We give out all together 14,000 € to the winning teams

The winning team gets 8,000 €

The second winner team gets 4,000 €

The third winner team gets 2,000 €

Timeline

Application start
27.02.2016
Provision of training data
13.03.2016
Presentation
17.03.2016
End of training phase
24.04.2016
Selection of winners and award ceremony
27.04.2016

Newsletter