Autodetection of various headers and folders
We are looking for an algorithm, which will allow us to automatically extract and organize documents such as blueprints and plans. These are our main goals
Header extraction & transfer into data structures
Portability of the algorithm onto unknown headers
The Deutsche Bahn has a multitude of plans and sketches depicting our infrastructure, i.e. all of our buildings and technical installations. The goal is to digitalize and automatize these plans in order to transfer them into our database.
Our collection of documents includes a variety of different document types. The Deutsche Bahn wants to catalogue all existing digital plans, sketches etc. to transfer them into a joint database. In order to reduce manual labor during this transfer we need a system that can automatically identify and extract the content of plans and headers. The collected data should then be prepared for import into the database.
The current challenge is developing an algorithm, which is capable of identifying the relevant data contained in the headers in order to import them into a predetermined structure. As it is unknown how many headers and text blocks exist, the developed approach must be able to handle unidentified documents. Therefore, the algorithm must be easily extendable. In addition, the algorithm should be able to learn how to make use of unknown documents in order to be used on previously unidentified headers.
- Automatized identification of different headers and the sorting of these into different stacks – known, unknown, indecipherable.
- Extraction of information from known headers and text blocks and subsequent transfer into an organized data structure.
- Portability of the algorithm onto unknown headers.
The final evaluation of the results will contain the identification of headers/ text blocks (20%), the extraction and structuring of the contained information (50%) as well as the portability onto other headers (30%). At the end of April an expert jury consisting of DB representatives and external experts will announce the winners and award the prizes.
We will award cash prizes totalling 14.000 €:
- The winner will be awarded 8.000 €
- 2nd place will be awarded 4.000 €
- 3rd place will be awarded 2.000 €.