It is obvious that the safety event databases do contain interesting data, and to make these data truly more valuable, SNCF’s has decided to combine semantical analysis [i.e., determining the exact meaning of the provided set of words, in order to be able with it afterwards to find matches with the same or approaching meaning of the words which are found, which can be different ones], with Natural Language Processing (NLP) [using computing techniques to analyse efficiently text and determine semantical similarities during an automatic comprehensive parsing].
The resulting ‘Machine learning software’ can easily be linked to any data storage; once key words provided, the search engine parses all information and acts as if information quality was improved, without truly modifying it: unstructured data becomes hence as useful as if it was structured. Safety reports could also be generated in semi-automatic mode.
It has been initially designed for railway safety purposes, but it could be used for any railway domain other than safety, and could even be used also by other sectors that railway.