The emergence of Ai technologies for analysing and generating texts has staff wondering what effect it will have on their jobs and how one might assess this technology as to its usefulness, accuracy and dangers.
The manner in which Ai-NLP has to be assessed has not emerged as a conversation between the various providers to their clients and a set of guidelines might be useful to imitate those conversations.
The essence of Ai-NLP is to identify semantic entities of interest in a target report. The technology used for this task is more broadly known as “machine learning” (ML). The ML process requires selecting an ML algorithm suited to the type of data to be analysed, in our case pathology reports, and the relevant values for each report that need to be identified for a given task.
The algorithm is trained with the set of reports (a corpus) and their respective values – this is the training corpus/set and it produces a language model, the is a model of the language used in pathology reports. The trained algorithm performs classification by being fed an unclassified report and it finding a report that most closely matches it and then adopts the values for the unclassified report of that matching report. As simple as the process sounds there are many issues that effect the quality of the results and therefore the acceptability of a particular ML implementation.
The major issues to consider are:
- The characterisation of the modelling task.
- The pre-processing algorithms applied to the training reports for ingestion into the algorithm.
- The variables investigated in selecting the training and test corpora.
- The source of the training corpus used to create the model.
- The variables selected to assess the accuracy of the model.
- The test corpora selected to represent the variables.
- The characteristics of the variables used in the assessment of accuracy.
- The methods for improving the model for particular clients.
- The methods available for updating the model for changes in standards.
These issues might be resolved in different ways by different registries, and it is an open question as to how well a model trained for one jurisdiction might or might not be applicable in another jurisdiction. If we think disease epidemiology might vary across jurisdictions then the different models might well be more important than currently considered. Nevertheless the resolution of these issues have a material effect of the scope, accuracy and relevance of a particular classifier for a given task.
This approach was used for building a case identification classifier for the California Cancer Registry and is applicable to all Ai-NLP developments.