Natural Language Processing

Natural Language Processing (NLP) can be broadly characterised as intelligent computer-based processing of human language, for instance to find documents about a certain topic or to extract predetermined types of information from textual data.

Recent developments in intelligent computer processing of human language have been used to enhance the availability, interpretation and utility of free text. By harnessing these developments, the NLP workstream is developing methods for identifying relevant data embedded in the clinical records, and exploring their impact on estimates of disease prevalence and time of first presentation.

We are exploring how we can use NLP to enrich parts of the data in records so that it is more usable for secondary researchers, and developing algorithms to search through archived records for recognisable information currently not easily available to secondary researchers due to its uncoded format. Our tools transform those parts of the information and so enhance the record for researchers to use with additional, data derived pseudo-codes.

The findings of this work are being integrated into the statistical comparisons workstream investigation, contributing to automated methods for extracting and enhancing the records.

For further information please click for the NLP publications

Workstream Participants:
John Carroll (workstream leader); Rob Koeling; Donia Scott; Martin Gleize; Shivani Padmanabhan

Relevant Links:
Cognitive and Language Processing Systems
Department of Informatics, University of Sussex