Here we outline the launch of a new crowdsourcing Challenge which seeks to obtain a tool to aid in data extraction from textual material, with a focus on Named Entity Recognition or similar approaches, within the context of literature relating to plant health and animal health
About the project
EFSA is exploring innovative ways of increasing risk assessment capacity without compromising quality, such as via crowdsourcing and cognitive computing (i.e. self-learning systems that use data mining, pattern recognition and natural language processing to mimic the way the human brain works).
EFSA commissioned ADAS on a project in 2016 titled “OC/EFSA/AMU/2015/03: Crowdsourcing: engaging communities effectively in food and feed risk assessment”. ADAS is exploring the risks and opportunities of using crowdsourcing within EFSA to assess how the methodology could add value to the organisation going forward. You can find out more about the project in a previous article here.
As part of the ongoing project, three Challenges have been launched on the InnoCentive crowdsourcing platform to assess the feasibility and effectiveness of running competitive Challenges to find solutions for real-life problems in science.
- The first Challenge launched in 2017 and sought ideas about how to visualise scientific uncertainty and the limitations in available knowledge. The Ideation Challenge attracted many proposed idea for possible solutions, with three participants (based in Australia and the US) sharing the prize fund of US $5,000.
- The second Challenge was launched in 2018, seeking a general algorithm for automated data extraction from text, graphs, and images in electronic documents. Despite relatively good engagement from Solvers, no solutions were submitted in response to the Challenge request. The primary outcome of that Challenge to EFSA was the realization that general automated data extraction was an extremely difficult task and far beyond the scope of a single Challenge.
- The third Challenge was launched on 4 May 2019, which looks to obtain a tool/algorithm to aid in data extraction from textual material, within the context of EFSA risk assessment activities (specifically animal health and plant health). A prize fund of US $60,000 is being made available to the winning Solver(s).
EFSA Challenge: Automated Named Entity Recognition
Data extraction from text is a fundamental human action and is key to gathering information in any field. With the exponential increase in material being published on any given subject it is becoming increasingly hard for humans to read and extract data from every relevant source of information. Automating data extraction would save a tremendous amount of human resources and possibly result in more accurate and more extensive extraction.
Data extraction relies on a proper identification of objects and concepts in the documents, therefore a very important sub-task of data extraction is to find and classify names in text, also known as Named Entity Recognition (NER). A number of tools to perform NER already exist (e.g. Neji, SIA, BeCAS and TaggerOne), however most of the tools rely on supervised machine learning, and thus, system performance is limited by the training set that is usually built for the specific domain and these domains generally do no overlap with EFSA’s areas of interest or responsibility.
EFSA provides scientific advice in several areas of responsibility, including food and feed safety, nutrition, animal health and welfare, and plant protection and health. Despite the existence of several tools in the biomedical domain, none are able to identify and classify specific entities of interest for EFSA, like plant pest, plant species, pathogenic organisms and their target animal species.
This Challenge looks to identify a solution to this problem through the design of a tool to perform NER for a specific predefined set of bioconcepts related to animal health and plant health.
Interested in finding out more or submitting a solution?
This current Challenge being run through the InnoCentive platform is being conducted via ADAS on behalf of EFSA as part of this project. The deadline for all submissions is 2nd September 2019 at 23:59 EDT.
For more information on any of the above, or to understand more about the risks and opportunities of crowdsourcing for your business, please contact email@example.com.