Artificial intelligence is set to revolutionise many aspects of cancer diagnosis and treatment – but an AI is only as good as the data used to train it. Here we find out how the OPTIMAM project’s mammogram collection is heralding a new dawn of machine learning based cancer detection
Artificial intelligence (AI) is on the brink of profoundly changing cancer diagnostics. Algorithms now exist that can detect abnormalities in X-rays, PET scans and histological samples as well as an expert human can. Such software holds the potential to speed processing times, cut medical costs and reduce human errors.
And if algorithms emerge that can detect cancers earlier than these malignancies are perceptible to human observers, they could make screening efforts more effective.
“Whilst the computer science behind these innovations is impressive when it comes to medical AI, getting hold of the data needed to develop such algorithms can often be a major challenge.”
Whilst the computer science behind these innovations is impressive when it comes to medical AI, getting hold of the data needed to develop such algorithms can often be a major challenge.
In 2008, Cancer Research UK funded a project – OPTIMAM – aiming to improve the accuracy and efficiency of the UK’s already very successful NHS Breast Screening Programme (NHSBSP).
Over the initial five-year grant period, the OPTIMAM team, led by Professor Kenneth C. Young, collated a library of X-ray images with which to run studies that would establish the best protocols for acquiring mammograms and to identify some of the pitfalls in their interpretation. All of this to help the performance of human radiologists.
“AI wasn’t on the cards at all,” says Professor Mark Halling-Brown, Head of Scientific Computing at the Royal Surrey NHS Foundation Trust, who now jointly manages the OPTIMAM database with Young.
As this work proceeded, however, it became increasingly clear that the future of breast cancer screening – and of diagnostics more broadly – would be transformed by AI technologies. Hence, Halling-Brown explains, when the OPTIMAM team applied for further support from CRUK in 2013, their application centred on creating a large database of mammogram images with which they could facilitate the development of such technologies.
Fast forward eight years and the OPTIMAM database is used by over 40 academic and commercial collaborators, including many of the world’s leading medical AI companies.
“What makes OPTIMAM great is the team behind it,” says Sarah Kerruish, Chief Strategy Officer of London-based Kheiron Medical Technologies. “It’s a very strong team of people who are really familiar with the nuances of what it takes to develop AI.” The OPTIMAM team has also greatly benefited from the participation of leading UK breast radiologists who ensure the accuracy of the clinical data and its interpretation.
In fact, the database has proved so successful, OPTIMAM has inspired and guided the creation of a national database of chest X-rays relating to COVID-19 and will likely shape the development of further nationwide databases.
All of which highlights that researchers and clinicians should appreciate the value of the data they generate – for while the science of AI is astounding, it works best when it’s built from well-designed data sets. Joe Day, business development executive for big data and AI at CRUK is keen that the research community not only recognise that, but begin to think about how they might act on it too. “Many cancer researchers – and indeed those in the wider life science fields – will be collecting vast amounts of data,” he says. “But it’s often the case that teams generating the data won’t realise just how useful it could be to other research efforts, both academic and commercial. The OPTIMAM project is a perfect example of how such data can be utilised by others, and I’d urge any researchers who think their data could be useful to talk to CRUK’s Commercial Partnerships team about how we can help maximise the impact of their research data.”
Why breast screening?
Early detection of breast cancers by screening is proven to reduce overall breast cancer mortality. For example, the NHSBSP – which invites all UK-based women aged 50-70 to attend for screening every three years – is estimated to save one woman’s life for every 400 screened, cutting breast cancer deaths by 25%.
Such screening programmes represent an obvious platform for AI applications. Mammograms – ie the algorithm’s input data – have a fairly standardised format. And the initial decision a radiologist makes – ie the output an algorithm must replicate, and potentially improve upon – is a relatively straightforward binary call: does this person need to undergo further assessment?
“Breast cancer detection is one of the hardest detection tasks, because breast tissue has many features that are confounding to the human eye, in terms of what maybe looks like a tumour but actually isn’t.”
Moreover, programmes such as the NHSBSP generate huge volumes of data – currently, mammograms from over 2 million women, attending roughly 100 local screening sites, annually. And large volumes of data are what researchers need to train, then test, machine learning systems.
Finally, to execute a large-scale screen as effectively as the NHS does demands significant time, money and resources. “Breast cancer detection is one of the hardest detection tasks,” says Kerruish, “because breast tissue has many features that are confounding to the human eye, in terms of what maybe looks like a tumour but actually isn’t. Only when you have a lot of experience as a radiologist or radiographer, would you be able to interpret that image.”
In the UK – to achieve maximal accuracy – every mammogram is assessed by two expert radiologists or specially trained clinical staff. Only, if they agree, is their decision actioned; if they don’t, the mammogram is sent to two further radiologists for arbitration. There is clear potential then for automated systems to cut costs and processing times by standing in for human radiologists.
Such technology is exactly what Kheiron is developing. In the UK, Kerruish says, “it could automate up to 45% of that workflow. It’s a combination of human plus machine intelligence that we think is going to deliver the best solution for patients. It’s never about replacing radiologists; it’s about giving them better tools.”
In addition, new computational tools may aid screening programmes in other ways. For instance, New Zealand-based Volpara Health has developed software that processes mammograms to compute a volumetric breast density score. With breast density being a major risk factor for breast cancer, further Volpara algorithms will incorporate this score with other types of patient information – such as age and family history – in models to estimate an individual’s breast cancer risk.
Volpara’s breast density scoring software was pivotal to the DENSE trial – a ten-year randomised control study done by researchers in the Netherlands – which showed that if woman with high density breast tissue were referred for further MRI scans, breast cancer detection rates increased significantly.
“As we’ve gone along,” says CEO and founder Ralph Highnam, “we’ve realised the impact of mammography quality. If you can put good images in, you get good density scores out.”
His team are now working to develop automated methods for ensuring high quality mammogram acquisition processes and real-time methods for assessing if individual mammograms are of sufficiently high quality or whether they should be re-taken. They’re also seeking to develop better systems for tracking women’s breasts over time, in order to determine if this allows better prediction of cancer risk. And for both projects they collaborate with OPTIMAM.
From project to database
All of these types of software were merely distant pipe dreams when the OPTIMAM project was instigated in 2008 by Professor Kenneth Young, a medical physicist and Head of Research at the National Coordinating Centre for the Physics of Mammography, based at The Royal Surrey. “It was more about the fundamental physics of the image formation process,” says Halling-Brown about the original plan. “Things like image processing, image display, lighting conditions, reading conditions, different manufacturers, different detectors, how X ray dose affects the image formation and how these factors make different types of cancers appear differently – and whether these affect long term outcomes for women.”
Doing this was necessary because despite the NHSBSP being a national programme, each screening centre has certain freedoms to decide exactly how they operated. Getting to grips with how these factors affected human radiologists’ ability to detect cancers helped refine screening methods – but it also critically informed the process of pivoting OPTIMAM from being a research project to being a database that could underpin the development of AI image analysis technologies.
To create an algorithm that can detect cancerous growths on X-rays of breast tissue, developers need mammograms for two purposes. First, their machine learning algorithms must be exposed to many, many examples of the data they are to analyse – here, a mixture of mammograms with and without cancers. Second, once developers have honed their software, it needs to be tested on a set of mammograms it’s never seen before. Its success rate – in terms of both false-positives and false-negatives – can then be compared to radiologists’ performances.
The database Young, Halling-Brown and team envisaged was a high-quality repository of images from multiple screening sites, with each mammogram annotated as fully as possible with clinical and technical details. The person from whom each mammogram came would be completely de-identified at the site of acquisition. And it would be a live database – new images being constantly added, including repeat mammograms for the same individual, whether from the next routine screening visit or owing to that individual developing breast cancer in the interim.
Another key feature was that OPTIMAM would make its data available to anyone who had a sound reason for using them. “We have a ‘mini-grant application’ data access process,” says Halling-Brown, with interested parties outlining their credentials and what they will do with the data. Academic collaborators pay a small administrative fee, while commercial parties pay what’s deemed a fair price – a process that’s overseen by CRUK’s legal and commercialisation team. A proportion of this income is shared with the centres which provide data.
The way OPTIMAM designed their database made them an attractive proposition to work with. The way they shared it gave OPTIMAM a unique – and important – position in the Wild West of medical data sourcing by AI companies.
An enviable dataset
At first, OPTIMAM gathered data from three centres, meaning its always incorporated data acquired according to the nuances of those three sites. But it has been expanding and diversifying across the UK since 2019. It now collects data from five sites, with four new sites currently joining and four more invited to do so. At the last publicly disclosed count, the OPTIMAM data set contained over 3.3 million images taken from nearly 170,000 clients. Over 150,000 of those clients were cancer free, roughly 8,500 had malignant tumours, nearly 5,000 had benign growths and about 2,000 developed cancers between screens.
“The OPTIMAM database is used by over 40 academic and commercial collaborators, including many of the world’s leading medical AI companies.”
But it’s not just the size that makes the database so attractive. “If you look at the other data sets people are publishing on in mammography,” says Lester Litchfield, a Data and Science Manager at Volpara, “they’re often: one, not as large, and, two, quite old and quite limited in their availability of annotation.”
“The OPTIMAM data set is nicely curated and controlled, and they have a lot of the clinical data recorded along with the images” adds Volpara’s Highnam. He also praises the fact that OPTIMAM collects the raw imaging data and not just processed images, which can vary significantly according to choices made by device manufacturers and radiologists.
For Volpara’s goal of developing technologies that track breast health over time, Litchfield stresses the value of OPTIMAM’s continuous collection of images permitting them to gather more and more evolving cases – including ones where women develop cancer during the time she’s been screened. “Most academic data sets are frozen snapshots in time,” he says.
Kheiron’s Kerruish also applauds the structure of the OPTIMAM dataset but emphasises too its diversity.
Diversity is essential for the ultimate effectiveness of the end product – small, homogeneous data sets can present serious problems because they can result in algorithms that won’t function across the general population.
Automated readers, Halling-Brown says, are considerably more prone to being tripped up by variations between sets of data than human readers are. “And an algorithm doesn’t necessarily know when it doesn’t know,” he says.
Consequently, he explains: “If you rely on AI vendors going out and trying to find their own data from individual hospitals, all they will do is produce an algorithm which is really good at detecting cancers in that hospital’s population.”
The challenge of generalisability affects all of AI research but in medical diagnostics it will risk lives. Systematic differences likely to bias a screening algorithm in fundamental ways can result from all those technical factors that the initial phase of the OPTIMAM project investigated – X-ray machine manufacturer, radiation dose, image processing, etc. But they can also arise from different hospitals serving different demographics.
It’s now well-recognised that ethnic groups vary in terms of breast size, density and other subtle features which influence algorithm performance. Consequently, an algorithm trained mainly on one group may perform poorly in another.
This is why Kerruish is happy that OPTIMAM includes data from different types of hardware and from the different patient populations screened by its various contributors. “Those are the characteristics we look for in any data set for machine learning,” she says
This generalisation challenge isn’t, however, the only issue that arises from AI vendors seeking their own data.
Levelling the playing field
When AI companies acquire medical data by forging commercial relationships with individual hospitals, they, typically, favour deals that give them exclusive access to patient data, which can be problematic on two fronts – both of which OPTIMAM has been mindful of.
First, the public are often justifiably uncomfortable withompanies gaining direct access to patient data – especially if there is a risk of that data not being securely anonymised. OPTIMAM serves as a go between, meaning companies do not directly interact with hospitals. And vitally, OPTIMAM has worked extensively to ensure rigorous anonymisation procedures, so that mammograms are de-identified before they even leave the screening centre.
The second issue that can result from companiesurcing data concerns transparency. “In machine learning, we have a fairly severe reproducibility issue,” says Litchfield. Often, when companies or academic groups publish new methods based on proprietary data sets, he explains, it prevents others in the field from attempting to replicate their work.
“It’s now well-recognised that ethnic groups vary in terms of breast size, density and other subtle features which influence algorithm performance. Consequently, an algorithm trained mainly on one group may perform poorly in another.”
“It’s a real problem for people who want to enter the market with new products,” Lichtfield says, “because there’s no way to validate that your performance is better than somebody else and to show that it’s verifiably good.”
OPTIMAM’s data set being available to anyone who wants to put it to good use immediately by-passes some of these issues.
Yet the OPTIMAM team also goes one step further, often acting as a semi-independent quality controller. If companies come to them looking for training data, OPTIMAM will supply them with a subset of its database, holding back a fraction of the data that they can then supply for algorithm testing. By keeping a tight rein on the data, it ensures that end users are held to the highest community standards.
In fact, Halling-Brown believes there are genuine problems with the present level of validation required for AI systems to get marketing approval. Currently, a vendor can market products that have only been shown to work on a small, company-selected data set. “And that’s just 100% not enough,” he says. “You can buy AI tools for breast screening that in independent validations are not doing very well because of these massive generalisable problems.”
Young, Halling-Brown and team are working with health authorities to explore how to create standardised validation for any product seeking to contribute to national screening programmes in the UK.
On the horizon
CRUK now solely supports the upkeep of the mature database, rather than funding its further development. The arrangement works well, says Halling-Brown with CRUK’s legal and administrative expertise smoothly arranging new partnerships and collaborations that not only allow the data to be used for new purposes, but also help grow and diversify the dataset. This is a mindset which builds on an initial collaboration with Google Health and Imperial College London in 2017 which enabled the OPTIMAM group to collect many additional cancer-free mammogram images to help make the dataset more representative of the general population.
As OPTIMAM expands by welcoming new sites, Halling-Brown says that this process is now relatively straightforward. This is thanks to the development of a SMART box, which is installed at each new site to automate the collection and annotation of images, the de-identification of participants and the secure uploading of data to the cloud.
“That box has taken us eight years to perfect: a huge amount of knowledge has gone into that thing, and it is a very intelligent piece of software,” he says. And that know-how, he says, will be very useful in other contexts where health systems are looking to set up secure, large scale databases.
Meanwhile, Kheiron have just completed a major study of their latest breast cancer detection algorithms in which they performed as well as human readers. This follows a study by Google Health, published in January 2020, using the OPTIMAM data, that made similar claims.
Kerruish says the software – christened Mia – annotates images so that human colleagues can see what features of the breast tissue it is responding to. The aim, of course, being to increase the speed and robustness of human decisions. “We all know very well the difference in prognosis if a woman’s breast cancer is detected early versus late,” says Kerruish.
This, along with the many other exciting AI approaches being trained on the OPTIMAM dataset, illustrates just what is at stake.
“You think about it in terms of an image,” says Kerruish, “but this is a woman’s life.”
Liam Drew is a writer and journalist covering biology and medicine. In 2020, he received the Association of British Science Writers’ Award for Best Engineering and Technology Reporting.