More and more developers are considering applying deep learning to real-world problems. Deep learning has skyrocketed in popularity in recent years, solving issues in various fields including medicine. In particular, neural networks can be used to classify types of skin lesions. These networks are trained on medical datasets and continually learn.
Despite constant improvements in medicine and quality of life, skin cancer remains an issue. According to statistics by the Skin Cancer Foundation, one in five Americans will develop skin cancer by age 70.
When it comes to skin cancer, time is an enemy: the earlier doctors detect it, the better. Modern magnification techniques allow specialists to conduct a detailed examination of suspicious skin lesions before proceeding to the next steps, such as biopsy, to make a precise diagnosis. Still, there’s no guarantee that a doctor’s diagnosis is accurate.
In this article, we explain how AI can enhance current diagnostic technologies, list major challenges on the way, and approach the problem of classifying seven types of skin cancer lesions using deep learning. This article will be helpful for those considering applying deep learning for healthcare applications.
Deep learning — in the form of image classification and semantic segmentation — is being used to solve various problems with computer vision. Deep learning algorithms demonstrate astonishingly accurate results (greater than 95% accuracy) when it comes to classifying cats and dogs or everyday objects like cars and chairs.
The reason for these great results is that a lot of huge datasets (Pascal VOC 2007, ImageNet, COCO) have already been designed — by compiling freely available images of pets and everyday objects — to help neural networks learn how to classify these things.
Medicine is one of the most important fields where AI can be applied. AI can solve various issues related to diagnosis and detection by:
- Processing gigabytes of images and data in a short amount of time
- Constantly training and learning to improve the accuracy of its results
- Exceeding a human’s reading capacity
- Detecting patterns and details better than the naked human eye
- Speeding up diagnosis and treatment
- Enabling precision medicine, whereby doctors use genetic changes in a patient’s tumor to determine an adequate treatment plan
- Eliminating the subjective judgment of individual physicians
Deep learning technologies are perfect for classifying various types of skin cancer while they can distinguish specific graphic patterns of skin lesions. However, developers still need to overcome several challenges in order to create flawless applications.
To accomplish a complicated task like detecting and fighting diseases, a deep learning model needs to be trained on thousands of samples. Unfortunately, developers often don’t have enough images to properly train neural networks. Therefore, tasks such as detecting and fighting diseases are usually performed in conditions of severe lack of data, time, and computational power.
Lack of data — Deep learning requires large amounts of data to train neural networks and constantly improve the accuracy of their results. Unfortunately, it’s not that easy to find a ready-to-go database with thousands of precisely the images you need, especially for less prevalent diseases. And gathering data manually is often not an option, as it can take too much time. To overcome this issue, developers may use tools that slightly alter existing images in a training set in order to provide more data for training. In the best-case scenario, medical institutions will gather more images and data for future datasets.
Issues of data access and integrity — Medical data is often siloed or slightly obfuscated by healthcare providers in order to protect patient health information, guard against medical malpractice claims, and compete with other medical institutions. Another issue is the lack of infrastructure to share data between hospitals, clinics, institutions, and other healthcare establishments.
The black box problem — Learning by itself, AI often comes up with conclusions that humans have never thought of and sometimes can’t even comprehend. Complex approaches to machine learning make the machine’s automated decision-making processes inscrutable, which means that users receive results without understanding how the system arrived at them. Therefore, doctors can consider it dangerous to rely on decisions made by AI. Developers are still working on solving the black box problem.
Despite all the challenges, various medical solutions already use AI to successfully enhance organizational and treatment processes. Here are a few of them:
- Nuance assists in administrative workflow automation, providing real-time clinical documentation guidance and ensuring consistent recommendations.
- AI-assisted robotic surgery reduces inefficiencies of surgical attendants and improves outcomes. AI systems gather and process tons of data to help surgeons better understand which techniques result in better outcomes.
- Healthcare applications can gather data about heart rate, calories burned, kilometers walked, etc. An AI system can then analyze this information and suggest lifestyle improvements. This data can also be shared with doctors to provide them with more data on a patient’s habits.
Now that we’ve highlighted major benefits of AI and listed the main challenges of using deep learning in medicine, let’s move further. To create a network that can classify seven types of skin cancer, we have to prepare the environment, find a dataset, train our network, and check its performance.
To apply deep learning for skin cancer detection, we first have to prepare the environment for our network. Training a deep learning model requires significant resources: tens or even hundreds of gigabytes of disk space, lots of RAM, and a decent GPU. To meet all these requirements, we use Google Colaboratory (also known as Colab), which provides us an instance of Ubuntu 18 with 300GB of disk space and an accessible file system, 12 GB of RAM, and 12GB of Nvidia K80 GPU memory that can be used up to 12 hours continuously.
During our research, we found a way to obtain twice as much RAM as usual in Google Colaboratory. All you need to do is crash the current environment with an out of memory error (simply start loading all of your data until you run out of memory) and you’ll be offered increased RAM. Although this trick works for now, it’s likely to be disabled in the near future.
Since the Colab instance lasts up to 12 hours and training processes may take a lot of time, we need to optimize our use of the environment. There are three basic steps to optimize this task:
1) Load your data to Google Drive.
2) Mount your Google Drive to Google Colab.
Once you click the Mount Drive button, the following block of code will appear:
3) Copy all your data to the Colaboratory file system.
Note: Uploading your data to Colab from the local system will take a lot of time. However, you can accelerate the process of copying files to Colab from Google Drive with one command:
Then import all the modules you’ll use for the task:
Now that we’ve optimized the Colab environment, let’s move on to datasets.
Machine learning for cancer detection isn't possible without data. Unfortunately, there are very few ready-to-use datasets for training a neural network to classify skin lesions. However, we managed to find the HAM10000 dataset, which is a decent dataset that contains a large collection of multi-source dermatoscopic images of common pigmented skin lesions.
The HAM10000 dataset consists of 10,000 images of seven classes of skin cancer. As with any dataset, it may contain errors and duplicates. Therefore, we have to preprocess this data first.
After we’ve finished copying the folder with the HAM10000 dataset, we can find it in our Colab VM filesystem.
Now we can see that all images in the dataset are located in two folders and are labeled as .csv files. Let’s simplify things by creating the following directory structure:
To create this directory structure, we used this code:
Now we have
val directories that contain images for each class of skin cancer for training and validation purposes, respectively.
Let’s take a look at the metadata. To read and work with metadata, we use the Pandas Python Data Analysis Library. We import the file with the dataset into Pandas and use the head command to display the first five records in the form of a table.
In the table above, the dx column corresponds to the class of the cancer. To train the neural network and make sure that it’s learning, we need to split our training data into training and validation subsets. The training portion of the data will only be used for training, while the validation set will be used to validate that the network is learning by seeing if it can accurately classify skin cancer types when processing previously unseen data.
The typical ratio for such a split is 80% of data for training and 20% for validation. To split our data into these sets, we use the
stratify parameter helps us keep the same distribution of image classes in both sets so we’ll have an 80/20 proportion for all classes.
df_val values, we can sort our images into the
val folders we created earlier:
Now that we’ve sorted two new datasets with images into these folders, let’s check how many images we have for each class in each folder:
We can see that
train_test_split works well. However, the dataset is unbalanced, as the number of images per class isn’t equal:
In order to classify every class with the same level of accuracy, we need to have an approximately equal number of images for all classes. To ensure this, we use data augmentation techniques. Augmentation is the process of slightly changing images in order to get more training data. For this purpose, we use ImageDataGenerator from the Keras framework, which provides a high- and low-level API for constructing neural networks:
The screenshot below shows that images are now more uniformly distributed among all seven classes:
This particular generator will randomly adjust an image using flip, shift, and zoom techniques according to the parameters we set. In this way, we obtain a sufficient number of images for every class in order to start modeling and training.
Let’s check how our data looks after augmentation on the image below:
As you can see, we have acquired more images that have only slight differences.
Now the data is ready to be used for deep learning to detect skin cancer, and we can proceed to modeling and training.
As a result of our previous work, we obtained 38,000 images for training. Although we can’t load all of them in memory at once (because of memory limitations), we still want to train the network on all of them. To overcome this issue, let’s use the ImageDataGenerator once again:
In this way, the ImageDataGenerator will apply the preprocess_input function to transform images to the format that ResNet152 uses as an input. Then the training data will be generated/loaded from the train_path/valid_path directories.
Sure, the process of loading/unloading images to/from memory brings additional overhead. However, we have no other choice, as our testing environment with 25GB of RAM can’t load all of the images at once.
To solve image classification issues fast and with minimum code, we apply transfer learning techniques. Transfer learning means using a model previously trained for a similar task as a starting point. For our task, we chose the ResNet152 architecture, which was pretrained on the ImageNet dataset. We replaced the top layers with our two fully convolutional layers. ImageNet is a database that consists of millions of images of 1000 classes and is used for developing and testing neural network architectures.
Using Keras, we can load pretrained ImageNet models with only one line of code:
The parameters above specify that we want to use the ResNet model that’s pretrained on the ImageNet dataset. We set
include_top=False to exclude top layers (the classifier) since we won’t use them. Our task is to classify seven types of skin cancer, not the 1000 classes contained in ImageNet. To classify these types of skin cancer, we add two fully convolutional layers (Dense) that work for our dataset:
The code above is where all experiments happen for modifying the neural network. Improving the results and fighting overfitting (fine-tuning) are usually done with the help of various techniques, layers, and configurations thereof. The topic of fine-tuning is huge and deserves a separate article; therefore, we won’t cover it here.
Since the training process consumes too much time and hardware resources, we have to optimize our time. When solving transfer learning issues, the first question to answer is how many layers we need to train. When it comes to neural networks for image processing, it’s known that the bottom layers of the network learn basic features like colors. The next couple of layers learn lines and curves, the following layers learn shapes, and the last layers learn classes. This is why we don’t always need to retrain the network from scratch.
The more similar your problem is to the problem the pretrained network was originally designed to solve, the fewer layers you’ll have to train. We start with training newly added layers first. To do this, we set all layers of the base_model as non-trainable and recompile the model:
In order to train the network in the most efficient way, we stop training when the network starts to overfit. This indicates that further training will not improve the results for our current configuration. For the purposes of detecting overfit, Keras provides callbacks and even interfaces for implementing callbacks. For our present task, we use standards callbacks:
We’ve used three callbacks — es_callback1, es_callback2, and checkpoint — two of which are used to stop training if the model overfits and a third which is the most important one: it freezes the best model so we always have a file with the best weights for our model. Now we can start training:
When training the top layers, we overfit pretty early — on the 4th epoch — and achieved an accuracy of 81%, which is not bad for a few layers.
Let's try to train more layers and see what we get. In order to choose how many layers to freeze, we’ve studied the ResNet152 architecture and decided to freeze the whole 5th convolution block:
With this command, we can see that ResNet152 has 514 layers and find that the last block starts from 483.
Actually, these are the first steps of fine-tuning the model in order to achieve better results. In this article, we don’t cover all the steps. But we show this one to demonstrate how much influence such steps may have:
On the 6th epoch, we achieved an increase of 3% in accuracy by simply changing the number of layers to train. We can notice that the other callback also worked and saved the best models we obtained so our work is not lost:
Accuracy can be improved even further in so many ways: by adding more data, cleaning up data, changing the model architecture, configuring dropout layers, configuring the regularization of l1 and l2 parameters, changing the learning rate, changing the optimizer, etc. In our next article, we will approach the problem of fine-tuning in order to achieve the best result we can without more data.
Applying deep learning for cancer diagnosis is only one of the numerous ways to use AI for solving medical issues. In this article, we’ve shown an approach to create a network for classifying seven types of skin cancer. Along the way, we overcame a lack of data by slightly changing our images using ImageDataGenerator.
The opportunity to train a network to classify skin cancer is not only exciting but also extremely promising. It shows that medicine can significantly evolve as deep learning technology is improved and optimized for medical purposes.
At Apriorit, we have a team of dedicated developers seasoned in AI, machine learning, and deep learning who are ready to help you improve your project or develop a new one from scratch.