top of page
Search

My Independent AI Research: Plant Disease Classification

  • Writer: Arnav Bansal
    Arnav Bansal
  • Oct 15, 2023
  • 3 min read

Updated: Oct 16, 2023

This summer, I had the opportunity to work with an Oxford graduate student on a research project using artificial intelligence computer vision to classify plant diseases from images. As someone who is interested in the application of data science/analytics in environmental science/sustainability, this project was a fun and valuable learning experience. Here is the link for the dataset we used: https://www.kaggle.com/datasets/abdallahalidev/plantvillage-dataset. The dataset comes from PlantVillage, an online platform. There are over 50,000 images of healthy and infectious diseases affecting different plants, and there are 38 different types of healthy/infected plants.


Before actually working on the model, there was a lot of preprocessing that had to be done for us to actually use the images in an AI model. For pre-processing, the first step was splitting our data into train, test, and valid sets (80% train, 10% test, 10% valid). The next step was flattening the images, which is our x variable. This means that each row in the datasets is converted into a single vector, which is necessary for linear regression modeling. The last step in the preprocessing of the data was converting the one-hot encoded target variables y_train, y_valid, and y_test into their corresponding class labels, which were the types of healthy/infected plants.


In order to get an idea of how good our own AI model is relative to what already exists, we first ran a test on pre-existing baseline models. For the first baseline model, we used sklearn’s LinearRegression model, and for the second model, we used sklearn’s MLPCLassifier model.





It was clear that there was a lot of room for improvement - if someone wanted to actually use this model to know what disease their plants have, they would not trust a model with less than 70% accuracy!


For our own model, we compared different pre-trained models and different parameters to determine which one gave us the best results. The table below shows the 4 models that gave us the highest accuracy. The first two tests show the comparison between ResNet50 and ResNet18 under the Adam Classifier, where it was clear that ResNet50 outperformed RestNet18. This may be because the dataset is very complex so having more layers increases the accuracy. After we determined that ResNet50 was better, we tried a different optimizer(SDG) but found the accuracy to be lower and loss to be very high. We then reverted back to the Adam optimizer and ran it for 50 epochs instead of 30 epochs, which gave us a final accuracy of about 86%.



To make sure that we did not overfit the model to our data, we created graphs for accuracy and loss:


Loss



Accuracy


As we ran the model for more epochs, we noticed that the difference between the training set and validation set accuracy/loss started to increase. We chose to not run the model for more than 50 epochs because the difference would continue to grow to the point where the model was overfitting our data.


Overall, I learned a lot about artificial intelligence through this project, and it was great to apply my skills in Python on a project like this. It fascinated me how data analysis and artificial intelligence can play a huge role in increasing sustainability in the near future, and I hope to continue working on research projects like these in college and beyond.


Next Steps


To expand upon this project, there are a few things I would love to do soon.

  • I intend to develop a user-friendly website that allows individuals to upload images of plants directly from their camera roll. This website will leverage my model to accurately identify diseases affecting the plants based on the input image.

  • Also, I want to continue working on the model itself, as I believe that there is still room for improvement. My goal is to achieve an accuracy rate exceeding 90% while avoiding overfitting issues. This entails exploring more advanced neural network architectures to ensure the highest level of precision in plant disease identification.

コメント


EnviroWare

bottom of page