This project focuses on fine-tuning the Mask2Former model for semantic segmentation specifically on the FoodSeg103 dataset. The goal was to enhance the modelβs performance in identifying and segmenting various food items from images. The project also includes deploying the fine-tuned model and creating a user-friendly GUI with Gradio for interactive inference.
See the Gradio interface in action with the GIF below. π΄β¨
git clone https://github.com/NimaVahdat/FoodSeg_mask2former.git
cd FoodSeg_mask2former
pip install -r requirements.txt
Configure the training parameters in the config.yaml
file:
batch_size
: Number of samples per batch.learning_rate
: Initial learning rate for the optimizer.step_size
: Epoch interval for learning rate adjustment.gamma
: Factor for learning rate decay.epochs
: Total number of training epochs.save_path
: Directory to save model checkpoints.load_checkpoint
: Path to a pre-trained checkpoint (or None
to train from scratch).log_dir
: Directory for TensorBoard logs.To start the training process, execute:
python -m scripts.run_training
This command will initialize training based on the parameters specified in config.yaml
and save the trained model checkpoints to the specified save_path
.
Deploy the model using Gradio to create an interactive web interface that allows users to upload images and view segmentation results in real time.
python -m gradio_app.app
For questions, feedback, or contributions, please open an issue or reach out to me.