Kaggle:树叶分类(使用Jupyter)

这篇具有很好参考价值的文章主要介绍了Kaggle:树叶分类(使用Jupyter)。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

竞赛网址:https://www.kaggle.com/c/classify-leaves

# 首先导入包
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from PIL import Image
import os
import matplotlib.pyplot as plt
import torchvision.models as models
# This is for the progress bar.
from tqdm import tqdm
import seaborn as sns
import albumentations
from albumentations.pytorch.transforms import ToTensorV2
from torchvision.models import resnet50, ResNet50_Weights
# 看看label文件长啥样
labels_dataframe = pd.read_csv('../input/classify-leaves/train.csv')
labels_dataframe.head(5)

Kaggle:树叶分类(使用Jupyter),深度学习,分类,jupyter,数据挖掘,李沐动手学深度学习

labels_dataframe.describe()

Kaggle:树叶分类(使用Jupyter),深度学习,分类,jupyter,数据挖掘,李沐动手学深度学习

#function to show bar length

def barw(ax): 
    
    for p in ax.patches:
        val = p.get_width() #height of the bar
        x = p.get_x()+ p.get_width() # x- position 
        y = p.get_y() + p.get_height()/2 #y-position
        ax.annotate(round(val,2),(x,y))
        
#finding top leaves

plt.figure(figsize = (15,30))
ax0 =sns.countplot(y=labels_dataframe['label'],order=labels_dataframe['label'].value_counts().index)
barw(ax0)
plt.show()

Kaggle:树叶分类(使用Jupyter),深度学习,分类,jupyter,数据挖掘,李沐动手学深度学习

# 把label文件排个序
leaves_labels = sorted(list(set(labels_dataframe['label'])))
n_classes = len(leaves_labels)
print(n_classes)
leaves_labels[:10]

Kaggle:树叶分类(使用Jupyter),深度学习,分类,jupyter,数据挖掘,李沐动手学深度学习

# 把label转成对应的数字
class_to_num = dict(zip(leaves_labels, range(n_classes)))
class_to_num

Kaggle:树叶分类(使用Jupyter),深度学习,分类,jupyter,数据挖掘,李沐动手学深度学习

# 再转换回来,方便最后预测的时候使用
num_to_class = {v : k for k, v in class_to_num.items()}
# 继承pytorch的dataset,创建自己的
class LeavesData(Dataset):
    def __init__(self, csv_path, file_path, mode='train', valid_ratio=0.2, resize_height=256, resize_width=256):
        """
        Args:
            csv_path (string): csv 文件路径
            img_path (string): 图像文件所在路径
            mode (string): 训练模式还是测试模式
            valid_ratio (float): 验证集比例
        """
        
        # 需要调整后的照片尺寸,我这里每张图片的大小尺寸不一致#
        self.resize_height = resize_height
        self.resize_width = resize_width

        self.file_path = file_path
        self.mode = mode

        # 读取 csv 文件
        # 利用pandas读取csv文件
        self.data_info = pd.read_csv(csv_path, header=None)  #header=None是去掉表头部分
        # 计算 length
        self.data_len = len(self.data_info.index) - 1
        self.train_len = int(self.data_len * (1 - valid_ratio))
        
        if mode == 'train':
            # 第一列包含图像文件的名称
            self.train_image = np.asarray(self.data_info.iloc[1:self.train_len, 0])  #self.data_info.iloc[1:,0]表示读取第一列,从第二行开始到train_len
            # 第二列是图像的 label
            self.train_label = np.asarray(self.data_info.iloc[1:self.train_len, 1])
            self.image_arr = self.train_image 
            self.label_arr = self.train_label
        elif mode == 'valid':
            self.valid_image = np.asarray(self.data_info.iloc[self.train_len:, 0])  
            self.valid_label = np.asarray(self.data_info.iloc[self.train_len:, 1])
            self.image_arr = self.valid_image
            self.label_arr = self.valid_label
        elif mode == 'test':
            self.test_image = np.asarray(self.data_info.iloc[1:, 0])
            self.image_arr = self.test_image
            
        self.real_len = len(self.image_arr)

        print('Finished reading the {} set of Leaves Dataset ({} samples found)'
              .format(mode, self.real_len))

    def __getitem__(self, index):
        # 从 image_arr中得到索引对应的文件名
        single_image_name = self.image_arr[index]

        # 读取图像文件
        img_as_img = Image.open(self.file_path + single_image_name)

        #如果需要将RGB三通道的图片转换成灰度图片可参考下面两行
#         if img_as_img.mode != 'L':
#             img_as_img = img_as_img.convert('L')

        #设置好需要转换的变量,还可以包括一系列的nomarlize等等操作
        if self.mode == 'train':
            transform = transforms.Compose([
                transforms.Resize((224, 224)),
                transforms.RandomHorizontalFlip(p=0.5),   #随机水平翻转 选择一个概率
                transforms.ToTensor(),
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
            ])
        else:
            # valid和test不做数据增强
            transform = transforms.Compose([
                transforms.Resize((224, 224)),
                transforms.ToTensor(),
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
            ])
        img_as_img = transform(img_as_img)
        
        if self.mode == 'test':
            return img_as_img
        else:
            # 得到图像的 string label
            label = self.label_arr[index]
            # number label
            number_label = class_to_num[label]

            return img_as_img, number_label  #返回每一个index对应的图片数据和对应的label

    def __len__(self):
        return self.real_len
train_path = '../input/classify-leaves/train.csv'
test_path = '../input/classify-leaves/test.csv'
# csv文件中已经images的路径了,因此这里只到上一级目录
img_path = '../input/classify-leaves/'

train_dataset = LeavesData(train_path, img_path, mode='train')
val_dataset = LeavesData(train_path, img_path, mode='valid')
test_dataset = LeavesData(test_path, img_path, mode='test')
print(train_dataset)
print(val_dataset)
print(test_dataset)

Kaggle:树叶分类(使用Jupyter),深度学习,分类,jupyter,数据挖掘,李沐动手学深度学习

# 定义data loader
train_loader = torch.utils.data.DataLoader(
        dataset=train_dataset,
        batch_size=64, 
        shuffle=True,
        num_workers=4
    )

val_loader = torch.utils.data.DataLoader(
        dataset=val_dataset,
        batch_size=64, 
        shuffle=True,
        num_workers=4
    )
test_loader = torch.utils.data.DataLoader(
        dataset=test_dataset,
        batch_size=64, 
        shuffle=False,
        num_workers=4
    )
# 看一下是在cpu还是GPU上
def get_device():
    return 'cuda' if torch.cuda.is_available() else 'cpu'

device = get_device()
print(device)
# 是否要冻住模型的前面一些层
def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        model = model
        for param in model.parameters():
            param.requires_grad = False
# resnet34模型
def res_model(num_classes, feature_extract = False, use_pretrained=True):

    model_ft = models.resnet34(pretrained=True, progress=True)
    set_parameter_requires_grad(model_ft, feature_extract)
    num_ftrs = model_ft.fc.in_features
    model_ft.fc = nn.Sequential(nn.Linear(num_ftrs, num_classes))

    return model_ft
# 超参数
learning_rate = 1e-4
weight_decay = 1e-3
num_epoch = 20
model_path = './pre_res_model.ckpt'
# Initialize a model, and put it on the device specified.
model = res_model(176)
model = model.to(device)
model.device = device
# For the classification task, we use cross-entropy as the measurement of performance.
criterion = nn.CrossEntropyLoss()

# Initialize optimizer, you may fine-tune some hyperparameters such as learning rate on your own.
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate,weight_decay=weight_decay)

# The number of training epochs.
n_epochs = num_epoch

best_acc = 0.0
for epoch in range(n_epochs):
    # ---------- Training ----------
    # Make sure the model is in train mode before training.
    model.train() 
    # These are used to record information in training.
    train_loss = []
    train_accs = []
    # Iterate the training set by batches.
    for batch in tqdm(train_loader):
        # A batch consists of image data and corresponding labels.
        imgs, labels = batch
        imgs = imgs.to(device)
        labels = labels.to(device)
        # Forward the data. (Make sure data and model are on the same device.)
        logits = model(imgs)
        # Calculate the cross-entropy loss.
        # We don't need to apply softmax before computing cross-entropy as it is done automatically.
        loss = criterion(logits, labels)
        
        # Gradients stored in the parameters in the previous step should be cleared out first.
        optimizer.zero_grad()
        # Compute the gradients for parameters.
        loss.backward()
        # Update the parameters with computed gradients.
        optimizer.step()
        
        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels).float().mean()

        # Record the loss and accuracy.
        train_loss.append(loss.item())
        train_accs.append(acc)
        
    # The average loss and accuracy of the training set is the average of the recorded values.
    train_loss = sum(train_loss) / len(train_loss)
    train_acc = sum(train_accs) / len(train_accs)

    # Print the information.
    print(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")
    
    
    # ---------- Validation ----------
    # Make sure the model is in eval mode so that some modules like dropout are disabled and work normally.
    model.eval()
    # These are used to record information in validation.
    valid_loss = []
    valid_accs = []
    
    # Iterate the validation set by batches.
    for batch in tqdm(val_loader):
        imgs, labels = batch
        # We don't need gradient in validation.
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
            logits = model(imgs.to(device))
            
        # We can still compute the loss (but not the gradient).
        loss = criterion(logits, labels.to(device))

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

        # Record the loss and accuracy.
        valid_loss.append(loss.item())
        valid_accs.append(acc)
        
    # The average loss and accuracy for entire validation set is the average of the recorded values.
    valid_loss = sum(valid_loss) / len(valid_loss)
    valid_acc = sum(valid_accs) / len(valid_accs)

    # Print the information.
    print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")
    
    # if the model improves, save a checkpoint at this epoch
    if valid_acc > best_acc:
        best_acc = valid_acc
        torch.save(model.state_dict(), model_path)
        print('saving model with acc {:.3f}'.format(best_acc))

Kaggle:树叶分类(使用Jupyter),深度学习,分类,jupyter,数据挖掘,李沐动手学深度学习

## predict
saveFileName = './submission.csv'

model = res_model(176)

# create model and load weights from checkpoint
model = model.to(device)
model.load_state_dict(torch.load(model_path))

# Make sure the model is in eval mode.
# Some modules like Dropout or BatchNorm affect if the model is in training mode.
model.eval()

# Initialize a list to store the predictions.
predictions = []
# Iterate the testing set by batches.
for batch in tqdm(test_loader):
    
    imgs = batch
    with torch.no_grad():
        logits = model(imgs.to(device))
    
    # Take the class with greatest logit as prediction and record it.
    predictions.extend(logits.argmax(dim=-1).cpu().numpy().tolist())

preds = []
for i in predictions:
    preds.append(num_to_class[i])

test_data = pd.read_csv(test_path)
test_data['label'] = pd.Series(preds)
submission = pd.concat([test_data['image'], test_data['label']], axis=1)
submission.to_csv(saveFileName, index=False)
print("Done!!!!!!!!!!!!!!!!!!!!!!!!!!!")

Kaggle:树叶分类(使用Jupyter),深度学习,分类,jupyter,数据挖掘,李沐动手学深度学习文章来源地址https://www.toymoban.com/news/detail-594662.html

到了这里,关于Kaggle:树叶分类(使用Jupyter)的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 五分钟快速掌握windows深度学习环境配置:Anaconda、PyCharm、Pytorch、jupyter notebook

    新手五分钟掌握windows深度学习环境配置:Anaconda、PyCharm、Pytorch 配置的时候遇到了很多问题,总结了一下,可以按这个流程无脑配置。 通过本流程下载的版本 Anaconda 2023.03 PyCharm Community 2023.1.3 Pytorch 2.0.1 方式一:进入Anconda官网下载 (比较慢,不太推荐) 链接:点此进入官网

    2024年02月16日
    浏览(58)
  • 机器学习--jupyter使用

    Jupyter项目是一个非盈利的开源项目,源于2014年的ipython项目,因为它逐渐发展为支持跨所有编程语言的交互式数据科学和科学计算 Jupyter Notebook,原名IPython Notbook,是IPython的加强网页版,一个开源Web应用程序 名字源自Julia、Python 和 R(数据科学的三种开源语言) 是一款程序员

    2024年01月19日
    浏览(34)
  • 史上最完整的深度学习环境配置教程,亲自踩雷,看必会(包含问题解决)配置Anaconda+Pycharm+Pytorch+Jupyter

    目录 前言 一、配置Anaconda 二、配置PyCharm 三、配置PyTorch 四、配置Jupyter notebook 本人浏览了大量教程,踩过很多的坑,我将配置的过程详细具体的教给大家,只要按照步骤来一定可以配置成功。 进入Anaconda官网,点击Download 点击Download之后会进入该页面 ----------------------------

    2024年02月12日
    浏览(65)
  • jupyter快速实现单标签及多标签多分类的文本分类BERT模型

    jupyter实现pytorch版BERT(单标签分类版) nlp-notebooks/Text classification with BERT in PyTorch.ipynb 通过改写上述代码,实现多标签分类 参考解决方案 ,我选择的解决方案是继承BertForSequenceClassification并改写,即将上述代码的ln [9] 改为以下内容:

    2024年02月02日
    浏览(42)
  • 【机器学习2】什么是Jupyter notebook & 新手使用Jupter notebook

    Jupyter Notebook(此前被称为 IPython notebook)是一个 交互式 笔记本,支持运行 40 多种编程语言。 Jupyter Notebook 的本质是一个 Web 应用程序 ,便于创建和共享 程序文档 ,支持 实时代码 ,数学方程,可视化和 markdown。 用途包括:数据清理和转换,数值模拟,统计建模,机器学习

    2024年02月13日
    浏览(38)
  • 【机器学习科学库】全md文档笔记:Jupyter Notebook和Matplotlib使用(已分享,附代码)

    本系列文章md笔记(已分享)主要讨论人工智能相关知识。主要内容包括,了解机器学习定义以及应用场景,掌握机器学习基础环境的安装和使用,掌握利用常用的科学计算库对数据进行展示、分析,学会使用jupyter notebook平台完成代码编写运行,应用Matplotlib的基本功能实现图

    2024年02月21日
    浏览(40)
  • 40 深度学习(四):卷积神经网络|深度可分离卷积|colab和kaggle的基础使用

    卷积神经网络的基本结构 1: (卷积层+(可选)池化层) * N+全连接层 * M(N=1,M=0) 卷积层的输入和输出都是矩阵,全连接层的输入和输出都是向量,在最后一层的卷积上,把它做一个展平,这样就可以和全连接层进行运算了,为什么卷积要放到前面,因为展平丧失了维度信息,因

    2024年02月08日
    浏览(43)
  • 【jupyter使用】在Anaconda虚拟环境中使用Jupyter

    1. 安装anaconda并创建虚拟环境 anaconda下载地址:https://www.anaconda.com/products/distribution 按照提示一步步安装即可,此处略 2. 在电脑“开始”找到Anaconda文件夹,选择命令行窗口 使用conda env list命令查看已有的虚拟环境(首次只有base) 使用conda create -n your_env_name python=x.x创建虚拟环

    2024年02月07日
    浏览(37)
  • 【jupyter使用】指定jupyter notebook 打开路径,超简单,秒会

    相信大家如果使用过jupyter都知道,直接打开是在电脑 C盘 的某个路径(不记得了。。。)。所以有些童鞋为了使用jupyter可能会选择把项目文件放到那个路径去。 很麻烦!而且项目管理不好。  所以我自己也有上网查过如何更改打开路径? 网上的方法很复杂,但我发现其实很

    2024年02月03日
    浏览(48)
  • pycharm 使用远程服务器 jupyter (本地jupyter同理)

    并指定端口为9999 可自定义更改 1. 2.

    2024年02月14日
    浏览(46)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包