Shattering Distribution for Active Learning:SDAL

这篇具有很好参考价值的文章主要介绍了Shattering Distribution for Active Learning:SDAL。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

Shattering Distribution for Active Learning

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 文章来源地址https://www.toymoban.com/news/detail-631963.html

"""
Code of SDAL for paper: Shattering Distribution for Active Learning
This Code is exactly the same as the original codes.
"""
import xlwt
import xlrd
import math
import numpy as np
import pandas as pd
from pathlib import Path
from sklearn.model_selection import StratifiedKFold
from collections import OrderedDict
from sklearn.svm import SVC
from scipy.special import expit
from copy import deepcopy
from scipy.spatial.distance import pdist, squareform
from sklearn.metrics import accuracy_score, mean_absolute_error, f1_score
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import rbf_kernel
from KBS_NEW.PointwiseQuery.ALOR import ALOR
from sklearn.metrics import accuracy_score, mean_squared_error
from time import time
from sklearn import preprocessing
from sklearn.metrics.pairwise import pairwise_kernels
from sklearn.base import ClassifierMixin, BaseEstimator
from sklearn.utils.validation import check_X_y
from scipy.linalg import pinv, pinv2, pinvh
from sklearn.linear_model import LogisticRegression
from sklearn.cluster import KMeans

class sdal():
    def __init__(self, X, y, labeled, budget, X_test, y_test):
        self.X = X
        self.y = y
        self.nSample, self.nDim = X.shape
        self.labels = sorted(np.unique(self.y))
        self.nClass = len(self.labels)
        self.X_test = X_test
        self.y_test = y_test
        self.budget = deepcopy(budget)
        self.budgetLeft = deepcopy(budget)
        self.labeled = list(deepcopy(labeled))
        self.unlabeled = self.initialization()
        self.K = rbf_kernel(X=self.X, gamma=0.1)
        self.lamb = 10e-4
        self.halving_ids = self.get_halving()


    def initialization(self):
        unlabeled = list(range(self.nSample))
        for idx in self.labeled:
            unlabeled.remove(idx)
        return unlabeled

    def get_halving(self):
        """Corresponding to the Halving function in the original code"""
        Halving_ids = []
        num_unlabeled = len(self.unlabeled)
        num_half = int(np.floor(num_unlabeled))
        if num_half < self.budget:
            num_half = self.budget
        Tmp_unlabeled = deepcopy(self.unlabeled)
        Halving_left = deepcopy(num_half)
        while Halving_left > 0:
            score = OrderedDict()
            for idx in Tmp_unlabeled:
                score[idx] = np.linalg.norm(self.K[idx,:]) / (self.K[idx,idx] + self.lamb)

            tar_idx = max(score, key=score.get)
            Halving_ids.append(tar_idx)
            self.K = self.K - np.outer(self.K[tar_idx],self.K[tar_idx]) / (self.K[tar_idx,tar_idx] + self.lamb)
            Halving_left -= 1
        return Halving_ids


    def NumberDensity(self, data, Center, Radius):
        f = 0.
        for i in range(len(data)):
            Ball_dist = []
            dist = []
            for j in range(len(Center)):
                dist.append(np.linalg.norm(data[i, :] - Center[j, :]))
                if dist[j] < Radius:
                    a=np.array(dist[j])
                    Ball_dist.append(dist[j])
            f = f + sum(np.exp(np.array(Ball_dist) / 1.8) ** 2) / (len(Ball_dist) + 1)
        return f

    def select(self):
        """Corresponding to the SDAL function in the original codes"""
        if self.budget == len(self.halving_ids):
            for idx in self.halving_ids:
                self.labeled.append(idx)
                self.unlabeled.remove(idx)
                self.budgetLeft -= 1
        else:
            data = self.X[self.halving_ids]
            clf = KMeans(n_clusters=self.budget)
            clf.fit(data)
            Center = clf.cluster_centers_
            Radi = 0.25
            T = 0
            L = data.shape[0]
            f = self.NumberDensity(data, Center, Radi)
            while T < 50:
                for j in range(self.budget):
                    Ball = []
                    dist = []
                    for i in range(L):
                        dist.append(np.linalg.norm(data[i, :] - Center[j, :]))
                        if dist[i] < Radi:
                            Ball.append(data[i, :])
                    if len(Ball) == 0:
                        Center[j, :] = Center[j, :]
                    else:
                        Center[j, :] = np.mean(np.array(Ball), 0)
                F = self.NumberDensity(data, Center, Radi)

                cul = np.zeros((len(Center), len(Center)))
                flag = 0

                for j in range(len(Center)):
                    for i in range(len(Center)):
                        cul[i, j] = np.linalg.norm(Center[i, :] - Center[j, :])
                        if i != j and cul[i, j] < 2 * Radi:
                            flag = 1
                if F - f == 0 or flag:
                    break
                else:
                    f = F
                T += 1
                Radi = (1 + 0.1) * Radi

            # -----------------------------
            selected_ids = np.zeros(self.budget)
            for b in range(self.budget):
                min_dist = np.inf
                tmp_center = Center[b]
                for idx in self.halving_ids:
                    dist = np.linalg.norm(tmp_center - self.X[idx])
                    if dist <= min_dist:
                        min_dist = dist
                        selected_ids[b] = idx

            # --------------------------
            for idx in selected_ids:
                self.labeled.append(idx)
        return self

if __name__ == '__main__':


    names_list = ["PowerPlant-5bin"]
    for name in names_list:
        print("########################{}".format(name))
        p = Path("D:\OCdata")
        data_path = Path(r"D:\OCdata")
        partition_path = Path(r"E:\CCCCC_Result\DataPartitions")
        # kmeans_path = Path(r"E:\CCCCC_Result\KmeansResult")
        """--------------read the whole data--------------------"""
        read_data_path = data_path.joinpath(name + ".csv")
        data = np.array(pd.read_csv(read_data_path, header=None))
        X = np.asarray(data[:, :-1], np.float64)
        scaler = StandardScaler()
        X = scaler.fit_transform(X)
        y = data[:, -1]
        y -= y.min()
        nClass = len(np.unique(y))
        Budget = 10 * nClass

        """--------read the partitions--------"""
        read_partition_path = str(partition_path.joinpath(name + ".xls"))
        book_partition = xlrd.open_workbook(read_partition_path)

        """-----read the kmeans results according to the partition-----"""
        # read_kmeans_path = str(kmeans_path.joinpath(name + ".xls"))
        # book_kmeans = xlrd.open_workbook(read_kmeans_path)
        workbook = xlwt.Workbook()
        count = 0
        for SN in book_partition.sheet_names():
            S_Time = time()
            train_idx = []
            test_idx = []
            labeled = []
            table_partition = book_partition.sheet_by_name(SN)
            for idx in table_partition.col_values(0):
                if isinstance(idx,float):
                    train_idx.append(int(idx))
            for idx in table_partition.col_values(1):
                if isinstance(idx,float):
                    test_idx.append(int(idx))
            for idx in table_partition.col_values(2):
                if isinstance(idx,float):
                    labeled.append(int(idx))

            X_train = X[train_idx]
            y_train = y[train_idx].astype(np.int32)
            X_test = X[test_idx]
            y_test = y[test_idx]

            model = sdal(X=X_train, y=y_train, labeled=labeled, budget=Budget, X_test=X_test, y_test=y_test)
            model.select()
            # SheetNames = "{}".format(count)
            sheet = workbook.add_sheet(SN)
            for i, idx in enumerate(train_idx):
                sheet.write(i, 0,  int(idx))
            for i, idx in enumerate(test_idx):
                sheet.write(i, 1, int(idx))
            for i, idx in enumerate(labeled):
                sheet.write(i, 2, int(idx))
            for i, idx in enumerate(model.labeled):
                sheet.write(i, 3, int(idx))

            print("SN:",SN," Time:",time()-S_Time)
        save_path = Path(r"E:\CCCCC_Result\SelectedResult\SDAL")
        save_path = str(save_path.joinpath(name + ".xls"))
        workbook.save(save_path)



















到了这里,关于Shattering Distribution for Active Learning:SDAL的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • ERROR: No matching distribution found for gradio>=3.23

    今天运行chatGPTweb项目的时候 跟下载其他包时候一样使用清华源下载的时候,pip install gradio==3.23 -i https://pypi.python.org/pypi 然后, 报错了 。 国内的镜像源还没有更新到 gradio=3.23,所以需要科学上网,手动去pypi官网下载whl,然后通过whl下载即可。 1.从官网中下载gradio编译的pyd文

    2024年02月11日
    浏览(60)
  • No matching distribution found for torch==1.10.1+cu111

    30系显卡暂时不支持CUDA11以下版本,CUDA不支持当前显卡的算力。 解决方法1:https://blog.csdn.net/weixin_43760844/article/details/115706289 解决方法2:conda下载cudatoolkit (貌似没有解决问题, 嘿嘿, 可能只能卸载cuda了) 首先搜索安装包的版本 然后安装固定版本的cudatoolkit,我的cuda最高

    2024年02月07日
    浏览(43)
  • 已解决ERROR: No matching distribution found for gradio==3.23

    已解决stderr: ERROR: Could not find a version that satisfies the requirement gradio==3.23 ERROR: No matching distribution found for gradio==3.23 粉丝群里面的一个小伙伴遇到问题跑来私信我,想用pip安装gradio ,但是发生了报错(当时他心里瞬间凉了一大截,跑来找我求助,然后顺利帮助他解决了,顺便记

    2023年04月18日
    浏览(45)
  • jenkins构建时,报错ERROR: No matching distribution found for pywin32==305

    最近用jenkin构建了一个任务,控制台输出,出现如下报错信息: ERROR: Could not find a version that satisfies the requirement pywin32==305 (from versions: none) ERROR: No matching distribution found for pywin32==305 Build step \\\'Execute shell\\\' marked build as failure Finished: FAILURE   原因: requirement是需要导入的依赖包文件

    2023年04月27日
    浏览(46)
  • 论文精读 《CaDNN: Categorical Depth Distribution Network for Monocular 3D Object Detection》

    研究动机:单目3D目标检测难以预测物体的深度 单目 3D 目标检测是自动驾驶领域的一个关键问题,因为它与典型的多传感器系统相比配置比较简单。 单目 3D 检测的主要挑战在于准确预测物体深度 ,由于缺乏直接的距离测量,因此必须从物体和场景线索推断出物体深度。许多

    2023年04月22日
    浏览(82)
  • Python安装tensorflow过程中出现“No matching distribution found for tensorflow”的解决办法

    在Pycharm中使用 pip install tensorflow 安装tensorflow时报错: 搜了好多帖子有的说可能是网络的问题,需要换国内的镜像源来下载,于是改用清华源: 依旧没用,折腾了好久,才发现我目前的Python版本是Python3.8(32位)的,可能是tensorflow对python3.8还不支持,所以得 降低python版本 (好

    2024年02月03日
    浏览(46)
  • 解决报错ERROR: No matching distribution found for torchvision==0.11.2+cu111

    目录 一、猜测 二、验证 三、解决方案 四、检验 该报错是在按官网方法用指令: 安装pytorch时出现的,以下是分析: 这个错误提示表明在指令提供的下载网址上没有找到符合要求的torchvision软件包版本,需要安装符合要求的版本。问题可能出在指定的版本号(0.11.2+cu111),这

    2024年02月11日
    浏览(42)
  • python3安装及pip3 报ERROR: No matching distribution found for

    python3 pip Install Error: No matching distribution found for 安装openssl 安装python3 可能会报错: zipimport.ZipImportError: can’t decompress data 解决方法:

    2024年02月14日
    浏览(44)
  • PIP安装python包,报ERROR: No matching distribution found for XXXXX 问题的处理

    最近在安装python包的时候老是报一个错误: 不管怎么装,都装不上,不同的包都是报这个错误,在网上找到的内容大多都相同,说是重新安装,或者加参数,结果全没用,最后准备排查python版本的问题,准备降级试一下。 因为之前安装的时候,是安装的3.10.2,以为是包不支

    2024年02月12日
    浏览(39)
  • Python安装selenium时报错:ERROR: No matching distribution found for selenium 附解决方法

    报错如下: 通过该方法可以成功解决报错:

    2024年02月05日
    浏览(54)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包