Linear Regression in mojo with NDBuffer

这篇具有很好参考价值的文章主要介绍了Linear Regression in mojo with NDBuffer。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

The linear regression is the simplest machine learning algorithm. In this article I will use mojo NDBuffer to implement a simple linear regression algorithm from scratch. I will use NDArray class which was developed by in the previous article. First import the necessary libs and NDArray definition:

# common import
from String import String
from Bool import Bool
from List import VariadicList
from Buffer import NDBuffer
from List import DimList
from DType import DType
from Pointer import DTypePointer
from TargetInfo import dtype_sizeof, dtype_simd_width
from Index import StaticIntTuple
from Random import rand

alias nelts = dtype_simd_width[DType.f32]()

struct NDArray[rank:Int, dims:DimList, dtype:DType]:
    var ndb:NDBuffer[rank, dims, dtype]
    
    fn __init__(inout self, size:Int):
        let data = DTypePointer[dtype].alloc(size)
        self.ndb = NDBuffer[rank, dims, dtype](data)
        
    fn __getitem__(self, idxs:StaticIntTuple[rank]) -> SIMD[dtype,1]:
        return self.ndb.simd_load[1](idxs)
    
    fn __setitem__(self, idxs:StaticIntTuple[rank], val:SIMD[dtype,1]):
        self.ndb.simd_store[1](idxs, val)

Let’s assume we want to figure out this function:
y = W ⋅ X y = W \cdot X y=WX
Where:

  • W is the parameter
  • X is the sample design matrix. Each row is a sample. Each sample is a n dimension vector x ∈ R n \boldsymbol{x} \in R^{n} xRn. If we have m samples then X ∈ R m × n X \in R^{m \times n} XRm×n
    Here we will deal with a very simple toy problem. We assume n = 3 n=3 n=3 and m = 5 m=5 m=5. Let’s define the ith sample:
    x ( i ) = [ x 1 ( i ) x 2 ( i ) x 3 ( i ) ] ∈ R 3 × 1 , i ∈ { 1 , 2 , 3 , 4 , 5 } \boldsymbol{x}^{(i)} = \begin{bmatrix} x^{(i)}_1 \\ x^{(i)}_2 \\ x^{(i)}_3 \end{bmatrix} \in R^{3 \times 1}, i \in \{ 1, 2, 3, 4, 5\} x(i)= x1(i)x2(i)x3(i) R3×1,i{1,2,3,4,5}
    Notes:
  • i is the index of the sample;
  • 1, 2, 3 is the subscript of the feature dimension;
  • m is the total number of samples. In this case m=5;
  • n is the dimension of the feature vector. in this case n=3;

Let’s generate the dataset:

alias X_rank = 2
alias r1 = 5
alias r2 = 3
var X_size = r1 * r2
var X = NDArray[X_rank, DimList(r1, r2), DType.f32](X_size)
# geneate random number and set to X
var rvs = DTypePointer[DType.f32].alloc(X_size)
rand[DType.f32](rvs, X_size)
for d1 in range(r1):
    for d2 in range(r2):
        X[StaticIntTuple[X_rank](d1, d2)] = rvs.load(d1*r2+d2)*5.0 + 1.0

Let’s define the parameter w \boldsymbol{w} w:
w = [ 1.1 1.2 1.3 ] \boldsymbol{w} = \begin{bmatrix} 1.1 \\ 1.2 \\ 1.3 \end{bmatrix} w= 1.11.21.3

Let define the ground truth hypothesis function:
y = w T ⋅ x + b = [ 1.1 2.2 3.3 ] ⋅ [ x 1 x 2 x 3 ] + b = 1.1 ⋅ x 1 + 2.2 ⋅ x 2 + 3.3 ⋅ x 3 + 1.8 y = \boldsymbol{w}^{T} \cdot \boldsymbol{x} + b =\begin{bmatrix} 1.1 \\ 2.2 \\ 3.3 \end{bmatrix} \cdot \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \end{bmatrix} + b = 1.1 \cdot x_{1} + 2.2 \cdot x_{2} + 3.3 \cdot x_{3} + 1.8 y=wTx+b= 1.12.23.3 x1x2x3 +b=1.1x1+2.2x2+3.3x3+1.8

Let’s define the paramter w \boldsymbol{w} w:

alias w_rank = 2
alias w_r1 = 3
alias w_r2 = 1
var w = NDArray[w_rank, DimList(w_r1, w_r2), DType.f32](w_r1 * w_r2)
w[StaticIntTuple[w_rank](0,0)] = 0.01
w[StaticIntTuple[w_rank](1,0)] = 0.02
w[StaticIntTuple[w_rank](2,0)] = 0.03
var b = SIMD[DType.f32, 1](0.0)

Now we can get the ground truch label 𝑦:

alias y_rank = 1
alias y_r1 = 5
var y = NDArray[y_rank, DimList(y_r1), DType.f32](y_r1)
for d1 in range(y_r1):
    y[StaticIntTuple[y_rank](d1)] = 1.1 * X[StaticIntTuple[X_rank](d1,0)] + 
                2.2 * X[StaticIntTuple[X_rank](d1,1)] + 
                3.3 * X[StaticIntTuple[X_rank](d1,2)] + 1.8

Let define the function get_batch to get a mini batch from the training dataset:

alias batch_size = 2
alias batch_rank = 2
# idx can only be 0,1 We will ignore the last element in X.
fn get_batch(inout batch_X:NDArray[batch_rank, DimList(batch_size, r2), DType.f32],
             inout batch_y:NDArray[y_rank, DimList(batch_size), DType.f32],
             X:NDArray[X_rank, DimList(r1, r2), DType.f32], 
             y:NDArray[y_rank, DimList(y_r1), DType.f32],
             batch_idx:Int):
    for b_idx in range(batch_size):
        batch_y[StaticIntTuple[y_rank](b_idx)] = y[StaticIntTuple[y_rank]
        		(batch_size*batch_idx+b_idx)]
        for c_idx in range(r2):
            batch_X[StaticIntTuple[batch_rank](b_idx, c_idx)] = 
            		X[StaticIntTuple[X_rank](batch_size*batch_idx+b_idx,c_idx)]

Let’s discuss the math theory of linear regress. For ith sample we will omit the (i) subscript for simplicity. The calculated label y ^ \hat{y} y^:
y ^ = w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b \hat{y} = w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b y^=w1x1+w2x2+w3x3+b
As we have the ground truth label y we define the loss function as:
l = 1 2 ( y ^ − y ) 2 = 1 2 ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) 2 \mathcal{l} = \frac{1}{2}(\hat{y}-y)^{2} = \frac{1}{2}(w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y)^{2} l=21(y^y)2=21(w1x1+w2x2+w3x3+by)2
According to linear regression algorithm we will set random value to parameter w \boldsymbol{w} w and zero to b b b. We will calculate y by using these parameters setting. Then we calculate the loss which represent how good our parameters are. Our task is to find the best parameters setting to let the loss minmum:
arg ⁡ min ⁡ w , b l \arg\min_{\boldsymbol{w},b} \mathcal{l} argw,bminl

To get the minmum parameter we will get each individual parameter’s gradient of loss and adjust the parameter against the gradient direction. This is the gradient descent algorithm. So let get parameter w 1 w_{1} w1 gradient of loss:
∂ l ∂ w 1 = ∂ ( 1 2 ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) 2 ) ∂ w 1 = ∂ ( 1 2 ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) 2 ) ∂ ( ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ) ⋅ ∂ ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ∂ w 1 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 1 \frac{\partial{\mathcal{l}}}{\partial{w_{1}}} = \frac{\partial{ \big( \frac{1}{2}(w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y)^{2} \big) }}{\partial{w_{1}}} \\ = \frac{\partial{ \big( \frac{1}{2}(w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y)^{2} \big) }}{\partial{ \big( (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \big) }} \cdot \frac{\partial{ (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) }} { \partial{w_{1}} } \\ = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{1} w1l=w1(21(w1x1+w2x2+w3x3+by)2)=((w1x1+w2x2+w3x3+by))(21(w1x1+w2x2+w3x3+by)2)w1(w1x1+w2x2+w3x3+by)=(w1x1+w2x2+w3x3+by)x1
We use the chain rule of gradient in the above formula. We can get all parameters gradient of loss in the same way:
∂ l ∂ w 1 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 1 ∂ l ∂ w 2 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 2 ∂ l ∂ w 3 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 3 ∂ l ∂ b = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) \frac{\partial{\mathcal{l}}}{\partial{w_{1}}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{1} \\ \frac{\partial{\mathcal{l}}}{\partial{w_{2}}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{2} \\ \frac{\partial{\mathcal{l}}}{\partial{w_{3}}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{3} \\ \frac{\partial{\mathcal{l}}}{\partial{b}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) w1l=(w1x1+w2x2+w3x3+by)x1w2l=(w1x1+w2x2+w3x3+by)x2w3l=(w1x1+w2x2+w3x3+by)x3bl=(w1x1+w2x2+w3x3+by)

Assume the learning rate is α \alpha α then we can update the parameters:
w 1 : = w 1 − α ⋅ l ∂ w 1 w 2 : = w 2 − α ⋅ l ∂ w 2 w 3 : = w 3 − α ⋅ l ∂ w 3 b : = b − α ⋅ l ∂ b w_{1} := w_{1} - \alpha \cdot \frac{\mathcal{l}}{\partial{w_{1}}} \\ w_{2} := w_{2} - \alpha \cdot \frac{\mathcal{l}}{\partial{w_{2}}} \\ w_{3} := w_{3} - \alpha \cdot \frac{\mathcal{l}}{\partial{w_{3}}} \\ b := b - \alpha \cdot \frac{\mathcal{l}}{\partial{b}} w1:=w1αw1lw2:=w2αw2lw3:=w3αw3lb:=bαbl文章来源地址https://www.toymoban.com/news/detail-468451.html

let epochs = 1000
var y_hat = NDArray[y_rank, DimList(batch_size), DType.f32](batch_size)
alias loss_rank = 1
var loss = NDArray[loss_rank, DimList(batch_size), DType.f32](batch_size)
var coff = 0.5
var Xi = NDArray[batch_rank, DimList(batch_size, r2), DType.f32](batch_size*r2)
var yi = NDArray[y_rank, DimList(batch_size), DType.f32](batch_size)
var lr = 0.001
for epoch in range(epochs):
    for bidx in range(batch_size):
        get_batch(Xi, yi, X, y, bidx)
        # forward pass
        y_hat[StaticIntTuple[y_rank](0)] = 
        			w[StaticIntTuple[w_rank](0,0)]*Xi[StaticIntTuple[X_rank](0,0)] + 
                    w[StaticIntTuple[w_rank](1,0)]*Xi[StaticIntTuple[X_rank](0,1)] + 
                    w[StaticIntTuple[w_rank](2,0)]*Xi[StaticIntTuple[X_rank](0,2)] +
                    b
        y_hat[StaticIntTuple[y_rank](1)] = 
        			w[StaticIntTuple[w_rank](0,0)]*Xi[StaticIntTuple[X_rank](1,0)] + 
                    w[StaticIntTuple[w_rank](1,0)]*Xi[StaticIntTuple[X_rank](1,1)] + 
                    w[StaticIntTuple[w_rank](2,0)]*Xi[StaticIntTuple[X_rank](1,2)] +
                    b
        # calculate the loss
        loss[StaticIntTuple[loss_rank](0)] = coff *(
                (y[StaticIntTuple[y_rank](0)]-y_hat[StaticIntTuple[y_rank](0)])*
                (y[StaticIntTuple[y_rank](0)]-y_hat[StaticIntTuple[y_rank](0)])
        )
        loss[StaticIntTuple[loss_rank](1)] = coff *(
                (y[StaticIntTuple[y_rank](1)]-y_hat[StaticIntTuple[y_rank](1)])*
                (y[StaticIntTuple[y_rank](1)]-
                y_hat[StaticIntTuple[y_rank](1)])
        )
        g_w1 = (y_hat[StaticIntTuple[y_rank](0)]-
        		y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,0)] + 
                (y_hat[StaticIntTuple[y_rank](1)]-
                y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,0)]
        w[StaticIntTuple[w_rank](0,0)] -= lr*g_w1
        g_w2 = (y_hat[StaticIntTuple[y_rank](0)]-
        		y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,1)] + 
                (y_hat[StaticIntTuple[y_rank](1)]-
                y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,1)]
        w[StaticIntTuple[w_rank](1,0)] -= lr*g_w2
        g_w3 = (y_hat[StaticIntTuple[y_rank](0)]-
        y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,2)] + 
                (y_hat[StaticIntTuple[y_rank](1)]-
                y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,2)]
        w[StaticIntTuple[w_rank](2,0)] -= lr*g_w3
        g_b = (y_hat[StaticIntTuple[y_rank](0)]-y[StaticIntTuple[y_rank](0)]) + 
                (y_hat[StaticIntTuple[y_rank](1)]-y[StaticIntTuple[y_rank](1)])
        b -= lr*g_b
        loss_val = loss[StaticIntTuple[loss_rank](0)] + 
        		loss[StaticIntTuple[loss_rank](0)]
        print('epoch_', epoch, ': idx=', bidx, ' loss=', loss_val, 
        		'; w1=', w[StaticIntTuple[w_rank](0,0)], 
              ', w2=', w[StaticIntTuple[w_rank](1,0)], 
              ', w3=', w[StaticIntTuple[w_rank](2,0)], ', b=', b, ';')

到了这里,关于Linear Regression in mojo with NDBuffer的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 吴恩达机器学习-可选实验:使用ScikitLearn进行线性回归(Linear Regression using Scikit-Learn)

    有一个开源的、商业上可用的机器学习工具包,叫做scikit-learn。这个工具包包含了你将在本课程中使用的许多算法的实现。 在本实验中,你将:利用scikit-learn实现使用梯度下降的线性回归 您将使用scikit-learn中的函数以及matplotlib和NumPy。 np.set_printoptions(precision=2) 的作用是告诉

    2024年03月14日
    浏览(49)
  • 500mA High Voltage Linear Charger with OVP/OCP

    YHM2810 is a highly integrated, single-cell Li-ion battery charger with system power path management for space-limited portable applications. The full charger function features Trickle-charge, constant current fast charge and constant voltage regulation, charge termination, and auto recharge. YHM2810 can deliver up to 500mA charging current, be programmed ex

    2024年01月16日
    浏览(29)
  • Human Pose Regression with Residual Log-likelihood Estimation

            通过似然热图对输出分布进行建模的基于热图的方法在人体姿态估计领域占据主导地位。相比之下,基于回归的方法更有效,但效果较差。 在这项工作中,我们探索了最大似然估计(MLE),以开发一种高效有效的基于回归的方法。从MLE的角度来看,采用不同的回

    2023年04月26日
    浏览(48)
  • 【论文笔记】Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    原文链接:https://arxiv.org/abs/2312.00752 基石模型(FM)的主干网络通常是序列模型,处理任意的输入序列。但现代FM主要基于Transformer这一序列模型,及其核心的注意力。但是,自注意力仅能在上下文窗口中密集地传递信息,而无法建模窗口外部的数据;此外,其尺度与窗口长度

    2024年04月26日
    浏览(45)
  • Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism

    本文通过估计锚框的离群度定义一个动态聚焦机制(FM) f(β),β = L I o U L I o U frac{L_{IoU}}{L_{IoU}} L I o U ​ L I o U ​ ​ 。FM通过将小梯度增益分配到具有小β的高质量锚框,使锚框回归能够专注于普通质量的锚框。 同时,该机制将小梯度增益分配给β较大的低质量锚箱,有效削

    2024年02月12日
    浏览(50)
  • 阿里团队轻量级语义分割框架——AFFormer:Head-Free Lightweight Semantic Segmentation with Linear Transformer

    代码地址:dongbo811/AFFormer (github.com) 文章地址 现有的语义分割工作主要集中在设计有效的解码器上;然而, 整体结构引入的计算负载长期被忽视 ,阻碍了其在资源受限硬件上的应用。本文提出了一种专门用于语义分割的 无头轻量级架构 ,命名为自适应频率Transformer( AFForme

    2024年02月04日
    浏览(35)
  • Bias in Emotion Recognition with ChatGPT

    本文是LLM系列文章,针对《Bias in Emotion Recognition with ChatGPT》的翻译。 本技术报告探讨了ChatGPT从文本中识别情绪的能力,这可以作为交互式聊天机器人、数据注释和心理健康分析等各种应用程序的基础。虽然先前的研究已经表明ChatGPT在情绪分析方面的基本能力,但它在更细微

    2024年02月07日
    浏览(34)
  • 《论文阅读07》Segment Anything in 3D with NeRFs

    研究领域:图像分割(3D) 论文:Segment Anything in 3D with NeRFs Submitted on 24 Apr 2023 (v1), last revised 1 Jun 2023 (this version, v3) Computer Vision and Pattern Recognition (cs.CV) nvos数据集 论文链接 使用NeRFs在3D中分割任何内容 摘要 最近,Segment Anything Model(SAM)作为一种强大的视觉基础模型出现,它能

    2024年02月16日
    浏览(47)
  • gitlab提交项目Log in with Access Token错误

    目录 报错信息 问题描述 解决方案 在提交项目到gitlab时,需要添加账户信息 ,但是报了这样一个错,原因应该就是路径问题,我在填写server地址的时候,就出现了路径问题,我把多余的几个/去掉之后,才访问到我的gitalb的指定页面。 所以我猜想下面json发送失败也是因为路

    2024年02月11日
    浏览(52)
  • Transfer learning in computer vision with TensorFlow Hu

    作者:禅与计算机程序设计艺术 Transfer learning is a machine learning technique that allows a model to learn new knowledge from an existing trained model on a similar task. Transfer learning can be useful for a variety of tasks such as image classification, object detection, and speech recognition. However, transfer learning has its own set of c

    2024年02月07日
    浏览(49)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包