DPDK系列之二十内存的整体架构-Toy模板网

这篇具有很好参考价值的文章主要介绍了DPDK系列之二十内存的整体架构。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

一、DPDK的整体内存层次结构

前面提到了DPDK中自己对内存进行了抽象。那么既然DPDK决定自己伸手处理内存，那就得有自己的一套处理体系。肯定不能简单的一个链表就搞定。学过操作系统和计算机原理的都明白，计算机的内存管理是一个相当复杂的系统。退回来说，就在更上层中使用中OS中的内存管理，开发者能熟练的控制内存的能有多少？再退一步讲，直接操作内存的C/C++为什么会越来越多的被上层应用抛弃？其实，最典型的就是内存的管理很多开发者是搞不定的。一旦出了异常，实在是不好下手分析，甚至解决问题的时间超过了开发的时间。
为了更好的管理内存，DPDK把内存划分成三层（如果物理层也算是四层），即Zone（大块，区域）、Pool（池）和Buf（缓冲区），而在每层中，又划分了基于功能或者逻辑不同的相关数据结构并提供了相关的API的管理控制。其实这个很好理解，举一个例子，家里想过个肥年，肯定是买一大块肉或者一大片肉，回来后再根据应用范围（炒菜、炖肉等等）切成不同的块，最后实际应用时，可能进一步分解成小块或者切成各种肉片，肉沫儿等等。不同的是，肉最终就吃了，而内存用完了还得收回去，不但要收回去，还要有效率的收回去，所以内存管理更复杂。

二、相关的数据结构和API

既然DPDK分为三层来管理内存，这里的数据结构和接口也按照这种方式来说明：
1、membuf
这里需要注意，此处的Buf是给网络通信包准备的，如果自己想使用不推荐使用这个，而是可以自定义内存池来处理。下面看一下相关的数据结构定义：

//\lib\librte_mbuf
//\lib\librte_mbuf\rte_mbuf.h
//\lib\librte_mbuf\rte_mbuf_core.h
struct rte_mbuf {
	MARKER cacheline0;

	void *buf_addr;           /**< Virtual address of segment buffer. */
	/**
	 * Physical address of segment buffer.
	 * Force alignment to 8-bytes, so as to ensure we have the exact
	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
	 * working on vector drivers easier.
	 */
	RTE_STD_C11
	union {
		rte_iova_t buf_iova;
		rte_iova_t buf_physaddr; /**< deprecated */
	} __rte_aligned(sizeof(rte_iova_t));

	/* next 8 bytes are initialised on RX descriptor rearm */
	MARKER64 rearm_data;
	uint16_t data_off;

......
	struct rte_mbuf_ext_shared_info *shinfo;

	uint64_t dynfield1[2]; /**< Reserved for dynamic fields. * /
} __rte_cache_aligned;

这个数据结构有点大，略过中间环节。更多的细节还是要看头文件和整个的这个文件下的相关源文件，其主要接口有：

void rte_pktmbuf_init(struct rte_mempool *mp, void *opaque_arg,
		      void *m, unsigned i);
void rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg);
struct rte_mempool *
rte_pktmbuf_pool_create(const char * name, unsigned n,
	unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
	int socket_id);
struct rte_mempool *
rte_pktmbuf_pool_create_by_ops(const char * name, unsigned int n,
	unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
	int socket_id, const char * ops_name);
  static inline void
  rte_pktmbuf_attach_extbuf(struct rte_mbuf *m, void *buf_addr,
  	rte_iova_t buf_iova, uint16_t buf_len,
  	struct rte_mbuf_ext_shared_info *shinfo);

这个更多，毕竟在上层应用各种花式的处理也更多，内部的外部的，池子和缓冲区的初始化，块的，重置的等等都可以在rte_mbuf.h和相关的此文件夹下的源文件发现。详细的在后面再分析，这里就不展开了，太多了。

2、mempool
再看一下内存池的定义：

//\lib\librte_mempool\rte_mempool.h
//
struct rte_mempool {
	/*
	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
	 * compatibility requirements, it could be changed to
	 * RTE_MEMPOOL_NAMESIZE next time the ABI changes
	 */
	char name[RTE_MEMZONE_NAMESIZE]; /**< Name of mempool. */
	RTE_STD_C11
	union {
		void *pool_data;         /**< Ring or pool to store objects. */
		uint64_t pool_id;        /**< External mempool identifier. */
	};
	void *pool_config;               /**< optional args for ops alloc. */
	const struct rte_memzone *mz;    /**< Memzone where pool is alloc'd. */
	unsigned int flags;              /**< Flags of the mempool. */
	int socket_id;                   /**< Socket id passed at create. */
	uint32_t size;                   /**< Max size of the mempool. */
	uint32_t cache_size;
	/**< Size of per-lcore default local cache. */

	uint32_t elt_size;               /**< Size of an element. */
	uint32_t header_size;            /**< Size of header (before elt). */
	uint32_t trailer_size;           /**< Size of trailer (after elt). */

	unsigned private_data_size;      /**< Size of private data. */
	/**
	 * Index into rte_mempool_ops_table array of mempool ops
	 * structs, which contain callback function pointers.
	 * We're using an index here rather than pointers to the callbacks
	 * to facilitate any secondary processes that may want to use
	 * this mempool.
	 */
	int32_t ops_index;

	struct rte_mempool_cache *local_cache; /**< Per-lcore local cache */

	uint32_t populated_size;         /**< Number of populated objects. */
	struct rte_mempool_objhdr_list elt_list; /**< List of objects in pool */
	uint32_t nb_mem_chunks;          /**< Number of memory chunks */
	struct rte_mempool_memhdr_list mem_list; /**< List of memory chunks */

#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
	/** Per-lcore statistics. * /
	struct rte_mempool_debug_stats stats[RTE_MAX_LCORE];
#endif
}  __rte_cache_aligned;

struct rte_mempool_cache {
	uint32_t size;	      /**< Size of the cache */
	uint32_t flushthresh; /**< Threshold before we flush excess elements */
	uint32_t len;	      /**< Current cache count */
	/*
	 * Cache is allocated to this size to allow it to overflow in certain
	 * cases to avoid needless emptying of cache.
	 */
	void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects */
} __rte_cache_aligned;
struct rte_mempool_objhdr {
	STAILQ_ENTRY(rte_mempool_objhdr) next; /**< Next in list. */
	struct rte_mempool *mp;          /**< The mempool owning the object. */
	RTE_STD_C11
	union {
		rte_iova_t iova;         /**< IO address of the object. */
		phys_addr_t physaddr;    /**< deprecated - Physical address of the object. */
	};
#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
	uint64_t cookie;                 /**< Debug cookie. */
#endif
};
struct rte_mempool_memhdr {
	STAILQ_ENTRY(rte_mempool_memhdr) next; /**< Next in list. */
	struct rte_mempool *mp;  /**< The mempool owning the chunk */
	void *addr;              /**< Virtual address of the chunk */
	RTE_STD_C11
	union {
		rte_iova_t iova;       /**< IO address of the chunk */
		phys_addr_t phys_addr; /**< Physical address of the chunk */
	};
	size_t len;              /**< length of the chunk */
	rte_mempool_memchunk_free_cb_t *free_cb; /**< Free callback */
	void *opaque;            /**< Argument passed to the free callback * /
};

其实重点是要明白rte_mempool_cache在Pool中的作用和相关头rte_mempool_memhdr和rte_mempool_objhdr的作用。一个用来管理内存池对象，一个用来管理内存池的地址块。
它的主要接口有：

//\lib\librte_mempool\rte_mempool.h
struct rte_mempool *
rte_mempool_create(const char * name, unsigned n, unsigned elt_size,
		   unsigned cache_size, unsigned private_data_size,
		   rte_mempool_ctor_t * mp_init, void * mp_init_arg,
		   rte_mempool_obj_cb_t * obj_init, void * obj_init_arg,
		   int socket_id, unsigned flags);
       void
rte_mempool_free(struct rte_mempool *mp);
......

这个就不一一列举了，比较多，可以去列也的文件去看。在后面的具体的源码分析中，会对这些接口及流程里进行详细的说明。

3、memzone
在这个层次上的内存分配管理，相对就简单一些了，先看它的数据结构：

struct rte_memzone {

#define RTE_MEMZONE_NAMESIZE 32       /**< Maximum length of memory zone name.*/
	char name[RTE_MEMZONE_NAMESIZE];  /**< Name of the memory zone. */

	RTE_STD_C11
	union {
		phys_addr_t phys_addr;        /**< deprecated - Start physical address. */
		rte_iova_t iova;              /**< Start IO address. */
	};
	RTE_STD_C11
	union {
		void *addr;                   /**< Start virtual address. */
		uint64_t addr_64;             /**< Makes sure addr is always 64-bits */
	};
	size_t len;                       /**< Length of the memzone. */

	uint64_t hugepage_sz;             /**< The page size of underlying memory */

	int32_t socket_id;                /**< NUMA socket ID. */

	uint32_t flags;                   /**< Characteristics of this memzone. * /
} __attribute__((__packed__));

在这段代码的上面有注释“A structure describing a memzone, which is a contiguous portion of physical memory identified by a name”，说明它就是对物理内存的名称标识。看它的内部定义也是哪些，其实就是uint64_t长度的数据。
其实包括ring、heap等好多都分配在这个之上，所以也可以把这个Zone分为这几个类型，从逻辑上也是可以理解的。
它的几个接口函数：

//\lib\librte_eal\common\include
//\lib\librte_eal\common\eal_common_memzone.c
const struct rte_memzone * rte_memzone_reserve(const char * name,
					      size_t len, int socket_id,unsigned flags);
int rte_memzone_free(const struct rte_memzone * mz);
const struct rte_memzone * rte_memzone_lookup(const char * name);
void rte_memzone_walk(void (*func)(const struct rte_memzone *, void * arg),void * arg);

上面的接口没有全部列出来，可以在相关路径查找，接口不多，但介绍非常详细，所以引处就不再重复说明了。不过看接口，应该可以分析出这些接口的作用，基本上就是分配、对齐、查找、导出、释放和返回列表之类的。

三、适用的场景

正如前面分析说过，mempool适合于固定大小的内存应用，而Buf适合于网络包的使用，Zone提供了底层的封装支持，所以在一些常见的数据结构里都可以看到它的影子。而为了应用的方便，在每一种层次上，又设计了一些相关的具体的数据结构，综合起来让其使用更方便快捷。比如在内存池中，可以看到对象和内存的列表，可以看到Pool的头定义数据结构，在这些结构里可以轻松的对Pool进行查找处理。
这种设计思想其实是从最简单的底层数据处理开始，一步步的向上，为了逻辑应用和业务处理的一种内存数据封装。最终形成一套对外的业务接口数据结构，供上层应用开发者的使用。

四、总结

越来越发现，学计算机和做饭的精髓殊途同归啊。做饭讲究火候（其实就是一种平衡状态），而写程序也讲究火候（资源的平衡利用），为了速度都可以提前处理资源（把菜提前切好，肉啥的处理），使用更先进的工具，使用更好的技术等等。也难怪好多网络小说把厨师写成各种高手，计算机是不是也应该出一个从厨师到神牛的系列。可惜的是，计算机是从国外发展起来的，国外的饮食和国内比起来基本都是渣渣，估计这种愿望实现有点难啊。
扯回来，学习别人的经验，最终是把别人的优秀设计思想和开发理念掌握，而不能光为了学习而学习。当然，最初一定会有这个过程，但大家不能永远停留在这个区块，要知其然，更要知其所以然。在前面分析了很多内存管理的文章结合着DPDK中的内存管理，回望一下自己对内存管理的掌握和设计能力，看看到底有多深多浅，也是一种衡量学习的标准。文章来源地址https://www.toymoban.com/news/detail-493840.html

到了这里，关于DPDK系列之二十内存的整体架构的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！