🔼上一集:dolphinscheduler 3.0.1数据质量
*️⃣主目录:dolphinscheduler 3.0.1功能梳理及源码解读
🔽下一集:dolphinscheduler 3.0.1 监控中心(上):服务管理
2.0常见数据库都支持,MySQL、PostgreSQL、Oracle、SQLServer、Hive,这样都验证过,都支持,Spark是不支持的,2.0没开发spark数据库组件,据说3.0支持,今天就来验证一下。至于其它的,目前完全没接触过的(有兴趣的自研吧):
- ClickHouse
- Presto
- Redshift
- DB2:也是常见的关系型数据库,不过目前我还没接触过
🐬Spark 数据源
🐠创建失败
🐟查看日志
看日志是输入的数据库名称不对,看来3.0确实是支持spark数据库插件了
🐟查看源码
- 数据源目录结构:看样子是都支持了
- 集成hive数据库插件中的类,虽然没啥问题,但是有悖插件这个概念,加入hive插件拿掉,spark插件很明显受影响
3.1.0也是,不知后续会不会优化
🐟spark sql
一说到大数据就能想到hadoop、spark。其实hive/spark sql目前还没接触过,因为spark比较出门,加上2.0的时候测试了spark数据源,插件不支持,所以对spark sql
兴趣比较大,稍微调研下吧。
🐡官网
spark sql官网
-
Spark SQL 允许您使用 SQL 或熟悉的DataFrame API 查询 Spark 程序中的结构化数据。可用于Java,Scala,Python和R。以相同的方式连接到任何数据源。
-
DataFrame 和 SQL 提供了一种访问各种数据源的通用方法,包括 Hive、Avro、Parquet、ORC、JSON 和 JDBC。您甚至可以跨这些源联接数据。在现有仓库上运行 SQL 或 HiveQL 查询。
-
Spark SQL支持HiveQL语法以及Hive SerDes和UDF,允许 以访问现有的 Hive 仓库。服务器模式为商业智能工具提供行业标准的 JDBC 和 ODBC 连接。
🐡使用指南
使用指南
🐟hive sql
🐡官网
官网,从主要功能看,hive sql感觉简称hive
🐡使用指南
hive sql 使用指南
🐬数据源使用
定义任务节点,涉及数据库操作的时候会使用到定义好的数据源
🐠节点调用数据库过程
- SqlTask
- 数据库客户端,看到JDBC,其实目的就达到了
🐵其它
HikariCP
-
github地址
-
是什么?数据库连接池,高性能的 JDBC 连接池组件.
-
特点?最快
-
spring boot的默认数据库连接池:回到上图代码,直接
new HikariDataSource()
,便获取到了连接- JDBCDataSourceProvider
public static HikariDataSource createJdbcDataSource(BaseConnectionParam properties, DbType dbType) { logger.info("Creating HikariDataSource pool for maxActive:{}", PropertyUtils.getInt(Constants.SPRING_DATASOURCE_MAX_ACTIVE, 50)); HikariDataSource dataSource = new HikariDataSource(); //TODO Support multiple versions of data sources ClassLoader classLoader = Thread.currentThread().getContextClassLoader(); loaderJdbcDriver(classLoader, properties, dbType); dataSource.setDriverClassName(properties.getDriverClassName()); dataSource.setJdbcUrl(DataSourceUtils.getJdbcUrl(dbType, properties)); dataSource.setUsername(properties.getUser()); dataSource.setPassword(PasswordUtils.decodePassword(properties.getPassword())); dataSource.setMinimumIdle(PropertyUtils.getInt(Constants.SPRING_DATASOURCE_MIN_IDLE, 5)); dataSource.setMaximumPoolSize(PropertyUtils.getInt(Constants.SPRING_DATASOURCE_MAX_ACTIVE, 50)); dataSource.setConnectionTestQuery(properties.getValidationQuery()); if (properties.getProps() != null) { properties.getProps().forEach(dataSource::addDataSourceProperty); } logger.info("Creating HikariDataSource pool success."); return dataSource; }
- pom.xml
<dependency> <groupId>com.zaxxer</groupId> <artifactId>HikariCP</artifactId> <version>4.0.3</version> </dependency>
-
-
README.md,里面有具体参数使用说明
Essentials
🔤
dataSourceClassName
This is the name of the class provided by the JDBC driver.
Consult the documentation for your specific JDBC driver to get this class name, or see the table below.
Note XA data sources are not supported.
XA requires a real transaction manager like bitronix.
Note that you do not need this property if you are using for "old-school" DriverManager-based JDBC driver configuration.
Default: noneDataSourcejdbcUrl
- or -
🔤
jdbcUrl
This property directs HikariCP to use "DriverManager-based" configuration.
We feel that DataSource-based configuration (above) is superior for a variety of reasons (see below), but for many deployments there is little significant difference.
When using this property with "old" drivers, you may also need to set the driverClassName property, but try it first without.
Note that if this property is used, you may still use DataSource properties to configure your driver and is in fact recommended over driver parameters specified in the URL itself.
Default: none
🔤
username
This property sets the default authentication username used when obtaining Connections from the underlying driver.
Note that for DataSources this works in a very deterministic fashion by calling on the underlying DataSource.
However, for Driver-based configurations, every driver is different.
In the case of Driver-based, HikariCP will use this property to set a property in the passed to the driver's call.
If this is not what you need, skip this method entirely and call , for example.
Default: noneDataSource.
getConnection(*username*, password)usernameuserPropertiesDriverManager.
getConnection(jdbcUrl, props)addDataSourceProperty("username", ...)
🔤
password
This property sets the default authentication password used when obtaining Connections from the underlying driver.
Note that for DataSources this works in a very deterministic fashion by calling on the underlying DataSource.
However, for Driver-based configurations, every driver is different.
In the case of Driver-based, HikariCP will use this property to set a property in the passed to the driver's call.
If this is not what you need, skip this method entirely and call , for example.
Default: noneDataSource.
getConnection(username, *password*)passwordpasswordPropertiesDriverManager.
getConnection(jdbcUrl, props)addDataSourceProperty("pass", ...)
Frequently used
✅
autoCommit
This property controls the default auto-commit behavior of connections returned from the pool.
It is a boolean value.
Default: true
⏳
connectionTimeout
This property controls the maximum number of milliseconds that a client (that's you) will wait for a connection from the pool.
If this time is exceeded without a connection becoming available, a SQLException will be thrown.
Lowest acceptable connection timeout is 250 ms. Default: 30000 (30 seconds)
⏳
idleTimeout
This property controls the maximum amount of time that a connection is allowed to sit idle in the pool.
This setting only applies when minimumIdle is defined to be less than maximumPoolSize.
Idle connections will not be retired once the pool reaches connections.
Whether a connection is retired as idle or not is subject to a maximum variation of +30 seconds, and average variation of +15 seconds.
A connection will never be retired as idle before this timeout.
A value of 0 means that idle connections are never removed from the pool.
The minimum allowed value is 10000ms (10 seconds).
Default: 600000 (10 minutes)minimumIdle
⏳
keepaliveTime
This property controls how frequently HikariCP will attempt to keep a connection alive, in order to prevent it from being timed out by the database or network infrastructure.
This value must be less than the value.
A "keepalive" will only occur on an idle connection.
When the time arrives for a "keepalive" against a given connection, that connection will be removed from the pool, "pinged", and then returned to the pool.
The 'ping' is one of either: invocation of the JDBC4 method, or execution of the .
Typically, the duration out-of-the-pool should be measured in single digit milliseconds or even sub-millisecond, and therefore should have little or no noticeable performance impact.
The minimum allowed value is 30000ms (30 seconds), but a value in the range of minutes is most desirable.
Default: 0 (disabled)maxLifetimeisValid()connectionTestQuery
⏳
maxLifetime
This property controls the maximum lifetime of a connection in the pool.
An in-use connection will never be retired, only when it is closed will it then be removed.
On a connection-by-connection basis, minor negative attenuation is applied to avoid mass-extinction in the pool.
We strongly recommend setting this value, and it should be several seconds shorter than any database or infrastructure imposed connection time limit.
A value of 0 indicates no maximum lifetime (infinite lifetime), subject of course to the setting.
The minimum allowed value is 30000ms (30 seconds).
Default: 1800000 (30 minutes)idleTimeout
🔤
connectionTestQuery
If your driver supports JDBC4 we strongly recommend not setting this property.
This is for "legacy" drivers that do not support the JDBC4 .
This is the query that will be executed just before a connection is given to you from the pool to validate that the connection to the database is still alive.
Again, try running the pool without this property, HikariCP will log an error if your driver is not JDBC4 compliant to let you know.
Default: noneConnection.
isValid() API
🔢
minimumIdle
This property controls the minimum number of idle connections that HikariCP tries to maintain in the pool.
If the idle connections dip below this value and total connections in the pool are less than , HikariCP will make a best effort to add additional connections quickly and efficiently.
However, for maximum performance and responsiveness to spike demands, we recommend not setting this value and instead allowing HikariCP to act as a fixed size connection pool.
Default: same as maximumPoolSizemaximumPoolSize
🔢
maximumPoolSize
This property controls the maximum size that the pool is allowed to reach, including both idle and in-use connections.
Basically this value will determine the maximum number of actual connections to the database backend.
A reasonable value for this is best determined by your execution environment.
When the pool reaches this size, and no idle connections are available, calls to getConnection() will block for up to milliseconds before timing out.
Please read about pool sizing.
Default: 10connectionTimeout
📈
metricRegistry
This property is only available via programmatic configuration or IoC container.
This property allows you to specify an instance of a Codahale/Dropwizard to be used by the pool to record various metrics.
See the Metrics wiki page for details.
Default: noneMetricRegistry
📈
healthCheckRegistry
This property is only available via programmatic configuration or IoC container.
This property allows you to specify an instance of a Codahale/Dropwizard to be used by the pool to report current health information.
See the Health Checks wiki page for details.
Default: noneHealthCheckRegistry
🔤
poolName
This property represents a user-defined name for the connection pool and appears mainly in logging and JMX management consoles to identify pools and pool configurations.
Default: auto-generated
Infrequently used
⏳
initializationFailTimeout
This property controls whether the pool will "fail fast" if the pool cannot be seeded with an initial connection successfully.
Any positive number is taken to be the number of milliseconds to attempt to acquire an initial connection;
the application thread will be blocked during this period.
If a connection cannot be acquired before this timeout occurs, an exception will be thrown.
This timeout is applied after the period.
If the value is zero (0), HikariCP will attempt to obtain and validate a connection.
If a connection is obtained, but fails validation, an exception will be thrown and the pool not started.
However, if a connection cannot be obtained, the pool will start, but later efforts to obtain a connection may fail.
A value less than zero will bypass any initial connection attempt, and the pool will start immediately while trying to obtain connections in the background.
Consequently, later efforts to obtain a connection may fail.
Default: 1connectionTimeout
❎
isolateInternalQueries
This property determines whether HikariCP isolates internal pool queries, such as the connection alive test, in their own transaction.
Since these are typically read-only queries, it is rarely necessary to encapsulate them in their own transaction.
This property only applies if is disabled.
Default: falseautoCommit
❎
allowPoolSuspension
This property controls whether the pool can be suspended and resumed through JMX.
This is useful for certain failover automation scenarios.
When the pool is suspended, calls to will not timeout and will be held until the pool is resumed.
Default: falsegetConnection()
❎
readOnly
This property controls whether Connections obtained from the pool are in read-only mode by default.
Note some databases do not support the concept of read-only mode, while others provide query optimizations when the Connection is set to read-only.
Whether you need this property or not will depend largely on your application and database.
Default: false
❎
registerMbeans
This property controls whether or not JMX Management Beans ("MBeans") are registered or not.
Default: false
🔤
catalog
This property sets the default catalog for databases that support the concept of catalogs.
If this property is not specified, the default catalog defined by the JDBC driver is used.
Default: driver default
🔤
connectionInitSql
This property sets a SQL statement that will be executed after every new connection creation before adding it to the pool.
If this SQL is not valid or throws an exception, it will be treated as a connection failure and the standard retry logic will be followed.
Default: none
🔤
driverClassName
HikariCP will attempt to resolve a driver through the DriverManager based solely on the , but for some older drivers the must also be specified.
Omit this property unless you get an obvious error message indicating that the driver was not found.
Default: nonejdbcUrldriverClassName
🔤
transactionIsolation
This property controls the default transaction isolation level of connections returned from the pool.
If this property is not specified, the default transaction isolation level defined by the JDBC driver is used.
Only use this property if you have specific isolation requirements that are common for all queries.
The value of this property is the constant name from the class such as , , etc. Default: driver defaultConnectionTRANSACTION_READ_COMMITTEDTRANSACTION_REPEATABLE_READ
⏳
validationTimeout
This property controls the maximum amount of time that a connection will be tested for aliveness.
This value must be less than the .
Lowest acceptable validation timeout is 250 ms. Default: 5000connectionTimeout
⏳
leakDetectionThreshold
This property controls the amount of time that a connection can be out of the pool before a message is logged indicating a possible connection leak.
A value of 0 means leak detection is disabled.
Lowest acceptable value for enabling leak detection is 2000 (2 seconds).
Default: 0
➡
dataSource
This property is only available via programmatic configuration or IoC container.
This property allows you to directly set the instance of the to be wrapped by the pool, rather than having HikariCP construct it via reflection.
This can be useful in some dependency injection frameworks.
When this property is specified, the property and all DataSource-specific properties will be ignored.
Default: noneDataSourcedataSourceClassName
🔤
schema
This property sets the default schema for databases that support the concept of schemas.
If this property is not specified, the default schema defined by the JDBC driver is used.
Default: driver default
➡
threadFactory
This property is only available via programmatic configuration or IoC container.
This property allows you to set the instance of the that will be used for creating all threads used by the pool.
It is needed in some restricted execution environments where threads can only be created through a provided by the application container.
Default: nonejava.
util.
concurrent.
ThreadFactoryThreadFactory
➡
scheduledExecutor
This property is only available via programmatic configuration or IoC container.
This property allows you to set the instance of the that will be used for various internally scheduled tasks.
If supplying HikariCP with a instance, it is recommended that is used.
Default: nonejava.
util.
concurrent.
ScheduledExecutorServiceScheduledThreadPoolExecutorsetRemoveOnCancelPolicy(true)
Druid vs HikariCP
参考文献
可以看到Druid功能更加全面,但是HikariCP的性能是最高的。其中Druid防sql注入可以研究下,正好前端时间项目通过拦截器增加加了SQL、xss防注入拦截。
Druid防sql注入
有时间可以测试对比一下之前增加的SQL防注入拦截器和Druid配置防sql注入效果文章来源:https://www.toymoban.com/news/detail-401108.html
<!-- 配置监控统计拦截的filters,和防sql注入 -->
<property name="filters" value="stat,wall" />
参数配置详解文章来源地址https://www.toymoban.com/news/detail-401108.html
到了这里,关于dolphinscheduler 3.0.1 数据源中心及使用的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!