大数据收集利器flume1.7实践 | AiTi修炼|重剑无锋,拈花微笑
  • flume »
  • 大数据收集利器flume1.7实践
Header
Header

大数据收集利器flume1.7实践

一、flume 1.7发布以及新特性

  • apache 发布了flume1.7版本,该版本除了对kafka集成做了大量的优化以及改进,还增加了Taildir Source可以十分方便地实现对目录文件的递归监控,解决了以往版本的SpoolDirectorySource仅可以对目录下所有的文件进行监控,但是如果配置目录下面嵌套了子目录,则无法监听的问题。
    Release Notes – Flume – Version v1.7.0
    ** New Feature
    [FLUME-2498] – Implement Taildir Source
    ** Improvement
    [FLUME-1899] – Make SpoolDir work with Sub-Directories
    [FLUME-2526] – Build flume by jdk 7 in default
    [FLUME-2628] – Add an optional parameter to specify the expected input text encoding for the netcat sourcef the netcat source
    [FLUME-2704] – Configurable poll delay for spooling directory source
    [FLUME-2718] – HTTP Source to support generic Stream Handler
    [FLUME-2729] – Allow pollableSource backoff times to be configurable
    [FLUME-2755] – Kafka Source reading multiple topics
    [FLUME-2781] – A Kafka Channel defined as parseAsFlumeEvent=false cannot be correctly used by a Flume source
    [FLUME-2799] – Kafka Source – Message Offset and Partition add to headers
    [FLUME-2801] – Performance improvement on TailDir source
    [FLUME-2810] – Add static Schema URL to AvroEventSerializer configuration
    [FLUME-2820] – Support New Kafka APIs
    [FLUME-2852] – Kafka Source/Sink should optionally read/write Flume records
    [FLUME-2868] – Kafka Channel partition topic by key
    [FLUME-2872] – Kafka Sink should be able to select which header as the key
    [FLUME-2875] – Allow RollingFileSink to specify a file prefix and a file extension.
    [FLUME-2909] – Bump Rat version
    [FLUME-2910] – AsyncHBaseSink – Failure callbacks should log the exception that caused them
    [FLUME-2911] – Add includePattern option in SpoolDirectorySource configuration
    [FLUME-2918] – TaildirSource is underperforming with huge parent directories
    [FLUME-2937] – Integrate checkstyle for non-test classes
    [FLUME-2941] – Integrate checkstyle for test classes
    [FLUME-2954] – make raw data appearing in log messages explicit
    [FLUME-2955] – Add file path to the header in TaildirSource
    [FLUME-2959] – Fix issues with flume-checkstyle module
    [FLUME-2982] – Add localhost escape sequence to HDFS sink
    [FLUME-2999] – Kafka channel and sink should enable statically assigned partition per event via header
    [FLUME-2821] – Flume-Kafka Source with new Consumer
    [FLUME-2822] – Flume-Kafka-Sink with new Producer
    [FLUME-2823] – Flume-Kafka-Channel with new APIs
    ** Bug
    [FLUME-1668] – Hdfs Sink File Rollover
    [FLUME-2132] – Exception while syncing from Flume to HDFS
    [FLUME-2143] – Flume build occasionally fails with OutOfMemoryError on Windows.
    [FLUME-2215] – ResettableFileInputStream can’t support ucs-4 character
    [FLUME-2318] – SpoolingDirectory is unable to handle empty files
    [FLUME-2448] – Building flume from trunk failing with dependency error
    [FLUME-2484] – NullPointerException in Kafka Sink test
    [FLUME-2485] – Thrift Source tests fail on Oracle JDK 8
    [FLUME-2514] – Some TestFileChannelRestart tests are extremely slow
    [FLUME-2567] – Remove unneeded repository declarations in pom.xml
    [FLUME-2573] – flume-ng –conf parameter is not used when starting a flume agent
    [FLUME-2593] – ResettableFileInputStream returns negate values from read() method
    [FLUME-2619] – Spooldir source does not log channel exceptions
    [FLUME-2632] – High CPU on KafkaSink
    [FLUME-2652] – Documented transaction handling semantics incorrect
    [FLUME-2660] – Add documentation for EventValidator
    [FLUME-2672] – NPE in KafkaSourceCounter
    [FLUME-2712] – Optional channel errors slows down the Source to Main channel event rate
    [FLUME-2725] – HDFS Sink does not use configured timezone for rounding
    [FLUME-2732] – Make maximum tolerated failures before shutting down and recreating client in AsyncHbaseSink configurable
    [FLUME-2734] – Kafka Channel timeout property is overridden by default value
    [FLUME-2738] – Async HBase sink FD leak on client shutdown
    [FLUME-2746] – How to include this Flume Patch in Flume 1.5.2 ?
    [FLUME-2749] – Kerberos configuration error when using short names in multiple HDFS Sinks
    [FLUME-2751] – Upgrade Derby version to 10.11.1.1
    [FLUME-2753] – Error when specifying empty replace string in Search and Replace Interceptor
    [FLUME-2754] – Hive Sink skipping first transaction in each Batch of Hive Transactions
    [FLUME-2761] – Move Hive sink out of preview mode
    [FLUME-2763] – flume_env script should handle jvm parameters like -javaagent -agentpath -agentlib
    [FLUME-2773] – TailDirSource throws FileNotFound Exception if ~/.flume directory is not created already
    [FLUME-2797] – SyslogTcpSource uses Deprecated Class + Deprecate SyslogTcpSource
    [FLUME-2798] – Malformed Syslog messages can lead to OutOfMemoryException
    [FLUME-2804] – Hive sink – abort remaining transactions on shutdown
    [FLUME-2806] – flume-ng.ps1 Error running script to start an agent on Windows
    [FLUME-2835] – Hive Sink tests need to create table with transactional property set
    [FLUME-2841] – Upgrade commons-collections to 3.2.2
    [FLUME-2844] – ChannelCounter of SpillableMemoryChannel doesn’t register actually.
    [FLUME-2881] – Windows Launch Script fails in plugins dir code
    [FLUME-2886] – Optional Channels can cause OOMs
    [FLUME-2889] – Fixes to DateTime computations
    [FLUME-2891] – Revert FLUME-2712 and FLUME-2886
    [FLUME-2897] – AsyncHBase sink NPE when Channel.getTransaction() fails
    [FLUME-2901] – Document Kerberos setup for Kafka channel
    [FLUME-2908] – NetcatSource – SocketChannel not closed when session is broken
    [FLUME-2913] – Flume classpath too long
    [FLUME-2915] – The kafka channel using new APIs will be stuck when the sink is avro sink
    [FLUME-2920] – Kafka Channel Should Not Commit Offsets When Stopping
    [FLUME-2922] – HDFSSequenceFile Should Sync Writer
    [FLUME-2923] – Bump AsyncHBase version
    [FLUME-2936] – KafkaSource tests arbitrarily fail
    [FLUME-2939] – Upgrade recursive SpoolDir to use Java7 features
    [FLUME-2948] – Docs: Fixed parameters on Replicating Channel Selector documentation example
    [FLUME-2949] – Flume fails to build on Windows
    [FLUME-2950] – ReliableSpoolingFileEventReader.rollCurrentFile is broken
    [FLUME-2952] – SyslogAgent possible NPE on stop()
    [FLUME-2972] – Handle offset migration in the new Kafka Channel
    [FLUME-2974] – Some tests are broken in TestReliableSpoolingFileEventReader and TestSpoolingFileLineReader
    [FLUME-2983] – Handle offset migration in the new Kafka Source
    ** Documentation
    [FLUME-2575] – FLUME-2548 brings SSLv2Hello back for Avro Sink, but UG says it is one of the protocols to exclude
    [FLUME-2713] – Document Fault Tolerant Config parameters in FlumeUserGuide
    [FLUME-2737] – Documentation for Pollable Source config parameters introduced in FLUME-2729
    [FLUME-2783] – Update Website Team page with new Committer’s
    [FLUME-2890] – Typo in Twitter source warning
    [FLUME-2934] – Document new cachePatternMatching option for TaildirSource
    [FLUME-2963] – FlumeUserGuide – error in Kafka Source properties table
    [FLUME-2971] – Document secure Kafka Sink/Source/Channel setup
    [FLUME-2975] – Minor mistake in NetCat Source example in documentation
    [FLUME-2998] – Add missing configuration parameter to SequenceSource docs
    ** Task
    [FLUME-2935] – Bump java target version to 1.7
    ** Test
    [FLUME-3003] – testSourceCounter in TestSyslogUdpSource is flaky

二、flume1.7系统环境要求

  • Java Runtime Environment – Java 1.7 or later
  • Memory – Sufficient memory for configurations used by sources, channels or sinks
  • Disk Space – Sufficient disk space for configurations used by channels or sinks
  • Directory Permissions – Read/Write permissions for directories used by agent
  • 详细参考apache flume 1.7 User Guide

rscala.com版权所有,本文大数据收集利器flume1.7实践转载请注明出处:http://rscala.com/index.php/358.html

该文章归档分类于 flume

Leave a Reply

电子邮件地址不会被公开。 必填项已用*标注

*

code