英文原文出处:DissectingTheNutchCrawler
转载本文请注明出处:http://blog.csdn.net/pwlazy
Factory classes: Overview
> Class net.nutch.parser.ParserFactory
> used by:
> - net.nutch.db.WebDBInjector
> - net.nutch.fetcher.Fetcher
> - net.nutch.parser.ParserChecker
>
> Class net.nutch.protocol.ProtocolFactory
> used by:
> - net.nutch.fetcher.Fetcher
> - net.nutch.parser.ParserChecker
>
> Class net.nutch.net.URLFilterFactory
> used by:
> - net.nutch.db.WebDBInjector
> - net.nutch.tools.UpdateDatabaseTool
>
> Class net.nutch.plugin.PluginRepository: used by (Parser/Protocol)Factory
Nutch's ParserFactory and ProtocolFactory classes are the key extension points for the crawler. URLFilterFactory additionally provides an extension point for other components, including WebDBInjector and UpdateDatabaseTool. These "Factory" classes can all be reconfigured by editingXML config files. So before we describe the mechanics of any of the Factory classes, we need take a quick look at Nutch's configuration system.
工厂类概览
net.nutch.parser.ParserFactory 被以下几个类使用
- net.nutch.db.WebDBInjector
- net.nutch.fetcher.Fetcher
- net.nutch.parser.ParserChecker
net.nutch.protocol.ProtocolFactory 被以下几个类使用
- net.nutch.fetcher.Fetcher
- net.nutch.parser.ParserChecker
net.nutch.net.URLFilterFactory 被以下几个类使用
- net.nutch.db.WebDBInjector
- net.nutch.tools.UpdateDatabaseTool
net.nutch.plugin.PluginRepository: 被 (Parser/Protocol)Factory 使用
对于crawler来说 ParserFactory 和 ProtocolFactory 是关键的扩展点
URLFilterFactory 又另外为其他组件(比如WebDBInjector 和 UpdateDatabaseTool)提供了一个扩展点.这些工厂类可以通过编辑xml配置文件重新配置。所以在我们阐述任何一个工厂类的机制之前,我们需要迅速浏览一下nutch的配置系统
分享到:
相关推荐
技术文档分享。
藏经阁-Offensive-Malware-Analysis-Dissecting-OSXFruitFly-Via-A-Cust
解剖图像作物这是B. Van Hoorick和C. Vondrick的正式资料库,“解剖图像作物”, arXiv预印本arXiv:2011.11831,2020 。简而言之,我们研究了视觉裁剪留下的痕迹。基本用法说明步骤1:使用高分辨率图像文件填充data...
探索oracle redolog内部结构
信息安全_数据安全_us-18-Goland-Dissecting-Non-Mali 安全管理 信息安全研究 信息安全 安全防护 区块链
信息安全_数据安全_D2T1 - Dissecting a Cloud-Connec 数据分析 情报处理 业务安全 数据恢复 安全架构
Chapter 11 - Dissecting Classes Chapter 12 - Compositional Design Chapter 13 - Extending Class Functionality Through Inheritance Part III - Implementing Polymorphic Behavior Chapter 14 - Ad ...
Dissecting the Hack - the F0rb1dd3n Network (revised) - J. Street (Syngress, 2010) BBS(英文版)
Real World Java EE Night Hacks--Dissecting the Business Tier.jpg(电子书的封面图片)
In 2019, the rapid rate at which GPU manufacturers refresh their designs, coupled with their reluctance to disclose microarchitectural details, is still a hurdle for those software designers who want ...
GTC 2018Dissecting the Volta GPU Architecture throughMicrobenchmarkingZhe Jia, Marco Maggioni, Benjamin Staiger, Daniele P. ScarpazzaHigh-Performance Computing Group• Micro-architectural details ...
2018CVPR_Dissecting Person Re-identification from the Viewpoint of Viewpoint
H0w t0 R34d Dissecting the Hack: The F0rb1dd3n Network xvii About the Authors xix PART 1 F0RB1DD3N PR010gu3 3 A New Assignment 3 ChAPTeR 0N3 15 Problem Solved 15 Getting Started 21 The Acquisition 22 ...
Dissecting Android Malware: Characterization and Evolution
Written by an expert with intelligence officer experience who invented the technology, it explores the keys to understanding the dark side of human nature, various types of security threats (current ...
Dissecting the Activity Building and Running the Activity ■Chapter 4: Using XML-Based Layouts What Is an XML-Based Layout? Why Use XML-Based Layouts? OK, So What Does It Look Like? What’s with ...
Completely updated and featuring 12 new chapters, Gray Hat Hacking: The Ethical Hacker's Handbook, Fourth Edition explains the enemy’s current weapons, skills, and tactics and offers field-tested ...
The developers who created SharpDevelop give you an inside track on application development with a guided tour of the source code for SharpDevelop. They will show you the most important code ...
dissecting MFC dissecting MFC