Build your personal search engine with Crawlzilla

Intervenant(s) : Jazz Wang

  • Langue : Anglais
  • Type d'événement : Conférence
  • Date : Maandag 8 juli 2013
  • Horaire : 16h20
  • Durée : 20 minutes
  • Lieu : H 2214

Vidéo :

Thème : Internet & Cloud
Fils rouges : Au quotidienCloud
Public cible : Grand publicProfessionnels


Do you want to have a personal search engine for education or research purpose? In this talk, we would like to share a tool called "Crawlzilla". Crawlzilla helps users to build search engine for specific websites, especially Intranet websites which could not be indexed by Google or Yahoo. Crawlzilla is mainly based on open source projects, such as Nutch, Hadoop and Tomcat. Its key features include: (1) installation scripts for cluster deployment; (2) Text UI of Cluster System Management; (3) Web UI for managing crawler URLs and index pools; (4) Chinese Lexical Support; (5) Support multiple users and multiple indexing pools.


Jazz Yao-Tsung Wang is a co-developer of DRBL/Clonezilla team in Free Software Lab, NCHC, NARL, Taiwan.

Free Software in NCHC mainly develop open source software for education including DRBL, clonezilla, partclone, Tux2Live, etc. DRBL/Clonezilla team are one of the winners in Trophees du Libre competition in the category "Public sector applications".

National Center for High-performance Computing (NCHC) is the only supercomputing center in Taiwan. It’s a non-profit organization founded by Taiwan’s National Science Council (NSC). Our mission is to assist researchers on a national level.

His speaking experience includes:

- [1] "Building an Cloud Computing Analysis System for Intrusion Detection System", Cloud Slam’09, 2009-04-22
- [2] "BoF: Clonezilla Hands-On Lab", LinuxWorld Conference & Expo 2008, 2008-08-06
- [3] "Massive Deployment of Kerrighed Virtual SMP Cluster using DRBL", Open Source Grid and Cluster Conference 2008, 2008-05-16
- [4] "ClassCloud: switch your PC Classroom into Cloud Testbed", RMLL 2010, 2010-07-07


Transparents (PDF - 2.9 MB)