GATE, Genereal Architecture for Text Engineering, is one of the most widely used software platform for language engineering which provides ways to get structured information from unstructured textual data sources. It is developed by the GATE Team at NLP group of the University of Sheffield. Our lead trainer has been the core-developer of GATE for past 10 years and is a certified GATE Guru who has been delivering training at the official GATE Training courses. We also have strong tie-ups with the core GATE team and trainings can be organized in collaborations with them.

We train participants on the most recent release of GATE using the latest training material published by the GATE Team on their website. It is a 4-days training event that includes lectures, hands-on-exercises and question and answering sessions. Depending on your requirements and experience with GATE, we can suggest and offer training for one of the three tracks as offered by the GATE Team. Participants choosing one of these three tracks are also encouraged to take online certification exams offered and managed by the GATE team in Sheffield.

We also organise on-demand developer sprints to help you solve YOUR problems using GATE. Should you wish to bring your own tasks and looking for our help, developer sprints offer the best opportunity to discuss and develop your solutions using GATE.

Need more information?

Three reasons why you should learn to develop with GATE

1. Wider Acceptability

GATE already benefits from more than 15 years of exhaustive development and contributions from researchers and developers all over the globe and has been widely accepted not only by scientists, researchers and students but also by several commercial organizations (such as IBM, HP, BBC, etc.).

2. Increased Productivity

GATE supports a variety of document formats such as text, HTML, XML, DOC, PDF etc which allows users of GATE to concentrate more on the actual tasks without worrying too much about dealing with different formats. It is shipped with a variety of language processing resources such as tokenisers, sentence splitters, part-of-taggers, machine learning, ontologies etc for different languages. Reusing these resources assures that techniques used for these tasks are widely acceptable and the results are uniform.

3. Easy Integrations, Adaptations and Portability

Portable exports of GATE applications can be easily integrated in third party components. Some of GATE's existing resources can be adapted to different requirements/languages, while, its flexible structure allows one to easily write wrappers for third-party libraries and integrate them in the system.

Training Agenda

Track 1: GATE & Text Mining Introduction

  • Module 1: Introduction to GATE Developer: GATE's application development environment
  • Module 2: Information Extraction and ANNIE: Our open-source IE system
  • Module 3: Introduction to JAPE: GATE's powerful rule-writing engine
  • Module 4: GATE Teamware: Web-based Collaborative Annotation Environment

Track 2: Programming in GATE

  • Module 5: GATE Embedded API: Using GATE as a library within bigger applications
  • Module 6: Main GATE APIs: Programming in GATE
  • Module 7: Creating new Resource Types: Writing new GATE text mining components
  • Module 8: Advanced GATE Embedded: Using GATE within web apps and scripting

Track 3: Advanced GATE

  • Module 9: Semantic Annotation with GATE
  • Module 10: Advanced GATE Applications: complex applications, writing multilingual IE, business intelligence, evaluation
  • Module 11: Machine Learning with GATE
  • Module 12: Sentiment Analysis


There are no prerequisites for Track 1, though it might be a good idea to run through some GATE Developer tutorials prior to attending, to make the most of the experience. For Track 2, you should be able to program using Java. Although Track 1 is not a prerequisite for taking Track 2, it can be a useful progression. Familiarity with the topics covered in Track 1 is a prerequisite for taking Track 3.