30 Links for English Language Data Geeks

A typical corpus linguist.. Although I personally prefer blue braces.

A typical corpus linguist.. Although I personally prefer blue braces.

  1. The Moby Lexicon Project
  2. BNC Baby
  3. Full BNC
  4. Project Gutenberg (Download full database)
  5. CMU Pronouncing Dictionary
  6. GNU Collaborative International Dictionary of English
  7. The Internet Dictionary Project
  8. English Wikitionary Dump
  9. Simple English Wiktionary Dump
  10. JACET 8000
  11. Minimal pairs in English RP
  12. List of homographs
  13. Homophones in English RP
  14. Google’s Official List of Bad Words
  15. Yasumasa Someya’s Lemmas List
  16. MRC Psycholinguistic Database
  17. Million Song Dataset
  18. Penn Treebank P.O.S. Tags
  19. Princeton University’s WordNet
  20. The Sentence Corpus of Remedial English
  21. Summer Institute of Linguistics (SIL) Word List
  22. The Tanaka Corpus
  23. The General Service List
  24. The New General Service List
  25. The Academic Word List
  26. The New Academic Word List
  27. The TOEIC Word List
  28. The Business Service List
  29. Apache Open Office MyThes
  30. Global WordNet
Posted in Coding, TEFL

Generating over 2000 flashcards from a DIY corpus of TOEFL material

Download the CSV of all 2,313 terms (inc. Japanese definitions) or access the full list on Quizlet.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Step 1: Assemble a corpus of TOEFL past papers

TOEFLFor my corpus, I used material from both the older CBT (Computer Based Test) and the current iBT (Internet Based Test). I found most of the materials online for free. Some were already in plain text format, but most were PDFs and required Optical Character Recognition (OCR) to convert to plain text. I used ABBYY’s FineReader Pro for Mac, but there are plenty of other options out there too. Some files were Microsoft Word format (.doc/.docx), and MacOS X’s batch conversion utility came in hand for these. I included model answers, listening transcripts, reading passages and multiple choice questions (prompts, distractors and answers). I tried to exclude explanations, advice and instructions from the authors and/or publishers.

Ultimately, I ended up with corpus just shy of a million words (959,124 to be precise). In general, bigger is better when it comes to corpus research. The TOEIC Service List (TSL) utilizes a corpus of about 1.5 million words, so my TOEFL corpus seems roughly comparable to this.

Step 2: Count the number of occurrences of each word

I used some custom PHP code to process my corpus data (although Python is probably more suited for corpus analysis). I lemmatized each token where possible using Yasumasa Someya’s list of lemmas. I then cross referenced each lemma occurrence with the NGSL, NAWL and TSL. Finally, I exported to a CSV, and ended up with 13,287 rows of data.

Step 3: Curate the final list

For my final list I removed any words which also appear on the NGSL, any contractions (e.g. “Don’t”,”I’m”,”that’s”), any numbers written in word form (e.g. “two”,”million”), any vocalizations (e.g. “uh”,”oh”), any ordinals (e.g. “first”,”second”,”third”), any proper nouns (“James”, “Elizabeth”, “America”, “San Francisco”, “New York”), and any words with fewer than 5 occurrences in the corpus. Next, I ran the list through a spell checker, and excluded any unrecognized words. I also excluded any non-lexical words, to leave a list consisting only of nouns, verbs, adjectives and adverbs.

Step 4. Generate flashcards

I now had a list of 2313 terms, made up of 523 adjectives, 123 adverbs, 1366 nouns, and 301 verbs. I used Text to Flash to generate Japanese definitions for each word, then uploaded the words to Quizlet, separated into part-of-speech and ordered alphabetically.

Posted in Coding, TEFL

Multilingual, part-of-speech categorized, difficulty sorted Quizlet flashcards for NGSL, NAWL and TSL

FlashcardsI’ve generated multilingual, part-of-speech categorized, difficulty sorted sets of flashcards for the latest New General Service List (NGSL), New Academic Word List (NAWL) and TOEIC Service List (TSL), and added them to Quizlet.

The sets are organized in classes according to the definition language. Each class contains sets of flashcards for the four lexical parts of speech (adverbs, verbs, adjectives and nouns). There are a maximum of 20 flashcards in each set, and the sets are ordered by difficulty (i.e. frequency), with Part 1 of each list containing the easiest (most common) words.

As no information was given about part-of-speech in the word lists themselves, I tagged the words using Moby (icon.shef.ac.uk/Moby/mpos.html) selecting only the most common part-of-speech for words which can be used as multiple parts-of-speech. The word “register”, for example, is listed as a noun by Moby before it is listed as a verb, so only the noun definition of “register” was included in the flashcards.

Links to the classes are as follows:

New Academic Word List (NAWL)

NAWL English-Arabic
NAWL English-Chinese
NAWL English-German
NAWL English-Greek
NAWL English-English
NAWL English-French
NAWL English-Italian
NAWL English-Japanese
NAWL English-Korean
NAWL English-Dutch
NAWL English-Portuguese
NAWL English-Russian
NAWL English-Spanish
NAWL English-Swedish
NAWL English-Thai
NAWL English-Turkish

New General Service List (NGSL)

NGSL English-Arabic
NGSL English-Chinese
NGSL English-German
NGSL English-Greek
NGSL English-English
NGSL English-French
NGSL English-Italian
NGSL English-Japanese
NGSL English-Korean
NGSL English-Dutch
NGSL English-Portuguese
NGSL English-Russian
NGSL English-Spanish
NGSL English-Swedish
NGSL English-Thai
NGSL English-Turkish

TOEIC Service List (TSL)

TOEIC English-Arabic
TOEIC English-Chinese
TOEIC English-German
TOEIC English-Greek
TOEIC English-English
TOEIC English-French
TOEIC English-Italian
TOEIC English-Japanese
TOEIC English-Korean
TOEIC English-Dutch
TOEIC English-Portuguese
TOEIC English-Russian
TOEIC English-Spanish
TOEIC English-Swedish
TOEIC English-Thai
TOEIC English-Turkish
Posted in Coding, TEFL

10 years in Japan

Today I mark 10 years living and working in Japan. To commemorate the occasion, here is one of my first blog posts from October 2006:

Some things about Japan that I’ve noticed:

  • The plugs don’t have switches, so if you want to turn something off, you have to physically unplug it
  • Semi-automatic doors: they lack motion sensors and only open when you press the button
  • Pelican crossings have no buttons to press
  • When it rains, everyone uses an umbrella
  • There are little racks in which to put your wet umbrella when entering shops
  • The Japanese are incredibly polite: one night some of us got lost, and when we asked for directions, we were escorted by a stranger for a good half-mile to the train station, which was the opposite direction to which he had been walking
  • The local gaijin pub, Mattari, serves fish and chips
  • The Japanese like queuing even more than the British. You might even expect to find them queuing on the platform for trains
  • There are lots of bikes
  • Pachinko parlors: buy yourself a tub full of ball bearings and pour them into an inverted pinball machine. Adopt an expression of post-lobotomy desolation. These places are completely insane.

For a more comprehensive run down of the past decade, check out my post on TEFL Journey.

Posted in TEFL

20 Tech Tips from Vocab@Tokyo 2016

  1. Tom Cobb’s venerable Lex Tutor now has a mobile interface
  2. Collins and Merriam-Webster both provide free online dictionaries
  3. The University of Texas at Austin provides a wide selection of free handouts (PDF) for teachers of English language writing
  4. Calibre is a comprehensive e-book manager and converter
  5. OmniPage and ABBYY FineReader are powerful OCR (Optical Character Recognition) applications
  6. The Lexical Research Foundation is “a not-for-profit organisation to promote excellence in lexical and vocabulary acquisition, description and pedagogy.”
  7. AntWordProfiler, Web VocabProfile, Range, and P_Lex (PDF) are tools for profiling lexical sophistication of a text, i.e. the proportion of advanced (rare) vocabulary…
  8. …while TextInspector can be used to measure lexical variation, i.e. the proportion of word types to tokens
  9. Michael Covington has developed a number of algorithms and tools for analyzing texts, including Moving Average Type-Token Ratio (MATTR)
  10. Paul Nation’s book, What You Need to Know to Learn a Foreign Language, is available as a free PDF download…
  11. …as are all his Vocabulary Size Tests (VST)…
  12. …which can also be taken online via Tom Cobb’s site
  13. Laurence Anthony’s WebSCoRE is “a free, parallel concordancer with a specially developed bilingual pedagogical corpus”
  14. Paul Meara’s Lognostics website “is designed to provide access to up to date research tools for people working in the field of Second Language Vocabulary Acquisition”
  15. Vocabulary Learning and Instruction (VLI) is an open access international journal for research relating to vocabulary acquisition, instruction, and assessment.
  16. Showbie is a great tool for keeping digital portfolios of students’ work
  17. Coh-Metrix is a system for computing computational cohesion and coherence metrics for written and spoken texts
  18. Lexile Analyzer can be used to compute the complexity of a text, including sentence length and word frequency
  19. Cambridge University Press’s English Vocabulary Profile (EVP) “offers reliable information about which words and phrases are known and used by learners at each level of the Common European Framework (CEF)”
  20. The CEFR-J website provides a series of “can-do” descriptors specifically for English language teaching contexts in Japan.
Posted in TEFL

30 Tech Tips from JALT CALL 2016


  1. James Rogers gives pronunciation advice for Japanese learners of English
  2. Linode is a powerful and good value web host
  3. The Multiplayer Classroom (Lee Sheldon) was one of the first publications arguing for gamification of education
  4. Class Craft helps you to make learning an adventure
  5. Socrative allows you to administer assessments and surveys via mobile phones
  6. Kahoot provides gamified classroom activities
  7. QuizUp offers a competitive multi-player gaming experience
  8. Sendtodropbox is a great way of getting files from your students into your Dropbox account…
  9. …while QuickVoice (iOS) allows you to record and send audio files as email attachments up to 5MB in size…
  10. …and MailVU are specialists in sending video via email
  11. Moxtra is a mobile-first embeddable collaboration platform…
  12. …and VoiceThread allows students to submit audio as attachments to images
  13. Schoology is a modern Learner Management System
  14. Ginger offers a variety of apps for online translation and grammar checking…
  15. …while Grammarly claims to make you a better writer by finding and correctly 10 times more mistakes than you word processor
  16. WikiTude is the world’s leading augmented reality SDK
  17. Diigo allows you to annotate and save web pages as you browse them
  18. Tiki Toki is web based software for creating beautiful timelines
  19. iBuildApp allows you to easily make apps for iOS or Android
  20. Mobyx (iOS) provides high quality VOIP (Voice over IP) services
  21. KanjiTomo is a comprehensive OCR (Optical Character Recognition) application for Japanese characters…
  22. …while Yomiwa (iOS) provides a real-time offline camera translator for Japanese…
  23. …and Perfect Master Kanji (iOS) is a fully fledged kanji practice app for people learning Japanese as a foreign language…
  24. …and Nihongo Shark provides free daily lessons for learners of Japanese
  25. Discord provides all-in-one text and voice chat for gamers
  26. Continuous Partial Attention (Linda Stone) is “motivated by a desire to be a LIVE node on the network”
  27. Wiggle allows you to easily import sporting goods and accessories (a weird one, but a good one for those of us who struggle to find bicycles big enough in Japan!)
  28. Phonologics offers automated pronunciation testing
  29. Words and Monsters taps into the addictive game play of apps like Puzzle and Dragons and Candy Crush by offering uncertain and unexpected rewards
  30. Paul Howard Jones is the preeminent expert on the effect of games on the brain
Posted in TEFL

JALT CALL 2016 “Unconference”: App Exchange

mobile-apps-pile-ss-1920We all like to leave presentations/conferences with practical tools that we can put straight to use in the classroom. Almost every teacher has a selection of bookmarks or apps they come back to time and time again. The most popular of these apps and sites are well-known, but some little gems can take a long time to spread via word-of-mouth.

In this 30 minute slot, attendees were asked to write the names of their favorite ELT apps and sites on the whiteboard. We amassed an impressive 73 80 apps and sites in total. Here is the list, in alphabetical order:

  1. a4esl
  2. Anki
  3. Apps 4 EFL
  4. Breaking News English Lessons
  5. Busuu
  6. Cambridge English Online
  7. CamScanner
  8. Can You Escape
  9. Doki-Doki Universe
  10. Dotsub
  11. Duolingo
  12. English Clip
  13. English Listening Lesson Libary Online
  14. EnglishCentral
  15. EnglishClass101.com
  16. engVid
  17. ESLvideo.com
  18. Espresso English
  19. EuroNews
  20. Extensive Reading Central
  21. Flubaroo
  22. Formfuse
  23. Freerice.com
  24. Google Sheets
  25. Grammarly
  26. Health Matters
  27. Imiwa
  28. Inogolo
  29. iTalki
  30. Kahoot!
  31. Keybr
  32. LanguageCaster
  33. Listen and Write
  34. Many Things
  35. Memrise
  36. MeWe
  37. Movie Clips
  38. New Internationalist Easier English Wiki
  39. Newsela
  40. Nobelprize.org
  41. Odd News
  42. One Night Ultimate Werewolf
  43. Padlet
  44. Peanut Gallery
  45. Pearson VUE
  46. PhoTransEdit
  47. Phrasebot
  48. PhraseMix
  49. Pixabay
  50. POPjisyo
  51. Quizlet (Live)
  52. QuizUp
  53. Readlang
  54. Reuters
  55. RhinoSpike
  56. Samorost
  57. Ship or Sheep
  58. Simple English Wiktionary
  59. Simple English Videos
  60. Socrative
  61. Spaceteam ESL
  62. Spreeder
  63. Story Dice
  64. Storybird
  65. TED (Ed)
  66. Telltale Games
  67. The Mixxer
  68. Tiki-Toki
  69. TitanPad
  70. Trace Effects
  71. TypeIt
  72. VOA
  73. Vocabulary.com
  74. WeVideo
  75. Word Engine
  76. Words & Monsters
  77. Wordfast
  78. WordFlex
  79. Xreading
  80. ZType

Do you use any of these apps or sites in your teaching contexts? Do you have any other recommendations? Please write your comments in the box below!

Posted in TEFL

The rocky road to LMS web app integration

Part 1

I’m an amateur coder. For the past couple of years, I’ve been developing a site called Apps 4 EFL; half LMS, half Web-Based Language Learning platform. It all started when I wanted to automatically generate language learning activities directly from Wikipedia articles. I’d had a bit of experience coding as a teenager, but then went on to pursue other interests (chiefly become an EFL teacher). It was a steep learning curve to develop the extant coding knowledge I had enough to achieve my aims, but I think I did an OK job in the end (the site works, although I doubt it’s the most efficient or clean code by a long way). The main reason I was able to achieve my aims was because of the vast array of tutorials and example code available on the web these days, including:

…to name a few. I highly recommend these resources to anybody thinking of learning coding from scratch, or developing the skills they already have.

Part 1 TLDR: I developed some useful(?) web apps for teachers and learners of EFL.

Part 2

Originally, Apps 4 EFL had no way to track learners’ progress. It simply generated pedagogical activities learners could complete online. So I decided to implement some kind of tracking system, and this is where things got complicated.

Signing up for websites is one of the worst things about the internet. Period. They all seem to require a different set of information about you, none of which you really feel they need to know. Your birthday. Your email. Sometimes your telephone number and address. And the passwords. SO. MANY. PASSWORDS.

Now multiply that issue by 25, for the number of students you have in your class.

Now multiply again by 10 for the number of classes you teach a week.

Suddenly you have 250 accounts to register, and 250 students who risk forgetting the passwords they have created, not to mention the URLs of the site(s) themselves.

Part 2 TLDR: Teachers needed a way to track their users engagement with the apps, but didn’t want to register all their students for yet another website.

Part 3

This is where LMSs (Learner Management Systems) come in to play. They take care of user registration so you don’t have to. LMSs such as Moodle even offer an array of different question types, some of which are amenable to EFL pedagogy. However, they are designed to accommodate a wide range of teaching and learning contexts, and therefore lack the specific tools we might want for our own unique disciplines.

But don’t think the LMS creators hadn’t thought of this issue – they had, and it wasn’t long before there were proposals for a variety of ways to link LMSs to external apps and share data between the LMS and the app.

Part 3 TLDR: LMSs can be used to manage user registration whilst facilitating access to subject specific tools.

Part 4

There are several solutions now available to (amateur) app developers to get their tools working in conjunction with existing LMSs:

  1. Passing parameters through the URL
  2. Creating an LMS plugin
  3. Creating an LTI compliant app

Passing parameters through the URL

The first of these methods is the easiest, but the most limited, as data can only go one way (from the LMS to the app). It can be achieved through Moodle, for example, by adding the “URL” resource to a course. Once added, a section called “URL Parameters” is available, which can be used to pass information about the LMS, the course, and the specific user through URL parameters to the target tool (which can be accessed by the tool with a simple $_GET statement in PHP).

Creating an LMS plugin

The second method allows for much greater integration with the LMS, and allows data to flow both to and from the app, so scores obtained from the app can be saved directly to the LMS. However, the drawbacks are numerous. Only users of that particular LMS will be able to utilize your app, and you’ll have to develop separate versions for other LMSs (and yes – there are a quite few of them) as required. The second problem is that its much more difficult to develop LMS plugins than it is to develop standalone web-apps, not least because you have to understand the way the LMS itself is designed and written before you can even start developing your app. Tutorials, where available, may be out-of-date or incomplete.

Creating an LTI compliant app

Learning Tools Interoperability (or “LTI” for less of a mouthful) is a specification developed (and trademarked?!) by the IMS Global Learning Consortium.  It basically provides a way for LMSs to interact with external apps, and most importantly for data to flow both ways, so user information can be provided to the app, and progress data can flow back to the LMS. It is by far the most promising of the three methods discussed here, and also the most complicated and difficult to implement, especially for amateur coders working alone.

Just take a look at the implementation guide. Go on, I dare you. It’s only 12,000 words long.

OK, well maybe we don’t need to read the whole thing to get it working. There must be some example code? Yes, there is. But it’s out of date (the PHP code pertains to version 1 of the specification, not the latest version 2), and the implementation instructions are enough to make you go cross-eyed. It is by no means facile to get an LTI compliant app working. Further complications are caused by the ever-so-slightly yet infuriatingly different specifications adopted by different LMSs (Canvas vs Moodle, for example). And this is coming from someone who has navigated his way through Apple’s needlessly complicated app provisioning process.

Part 4 TLDR: The current ways to integrate external apps with LMSs are either too limited or overly complex to set up

Part 5 – The final word

For educators dabbling in code (and this is something that’s going to increase as programming enters the curriculum) there needs to be a simple yet powerful way to implement web apps with LMSs. Not every app should have to provide a complicated user management system in order to track progress – not when better solutions already exist, and when doing so only complicates teacher’s lives instead of making them easier and more productive. Something between URL parameters and LTI compliance would be fantastic, and hopefully a solution will present itself in the near future.

Posted in Coding, TEFL

10 Tech Tips from Moodle Moot Japan 2016

  1. Learn anything online with MOOCs (Massive Online Open Courses) from EdX.
  2. Take a look at Manaba, the LMS for educators in Japan.
  3. You can create a Question Bank with Moodle that can be used to easily generate quizzes.
  4. Moodle’s AIKEN format can be used to easily create multiple choice quizzess…
  5. …while the GIFT format provides a wider range of question types.
  6. Check out PoodLL – a set of plug-ins for Moodle for recording audio and video
  7. You can set up a virtual international exchange with Soliya.
  8. The Open University has a number of freely available plugins for Moodle…
  9. …and you can easily install additional Moodle question types.
  10. Motivate and engage students through Moodle using real time response tool, Zapette.
Posted in TEFL

20 tech tips from ETJ Tokyo 2016

  1. Wunderlist is a great way to organize your tasks, and sync across a variety of devices
  2. Slack is a messaging app for teams
  3. Storybots makes your child the star of their own story
  4. Mindmeister is a free browser-based mind-mapping tool
  5. Zuknow is a language flashcard app for Android and iOS
  6. Picmonkey is a browser-based tool for creating “mind blowing” photos
  7. You can create animated videos and presentations with Powtoon
  8. …or try the simple do-it-yourself tools at GoAnimate
  9. Livestream allows you to watch and broadcast live events online
  10. With Sitepal you can create talking avatars for your website or blog
  11. Learn to code with Codeacademy
  12. …and optimize your JavaScript with Google Closure
  13. Take an online course with Udemy
  14. …or learn for free at Khan Academy
  15. You can create and sell your own online courses with the LearnDash WordPress LMS plugin…
  16. …or create and deploy online courses with Udutu
  17. …or easily create your own stunning website with Wix
  18. Brush up on your artistic skills with Paint.NET
  19. Check out the fun and educational kids videos at Dream English
  20. Explore the resources for Japanese learners of English at ALC
Posted in TEFL