The Hans Rausing Endangered Languages Project  The Hans Rausing Endangered Languages Project

ONLINE RESOURCES FOR ENDANGERED LANGUAGES

Technology and techniques


Cartography 

Language Map Server
Östen Dahl and Ljuba Veselinova

Discusses issues relating to mapping languages using GIS systems.

LL-MAP
LINGUIST List

LL-MAP is a project designed to integrate language information with data from the physical and social sciences by means of a Geographical Information System (GIS).

System of Exhibition and Analysis of Linguistic Data (SEAL)
Chitsuko and Yusuke Fukushima

The SEAL system, developed and published by Chitsuko and Yusuke Fukushima first in 1983, works on a personal computer and can be used to process and analyze geolinguistic data and produce linguistic maps.

World Language Mapping System
Global Mapping International

GIS language map data for the languages listed in Ethnologue, with an emphasis on Christian missionary applications.

Character encoding 

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Joel Spolsky

A simple introduction to Unicode for programmers.

Character Sets
i18nGurus.com

A collection of character set related links.

Dan's Web Tips: Characters and Fonts
Dan Tobias

A beginner's introduction to character set issues.

Getting Started: Unicode
Penn State University

This page provides an overview of encoding foreign language text electronically and provide an overview of what kinds of utlities and fonts are needed to support individual languages.

The International Phonetic Alphabet in Unicode
John Wells, University College London

A simple guide to displaying and using Unicode IPA characters.

Script Encoding Initiative
Deborah Anderson (University of California, Berkeley)

The Script Encoding Initiative (SEI), established in the UC Berkeley Department of Linguistics in April 2002, is a project devoted to the preparation of formal proposals for the encoding of scripts and script elements not yet currently supported in Unicode (ISO/IEC 10646).

The secret life of Unicode
Suzanne Topping (BizWonk, Inc.)

Discusses some of the weak points of the Unicode standard.

A Simple Character Entity Chart
Adrian Roselli

Character entities for HTML.

TECkit
Jonathan Kew (SIL)

A text encoding conversion toolkit allowing the creation of custom mappings between arbitrary encodings which can be used to automate file conversions.

Unicode Home Page
Unicode Consortium

Full information on Unicode, together with technical reports and proposals.

XML and Unicode
Robin Cover

An extensive collection of Unicode-related links.

Digital archival 

ANA Native Language Preservation: A Reference Guide for Establishing Archives and Repositories
Administration for Native Americans

The Native Language Preservation: A Reference Guide for Establishing Archives and Repositories is a book for sale explaining why language repositories are vital to long-term language preservation efforts, offers advice on what to preserve and how to think about cataloging, includes interviews with curators of large collections and descriptions of construction techniques that will assist in the preservation of irreplaceable treasures, includes policies for repositories and instructions on how to find materials that have already been saved in government and other collections, and contains information on how to develop a disaster plan.

Archivists' Toolkit: Appraisal and Accessioning
Archivists' Association of British Columbia

Some links for appraisal and accessioning of archival material.

Ask an Expert
LINGUIST List

Ask-An-Expert is a service provided by The LINGUIST List, as part of the E-MELD School of Best Practice. It is staffed by a panel of E-MELD advisors with technical expertise, who have volunteered to give their time to help fellow linguists follow recommended practices in digitizing language documentation.

Audio Preservation
Hannah Frost, American Library Association

A bibliography and collection of links on audio preservation, particularly for librarians.

Barren Lands Digital Collection
J. B. Tyrrell, University of Toronto

A digital collection consisting chiefly of material describing the two Barren Lands expeditions of 1893 and 1894 for the Geological Survey of Canada. It includes over 5,000 images from original field notebooks, correspondence, photographs, maps and published reports.

Basic Oral Language Documentation
Bird, Steven

Training university students and literacy teachers to collect and curate oral texts from indigenous languages, with a focus on Papua New Guinea.

Cedars Guide to Digital Preservation Strategies
Cedars Project (University of Leeds, Oxford, Cambridge)

A guide to technical approaches to digital preservation and archiving, aimed principally at librarians.

Cedars Project
Cedars Project (University of Leeds, Oxford, Cambridge)

Cedars began in April 1998 and ended in March 2002. Its broad objective was to explore digital preservation issues. These range through acquiring digital objects, their long-term retention, sufficient description, and eventual access.

Chiricahua and Mescalero Apache Texts
Harry Hoijer, University of Virgina

An electronic version of a volume of Apache texts, with some functionalities added.

Developing Linguistic Corpora: A Guide to Good Practice
ed. Martin Wynne, AHDS

This Guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will also find the guidelines here useful.

Digital Curation Centre
Digital Curation Centre

The Digital Curation Centre has been established to help solve the extensive challenges of digital preservation and to provide research, advice and support services to UK institutions.

Digital Endangered Languages and Musics Archives Network (DELAMAN)
DELAMAN

The Digital Endangered Languages and Musics Archives Network was established in 2003 as an international umbrella body for archives and other initiatives with the goal of documenting and archiving endangered languages and cultures worldwide. Their aim is to stimulate interaction about practical matters that result from the experiences of fieldworkers and archivists, and to act as an information clearinghouse. DELAMAN is intended as an open organisation where any initiative actively contributing to documentation and archiving of endangered languages and musics can participate.

Digital Libraries
William Arms

This online edition of Digital Libraries is an updated version of the book of the same name published by the M.I.T. Press in January 2000.

Digital Recordkeeping: Guidelines for Creating, Managing and Preserving Digital Records
National Archives of Australia

Guidleines for digital recordkeeping

EMELD
Wayne State University, Eastern Michigan University, University of Arizona, Linguistic Data Consortium, Endangered Language Fund

Electronic Metastructure for Endangered Language Data, a project with the objective of aiding in the preservation of endangered languages data and documentation and in the development of the infrastructure necessary for effective collaboration among electronic archives.

Ethnomusicological Video for Instruction and Analysis
Indiana University, University of Michigan

The EVIA Digital Archive project is a joint effort of Indiana University and the University of Michigan to establish a digital archive of ethnomusicological video for use by scholars and instructors. It aims to preserve video recordings and make them easily accessible for teaching and research, providing an alternative to physical archives by creating a functioning digital repository and delivery system containing approximately 150 hours of digital video and accompanying metadata. Part of this metadata will include annotations and analysis of video content by the scholars who made the recordings.

Ethnomusicology Archive Report
UCLA Ethnomusicology Archive

The EAR is an informal discussion of ethnomusicology archiving at UCLA and in the world, issued four times a year.

General Guide to Audiovisual Preservation
PrestoSpace

If you have audiovisual media, it needs maintenance – or you will lose it. This guide shows how to maintain it.

International Council on Archives
International Council on Archives

The mission of ICA is to promote the preservation and use of archives around the world.

ISO 639-2
Library of Congress

The Library of Congress has been designated the ISO 639-2/RA for the purpose of processing requests for alpha-3 language codes comprising the International Standard, Codes for the representation of names of languages-- Part 2: alpha-3 code. Note that this standard will be superseded by ISO 639-3.

ISO/FDIS 639-3
SIL International

This is the home page for Part 3 of the ISO 639 family of standards, Codes for the representation of names of languages. ISO 639-3 (which is currently a Final Draft International Standard) attempts to provide as complete an enumeration of languages as possible, including living, extinct, ancient, and constructed languages, whether major or minor, written or unwritten. Largely based on the Ethnologue codes.

Language Archives Newsletter
ed. David Nathan, Romuald Skiba, Marcus Uneson

The Language Archives Newsletter provides news and informative articles about topics in endangered languages, especially archiving, fieldwork, language documentation, data and media management, computer tools, and developments in relevant technologies. LAN warmly welcomes submissions of news, reviews, and articles from anyone working in these areas.

Language engineering for the Semantic Web: a digital library for endangered languages
Shiyong Lu, et al., Wayne State University

This paper describes describe the effort undertaken at Wayne State University to preserve endangered languages using the state-of-the-art information technologies. They discuss the issues involved in such an effort, and present the architecture of a distributed digital library which will contain various data of endangered languages in the forms of text, image, video and audio files and include advanced tools for intelligent cataloguing, indexing, searching and browsing information on languages and language analysis. Various Semantic Web technologies such as XML, OLAC, and ontologies are used so that the digital library is developed as a useful linguistic resource on the Semantic Web.

Lesser Known Languages of India
CIIL, Mysore, and Uppsala University

The aim of this project is to collect, organize and disseminate information on some lesser-known Indian languages, many of which are threatened with extinction. The project will include linguistic documentation (i.e. texts and speech files) as well as documentation anchoring this linguistic material to social and cultural aspects of these communities. Not much has been put online here so far.

Linguistic Data Consortium
University of Pennsylvania

The Linguistic Data Consortium supports language-related education, research and technology development by creating and sharing linguistic resources: data, tools and standards.

LOCKSS
Stanford University Libraries

LOCKSS (for "Lots of Copies Keep Stuff Safe") is open source software that provides librarians with a way to collect, store, preserve, and provide access to their own, local copy of authorized content they purchase, creating low-cost, persistent, accessible copies of e-journal content as it is published.

The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials
Humanities Advanced Technology and Information Institute (HATII), University of Glasgow, and National Initiative for a Networked Cultural Heritage (NINCH)

A guide to good practice in digital archival.

Online Computer Library Center
Online Computer Library Center

Resources and news for librarians.

Open Language Archives Community
Open Language Archives Community

OLAC, the Open Language Archives Community, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources.

PARADISEC
PARADISEC

PARADISEC (Pacific And Regional Archive for Digital Sources in Endangered Cultures) offers a facility for digital conservation and access for endangered materials from the Pacific region, defined broadly to include Oceania and East and Southeast Asia. Its links page gathers together a number of useful resources relevant to digital archiving.

Preservation: Audiovisual Carriers
Oxford University Library Services

A guide to the preservation of various audiovisual media.

RLG DigiNews
Cornell University Library

RLG DigiNews is a bimonthly electronic newsletter that focuses on digitization and digital preservation.

Securing Interpretability: The Case of Ega Language Documentation
Gibbon, Dafydd and Bow, Catherine and Bird, Steven and Hughes, Baden

"The prime consideration in designing sustainable language resources is to ensure that they remain interpretable for coming generations of users. In this paper we adopt a new perspective on resource creation - securing the interpretability of data, using a case study of Ega, an endangered African language for which a small amount of legacy data is available. Basic steps to securing interpretability are to transfer files to durable media, and where possible, to convert all legacy data into XML files with Unicode character encodings. In the absence of agreed "best practice" standards, we propose a methodology of better practice to assist in the transition process towards this goal. We discuss a number of issues involed in securing interpretability of the lexicon, character encodings, interlinear glossed text, annotated recordings and nomenclature in linguistic descriptions, and describe our solutions."

Sustainability of Digital Formats: Planning for Library of Congress Collections
Caroline R. Arms and Carl Fleischhauer

The Digital Formats Web site provides information about digital content formats. The analyses and resources presented here will increase and be updated over time. They fall under four headings: Introduction, Sustainability Factors, Content Categories, Format Descriptions.

TalkBank
Brian MacWhinney (Carnegie Mellon University)

The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It will construct sample databases within each of the subfields studying communication. It will use these databases to advance the development of standards and tools for creating, sharing, searching, and commenting upon primary materials via networked computers.

Task Force to establish selection criteria of analogue and digital audio contents for transfer to data formats for preservation purposes
International Association of Sound and Audiovisual Archives

Examines issues and strategies regarding priorities in digital transfer. (PDF)

UCLA Phonetics Lab Archive
UCLA, Peter Ladefoged

For over half a century, the UCLA Phonetics Laboratory has collected recordings of hundreds of languages from around the world, providing source materials for phonetic and phonological research. Many of these are available on the site, from Bahasa Aceh to Yoruba.

UNESCO Archives Portal
UNESCO

The UNESCO Archives Portal gives access to websites of archival institutions around the world. It is also a gateway to resources related to records and archives management and to international co-operation in this area.

Using Text Encoding to Represent Linguistic Data
Gary Simons

A glossary containing key terms related to text encoding. Basic definitions are supplemented with pointers to further information resources.

Validation Manual for Written Language Resources
Oxford University

A manual on the evaluation of the markup of written language resources.

Fieldwork 

Anthropology/Linguistic Field Checklist
James A. Fox, Stanford University

Some suggestions on what to bring for fieldwork.

Batteries in Fact and Fiction
Hawaii Ham Radio Information Pages

More than you need to know about batteries.

Eva's Solar Page
Eva Lindholm

How to make a portable solar power recharger for fieldwork.

Linguistic Discovery (Dartmouth College)
Dartmouth College; ed. Lenore A. Grenoble, Lindsay J. Whaley

An online journal dedicated to the description and analysis of primary linguistic data.

Linguistics Fieldwork Preparation: A Guide for Field Linguists
University of Toronto

Linguistic Fieldwork Preparation: a guide for field linguists is meant to be a comprehensive web-resource for the benefit of the linguistic community at large, from those who teach courses in field methods, endangered languages, and language revitalization, to those who do or wish to conduct field research. It includes an extensive bibliography of pertinant readings, access to an array of technological tools, leads on funding bodies as well as course syllabi for field methods and language endangerment courses.

Some background information for travellers, field workers and visitors to New Ireland (Papua New Guinea)
Eva Lindstrom

Travel tips for fieldworkers in Papua New Guinea (New Ireland.)

Fonts and keyboards 

Bisharat: A12N Gateway
Osborn, Don

African language encoding, fonts, keyboards: discussion fora and reference pages.

Alan Wood's Unicode Resources
Alan Wood

A variety of resources, including fonts and software, for Unicode.

Diacritics Project
Filip Blažek

Tips for typographers on how to design diacritics, with comments on usage.

Gallery of Unicode Fonts
David McCreedy and Mimi Weiss

This Gallery displays samples of available Unicode fonts by writing system (roughly Unicode ranges).

IPA: Fonts
International Phonetic Association

IPA font links, somewhat outdated.

Language Geek
Christopher Harvey

A site offering keyboards, fonts, and summaries of the orthographies of a number of north American languages.

Linguist's Software
Linguist's Software

Commercial font sets for linguists.

Microsoft Keyboard Layout Creator
Microsoft

Extends the international functionality of Windows 2000, Windows XP, Windows Server 2003,and Windows Vista systems by allowing users to create new keyboard layouts from scratch

Microsoft Typography - Fonts and products
Microsoft

Information about Microsoft fonts, including their code ranges.

SIL Fonts for downloading
Victor Gaultney (SIL)

A number of downloadable Unicode fonts collectively covering the extended Latin, IPA, extended Arabic, Ethiopic, Burmese, Greek, Cyrillic, Hebrew, and Yi scripts.

Typography link pages
Luc Devroye

Links to typefaces for a wide variety of languages, font editors, and other information about typography.

Unicode Font Guide for Free/Libre Open Source Operating Systems
Ed Trager

This is a selective guide to Unicode-based fonts and script projects that contain Unicode CMAPs for mapping Unicode values to glyphs and can be downloaded and used legally for free.

Metadata 

Author-generated Dublin Core Metadata for Web Resources
Jane Greenberg, Maria Cristina Pattuelli, Bijan Parsia and W. Davenport Robertson

This paper reports on a study that examined the ability of resource authors to create acceptable metadata in an organizational setting. The results indicate that authors can create good quality metadata when working with the Dublin Core, and in some cases they may be able to create metadata that is of better quality than a metadata professional can produce.

Dublin Core Metadata Initiative
DCMI

The Dublin Core Metadata Initiative is an open organization engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models.

EAD: Encoded Archival Description
Library of Congress

The EAD Document Type Definition (DTD) is a standard for encoding archival finding aids using Extensible Markup Language (XML).

IMDI Metadata Tools
Max Planck Institute for Psycholinguistics

With the IMDI browser you can explore existing corpora from the MPI, DOBES, CGN and others. The IMDI editor is used to create IMDI metadata structures and descriptions for language resources like media files and annotations.

Metacrap
Cory Doctorow

A sceptical perspective on metadata in practice.

Metadata for your Digital Resource
Iain Wallace and Eileen Maitland, AHDS

This paper discusses different forms of documentation, from unstructured information to resource discovery and preservation metadata. The paper is intended to enable anyone embarking on a digitisation project to make informed choices about how to successfully document their digital resources.

The Protégé Ontology Editor
Stanford Medical Informatics

Protégé is a free, open source ontology editor and knowledge-base framework.

Photography 

Creating Digital Images: Digital Cameras
Technical Advisory Services for Images

This document looks at the underlying technologies that drive the digital cameras available today and shows how these technologies influence how the camera works and will hopefully enable you to make the correct choice of digital camera.

Got a Digital Camera for Christmas? Learn How to Use it Here
Digital Photography School

A series of tutorials intended to help new digital camera owners to get the most of their cameras.

Megapixel.net
Jupitermedia Corporation

A digital camera review web magazine.

Presentation format 

AMBULANT Open SMIL Player
Centrum voor Wiskunde en Informatica

An open-source media player with support for SMIL 2.1.

The Design of Online Lexicons
Sean Michael Burke

This work is an introduction to topics in the design of online lexicons.

Dictionary making in endangered speech communities
Mosel, Ulrike

This paper discusses a number of problems which are characteristic of lexicographic work in short-term language documentation projects and addresses the following issues: cooperation with the speech community, the selection of a dialect and the challenge to produce a useful piece of work meeting the scientific standards of lexicography in spite of limited resources of time, money and staff and the fact that the indigenous language is not well researched, the linguist does not have a thorough knowledge of the language and the indigenous assistants do not speak the lingua franca fluently.

Kirrkirr: software for the exploration of indigenous language dictionaries
Kevin Jansz, Christopher Manning, Nitin Indurkhya, and many others

Kirrkirr is a research project exploring the use of computer software for automatic transformation of lexical databases ("dictionaries"), aiming at providing innovative information visualization, particularly targeted at indigenous languages. It can generate networks of words automatically from dictionary data. Kirrkirr aims at a perceived gap in work being done elsewhere: while there is a lot of work on designing dictionary databases, and providing software for building and maintaining these databases, there is a dearth of work that exploits these databases to provide useful and fun tools for nontechnical end users.

Mātāpuna
Dave Moskovitz and the Māori Language Commission

The Mātāpuna Dictionary Writing System is a free, web-based, multi-user, multilingual dictionary writing system. The system assists with many aspects of lexicography, including team collaboration, routine error and consistency checking, corpus searching, publishing, and progress monitoring in addition to the traditional headword and entry management.

Recipe for a Successful Website
Nathan Shedroff

A simple introduction to web design.

Representing information about words digitally
Jane Simpson

The growth in the use of computers has transformed all aspects of dictionary-making, from collecting data about word meanings and uses, creating a set of dictionary entries, and displaying, using, preserving and distributing these entries and the data on which they are based. This paper discusses the transformations, and considers the ways in which dictionaries for minority languages are leading or lagging in the electronic-dictionary age. Illustrations are taken mostly from the uses of digital sound in modern multimedia dictionaries.

The semantics of markup
Gary Simons, William Lewis, and Scott Farrar

A method for mapping linguistic descriptions in plain XML into semantically rich RDF/OWL.

SMIL Authoring Tools
Sams Publishing

Tools for adding multimedia to Web pages.

Usability Engineering Page
Craig Marion

Usability engineering is a systematic approach to making software easier to use for the individuals who actually use it to get their work done. This page contains information on a variety of usability techniques and evaluation methods.

W3C Internationalisation Activity
World Wide Web Consortium

The W3C Internationalization Activity has the goal of proposing and coordinating any techniques, conventions, guidelines and activities within the W3C and together with other organizations that allow and make it easy to use W3C technology worldwide, with different languages, scripts, and cultures.

The Web Developer's Handbook
Vitaly Friedman

An extensive library of essential bookmarks for web-designers and web-developers.

What Native Communities Want from Web-Based Data
Doug Whalen

Although the communities that make up or represent the native speakers of a language constitute potential users of a language database, they have concerns that go beyond those of a typical user. These include making it possible to download material easily into non-web formats; being able to place restrictions on who can access certain texts; and sharing in any tangible benefits that arise from their language material.

Wunderkammer and wkimport
Project for Free Electronic Dictionaries

Wunderkammer is a Java ME MIDlet for storing and displaying multimedia electronic dictionaries on mobile phones. wkimport is an application for importing electronic dictionaries in a variety of formats into Wunderkammer.

Regular expressions 

Natural Language Toolkit (NTLK)
Various

NLTK — the Natural Language Toolkit — is a suite of open source Python modules, data and documentation for research and development in natural language processing.

Python Resources for Linguists New to Programming
Michael A. Covington

Recommends resources for linguists new to programming trying to get to grips with Python, popular for text processing, corpus statistics, and the like.

The Regex Coach
Edi Weitz

The Regex Coach is a graphical application for Windows which can be used to experiment with (Perl-compatible) regular expressions interactively.

Regular Expression HOWTO
A. M. Kuchling

This document is an introductory tutorial to using regular expressions in Python with the re module.

Regular Expression Tutorial
Jan Goyvaerts

A tutorial in writing and processing regular expressions.

Sound recording 

Akustyk
Bartlomiej Plichta

A site offering, in addition to a vowel analysis program, discussions of recording equipment for linguists.

Audacity
Dominic Mazzoni et al.

Audacity is "a free, easy-to-use audio editor and recorder for Windows, Mac OS X, GNU/Linux, and other operating systems."

Audio Field Recording Equipment Guide: Vermont Folklife Center
Andy Kolovos, Vermont Folklife Center

This document is designed to offer guidance to researchers interested in obtaining audio recording equipment for conducting folklore, ethnomusicology, oral history and other ethnographic fieldwork projects.

Audio Media
Audio Media

A professional audio technology magazine

The Broadcast Wave Format
R. Chalmers - European Broadcasting Union

An introduction to BWF audio format.

Building the pod (Understanding Adobe Audition)
Bruce Williams

A guided tour of how to use Adobe Audition

dbPowerAmp
illustrate

A set of shareware audio software

Digital Editing of Field Audio
Andy Kolovos, Vermont Folklife Center

A guide to digital editing of audio recorded in the field.

Digital Voice Player 2.1
Sony

This is free, dedicated software for use with the DVF, ICS, MSV and WAV file types used by the ICD-BP100, ICD-BP120, ICD-MS1, ICD-R100 IC Recorders.

e-ARENA - Musiclab's Newsletter
Musiclab

An Australian audio equipment newsletter.

Electronic Design Laboratory
Electronic Design Laboratory

Commercial minidisc software for transferring tracks to PC and recovering data from corrupted minidiscs.

Equipment for Audio Recording of Speech
University College London

This page provides advice in the selection of audio equipment for the recording of speech, targeted at linguists and phoneticians.

Handbook for Recording Aboriginal Languages Vol. 1
Philip Djwa

This handbook is intended to provide a basic overview of video and audio recording techniques as they relate to Aboriginal languages. It includes specific suggestions for achieving high quality sound and video at a reasonable price, as well as tips for ensuring that the resources can be maintained and used over time.

HI-MD Renderer Program
Marcnet

This program renders HI-MD minidisc files that have been uploaded via SongStage into a .wav file.

How to Transfer Cassette Tape to Computer
WikiHow

A non-professional but handy set of tips for digitising cassette tapes.

Microphone Theory Links
Han-Kwang Nienhuys

A variety of links relating to microphones and recording techniques.

Microphones for the TRV900
John Beale

Advice on attaching microphones to video cameras.

Minidisc Frequently Asked Questions
Eric Woudenberg, minidisc.org

A rather extensive FAQ on minidiscs.

Nick Thieberger's home page
Nick Thieberger

Includes papers on audio concordances for linguists, notes for the computer-assisted language worker, and other useful resources.

Praat: doing phonetics by computer
Paul Boersma and David Weenink

Praat is a program for speech analysis and synthesis written by Paul Boersma och David Weenink at the Department of Phonetics of the University of Amsterdam (links on the Contents page).

Recording directly to laptop?
Transom

A discussion of how and whether to record audio directly to a laptop

A review of the Marantz PMD 660
Jeff Towne

Examines and tests the Marantz PMD660, a solid-state recorder.

Roland US - Edirol
Edirol

Audio recording equipment.

SemArch - Semitisches Tonarchiv
Ruprecht-Karls-Universit?t Heidelberg

A set of field-recorded sound files for (mainly endangered) Semitic languages

7-Series Recorders
Sound Devices

A commercial overview of a new recorder series.

The Sonic Spot: Sample Editors
The Sonic Spot

A list of software that can play, edit, fine-tune and often record audio files. Some can also send and receive samples from an external sampler.

Speech and Spoken Language Resources - Bibliography
Joaquim Llisterri, Universitat Aut?noma de Barcelona

A bibliography on sound recording for linguists, particularly in the context of corpora.

Stereo-Types
Jeff Towne

Collecting stereo sound in the field seems to be one of the most perplexing topics for recordists, There are a myriad of options encompassing equipment, technique and mixing. Mic placement, pick-up patterns, phase relationships and many more issues come into play

Transom
Transom

A showcase and workshop for new public radio, with a lot of useful information on technical and interviewing methods.

Transom Tools FAQ
Jeff Towne

Frequently asked questions about sound recording tools.

WavePad
NCH SwiftSound

This audio editing software is described as "a full featured professional sound editor for Windows."

What Microphone Do I Get?
Jeff Towne

There are lots of different kinds of microphone types: dynamic, condenser, ribbon, boundary, binaural, M-S and more. There are a myriad of pick-up patterns, different-sized diaphragms, variations in frequency response, sensitivity, self-noise, susceptibility to handling noise, wind or plosives. This article clarifies the possibilities.

Working with audio and video data on your PC
University College London

This page shows you how you can use free tools for sound and video capture and processing on your PC.

Transcription 

Audiamus
Nick Thieberger

A tool for building corpora of linked transcripts and digitised media.

Linguistic Annotation Wiki
Linguistic Annotation Wiki

This wiki describes tools and formats for creating and managing linguistic annotations. `Linguistic annotation‘ covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, "named entity" identification, co-reference annotation, and so on.

MPI-PL Tools
Max Planck Institute for Psycholinguistics

Software tools from the Max Planck Institute, including the audio/video annotator ELAN and the IMDI metadata suite.

Portability, Modularity and Seamless Speech-Corpus Indexing and Retrieval: A New Software for Documenting (not only) the Endangered Formosan Aboriginal Languages
Josef Szakos and Ulrike Glavitsch

SpeechIndexer has two versions, one for the preparation of data and one for the search and sharing of the database. The researcher correlates the transcribed morphemes with the highlighted data from the authentic audio recording and creates indices. He/she can then string-search the database according to morphemes, grammatical tags, etc., depending on the indices prepared.

Speech analysis and transcription software
Joaquim Llisterri, Universitat Aut?noma de Barcelona

A set of links to speech analysis and transcription software.

Toolbox
SIL International

Toolbox is a data management and analysis tool for field linguists. It is especially useful for maintaining lexical data, and for parsing and interlinearizing text, but it can be used to manage virtually any kind of data.

Transana
Chris Fassnacht, David C. Woods

Transana is software for professional researchers who want to analyze digital video or audio data. Transana lets you analyze and manage your data in very sophisticated ways. Transcribe it, identify analytically interesting clips, assign keywords to clips, arrange and rearrange clips, create complex collections of interrelated clips, explore relationships between applied keywords, and share your analysis with colleagues. The result is a new way to focus on your data, and a new way to manage large collections of video and audio files and clips.

Transcriber
Mathieu Manta, Fabien Antoine, Sylvain Galliano, Claude Barras

Transcriber is a tool for assisting the manual annotation of speech signals. It provides a user-friendly graphical user interface for segmenting long duration speech recordings, transcribing them, and labeling speech turns, topic changes and acoustic conditions. It is more specifically designed for the annotation of broadcast news recordings, for creating corpora used in the development of automatic broadcast news transcription systems, but its features might be found useful in other areas of speech research.

Video recording 

TalkBank Video Equipment
TalkBank

A list of equipment needed to record and capture digital video

XML 

Alchemist
Colin Sprague and Yu Hu

"The original purpose of Alchemist is to allow you to read in raw text files and create morphological gold-standards in XML format. Using Alchemist, you can identify morphemes, along with a number of important characteristics of the morphemes, such as whether they are roots or affixes, the degree of analyst certainty, and allomorphs of the morpheme."

Café con Leche XML News and Resources
Elliote Rusty Harold

News and links for XML.

Choosing an XML Editor
Thijs van den Broek, Arts and Humanities Data Service

With the increasing popularity of XML, the number of XML editors is also increasing and it can be difficult to choose the editor that best suits a particular user or task. The aim of this Information Paper is to provide an introduction to different features XML editors can have and the extent to which these features are implemented in various editors. It also presents the result of an evaluation exercise where different user groups tried a number of the editors.

Free XML Tools
Lars Marius Garshol

An index of free XML tools, with much metadata about the tools to make them easier to locate.

A Manager's Introduction to the Adobe XML Metadata Framework
Adobe

A very gentle introduction to XML and Adobe's software for it.

Working with XML: The Java API for Xml Processing (JAXP) Tutorial
Eric Armstrong

A tutorial in the use of XML in general and the Java XML API in particular.

XSLT transforms library
J. M. Vanel

This is a collection of XSLT transforms, models and reusable fragments under GPL, involving HTML tables, XML Schema, HTML GUI, MathML, SQL analogy, etc. This has been developed as part of the "Worlwide Botanical Knowledge Base" project.

Computational Resources for Linguistic Research
Bill Poser

This page lists computational tools for doing linguistics, emphasising free software that runs on Unix systems.