chamilo-lms/main/inc/lib/internationalization_database/language_detection/readme.txt

Libbrary of statistical profiles for language recognition
---------------------------------------------------------

The sample texts for dieffernt languages have been taken from
Perl module: Lingua::LanguageGuesser - http://gensen.dl.itc.u-tokyo.ac.jp/LanguageGuesser/LanguageGuesser_demo.html
Statistical Text Analysis - http://boxoffice.ch/pseudo/
Some random sample texts have been taken from Wikiedia - http://wikipedia.org/

All the sample texts should be UTF-8 encoded!

To understand how does language recognition work you need to read the following remarkable work:
W. B. Cavnar and J. M. Trenkle. N-gram-based text categorization. Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994.
http://citeseer.ist.psu.edu/cache/papers/cs/810/http:zSzzSzwww.info.unicaen.frzSz~giguetzSzclassifzSzcavnar_trenkle_ngram.pdf/n-gram-based-text.pdf

License: GNU General Public License (GPL) as published by the Free Software Foundation (http://www.fsf.org/); either version 2 of the License, or (at your option) any later version.
Assembled by Ivan Tcholakov, <ivantcholakov@gmail.com>
November, 2009
Feature #272 - The intenationalization library: Adding code for language and encoding detection support. 16 years ago			`Libbrary of statistical profiles for language recognition`
			`---------------------------------------------------------`

			`The sample texts for dieffernt languages have been taken from`
			`Perl module: Lingua::LanguageGuesser - http://gensen.dl.itc.u-tokyo.ac.jp/LanguageGuesser/LanguageGuesser_demo.html`
			`Statistical Text Analysis - http://boxoffice.ch/pseudo/`
			`Some random sample texts have been taken from Wikiedia - http://wikipedia.org/`

			`All the sample texts should be UTF-8 encoded!`

			`To understand how does language recognition work you need to read the following remarkable work:`
			`W. B. Cavnar and J. M. Trenkle. N-gram-based text categorization. Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994.`
			`http://citeseer.ist.psu.edu/cache/papers/cs/810/http:zSzzSzwww.info.unicaen.frzSz~giguetzSzclassifzSzcavnar_trenkle_ngram.pdf/n-gram-based-text.pdf`

			`License: GNU General Public License (GPL) as published by the Free Software Foundation (http://www.fsf.org/); either version 2 of the License, or (at your option) any later version.`
			`Assembled by Ivan Tcholakov, <ivantcholakov@gmail.com>`
			`November, 2009`