CBETA XML 說明

中華電子佛典協會 (CBETA) 已於近日發行一套全新的 XML 版本，此為 CBETA 佛典電子化工程的一個重大里程碑。為此發行，所有檔案均經仔細查驗，在確認合乎最新 TEI 指南（ P5 版本）的狀況下予以更新。本版的內碼全部使用統一碼 (Unicode) ， Unicode 中沒有對應的漢字，則使用 TEI 「缺字」（所謂 'gaiji'）模組。因而此次發行可說是首次能讓每個單一字元與國際開放標準相符合，如此更有利於進行文獻互換，並可由所有符合 XML 的工具加以處理。整套的檔案說明文件作為引伸是由 CBETA 團隊執行（使用標準 TEI ODD 引伸機制），並且依照三種文件定義語言的定義（就是 DTD 語言 , W3C Schema 語言 , Relax NG）。

以總數而言， CBETA 電子佛典現已發行等同於100冊紙本的數量，所以這次發行的經文足可開放一條嶄新大道方便學者研究。我們希望這不僅提供一個踏實的漢籍佛典研究服務，也期望藉此發行作為各類文本內碼處理的範例；或以此來開發新的工具，譬如用這些文本研發一個新的試驗平台等等。

The new release of a set of XML files by the Chinese Buddhist Electronic Text Association (CBETA) is a very significant step for the whole project. For this release, the files have been thoroughly checked and updated to confirm to the latest version of the TEI Guidelines (P5). The internal encoding is Unicode; characters that have no equivalent in Unicode are represented using the TEI "gaiji" module. We thus have for the first time a release were every single
character is identified in a way that is in accordance with international, open standards and thus allows easy interchange and handling by all conformant XML tools. The set of files comes with documentation for the extensions made by the CBETA project (using the standard TEI ODD extension mechanism) and with equivalent schema definitions using the DTD language, the W 3C Schema language and Relax NG as defined by the ISO.

All together, the electronic text now holds the equivalent of about 100 printed volumes. With this release, they become available to every interested researcher and open up completely new avenues for investigation. We hope this will not only provide a genuine service to the Chinese Studies and Buddhist Studies research communities, but will also be used as a an example for the encoding of new texts and as a test bed for tools developped to do new interesting things with
these texts.