Critical Source Materials for Buddhist Textual Studies
The Resource for Kanjur and Tanjur Studies (rKTs) presents an extended repository of
machine-generated e-texts. These digital renditions have been created using BDRC’s OCR
(Optical Character Recognition) application for Tibetan, released in March 2025.
Technical Implementation and Background
The repository employs a systematic approach to the organization and retrieval of textual
data:
• Direct OCR processing of source materials without subsequent human editorial
intervention
• Structural organization following canonical divisions and traditional text hierarchies
• Unicode compliance ensuring proper display of Tibetan script across platforms and
applications
• Downloadable formats permitting offline scholarly analysis and integration with
research workflows
These e-texts were generated using models developed by Eric Werner in collaboration with
Élie Roux, Pentsok W Rtsang, and the Monlam AI team.
Quality Statement
The present corpus constitutes raw OCR output without manual correction or editorial
review. As such, researchers should note:
• Character recognition accuracy varies according to script type, image quality, and
textual layout
• The OCR models employed are in continued development
• Results may exhibit inconsistencies in complex passages or where source materials
present degradation
• Occasional character substitutions, omissions, or misrecognitions may be present
Users are advised to verify critical passages against original sources when employing these
materials for substantive philological or doctrinal analysis.
Community Collaboration
The rKTs team welcomes scholarly contributions in the form of corrected or edited versions
of these texts. All contributions will be properly attributed to their original authors.
If you find errors in these e-texts or if you have good-quality e-texts to contribute, please
contact us.
The Journey of Digital Tibetan canonical collections