The MIDI interface is originally designed for electronic musical instruments but we consider this music-note based coding concept can be extended for general acoustic signal description. We proposed applying the MIDI technology to coding of bio-medical auscultation sound signals such as heart sounds for retrieving medical records and performing telemedicine. Then we have tried to extend our encoding algorithm and improve the coding precision based on Generalized Harmonic Analysis in order to apply to vocal sounds. Currently, we can separate vocal sounds included in popular songs and encode given musical sounds into a multiple-channel MIDI format. This can reproduce complete musical signals including singing sounds using a GM-standard MIDI tone generator. In this paper, we present our novel automatic XML transcription method devised on our extended MIDI coding research project: "Structured Description of Acoustic Information," including two coded examples.