GigaAM ASR for ONNX
Project purpose:
Usage of a latest (v3) versions of a GigaAMASR without additional dependencies and with localy stored models running in an ONNX runtime
Project setup:
- Set up original GigaAM project
# Clone original GigaAM repo:
git clone https://github.com/salute-developers/GigaAM
cd GigaAM
# Create temp venv:
python -m venv ./tmp_venv
source ./tmp_venv/bin/activate
# Install project:
pip install -e .
- Acquire chosen models:
import gigaam
onnx_dir = '/target/onnx/model/paths'
model_version = 'v3_ctc' # Options: v3_* models
model = gigaam.load_model(model_version)
model.to_onnx(dir_path=onnx_dir)
- Fetch SentencePieceProcessor tokenizer model
- From a cache
~/.cache/gigaam/{model_name}_tokenizer.model - From a sberdevices server
https://cdn.chatwm.opensmodel.sberdevices.ru/GigaAM/{model_name}_tokenizer.model
- Then you may remove original project:
cd ..
rm -r ./GigaAM
- Install this (gigaam-onnx) project
- Set up onnx runtime and load chosen model:
import onnxruntime as ort
from gigaam_onnx import GigaAMV3E2ERNNT, GigaAMV3RNNT, GigaAMV3E2ECTC, GigaAMV3CTC
import numpy as np
# Set up ONNX runtime
if 'CUDAExecutionProvider' in ort.get_available_providers():
provider = 'CUDAExecutionProvider'
else:
provider = 'CPUExecutionProvider"
opts = ort.SessionOptions()
opts.intra_op_num_threads = 16
opts.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
opts.log_severity_level = 3
e2e_rnnt_model = GigaAMV3E2ERNNT(
'/path/to/onnx/files/v3_e2e_rnnt_decoder.onnx',
'/path/to/onnx/files/v3_e2e_rnnt_encoder.onnx',
'/path/to/onnx/files/v3_e2e_rnnt_joint.onnx',
'/path/to/onnx/files/v3_e2e_rnnt_tokenizer.model',
provider,
opts
)
rnnt_model = GigaAMV3RNNT(
'/path/to/onnx/files/v3_rnnt_decoder.onnx',
'/path/to/onnx/files/v3_rnnt_encoder.onnx',
'/path/to/onnx/files/v3_rnnt_joint.onnx',
provider,
opts
)
e2e_ctc_model = GigaAMV3E2ECTC(
'/path/to/onnx/files/v3_e2e_ctc.onnx',
'/path/to/onnx/files/v3_e2e_ctc_tokenizer.model',
provider,
opts
)
ctc_model = GigaAMV3CTC(
'/path/to/onnx/files/v3_ctc.onnx',
provider,
opts
)
# Load wav 16kHz mono PCM
wav_data = ...
audio_array = np.array(wav_data)
# Single fragment transcribing with per-char timings
text, timings = ctc_model.transcribe(audio_array)
# Batch transcribing with per-char timings
text, timings = e2e_ctc_model.transcribe_batch([audio_array])[0]
# Batch joined transcribing - joins fragments by lengts provided and returns continuous text with per-char timings
text, timings = e2e_rnnt_model.transcribe_batch(
[audio_array] # audio chunks
[1] # length of chunks to combine
)[0]
Description
Languages
Python
100%