o �J�h��@sHddlZddlZddlmZddlmZmZddlmZddl Z ddl ZddlZddl mmZdd�ZdZdZd Zd ZeeZeee�ZedZeee�Zeee�Zefdeeejfd edejfdd�Zefdd�dedefdd�Zedd�dedej fdd��Z! d deeejej fdededeeeej"ffdd�Z#dS)!�N)� lru_cache)�Optional�Union)�writecCs||dksJ�||S)Nr�)�x�yrr�DC:\pinokio\api\whisper-webui.git\app\modules\diarize\audio_loader.py� exact_div sr i�>i��file�sr�returncCs:t|tj�r:|jtjkr|�tj�}|jdkrtj|dd�}tj ddd�}t |jt|d�tj ��|j}|��n|}zDzddd d d|dd dddddt|�dg}tj|ddd�j}Wntjys}ztd|j��|�d}~wwWt|tj�r�t�|�n t|tj�r�t�|�wwt�|tj ��tj�dS)a� Open an audio file or process a numpy array containing audio data as mono waveform, resampling as necessary. Parameters ---------- file: Union[str, np.ndarray] The audio file to open or a numpy array containing the audio data. sr: int The sample rate to resample the audio if necessary. Returns ------- A NumPy array containing the audio waveform, in float32 dtype. ��axisFz.wav)�delete�suffixi��ffmpegz-nostdinz-threads�0z-iz-f�s16lez-ac�1z-acodec� pcm_s16lez-ar�-T)�capture_output�checkzFailed to load audio: Ng�@)� isinstance�np�ndarray�dtype�float32�astype�ndim�mean�tempfile�NamedTemporaryFiler�name�SAMPLE_RATE�int16�close�str� subprocess�run�stdout�CalledProcessError�RuntimeError�stderr�decode�os�remove� frombuffer�flatten)rr� temp_file�temp_file_path�cmd�out�errr � load_audiosP �� r=��r�lengthrcCs�t�|�rC|j||kr|j|tj||jd�d�}|j||krAdg|j}d||j|f||<t�|dd�|ddd�D��}|S|j||krS|j t |�|d �}|j||krqdg|j}d||j|f||<t�||�}|S) zO Pad or trim the audio array to N_SAMPLES, as expected by the encoder. )�device)�dim�index)rrrcSsg|] }|D]}|�qqSrr)�.0�sizes�padrrr � <listcomp>eszpad_or_trim.<locals>.<listcomp>Nr>)�indicesr)�torch� is_tensor�shape�index_select�aranger@r$�FrE�take�ranger)�arrayr?r� pad_widthsrrr �pad_or_trimXs" � �rR)�maxsize�n_melscCsr|dvsJd|��t�tj�tj�t�dd��}t�|d|�� |�Wd�S1s2wYdS)a load the mel filterbank matrix for projecting STFT into a Mel spectrogram. Allows decoupling librosa dependency; saved using: np.savez_compressed( "mel_filters.npz", mel_80=librosa.filters.mel(sr=16000, n_fft=400, n_mels=80), ) )�P�zUnsupported n_mels: �assetszmel_filters.npz�mel_N) r�loadr4�path�join�dirname�__file__rH� from_numpy�to)r@rT�frrr �mel_filtersrs�$�ra�audio�paddingr@c Cs�t�|�st|t�rt|�}t�|�}|dur|�|�}|dkr(t�|d|f�}t� t ��|j�}tj|t t |dd�}|ddd�f��d}t|j|�}||}tj|dd ��} t�| | ��d �} | dd} | S)ap Compute the log-Mel spectrogram of Parameters ---------- audio: Union[str, np.ndarray, torch.Tensor], shape = (*) The path to audio or either a NumPy array or Tensor containing the audio waveform in 16 kHz n_mels: int The number of Mel-frequency filters, only 80 is supported padding: int Number of zero samples to pad to the right device: Optional[Union[str, torch.device]] If given, the audio tensor is moved to this device before STFT Returns ------- torch.Tensor, shape = (80, n_frames) A Tensor that contains the Mel spectrogram NrT)�window�return_complex.r>r g��|�=)�ming @g@)rHrIrr,r=r^r_rMrE�hann_window�N_FFTr@�stft� HOP_LENGTH�absra�clamp�log10�maximum�max) rbrTrcr@rdri� magnitudes�filters�mel_spec�log_specrrr �log_mel_spectrogram�s" rt)rN)$r4r-� functoolsr�typingrrZscipy.io.wavfilerr&�numpyrrHZtorch.nn.functional�nn� functionalrMr r)rhrj�CHUNK_LENGTH� N_SAMPLES�N_FRAMES�N_SAMPLES_PER_TOKEN�FRAMES_PER_SECOND�TOKENS_PER_SECONDr,r �intr=rR�Tensorrar@rtrrrr �<module>sD &:��