o �J�hoP� @s�dZddlZddlZddlZddlmZddlmZddlm Z m Z m Z m Z m Z ddlmZmZmZmZddlmZddlmZmZmZdd lmZdd lmZdd lmZmZe�e �Z!d e"fd d�Z#dd�Z$dd�Z%d e"de&fdd�Z'de(de e e ffdd�Z)de"de"fdd�Z*de"de"fdd�Z+de"de"fd d!�Z,de"de"fd"d#�Z-de"de"fd$d%�Z.d!e"d&e"de"fd'd(�Z/d!e"d)e"de"fd*d+�Z0d,ede"fd-d.�Z1d/e"de"fd0d1�Z2d2e"de e"e e"ffd3d2�Z3d/e"de fd4d5�Z4d/e"de fd6d7�Z5d/e"de"fd8d9�Z6d:e d;e d<e"ddfd=d>�Z7d:e d;e de"d?e"ddf d@dA�Z8dBe de e e fdCdD�Z9d e"de(fdEdF�Z:d e"de"fdGdH�Z;de efdIdJ�Z<dS)KzBThis module contains all non-cipher related data extraction logic.�N)� OrderedDict)�datetime)�Any�Dict�List�Optional�Tuple)�parse_qs�quote� urlencode�urlparse)�Cipher)�HTMLParseError�LiveStreamError�RegexMatchError�� regex_search)�YouTubeMetadata)�parse_for_object�parse_for_all_objects� watch_htmlcCs>zt�d|�}|rt�|�d��WSWdStyYdSw)z�Extract publish date and return it as a datetime object :param str watch_html: The html contents of the watch page. :rtype: datetime :returns: Publish date of the video as a datetime object with timezone. z\(?<=itemprop=\"datePublished\" content=\")\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}[+-]\d{2}:\d{2}rN)�re�searchr� fromisoformat�group�AttributeError)r�result�r�OC:\pinokio\api\whisper-webui.git\app\env\lib\site-packages\pytubefix\extract.py� publish_dates�� �rcCs"dg}|D] }||vrdSqdS)z�Check if live stream recording is available. :param str watch_html: The html contents of the watch page. :rtype: bool :returns: Whether or not the content is private. z,This live stream recording is not available.FTr)rZunavailable_strings�stringrrr�recording_available&s ��r!cCs$gd�}|D] }||vrdSqdS)z�Check if content is private. :param str watch_html: The html contents of the watch page. :rtype: bool :returns: Whether or not the content is private. )zFThis is a private video. Please sign in to verify that you may see it.z"simpleText":"Private video"zThis video is private.TFr)rZprivate_stringsr rrr� is_private8s  �r"�returncCs*z td|dd�WdStyYdSw)z�Check if content is age restricted. :param str watch_html: The html contents of the watch page. :rtype: bool :returns: Whether or not the content is age restricted. zog:restrictions:ager�rFT)rr)rrrr�is_age_restrictedLs   ��r%�player_responsecCsh|�di�}d|vrd|dvrdSd|vr/d|vr#|d|dgfSd|vr/|d|dfSddgfS) a�Return the playability status and status explanation of a video. For example, a video may have a status of LOGIN_REQUIRED, and an explanation of "This is a private video. Please sign in to verify that you may see it." This explanation is what gets incorporated into the media player overlay. :param str player_response: Content of the player's response. :rtype: bool :returns: Playability status and reason of the video. �playabilityStatusZ videoDetailsZisLive)Z LIVE_STREAMzVideo is a live stream.�status�reason�messagesN)�get)r&Z status_dictrrr�playability_status\s   r,�jscC�td|dd�S)NzsignatureTimestamp:(\d*)�r$r)r-rrr�signature_timestampz�r0�response_contextcCr.)Nz>visitor_data[',\"\s]+value['\"]:\s?['\"]([a-zA-Z0-9_%-]+)['\"]r/r$r)r2rrr� visitor_data~r1r3�urlcCr.)arExtract the ``video_id`` from a YouTube url. This function supports the following patterns: - :samp:`https://youtube.com/watch?v={video_id}` - :samp:`https://youtube.com/embed/{video_id}` - :samp:`https://youtu.be/{video_id}` :param str url: A YouTube url containing a video id. :rtype: str :returns: YouTube video id. z(?:v=|\/)([0-9A-Za-z_-]{11}).*r/r$r)r4rrr�video_id�sr5cCstj�|�}t|j�ddS)aoExtract the ``playlist_id`` from a YouTube url. This function supports the following patterns: - :samp:`https://youtube.com/playlist?list={playlist_id}` - :samp:`https://youtube.com/watch?v={video_id}&list={playlist_id}` :param str url: A YouTube url containing a playlist id. :rtype: str :returns: YouTube playlist id. �listr)�urllib�parser r �query)r4�parsedrrr� playlist_id�s r;cCs�gd�}|D]5}t�|�}|�|�}|r;t�d|�|�d�}|�d�}|dkr2d|�d|��Sd|�|��Sqtddd ��) aExtract the ``channel_name`` or ``channel_id`` from a YouTube url. This function supports the following patterns: - :samp:`https://youtube.com/c/{channel_name}/*` - :samp:`https://youtube.com/channel/{channel_id}/* - :samp:`https://youtube.com/u/{channel_name}/*` - :samp:`https://youtube.com/user/{channel_id}/* - :samp:`https://youtube.com/@{channel_id}/* :param str url: A YouTube url containing a channel name. :rtype: str :returns: YouTube channel name. )z(?:\/(c)\/([%\d\w_\-]+)(\/.*)?)z%(?:\/(channel)\/([%\w\d_\-]+)(\/.*)?)z(?:\/(u)\/([%\d\w_\-]+)(\/.*)?)z"(?:\/(user)\/([%\w\d_\-]+)(\/.*)?)z (?:\/(\@)([%\d\w_\-\.]+)(\/.*)?)�"finished regex search, matched: %sr/��@�/� channel_name�patterns��caller�pattern�r�compiler�logger�debugrr)r4rArD�regex�function_matchZ uri_styleZuri_identifierrrrr@�s     .��r@� watch_urlcCs*td|fddt|�fddddg�}t|�S)aConstruct the video_info url. :param str video_id: A YouTube video identifier. :param str watch_url: A YouTube watch url. :rtype: str :returns: :samp:`https://youtube.com/get_video_info` with necessary GET parameters. r5)�ps�default�eurl)�hl�en_US��html5�1��cZTVHTML5�Zcverz 7.20201028)rr �_video_info_url)r5rK�paramsrrr�video_info_url�s  �� rY� embed_htmlcCs\z td|dd�}Wn tyd}Ynwd|��}td|fd|fd|fd d d g�}t|�S) a<Construct the video_info url. :param str video_id: A YouTube video identifier. :param str embed_html: The html contents of the embed page (for age restricted videos). :rtype: str :returns: :samp:`https://youtube.com/get_video_info` with necessary GET parameters. z"sts"\s*:\s*(\d+)r/r$�z!https://youtube.googleapis.com/v/r5rN�stsrQrTrV)rrrrW)r5rZr\rNrXrrr�video_info_url_age_restricted�s   � �� r]rXcCsdt|���S)Nz'https://www.youtube.com/get_video_info?)r )rXrrrrWr1rW�htmlc Cs>z t|�dd}Wnttfyt|�}Ynwd|��S)z�Get the base JavaScript url. Construct the base JavaScript url, which contains the decipher "transforms". :param str html: The html contents of the watch page. �assetsr-zhttps://youtube.com)�get_ytplayer_config�KeyErrorr�get_ytplayer_js)r^Zbase_jsrrr�js_url s   � rc�mime_type_codeccCsLd}t�|�}|�|�}|std|d��|��\}}|dd�|�d�D�fS)a�Parse the type data. Breaks up the data in the ``type`` key of the manifest, which contains the mime type and codecs serialized together, and splits them into separate elements. **Example**: mime_type_codec('audio/webm; codecs="opus"') -> ('audio/webm', ['opus']) :param str mime_type_codec: String containing mime type and codecs. :rtype: tuple :returns: The mime type and a list of codecs. z,(\w+\/\w+)\;\scodecs=\"([a-zA-Z-0-9.,\s]*)\"rdrBcSsg|]}|���qSr)�strip)�.0rUrrr� <listcomp>1�z#mime_type_codec.<locals>.<listcomp>�,)rrFrr�groups�split)rdrDrI�results� mime_type�codecsrrrrds    cCs`dg}|D]$}t�|�}|�|�}|r)t�d|�|�d�}t�d|�|Sqtddd��)z�Get the YouTube player base JavaScript path. :param str html The html contents of the watch page. :rtype: str :returns: Path to YouTube's base.js file. z'(/s/player/[\w\d]+/[\w\d_/.]+/base\.js)r<r/z player JS: rb�js_url_patternsrBrE)r^rorDrIrJZ yt_player_jsrrrrb4s �    ��rbc Cs�t�d�ddg}|D])}zt||�WSty4}zt�d|���t�|�WYd}~q d}~wwdg}|D]}zt||�WStyMYq:wtddd ��) a�Get the YouTube player configuration data from the watch html. Extract the ``ytplayer_config``, which is json data embedded within the watch html and serves as the primary source of obtaining the stream manifest data. :param str html: The html contents of the watch page. :rtype: str :returns: Substring of the html containing the encoded manifest data. zfinding initial function namezytplayer\.config\s*=\s*�ytInitialPlayerResponse\s*=\s*zPattern failed: Nz,yt\.setConfig\(.*['\"]PLAYER_CONFIG['\"]:\s*r`z#config_patterns, setconfig_patternsrB)rGrHrrr)r^Zconfig_patternsrD�eZsetconfig_patternsrrrr`Ns0 �  �� � ��r`c Cs^i}ddg}|D]}zt||�}|D]}|�|�qWqty$Yqw|r)|Stddd��)a;Get the entirety of the ytcfg object. This is built over multiple pieces, so we have to find all matches and combine the dicts together. :param str html: The html contents of the watch page. :rtype: str :returns: Substring of the html containing the encoded manifest data. z ytcfg\s=\sz ytcfg\.set\(� get_ytcfgZytcfg_pattenrsrB)r�updaterr)r^ZytcfgZytcfg_patternsrDZ found_objects�objrrrrr|s$ �  � ��rr�stream_manifest�vid_info�po_tokenc Cs�t�d�t|�D]R\}}z|d}Wnty+|�di��d�}|r)td��Ynwt|�}tt|�j�}dd�|� �D�}||d<|j �d |j �|j �d t |���}|||d<q d S) z�Apply the proof of origin token to the stream manifest :param dict stream_manifest: Details of the media streams available. :param str po_token: Proof of Origin Token. zApplying poTokenr4r'�liveStreamability�UNKNOWNcS�i|] \}}||d�qS�rr�rf�k�vrrr� <dictcomp>���z"apply_po_token.<locals>.<dictcomp>�pot�://�?N)rGrH� enumeraterar+rr r r9�items�scheme�netloc�pathr ) rurvrw�i�streamr4� live_stream� parsed_url� query_paramsrrr�apply_po_token�s*    ����"�r��url_jsc Csvt||d�}t�}t|�D]�\}}z|d}Wnty/|�di��d�} | r-td��Ynwt|�} tt|�j�} dd�| � �D�} d|vsTd |vrZd |vsTd |vrZt � d �n|j |d d �} t � d|d�| | d<d| � �vr�| d} t � d| ���| |vr�|�| �|| <nt � d�|| }|| d<t � d|���| j�d| j�| j�dt| ���}|||d<q dS)aApply the decrypted signature to the stream manifest. :param dict stream_manifest: Details of the media streams available. :param str js: The contents of the base.js asset file. :param str url_js: Full base.js url )r-rcr4r'rxrycSrzr{rr|rrrr�r�z#apply_signature.<locals>.<dictcomp>� signature�sz&sig=z&lsig=zsignature found, skip decipher)Zciphered_signaturez+finished descrambling signature for itag=%s�itag�sig�nzParameter n is: z%Parameter n found skipping decryptionzParameter n deciphered: r�r�N)r �dictr�rar+rr r r9r�rGrH� get_signature�keysZget_throttlingr�r�r�r )rurvr-r��cipherZ discovered_nr�r�r4r�r�r�r�Z initial_n�new_nrrr�apply_signature�sL    ���� �  "�r�� stream_datacCs�d|vrdSg}d|��vr|�|d�d|��vr"|�|d�|D]@}d|vrId|vrIt|d�}|dd|d<|dd|d<d|d <nd|vr[d|vr[|d |d<d |d <|�d �d k|d<q$t�d�|S)a-Apply various in-place transforms to YouTube's media stream data. Creates a ``list`` of dictionaries by string splitting on commas, then taking each list item, parsing it as a query string, converting it to a ``dict`` and unquoting the value. :param dict stream_data: Dictionary containing query string encoded values. **Example**: >>> d = {'foo': 'bar=1&var=test,em=5&t=url%20encoded'} >>> apply_descrambler(d, 'foo') >>> print(d) {'foo': [{'bar': '1', 'var': 'test'}, {'em': '5', 't': 'url encoded'}]} r4N�formatsZadaptiveFormatsZsignatureCipherrr�F�is_sabrZserverAbrStreamingUrlT�typeZFORMAT_STREAM_TYPE_OTF�is_otfzapplying descrambler)r��extendr r+rGrH)r�r��dataZ cipher_urlrrr�apply_descrambler s&      r�c C�@ddg}|D]}zt||�WStyYqwtddd��)z�Extract the ytInitialData json from the watch_html page. This mostly contains metadata necessary for rendering the page on-load, such as video information, copyright notices, etc. @param watch_html: Html of the watch page @return: z'window\[['\"]ytInitialData['\"]]\s*=\s*zytInitialData\s*=\s*� initial_dataZinitial_data_patternrB�rrr�rrArDrrrr�5s � � r�c Cr�)aExtract the ytInitialPlayerResponse json from the watch_html page. This mostly contains metadata necessary for rendering the page on-load, such as video information, copyright notices, etc. @param watch_html: Html of the watch page @return: z1window\[['\"]ytInitialPlayerResponse['\"]]\s*=\s*rp�initial_player_responseZinitial_player_response_patternrBr�r�rrrr�Ks � ��r�c Cstz|dddddddddd}Wnttfy'tg�YSwtd d �|�}d d �|D�}t|�S) u<Get the informational metadata for the video. e.g.: [ { 'Song': '강남스타일(Gangnam Style)', 'Artist': 'PSY', 'Album': 'PSY SIX RULES Pt.1', 'Licensed to YouTube by': 'YG Entertainment Inc. [...]' } ] :rtype: YouTubeMetadata �contentsZtwoColumnWatchNextResultsrlr/ZvideoSecondaryInfoRendererZmetadataRowContainerZmetadataRowContainerRenderer�rowscSs d|��vS)N�metadataRowRenderer)r�)�xrrr�<lambda>~s zmetadata.<locals>.<lambda>cSsg|]}|d�qS)r�r)rfr�rrrrg�rhzmetadata.<locals>.<listcomp>)ra� IndexErrorr�filter)r�Z metadata_rowsrrr�metadatads6 �������� ��r�)=�__doc__�logging� urllib.parser7r� collectionsrr�typingrrrrrr r r r Zpytubefix.cipherr Zpytubefix.exceptionsrrr�pytubefix.helpersrZpytubefix.metadatarZpytubefix.parserrr� getLogger�__name__rG�strrr!r"�boolr%r�r,r0r3r5r;r@rYr]rWrcrdrbr`rrr�r�r�r�r�r�rrrr�<module>sL      % ."#H,
Memory