39 Commits

Author SHA1 Message Date
James Taylor
4bedf55461 yt_data_extract: Fix time_published picking up 'Streaming' string
This was causing an exception in subscriptions when it tried
to estimate the unix timestamp for the upload time
2020-08-12 14:40:47 -07:00
James Taylor
8e12551471 Switch to mobile api endpoint to fix 'Unknown error' blockage
See https://github.com/iv-org/invidious/issues/1319#issuecomment-671732646
2020-08-11 21:09:59 -07:00
James Taylor
fa61874f97 extract_items: Handle case where continuation has multiple
[something]Continuation renderers, all of which are junk
except one. Check the items in each one until the one which
contains the items being sought is found.
The usage in extract_comments_info needed to be changed to
specify the items being sought. It was unspecified before which
is strictly incorrect since extract_items by default looks for
video/playlist/channel thumbnail items. It was relying on this
special case for continuations. But now that wouldn't work
anymore.
2020-08-11 19:59:25 -07:00
James Taylor
81ff5ab99c extract_channel_info: Improve error extraction
Use extract_str function since it's not always 'simpleText'
Make sure we don't output an empty error message if we don't
know what it is.
channel.py: Don't check if error message is empty, check if it's
None
2020-08-11 19:47:37 -07:00
James Taylor
803c901445 Fix hls_manifest_url not included when there's no other formats
Since there are no formats, it was retrying with the
non-embedded playerResponse, which resulted in the
hls_manifest_urls from the embedded player_response being
overwritten with None. So use conservative_update instead
2020-06-28 18:18:04 -07:00
James Taylor
aa3e5aa441 Add dialog for copying urls to external player for livestreams
Also for livestreams which are over whose other sources
aren't present or aren't ready yet.
2020-06-28 17:52:24 -07:00
James Taylor
6e14a8547d Handle case where embedded player response missing
Change so it extracts other stuff from regular playerResponse
Extract formats from embedded player response, but fallback to
regular one if that doesn't work.
Sometimes there is no 'player' at top_level and the urls are in
the regular playerResponse
2020-06-28 13:18:54 -07:00
James Taylor
0b5d6fe1ed Do not override previous playability error if unknown 2020-06-28 12:46:04 -07:00
James Taylor
b4450ec4bb Fix previously live videos labeled as live 2020-05-29 15:34:33 -07:00
James Taylor
bdac6a2302 Fix broken signature decryption
The base.js url format changed, so the identifier at the end
was no longer unique. So it was using the wrong cached decryption
function

Changes the identifier to just be the whole url so
this won't happen again.
2020-05-27 12:15:41 -07:00
James Taylor
85db7e46ed Fix urls sometimes not extracted due to youtube changes
The 'cipher' parameter which contains the url is sometimes called
'signatureCipher' instead now.
2020-05-27 11:56:30 -07:00
James Taylor
f1f77c4d77 Fix error getting exit node ip if format urls are None 2020-05-27 11:14:52 -07:00
James Taylor
b2f482f1fb Fix comment count & disabled extraction not working sometimes
because of A/B test.
2020-04-10 13:57:11 -07:00
James Taylor
1224dd88a3 Fix related video extraction sometimes failing
Youtube added some pointless variation in variable names
2020-04-10 13:09:38 -07:00
James Taylor
3e09193eaf Fix exception due to missing 'playlist' key in extracted info
Happens when there's an error on the page and there was no
visible stuff on the page. 'playlist' wasn't set to None in that
case.
2020-04-05 17:27:43 -07:00
James Taylor
4d9d8cec6f Fix error when there's a video format with mimetype class of 'text' 2020-04-04 22:53:49 -07:00
James Taylor
5554d5afff Add playlist sidebar for videos in playlist, including autoplay 2020-04-04 22:52:09 -07:00
James Taylor
8c2b81094e yt_data_extract: fix missing variables in info for unavailable videos
'ip_address' was not set when no formats are available
'allowed_countries' was set to None rather than [] in extract_desktop_info which it turns out is the function that gets used in these cases
2020-02-17 20:15:59 -08:00
James Taylor
9f090dbbf8 Watch page: add info box with allowed countries and tor exit node
Should help with debugging various content blocks
2020-02-01 16:16:49 -08:00
James Taylor
7c2736aa26 Check for 403 errors and fallback on Invidious
403 errors on the video urls happen typically when a video has copyrighted content or was livestreamed originally. They appear to not happen (or at least happen less frequently) if the Tor exit node used ipv6, however.
2020-02-01 15:09:37 -08:00
James Taylor
e364927f83 yt_data_extract: parse mimeType field for codecs
the youtube-dl formats table doesn't have all the necessary information
2020-02-01 14:23:50 -08:00
James Taylor
b2a1f4ecfb Fix signature decryption.
The function body regex was capturing some unrelated new code before the actual function body. Example:

`function(a){a=a.split("");var b=[function(c,d){d=(d%c.length+c.length)%c.length;c.splice(-d).reverse().forEach(function(e){return c.unshift(e)}`

If you look closely, the closing bracket doesn't match the opening one. I have added `{` to the `[^\}]+` part to make sure it only captures matching brackets. Additionally, I've added `return a\.join\(""\)` to the end for good measure.
2020-01-24 14:11:59 -08:00
James Taylor
113c75801a Fix playlist id extraction for radio renderers 2019-12-31 18:06:31 -08:00
James Taylor
506dbb552a Extraction: Correctly extract view_count for vids with 0 views.
Also change superfluous use of multi_get to item.get nearby
2019-12-30 16:18:38 -08:00
James Taylor
0c6a37e9aa extract_items: allow extracting items that are normally dug into for more
By checking first if it's in item_types rather than checking if it can be dug into first.
For example: this allows extracting things like sectionListRenderer
2019-12-26 19:39:48 -08:00
James Taylor
8e8a1b70b6 yt_data_extract: Split up extract_items so renderer extraction works independently
extract_items_from_renderer will extract given just a renderer rather than a response
2019-12-26 19:02:13 -08:00
James Taylor
b027f66738 yt_data_extract.common: Simplify usage of get functions and remove dead code
Change usage of multi_deep_get to multi_get where possible
Remove checking of type from calls to get functions (because it's very unlikely Youtube suddenly changes the type without changing the name of the variable or anything, and it takes up unnecessary space)
Remove all default=None arguments from get functions, since those are superflous.
Remove list_types constant since it's no longer in use.
2019-12-26 18:49:04 -08:00
James Taylor
c7edea0848 yt_data_extract: Simplify extract_items so it needs only 1 while loop 2019-12-26 18:38:18 -08:00
James Taylor
f706689a56 extract_item_info: Don't extract author, author_id, etc. for channel items
Philosophically, a channel doesn't create itself.
2019-12-24 13:11:21 -08:00
James Taylor
3200d66d88 Fix extract_approx_int not working for non-approx ints, make extract_int more robust
For example, "354 subscribers" wasn't being extracted correctly be extract_approx_int.
Make extract_approx_int and extract_int only extract integers that are words.
So e.g. 342 will not be extracted from internetuser342
2019-12-24 13:07:12 -08:00
James Taylor
9737ffcf82 Regression: Fix channel extraction 'items' key not present when there's no items.
Examples: Empty channels, no search results
2019-12-23 15:07:03 -08:00
James Taylor
777ed756dc Channel: Change search results to use next and previous page buttons
Because youtube doesn't give the number of search results, so previous behavior would give an error if a page number out of range was selected.
2019-12-23 14:39:59 -08:00
James Taylor
7a6bcb6128 Rewrite channel extraction with proper error handling and new extraction names. Extract subscriber_count correctly.
Don't just shove english strings into info['stats']. Actually give semantic names for the stats.
2019-12-21 15:45:01 -08:00
James Taylor
3936310e7e Fix extract_approx_int. Fixes incorrect subscriber count on channels.
It wasn't working because decimals such as 15.1M weren't considered, so it was extracting "1M"
2019-12-21 15:44:03 -08:00
James Taylor
a9f67d4630 Fix regression: date extraction broken. Move constants to correct file in yt_data_extract 2019-12-20 18:48:40 -08:00
James Taylor
6b7a1212e3 Extraction: Move non-stateful signature decryption functionality into yt_data_extract 2019-12-19 21:28:21 -08:00
James Taylor
4a3529df95 Extraction: Move stuff around in files and put underscores in front of internal helper function names
Move get_captions_url in watch_extraction to bottom next to other exported, public functions
2019-12-19 20:12:37 -08:00
James Taylor
d1d908d5b1 Extraction: Move html post processing stuff from yt_data_extract to util 2019-12-19 19:48:53 -08:00
James Taylor
76376b29a0 Extraction: Split yt_data_extract.py into multiple files 2019-12-19 19:29:47 -08:00