34 Commits

Author SHA1 Message Date
f3469b1ff4
Revert "Usage hqdefault thumbnail in related videos"
This reverts commit a0c3ca0159136d17eefa129176ae1904110238b8.
2021-09-14 16:35:04 -05:00
a0c3ca0159
Usage hqdefault thumbnail in related videos 2021-09-14 15:58:13 -05:00
James Taylor
7c79f530a5
Support more audio and video qualities
Adds support for AV1-encoded videos, which includes any videos
above 1080p. These weren't getting included because they did
not have a quality entry in the format table at the top of
watch_extraction.py. So get the quality from the quality
labels of the format if it's not there.

Because YouTube often includes BOTH AV1 and H.264 (AVC) for each
quality, after these are included, there will be way too many
quality options and the code needs to choose which one to use.
The choice is somewhat hard: AV1 is encoded in fewer bytes than
H.264 and is patent-free, however, it has less hardware support,
so might be more difficult to play. For instance, on my system,
AV1 does not work on 1080p, but H.264 does. Adds a setting about
which to prefer, set to H.264 as the default.

Also adds support for the lower quality mp4 audio quality, which
now gets used at 144p to save network bandwidth. For similar
reasons, this was not getting included because it did not
have an audio_bitrate entry in the table. Prefer bitrate
instead for the quality.

Signed-off-by: Jesús <heckyel@hyperbola.info>
2021-08-31 16:40:19 -05:00
James Taylor
4e556efa3d
Fix comments extraction due to new response continuation key name
Signed-off-by: Jesús <heckyel@hyperbola.info>
2021-08-23 18:40:52 -05:00
James Taylor
40fcee52c0
Fix description extraction in search results
Signed-off-by: Jesús <heckyel@hyperbola.info>
2021-08-09 12:29:01 -05:00
James Taylor
2039972ab3
Fix (dis)like, music list extraction due to YouTube changes (again)
YouTube reverted the changes they made that prompted f9f5d5ba.

In case they change their minds again, this adds support for both
formats.

The liberal_update and conservative_update functions needed to be
modified to handle the cases of empty lists, so that
a successfully extracted 'music_list': [{'Author':...},...] will
not be overwritten by 'music_list': [] in the calls to
liberal_dict_update.

Signed-off-by: Jesús <heckyel@hyperbola.info>
2021-08-09 12:13:52 -05:00
James Taylor
3dee7ea0d1
Switch to new comments api now that old one is being disabled
watch_comment api periodically gives the error "Top level
comments mweb servlet is turned down."

The continuation items for the new api are in a different
arrangement in the json, so changes were necessary to the
extract_items function.

Signed-off-by: Jesús <heckyel@hyperbola.info>
2021-08-09 12:10:42 -05:00
James Taylor
54b39f1303
Fix missing likes, dislikes, & music list due to Youtube changes
Also moves some microformat extraction from
_extract_watch_info_mobile to extract_watch_info where it belongs.
_extract_watch_info_mobile is really only for stuff visible on the
page, and thus specialized for either mobile or desktop.

Signed-off-by: Jesús <heckyel@hyperbola.info>
2021-07-28 23:47:41 -05:00
7fd2c3474f
Capitalize name app 2021-06-10 16:41:45 -05:00
James Taylor
f0cd170767
Fix videos added to playlist from channel page not having author
Information from additional_info was being overrided with None.

Signed-off-by: Jesús <heckyel@hyperbola.info>
2021-05-17 22:02:03 -05:00
James Taylor
e549b5f67c
Channel: Allow going to next pages of playlists page
Uses previous and next buttons. Now can view more than just
first page of playlists page

Signed-off-by: Jesús <heckyel@hyperbola.info>
2021-03-15 22:22:15 -05:00
James Taylor
2df4238924
Use new channel api endpoint now that browse_ajax is disabled
Fixes channel pages > 1

Signed-off-by: Jesús <heckyel@hyperbola.info>
2021-03-03 10:40:02 -05:00
James Taylor
1cc0ffcb20
yt_data_ext: support richGrid&richItem sometimes used on search
Some searches have these renderers instead of the usual ones

Signed-off-by: Jesús <heckyel@hyperbola.info>
2021-02-13 17:29:05 -05:00
James Taylor
6b6a6653a0
Fix youtube mixes
They cannot be viewed on their own, so change url in items to
go to the video+playlist instead

Signed-off-by: Jesús <heckyel@hyperbola.info>
2020-12-18 23:39:25 -05:00
zrose584
a27b575380 remove trailing whitespaces 2020-10-21 10:35:01 +02:00
James Taylor
75e8930958 yt_data_extract: normalize thumbnail and author urls
for instance, urls that start with // become https://

adjustment required in comments.py because the url was left as a
relative url in yt_data_extract by mistake and was using URL_ORIGIN
prefix as fix.

see #31
2020-10-19 12:55:03 -07:00
James Taylor
4bedf55461 yt_data_extract: Fix time_published picking up 'Streaming' string
This was causing an exception in subscriptions when it tried
to estimate the unix timestamp for the upload time
2020-08-12 14:40:47 -07:00
James Taylor
fa61874f97 extract_items: Handle case where continuation has multiple
[something]Continuation renderers, all of which are junk
except one. Check the items in each one until the one which
contains the items being sought is found.
The usage in extract_comments_info needed to be changed to
specify the items being sought. It was unspecified before which
is strictly incorrect since extract_items by default looks for
video/playlist/channel thumbnail items. It was relying on this
special case for continuations. But now that wouldn't work
anymore.
2020-08-11 19:59:25 -07:00
James Taylor
1224dd88a3 Fix related video extraction sometimes failing
Youtube added some pointless variation in variable names
2020-04-10 13:09:38 -07:00
James Taylor
5554d5afff Add playlist sidebar for videos in playlist, including autoplay 2020-04-04 22:52:09 -07:00
James Taylor
113c75801a Fix playlist id extraction for radio renderers 2019-12-31 18:06:31 -08:00
James Taylor
506dbb552a Extraction: Correctly extract view_count for vids with 0 views.
Also change superfluous use of multi_get to item.get nearby
2019-12-30 16:18:38 -08:00
James Taylor
0c6a37e9aa extract_items: allow extracting items that are normally dug into for more
By checking first if it's in item_types rather than checking if it can be dug into first.
For example: this allows extracting things like sectionListRenderer
2019-12-26 19:39:48 -08:00
James Taylor
8e8a1b70b6 yt_data_extract: Split up extract_items so renderer extraction works independently
extract_items_from_renderer will extract given just a renderer rather than a response
2019-12-26 19:02:13 -08:00
James Taylor
b027f66738 yt_data_extract.common: Simplify usage of get functions and remove dead code
Change usage of multi_deep_get to multi_get where possible
Remove checking of type from calls to get functions (because it's very unlikely Youtube suddenly changes the type without changing the name of the variable or anything, and it takes up unnecessary space)
Remove all default=None arguments from get functions, since those are superflous.
Remove list_types constant since it's no longer in use.
2019-12-26 18:49:04 -08:00
James Taylor
c7edea0848 yt_data_extract: Simplify extract_items so it needs only 1 while loop 2019-12-26 18:38:18 -08:00
James Taylor
f706689a56 extract_item_info: Don't extract author, author_id, etc. for channel items
Philosophically, a channel doesn't create itself.
2019-12-24 13:11:21 -08:00
James Taylor
3200d66d88 Fix extract_approx_int not working for non-approx ints, make extract_int more robust
For example, "354 subscribers" wasn't being extracted correctly be extract_approx_int.
Make extract_approx_int and extract_int only extract integers that are words.
So e.g. 342 will not be extracted from internetuser342
2019-12-24 13:07:12 -08:00
James Taylor
7a6bcb6128 Rewrite channel extraction with proper error handling and new extraction names. Extract subscriber_count correctly.
Don't just shove english strings into info['stats']. Actually give semantic names for the stats.
2019-12-21 15:45:01 -08:00
James Taylor
3936310e7e Fix extract_approx_int. Fixes incorrect subscriber count on channels.
It wasn't working because decimals such as 15.1M weren't considered, so it was extracting "1M"
2019-12-21 15:44:03 -08:00
James Taylor
a9f67d4630 Fix regression: date extraction broken. Move constants to correct file in yt_data_extract 2019-12-20 18:48:40 -08:00
James Taylor
4a3529df95 Extraction: Move stuff around in files and put underscores in front of internal helper function names
Move get_captions_url in watch_extraction to bottom next to other exported, public functions
2019-12-19 20:12:37 -08:00
James Taylor
d1d908d5b1 Extraction: Move html post processing stuff from yt_data_extract to util 2019-12-19 19:48:53 -08:00
James Taylor
76376b29a0 Extraction: Split yt_data_extract.py into multiple files 2019-12-19 19:29:47 -08:00