Javascript Regex Extract Url and Dimension in SrcSet Attribute String
This objective in this question is to extract a URL and dimension from a srcset
html attribute string. Specifically the parameters here are the following:
- Url starts with
http
orhttps
- Url may contain
,
- Url cannot contain spaces
- Dimension contains digits followed by
x
orw
. Potentially doesn't even need to be followed by either of those though.
Because of this, the desired method for matching is to find the http/https and match until a space, then match digits immediately followed by a w
or x
, then a comma. A space following this would denote the end of the match.
This usually looks like https://url.com 650w
or https://url.com 650
or https://url.com 650x
. There is no strict standard here.
Here is my attempted regex with the Regex101 demo here. The problem here is that it's not grouping correctly:
(https?:\/\/(?:.*(?:\s+\d+[wx])(?:,\s*)?)+)
Sample string to parse:
http://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 640w, http://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 750w, https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 828, https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 1080x, https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 1200w, https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 1920w, https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 2048w, https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 3840w,https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 3840w, https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw%3D&auto=format&fit=crop&w=100&q=60 100w, https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw%3D&auto=format&fit=crop&w=200&q=60 200w, https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw%3D&auto=format&fit=crop&w=300&q=60 300w, https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw%3D&auto=format&fit=crop&w=400&q=60 400w, https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw%3D&auto=format&fit=crop&w=500&q=60 500w, https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw%3D&auto=format&fit=crop&w=600&q=60 600w, https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw%3D&auto=format&fit=crop&w=700&q=60 700w
The outcome of this should be:
http://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 640w
http://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 750w
https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 828
https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 1080x
https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 1200w
https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 1920w
https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 2048w
https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 3840w
https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 3840w
https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw%3D&auto=format&fit=crop&w=100&q=60 100w
https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw%3D&auto=format&fit=crop&w=200&q=60 200w
https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw%3D&auto=format&fit=crop&w=300&q=60 300w
https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw%3D&auto=format&fit=crop&w=400&q=60 400w
https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw%3D&auto=format&fit=crop&w=500&q=60 500w
https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw%3D&auto=format&fit=crop&w=600&q=60 600w
https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw%3D&auto=format&fit=crop&w=700&q=60 700w
For the 4 points in the question, and to get the outcome of the example string, you can use:
https?:\/\/\S* \d+[xw]?(?=,|$)
The pattern matches:
-
https?:\/\/
Match the protocol for http and https -
\S*
Match optional non whitespace chars (can contain a comma) and then a space -
\d+[xw]?
Match 1+ digits and optionalx
orw
-
(?=,|$)
Positive lookahead, assert either a,
or the end of the string to the right
Regex demo