GitHub Icon Image
GitHub

Find Links in Modern Page

Summary

Script will take a csv file that contains url to SharePoint sites and analyse the site pages to see if any of the pages have hyperlinks. For every hyperlink in a page this gets output to a row in a csv that is delimited by a pipe

The script reads a list of SharePoint sites from a CSV file, connects to each site and extracts all pages within the specified lists. For every page content that contains an anchor tag, it captures both 'Title' field value (from Page metadata) along with any href tags present in its body text using regex matching, then writes these details into a new CSV file named after today’s date.

Note: Above last paragraph of description uses AI to describe the script.

  • PnP PowerShell

 # Each site in this list will have the script run against
$csv_SiteList = "sites-test.csv"
$csv_siteheaders = 'Url'

# Date used in the file creation
$date = Get-Date
$date = $date.ToString("yyyymmddhhss")

# filename by using the date
$file_name = $date + 'LinkMatches.csv'

# Path to create the output fil
$creation_path = Get-Location

# The site pages list that this script will run against
$List = "SitePages"

# Headers for the output csv
$headers = "Site Title|Page Title|Page Url|Href Tag"

# new line character
$ofs = "`n"

# delimiter to use
$delim = '|'

# regex used to match the href tags that are embeded in the canvas page content
$regex ='<a\s+(?:[^>]*?\s+)?href=(["])(.*?)\1>'

# create object of all the sites
$sites = Import-Csv -Path $csv_SiteList -Header $csv_siteheaders

#variable for the header
$csv_outputheader = $headers + $ofs

#complete file path
$csv_path = Join-Path $creation_path $file_name

# create output csv
New-Item -Path $creation_path -Name $file_name -ItemType File -Value $csv_outputheader

# itterate around each site from the csv
foreach($site in $sites)
{
    # make the connection, get ome site information and create object that contains all the site pages
    $connection = Connect-PnPOnline -Url $site.Url -Interactive
    $pnpsite = Get-PnPWeb -Connection $connection
    $site_title = $pnpsite.Title
    $pages = (Get-PnPListItem -List $List -Fields "CanvasContent1", "Title" -Connection $connection).FieldValues

    # itterate around each page in the stie to get the information from each page that will be used to build up the row and also conduct
    # the check to see if the canvas content has any href tags embeded
    foreach($page in $pages)
    {
        $page_title = $page.Get_Item("Title")
        $fileref = $page.Get_Item("FileRef")
        $canvascontent = $page.Get_Item("CanvasContent1")
        # check if the canvas has content 
        if ($canvascontent.Length -gt 0) 
        {
            # hash table of the results that match the href regular expression
            $hrefmatches = ($canvascontent | select-string -pattern $regex -AllMatches).Matches.Value

            # itterate around each regular expression match and write it out into the output csv that is pipe delimited 
            foreach($hrefmatch in $hrefmatches)
            {
                $row = $site_title + $delim + $page_title + $delim + $fileref + $delim + $hrefmatch
                Add-Content -Path $csv_path -Value $row
            }
        }
    }
    Disconnect-PnPOnline
}

Check out the PnP PowerShell to learn more at: https://aka.ms/pnp/powershell

The way you login into PnP PowerShell has changed please read PnP Management Shell EntraID app is deleted : what should I do ?

Source

This script was first created on PnP PowerShell and transferred over in Dec 2024. Details of the orignal author missing. Report if inaccurate. https://github.com/pnp/powershell

Contributors

Author(s)
Paul Bullock

Disclaimer

THESE SAMPLES ARE PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING ANY IMPLIED WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR NON-INFRINGEMENT.

Back to top Script Samples
Generated by DocFX with Material UI