Search in PDF
Using a custom Python script, I was able to validate the speculation that the target web server contains additional PDF files while leveraging the naming convention used. While this small progression made it closer to gaining further information about the target domain, manually and individually enumerating the additional 82 PDF file would be exceedingly time-consuming. The following custom Python script resolve that issue.
import PyPDF2
import os
import argparse
# Create an argument parser
parser = argparse.ArgumentParser(description='Search for a keyword in PDF files.')
# Add keyword and directory arguments
parser.add_argument('-k', '--keyword', required=True, help='Keyword to search for')
parser.add_argument('-d', '--directory', required=True, help='Working directory')
# Parse the command-line arguments
args = parser.parse_args()
# Extract keyword and directory from the arguments
keyword = args.keyword
dir = args.directory
for foldername, subfolders, files in os.walk(str(dir)):
for file in files:
if file.endswith(".pdf"):
file_path = os.path.join(foldername, file)
with open(file_path, "rb") as f:
pdf = PyPDF2.PdfFileReader(f)
for page_num in range(pdf.numpages):
page = pdf.getPage(page_num)
text = page.extractText()
lines = text.split("\n")
for i, line in enumerate(lines):
if keyword.lower() in line.lower():
print(f"{file_path}:\n{lines[max(0, i-1)]}\n{line}\n{lines[min(len(lines)-1, i+1)]}\n")
This Python script searches for a user-provided keyword within text-based PDF files found in a specified directory and its subdirectories. it extracts and prints the lines containing the keyword, along with the lines before and after it to provide context. the script uses the pypdf2 library to analyze PDF content efficiently.
┌──(kali㉿kali)-[~/…/htb/labs/intelligence/pdf]
└─$ python3 search_in_pdf.py -k user -d .
./2020-06-04-upload.pdf:
Welcome to Intelligence Corp!
please login using your username and the default password of:
NewIntelligenceCorpUser9876
./2020-06-04-upload.pdf:
please login using your username and the default password of:
NewIntelligenceCorpUser9876
After logging in please change your password as soon as possible.
I ran the Python script with the keyword, user
, and it returned a single PDF file; 2020-06-04-upload.pdf
The 2020-06-04-upload.pdf
file contains the default password; NewIntelligenceCorpUser9876
I will double-check it by manually inspecting the file
2020-06-04-upload.pdf
┌──(kali㉿kali)-[~/…/htb/labs/intelligence/pdf]
└─$ open 2020-06-04-upload.pdf
It indeed does contain the default credential for the domain;
NewIntelligenceCorpUser9876
I will perform a password spray with the default password to see it belongs to anyone