This is kind of picking up on where the last “Thoughts on Python post left off.” One of the things I’ve learned over the last few weeks playing with Python is some new lexicon. Things like Projects, to mean programs. Linters, which are hooks to call out mistakes in the code to help fix it.
Anyway, besides the quick and dirty proxy I listed last time I have two other projects I’m working on.
One is taking a list of domain names that currently don’t have web pages, or have parked pages, and checks to see if they have changed to active. There are several ways to do this, but those methods didn’t work for me. A couple are using GoDaddy and they seem to have several different pages that host the parked page, which returns different data each time the page is visited. So the simple way of using cURL and a hash doesn’t look like it will work. I’m thinking Requests with BeautifulSoup and .find()
The second uses data pulled from a Shodan search, and searches for context for me from an internal system at work. This is the one I learned the most from, over the last week. Mainly because it has changed several time. I’ve learned some web scraping tricks, mainly using Ryan Mitchell’s book Web Scraping with Python second ed (Amazon affiliate link) . I really want to work through the book from cover to cover but mainly it is a reference guide at this point.
During this project, from WSwP2e, I learned how to use Sessions from requests to capture authentication cookies and replay them during a session while scraping a website for data. I learned how to use BeautifulSoups .get_text() to print only the data I needed. Outside of the book I learned how to drill down to get to drill down to get to the right part of the table. I also learned of the getpass module to ask a have a user input their password without reveling it to the screen or .history_file.
After I got that all figured out and written, with “Open with” and some testing on the table results to get past out of ranger problems; I found out there was an API option. So I can get the same data from a single URL in JSON. That will make getting the data easier since it ill be in the forms of JSON keys / dictionary like, and not in the form of rows in a table.
So the code is a mess right now, written with the old scrapping way, and with the API mixed in. I’m waiting for the people who wrote the API to tell me if I’m going to have to write a for loop or if I can feed it the whole list I need information on.
A third project I want to work on deals with collecting IOCs. The other week at work I was going through some Emotet related emails, and the SOC analysts asked for all the related domains my team could find. So I was going to URLhaus to look up the domains we had from the PowerShell script. Then grabbing the hash, and all the domains the hash was found on.
I got real tired of copy, go to terminal window, open file, paste, awk print and sort uniq, copy paste to note pad file. I set the terminal command line up so all I had to do was up arrow. It would remove the old temp file, get the data, sort it and the print it to the screen. So I could copy and paste.
|
rm temp && vim temp && awk '{print "filename: " $2 "\n" "sha256: " $4}' temp | sort -u |
Even that was a bit of a mess to use, because it needed human interaction and there was a few times that the data didn’t copy so I ended up repeating a few copies a few times.
Without looking at the API for URLhaus, which I’ll get around to eventually, I want to write a script, that while running will watch the clipboard, copy the data, manipulate it, sort it, and paste it to a file, or even just write it to the file. Still trying to flesh that one out. But it will be helpful beyond just the one site.
* Update 2024-10-06: changed to Amazon Affiliate Link, which I earn a commission from qualifying purchases.