These are very loose terms. Pretty much every major website saves IP addresses when you create an account (to prevent abuse/spam detection). And you can get location info from the IP address. Hence the first condition would be true for all of those websites.
Next, any website/app that builds a recommendation system will save user interactions to build the “algorithm”. So every social media with an algorithm will fall into this category.
With enough bending of terminology, we might be able to prove that the lemmy also collects user data (although it will be really hard cuz the algo here is based on upvotes and time posted iirc). And “large amount” part is just legal filler words.
There are groups that give access to pirated AI. When I was a student, i used them to make projects. As for how they get access to it? They usually jailbreak websites that provide free trials and automate the account creation process. The higher quality ones scam big companies for startup credits. Then there are also some leaked keys.
Anyways thats what i would call “pirated AI”. (Not the locally run AI)