Reposted by David Mimno
Poetry is weirdly prominent in LLM conversations. But what do models really "know" about poetry?
We tested how well LLMs can recognize 20+ poetic forms in English & probed major pretraining datasets to see which poems might be memorized.
New preprint: arxiv.org/abs/2406.18906
1 replies
17 reposts
39 likes
Yes! Hahaha yes!
0 replies
1 reposts
5 likes
I decided against adding adding 🙄🤷♂️, but you can imagine it
0 replies
0 reposts
2 likes
Llama3 examples here: mimno.infosci.cornell.edu/arxivcl/
"Be concise. Provide just a single short text title in doublequotes for the topic associated with the following words:"
0 replies
1 reposts
3 likes
I'm looking for a good example of LLM fine-tuning + preference learning that's compelling to students but small enough to run as an in-class demo. Ideas?
1 replies
4 reposts
6 likes
That would be one of the prompts I'd like to see variants of! I'm about to do some experiments myself (realized I was missing a big chunk of arxiv cs.CL, rerunning models now)
1 replies
1 reposts
1 likes
In practice it seems to be berttopic (ie hard clustering on doc embeddings + post-hoc keywords). I think it's mostly popular because the package is well written. I'm most excited about the Stanford CHI paper version that seems to let you have a dialog of "like that, but with more..."
2 replies
1 reposts
3 likes
DM for a shared doc
0 replies
0 reposts
2 likes
I’m really impressed by llama3 in ollama for Humanities applications. Thinking of adding a prompt collection to aiforhumanists.com, would people contribute?
6 replies
8 reposts
25 likes
I appreciate that Google is trying to attribute suggested code from Colab to sources, but the threshold is set waaaay too low. Possible opportunities for malicious license trolls.
0 replies
0 reposts
3 likes
Laure and I found that there’s a lot of terms that are author specific beyond named entities, I wonder if there’s a similar boost from swapping synonyms there too? (Cf “authorless topic models”)
1 replies
0 reposts
4 likes
After 4mins, pretty unimpressive. Seems like a good model but not for this.
0 replies
0 reposts
2 likes
Got to the point of spinning progress thingy! Gemini in colab was quite helpful in debugging an image format error
1 replies
0 reposts
2 likes
I want to try the new Florence models OCR but stuck yak-shaving on JBIG2 file formats
1 replies
0 reposts
4 likes
What did little tiny ants do before they invented sidewalks?
0 replies
2 reposts
4 likes
I know you’re working on a write up, but any info on how you’re prompting it would be great!
1 replies
0 reposts
6 likes
UMass administration building
0 replies
2 reposts
11 likes
Was debating whether to use the p word
1 replies
0 reposts
3 likes
It’s great that ACL ARR has a section for Comp Soc Sci and Cultural Analytics, but all the keywords seem to be CSS. What are the key NLP problems for CA? Narrative? Character? Dealing with complex documents? Language change?
1 replies
2 reposts
7 likes
If mosquitoes were like “hey can I have a minuscule amount of your blood so I can have my babies?” I’d be like “sure! Here you go” but they have to be dicks about it
0 replies
1 reposts
6 likes
Just saw Hitman. Seemed totally plausible except he’s supposed to be a professor of Psychology AND Philosophy? Like wtf
2 replies
2 reposts
7 likes
This is chapter 80 of de rebus gestis ricardi primi, in 81 there’s a tour of other English cities and how bad they are, eg Bath is “ad portas inferi”
0 replies
1 reposts
2 likes
Reposted by David Mimno
A ton of folks at the CDH & incredible graduate students in Princeton English have worked for SO LONG on this project -- please read & share! modernismmodernity.org/forums/world...
Especially excited about @zoeleblanc.bsky.social and @suttonkoeser.bsky.social's article.
0 replies
18 reposts
29 likes
Galaxy brain: Big language models represent your docs in the context of a massive background collection. That may or may not be what you want.
0 replies
2 reposts
3 likes
Reposted by David Mimno
c19datacollective.com/data/
DATASET ALERT. First dataset accepted at 19thC Data Collective. Have some 19thC Data? Submit! Congrats @micahbateman.bsky.social ! (Please post on other place, I'm not allowed back there until my book is in)
0 replies
17 reposts
20 likes
So AI is going to destroy us by blindly optimizing objective functions despite devastating practical or human cost? Don't we already have this, and it's called private equity?
1 replies
0 reposts
8 likes
I can guarantee you that fucksmith did not see one penny of that $60m
0 replies
0 reposts
2 likes
Waikiki could be one of the world’s great biking cities but they somehow manage to treat everything but cars as an afterthought AND create a miserable experience for drivers
2 replies
1 reposts
3 likes
The font is Art Nouveau creativemarket.com/lizkohlerbro...
By Liz Kohler Brown www.lizkohlerbrown.com
1 replies
0 reposts
3 likes
Also, Master of the Senate
0 replies
0 reposts
2 likes
Tacitus, “Agricola”
0 replies
0 reposts
1 likes
👀
0 replies
0 reposts
3 likes
So meta
0 replies
0 reposts
0 likes
Reposted by David Mimno
PhD position Computational Approaches to Narrative in Argumentation
www.rug.nl/about-ug/wor...
#nlproc #nlp #computationalhumanities
@tedunderwood.me @andrewpiper.bsky.social @mellymeldubs.bsky.social @mariaa.bsky.social @dmimno.bsky.social @dbamman.bsky.social @lucy3.bsky.social
0 replies
8 reposts
7 likes
It’s been pretty clear for a while they use punctuation tokens this way
0 replies
0 reposts
1 likes
Yes, this photo was taken at Ithaca Beer Company
0 replies
0 reposts
2 likes
In the totality, Chimney Bluffs NY
0 replies
0 reposts
5 likes