So, I was messing around with some natural language processing stuff the other day, specifically trying to get a handle on these two libraries, Humbert and Nakashima. They’re both supposed to help with, like, breaking down Japanese text into its parts of speech and stuff. I figured I’d give them a shot and see which one worked better for my needs.
Getting Started
First, I had to get these libraries installed. I just used pip, which is the usual way to install Python packages. It was pretty straightforward:
I started with Humbert. The way you use it is, you basically create a Tokenizer object, and then you can use it to tokenize a sentence. Like this:
from humbert import Tokenizer
tokenizer = Tokenizer()
text = "今日は良い天気ですね。"
tokens = *(text)
for token in tokens:
print(token)
This printed out each word in the sentence along with its part of speech. Pretty cool, right? It seemed to be doing a decent job. The output is like, each word in the sentence broken down with its part of speech.
Trying Out Nakashima
Next up, I tried Nakashima. It’s a bit different. You kind of load it up like a resource or something.
import nakashima
tagger = *()
text = "今日は良い天気ですね。"
result = *(text)
print(result)
This one gave me a more detailed breakdown, including things like the base form of the word and some other linguistic stuff. It was a bit overwhelming, but I guess it could be useful if you need all that detail.
My Takeaway
After playing around with both, I’m leaning towards Humbert for now. It’s simpler to use, and honestly, I don’t really need all the extra information that Nakashima provides. Humbert seems to be good enough for what I’m doing, which is just some basic text analysis. Plus, Humbert felt a bit faster, but that might just be me. Both tools have their uses, but for just splitting up sentences and getting the basic parts of speech, Humbert is my pick. Nakashima is like bringing a bazooka to a knife fight, in my experience. It is just too much for basic tasks.
Anyway, that’s my little experiment with Humbert and Nakashima. If you’re messing around with Japanese NLP, give them a try and see which one you like better. Peace out!