|
Posted: Dec 18, 2013 |
[ # 16 ]
|
|
Guru
Total posts: 1297
Joined: Nov 3, 2009
|
Perhaps the lowest wage telemarketers may not be fluent in the language of the country being telemarketed. However they are trained to listen for keywords, emotions and are further prompted by an Artificial Intelligence to press buttons that are most likely correct… All this button pressing trains the A.I. to eventually replace the lowest wage telemarketers.
|
|
|
|
|
Posted: Dec 18, 2013 |
[ # 17 ]
|
|
Guru
Total posts: 1009
Joined: Jun 13, 2013
|
Here’s one.
Not the ones I was thinking of though. Their site was green, and they had an Einstein, a leprichaun, and an angry woman in big head mode.
|
|
|
|
|
Posted: Dec 18, 2013 |
[ # 18 ]
|
|
Administrator
Total posts: 2048
Joined: Jun 25, 2010
|
I fear we are entering the realms of fiction now. Sorry 8man but I just don’t see it. What does the guy do? Type the messages into a computer and wait for it to say what button to press? This would be impossible in the few seconds delay between each message.
|
|
|
|
|
Posted: Dec 18, 2013 |
[ # 19 ]
|
|
Senior member
Total posts: 328
Joined: Jul 11, 2009
|
Also why bother having the guy there at all ?
Steve I think you are probably right, to me they sound like pre-recorded messages and nothing like any TTS I have so far heard and I have heard most of them.
|
|
|
|
|
Posted: Dec 18, 2013 |
[ # 20 ]
|
|
Senior member
Total posts: 328
Joined: Jul 11, 2009
|
That video you linked to above Don, is about as good as most TTS get at this time. You can often hear almost a musical note on vowel sounds. This is the same with Loquendo’s voices. A little like a auto-tune type effect.
Cereproc has some nice voices which are close to real, but without much emotional range as yet.
http://www.cereproc.com/en/node/229
|
|
|
|
|
Posted: Dec 18, 2013 |
[ # 21 ]
|
|
Guru
Total posts: 1297
Joined: Nov 3, 2009
|
Sorry? Accusing me of Science Fiction is a big compliment
If I were designing this system, the telemarketers would monitor the A.I. suggestions using a touch screen. The A.I. would script everything like clockwork and track the telemarketer’s performance.
My friend Raj shared with me that in India they have voiceXML based systems in call centers that can detect when customers are getting angry. Maybe this is a next generation of some sort?
|
|
|
|
|
Posted: Dec 18, 2013 |
[ # 22 ]
|
|
Administrator
Total posts: 2048
Joined: Jun 25, 2010
|
It wasn’t a chatbot. Straight from the company themselves:
http://newsfeed.time.com/2013/12/17/robot-telemarketer-samantha-west/
|
|
|
|
|
Posted: Dec 18, 2013 |
[ # 23 ]
|
|
Guru
Total posts: 1009
Joined: Jun 13, 2013
|
Reality, always managing to be more dull than our imagination
I’d wager there is at least a built-in randomizer for the various “I am a real person” versions, but I wouldn’t call that much in terms of technology.
I will now bask in the public’s responses that they ‘may’ have jumped to conclusions prematurely . It continues to interest me because we are continuing to work towards this, towards computers that sound and behave natural in conversation, and this was a nice prelude to what awaits. The same could happen if a chatbot ever passes a Turing Test.
One of the comments on the follow-up of the article I posted calls it “futureshock”. I think it is exactly that. So perhaps our robots should always sound recognisably robotic. And walk like robots, and look like robots, just so they don’t freak people out.
@Roger: The thick Scottish accent doesn’t exactly help make it sound less robotic , but thanks for sharing.
|
|
|
|
|
Posted: Dec 19, 2013 |
[ # 24 ]
|
|
Senior member
Total posts: 336
Joined: Jan 28, 2011
|
Don Patrick - Dec 14, 2013:
IBM is prepping Watson to handle customer service calls, after all.
My understanding was that Watson was best suited for use as an “Expert”- for medical diagnostics for instance… so why not customer service too?
Why the Insurance call center using prerecorded dialog is disturbing folks is interesting indeed- how much of a leap is it to go from Cleverbot to “CleverTalk”, where likely output (real voice clips, but selected algorithmically) is matched to the voice input of the Mark? You might scoff and counter with- “Bah! Where would all that prerecorded input/output pairs come from?!” Google voice, among other denizens of GoogleNet, of course.
At least we can take some solace, in this the solstice of our Solar cycle, for how close are we are to the Singularity is a much more comforting thought than the alternative, which will be [cowering from Hunter/Killer drones] “when exactly did it happen”?
|
|
|
|
|
Posted: Dec 19, 2013 |
[ # 25 ]
|
|
Senior member
Total posts: 336
Joined: Jan 28, 2011
|
Don Patrick - Dec 18, 2013: So perhaps our robots should always sound recognisably robotic. And walk like robots, and look like robots, just so they don’t freak people out.
I would argue that the more “alive” something seems, the better chance it has to thrive in both the Market Place and the Imagination.
It is actually too bad that voice synthesis (TTS specifically) is still so primitive. The best voices (ATT, ANOVA, NEOSPEECH, and Vocaloid being arguably the best) are obviously synthetic due to inflective giveaways and “autotunation”, even without the added burden of real “Turing” level text/verbal interaction at the chat bot level.
|
|
|
|
|
Posted: Dec 20, 2013 |
[ # 26 ]
|
|
Guru
Total posts: 1009
Joined: Jun 13, 2013
|
The problem is that inbetween clunky and seemless human mimicry lies the Uncanny Valley. A nice quote from wikipedia: “The more human an organism looks, the stronger the aversion to its defects”. So anything that sounds almost but not quite perfect might be hard to market from here on. On the other hand, I can’t say that I’d be freaked out in the least if my computer could dictate to me with a more human voice, but in that case I do know that it is coming from a mindless computer. In the telemarketing case it seemed like there was a mind behind it (as there was).
My own AI uses a primitive system of playing voice clips as well, word by word. It’s very inefficient to make and takes up hard drive space, but since I can tweak the voice clips manually, I find that it sounds slightly more natural than on-the-fly TTS generation. (Though the real reason was that I couldn’t be bothered to spend time on trivial functions)
|
|
|
|
|
Posted: Dec 20, 2013 |
[ # 27 ]
|
|
Senior member
Total posts: 336
Joined: Jan 28, 2011
|
Don Patrick - Dec 20, 2013: The problem is that inbetween clunky and seemless human mimicry lies the Uncanny Valley…
Oft qouted, but may be [becoming] obsolete contemporarily since robots are now everywhere.
Don Patrick - Dec 20, 2013: I can’t say that I’d be freaked out in the least if my computer could dictate to me with a more human voice, but in that case I do know that it is coming from a mindless computer.
I think many more folks would adopt TTS if the voices/engines sounded closer to “real”, most current ones get pretty annoying after a short time (with too many poor/mis pronouncations/intonations).
Don Patrick - Dec 20, 2013: In the telemarketing case it seemed like there was a mind behind it (as there was).
Probably the future of telemarketing. Robo calling is very common, why not take it to the next level (RoboCaller 2.0) by having the ability for the robocaller to interact with the callee? Could be via a “chatbot” and/or a human “Navagator” keying various canned responses when needed.
Don Patrick - Dec 20, 2013: My own AI uses a primitive system of playing voice clips as well, word by word. It’s very inefficient to make and takes up hard drive space, but since I can tweak the voice clips manually, I find that it sounds slightly more natural than on-the-fly TTS generation. (Though the real reason was that I couldn’t be bothered to spend time on trivial functions)
Sounds very inefficient indeed, but I bet it sounds awesome!
One problem with that approach though (using sound files) is how to generate visemes for animating an avatar. Especially cool is human avatar (still photos of a real persons face saying each of the various visemes) linked to TTS (which can generate viseme tags for each sound).
|
|
|
|
|
Posted: Dec 20, 2013 |
[ # 28 ]
|
|
Senior member
Total posts: 328
Joined: Jul 11, 2009
|
Carl, I use Windows SAPI and hook into it’s phoneme events to trigger my visemes. I was wondering if you knew anything else that would do the same on Android for example or for use in a web installation ?
Here’s where I am at…
http://www.youtube.com/watch?v=S3_qO5P5j38
I’ve made various avatars over the past couple of years, some 2D, some 3D. This one I will develop more as I coded all the morphing myself and now understand it better. She will do emotions too, but in that one she just talks. You can see my experiments in smiling for example here :
http://www.youtube.com/watch?v=RZbqQ1gmyIE
So I will take all that and run it altogether, so she will blink, smile and raise eyebrows etc when talking.
|
|
|
|
|
Posted: Dec 21, 2013 |
[ # 29 ]
|
|
Senior member
Total posts: 336
Joined: Jan 28, 2011
|
Roger Davie - Dec 20, 2013: Carl, I use Windows SAPI and hook into it’s phoneme events to trigger my visemes. I was wondering if you knew anything else that would do the same on Android for example or for use in a web installation ?
I have used Windows SAPI (4 and 5) and C# mostly to program physical robots to speak (using visemes to trigger jaw movements). Also have played with a C# code that animates TTS using an avatar. The peak of this effort was a kinect enabled (for tracking people using two axis “turret” type servo setup) talking head. Unfortunately, the Kinect SDK, which was pretty buggy and packed with esoteric command structures… getting TTS voices to work as expected was quite frustrating (but when it did worked it WAS pretty cool). I have not messed with it in a while. Doing something similar on Web/Android should be pretty straight forward, but I have not (yet) delved into it).
http://www.youtube.com/watch?v=08YqQNgpJaw
|
|
|
|
|
Posted: Dec 22, 2013 |
[ # 30 ]
|
|
Guru
Total posts: 1009
Joined: Jun 13, 2013
|
Impressive work, Roger!
Carl B - Dec 20, 2013: One problem with that approach though (using sound files) is how to generate visemes for animating an avatar.
My AI just has a horizontal stripe for a mouth, but it lip-syncs with the output text string while the voice clips play. One could contain the text values in the voice clip filenames and use both.
|
|
|
|