Lei feng's network (search for "Lei feng's network", public interest) by the writer Chen Xiaoliang, doctor of engineering, founder of sound technology. Lei feng's Web exclusive article, reproduced please contact the authorized.
Intelligent speaker need to chain polished, Kung-Fu decided to experience cannot be ignored in any detail. Most of the time we give the data and deep learning of high praise, but many researchers made a lazy, that the world has not, data will be able to understand, the exploration of the physical world, reflections on the philosophy of human, is the driving force for social progress.
With the release of Google Home, intelligent speaker originator Amazon Echo once again become the focus, this product has become representative of a new era of intelligent hardware. Of course, Amazon Echo does not let people down, sales climbed not only the tens, and r is more than a thousand people, a few days ago and open job offerings for 400 people.
In General, successful model or products abroad, for up to a year or so, China will successfully copy and even quickly catching up. But Amazon Echo is an exception, since its released in 2014, foreign only Google has spent the past two years to launch competing products, Home, because of the TPS has just been specific sales we remains to be seen. But how? Spent two years in China also copied several products, but not copy Amazon Echo this product that can be widely accepted.
This problem of many domestic manufacturers cannot understand, why spend a lot of effort, but the products do not have access to the user's identity? Dismal sales of similar products in China, has left many domestic Internet giant still can't make up your mind for developing intelligent speaker products similar to the Amazon Echo. What the hell is this?
| Intelligent speaker need to chain polished, Kung-Fu decided to experience cannot be overlooked any details
Intelligent speaker, as intelligent speech Assistant family entrance, is far beyond the scope of speaker definitions, this is hardware? Software? Is the platform? It's hard to define, and intelligent speakers involved the entire ecological chain for voice interaction. If you do not use this product as a strategic product for research and development, it is estimated that make products to avoid for borderline. While many businesses think they invested a lot, but compared to Amazon for the Echo, it would be a little dwarfed. Amazon not only thousands of people into research and development, Echo can be advertising in the Super Bowl. This competition between the mobile phone and is very similar to domestic brand mobile phones also thinks well, but users get the hands, naturally there will be a measure of distinction. In fact, the competition between the companies is not put into 20% the 80%, and is put in 80% changed to 20%. Most of the time, is often the 20% determines the success or failure of the company. But, after all, is one of the few willing to put this 80%, especially for emerging market products, refined user experience not possible, pour enough effort, do not have access to the user's acceptance are reasonable.
In fact, Amazon birth of Echo is also not easy.
Take Amazon Echo Research and development tasks is Amazon 126, the company was founded in 2004, mainly responsible for Amazon's hardware product development tasks. Amazon 126 have also developed four products, Echo is in the Kindle and the Fire Phone and AR d research and development tasks, even its members mainly AR component of the project, launched by the end of 2010 when presumably no one will watch the speaker. Amazon Echo is not the original name, but Amazon Flash, even ahead of the 2014 delivery is the name. Echo is also lucky because Fire Phone failed, AR project is stopped, the Echo benefit directly, greatly enhanced research and development strength after a short internal adjustments, but even so, there is still a lot of controversy, which has just launched the Echo dare not openly sold and instead take the form of invitation to purchase water market.
Amazon Echo Research and development for many years, for the pursuit of technology is also a kind of extreme, but this did not save the Echo was released in 2014 when the embarrassment, then experience a demo model, from algorithms to content, there are a lot of problems. Of course, with the expanding scale of users, as well as research and development continues to grow, Echo has also been greatly improved, its content aggregation and because of its open policy and rapid development, which also get $ 100 or more for later Echo repeated product sales champion laid the Foundation.
Amazon nature of Echo is a speaker, while Echo's sound quality is adequate but in the current HiFi speakers declined, Bluetooth speaker power era of sound quality for most consumers is not the first element. Using the simple, beautiful appearance is the main power of users to purchase wireless speakers. Echo design quite satisfactory, but Echo is a combination of acoustics and intelligent matching, smart is only extended property of Echo, and deliberately remove display highlights the Amazon for voice interaction of confidence and persistence. The strategic sense of home as well as a general lack of excessive copy focuses on the functional difference is hard enough to Echo Echo success, not to mention domestic product design and planning do not Echo refined.
Fine here, really require commitment to experience.
To cite an example, voice awakened China likes to flaunt more than Alexa, however, this matter if indicators that matter, floating high false alarm rates in the country, somehow missed to wake up, it's annoying, suddenly a speaker to talk you a false problem occasional is fed up with this kind of thing. Process design, everybody has an aesthetic standard, it is difficult to judge why Echo and Home look comfortable, but at least better than the trash speaker shape better.
Grind a chain of ecological products, in addition to the need to consider all relevant details, and pooling their resources. Voice interaction, for example, foreign Giants continue to acquiring related companies and develop strengths, while domestic companies like his to form small teams manage everything, how can so how can disperse over foreign energy giants? Voice interaction nor, deep learning can be solved at all, this in itself need to acoustic and smart has a deep understanding and long-term accumulation can be finished.
| Presence and immediacy of voice interaction is a key factor, but there is less heat
Voice interaction without a doubt is the keyboard, the mouse and the mainstream interactive touch screen, but always from real into the millions of poor.
There are a number of factors, like vendors think voice interaction is not intelligent, that is indeed the case. World does not have any company can make speech interaction do not stupid, intelligent speech levels still keyword content recognition and context analysis, the so-called syntax and sense of academic community there is no clear idea. This requires long-term research breakthrough, not only limited to the current hot machine learning and data, concepts and models also have to consider the physical world, or at least to understand infant language learning process. From this idea, now truly intelligent AI distance short of voice interaction with 108,000 miles.
Most of the time we give the data and deep learning of high praise, but many researchers made a lazy, that the world has not, data will be able to understand, the exploration of the physical world, reflections on the philosophy of human, is the driving force for social progress.
This may be a common Amazon Echo faced the same predicament, even critics Echo speech synthesis is not good enough, because humans always want their words to be able to get a similar response. Cannot say that Amazon does not attach importance to this issue, in fact, Amazon stepped up another level, not the degree of natural speech synthesis, but a voice answered response. Speech synthesis is currently really hard to be as natural as human, but this is not too far away. Last month Google released WaveNet caused the vibrations of speech synthesis, it is a new idea. Prior to this, speech synthesis for a long time without any substantive progress, nothing more than argument and spliced in two ways. Google how these methods in the evaluation of speech synthesis WaveNet and Microsoft's speech recognition "milestone"? Do a comparative analysis, however, this is not a user focus of concern at the present stage.
Turns out, Amazon Echo the betting choice is correct, the user is more concerned about interactive presence, come from analysis of indicators, one of the most important parameter is the response time of the machine, Echo started for 5 seconds, then down to 1.5 seconds, then within 1 second, note that this is the average response time, rather than domestic peak indicators.
Interactive presence is quite interesting, sometimes described as immersion development process: just look at this from the language dialog. We know that communication is the main way for human interaction, which is the exchange of information, the main way of learning, but because the language is too timely, documented functionality is so bad, along with gradually forming a Word. Due to the effectiveness of the language, a language often is done face to face. Of course, now that humans have a phone, but even after you have your phone, communication remains instantly. That is, the phone is actually the distance to expand human language, but does not change the language exchange rates property, so the development of telecommunications and the Internet is still essentially in the enjoyment of such dividends.
Of course, telecommunications and Internet technology are hard to do language Exchange level often undermines this immediacy. Technology is often called a simplex or duplex single interlocutor's language was "mutually exclusive", not duplication of sound and interrupted. Apparently, man-machine speech interaction to Siri and Echo of, is the single mode. Simplex mode does not provide face-to-face sense of fun and feeling, part on the lack of "dialogue" experience. Duplex mode is changing that, actually, but now human still has many gaps, these are technical difficulties to overcome.
Since the presence of voice interaction is currently unable to achieve human-level, naturally, want to answer time machine again this index takes an average time, and must be stable and reliable. This is crucial, how can you and a long time to come up with a machine dialogue? Should not, this will bring you down, even if it is not human, perhaps involving dignity. Clearly, the current phase of artificial intelligence cannot pursue human wisdom, which has too many physical and philosophical problems are not solved, don't cross the arguing about what "Singularity Theory", threaten humans or the long process of the machine, before considering these, perhaps you should first think about how to solve each of the key issues in the product. iWatch bumpframe
| Cultural differences between East and West but also restricted the development of domestic intelligent speech interaction technology
Intelligent speakers understanding at home and abroad, maybe more differences, but when people attempt to use the core of intelligent speakers--voice of the East when Intelligent Assistant, there is also a question of culture blocked, this may be the result of cultural differences between East and West. Understand this before we make a distinction between speech and language, speech (Speech) is a language (Language) signal, speech is human speech organs produce, carries some meaning, and the language of mankind's wisdom. Generally speaking, voice is born, baby babbling babbling speech, and even their cries also represents a certain meaning, and language is needed to learn and constantly evolved. Man-machine speech interaction interaction is actually the language, even if speech recognition 100%, for there is no specific meaning for understanding language, language is always personality, scenes and emotions.
Language is a product of culture, left culture has no language at all. Language is a social culture that reflected not only a socio-cultural patterns and linguistic structures also reflected the people's concept of value. Obviously, the language used in different countries are very diverse, due to the different nationalities living environment, the resulting natural different culture and language. There is no doubt that because of cultural differences between East and West, its expression there is a huge difference. This difference also determines the speed similar to the Echo of such intelligent speech Assistant.
Oriental culture always implicit, and directly in the West are different, we always like to beat around the means of expression. "Yes" does not say "Yes", but said "no". "No," without saying "no", but said "Yes", this Oriental boys in love will break down every now and then. In fact, the Oriental philosophy of doing things is the most important thing "to his mouth remained part of the sentence". This bitter voice Intelligent Assistant, a lot of times when people of the East face of the Echo when this type of smart speaker, according to our experimental observations, is really still needs to consider to export.
This is beyond the scope of any technology, compared to Westerners, Asians using intelligent speech products are facing a greater degree of psychological disorders. Machines smart enough different with Westerners, easterners to face this type of product is more subtle and awkward. Coupled with the current field of voice interaction and immediacy does not good enough, more Orientals this psychological barrier is increased. Which leads East to use Echo when this type of intelligent speech product, it's hard to say more than ten different expressions in a row.
In fact, the second element of cultural differences between East and West also reflects the voice smart assistants in different position between East and West. Anime in the West, such as transformers, in fact, is typical of the man-machine speech interaction, such as iron man, Star Wars, and more natural speech interaction and robots. Contrary the second element of the Oriental culture, such as Saint Seiya and final fantasy, still greater emphasis on interaction between everyone and expression. From this point of view, East compared to the West, on the whole, lot worse in man-machine dialogue and education in artificial intelligence, nature, mass-Intelligent Assistant for voice recognition in the East than the West.
Domestic intelligence speakers or say voice smart Assistant is still in early stages of market education, there is a thorny path to get to a full, even knowing that there are pits, might also drop a lot of pioneers, but steps must be taken to, prospects will be better. Trip process probably took a lot of sacrifice, to develop a wave of technical personnel, marketing personnel, accumulate a large number of hardcore users. This strategy has gone one step ahead in foreign countries, also don't imagine overtaking, how can there be so many corners, other people are not stupid, and take solid steps to get down to.
Domestic Internet development is also through, BAT is not the earliest pioneers in the field, but inspired by the pioneers continued to develop and grow. Well expect this trip takes how long, after all, is not predicting football results, but I believe it will process faster than Internet and mobile Internet. Lunatik iWatch Case