Robo HiveRobo HiveRobo HiveRobo Hive
  • Home
  • RobotsIn this section you will find a wide range of popular Self Build & Ready Assembled Robotics products to suit everyone’s taste. Building a robot from a kit is the perfect foundation for any Robotics experience. In fact, when you build and assemble a robot kit, you learn about core STEM subjects, Science, Technology, Engineering and Mathematics. Robot kits are a lot of fun and make excellent educational gifts. For more advanced enthusiasts wanting to create their own robots from scratch, please visit our Robot Parts section.
  • Robot PartsDesigning a robot from scratch can be a very rewarding project, but can be a little overwhelming to the beginner robot builder. Chassis kits provide a good starting point to which you can add your choice of parts. Some platforms include wheels, motors and parts bundles. If you are creative, consider making your own base with the easy to use Sintra PVC boards. Perhaps design your own robot arm or a walking robot, using the large assortment of aluminium hardware brackets, rails and plates. Arm gripper kits and robot leg kits are available too. Our small electronic kits are ideal for beginners as an introduction to the techniques of soldering components and simple circuit boards. To power your robot, batteries are available in different types and sizes. Perhaps consider a solar powered robot? Solar panels are a great alternative and can be linked in series or parallel to increase voltage and current. Linking batteries in series increases voltage while linking in parallel increases the available current.
    • Batteries
    • Motors & Wheels
    • Sensors
    • Controllers
    • Radio
  • Robot AppsRobot Apps are used to control your robot and give your robots their intelligence. Some Apps allow you to generate pre-programmed movement sequences, others are used for remote control, and software development platforms are used to make more sophisticated autonomous control systems.
  • EducationActive Robots are embracing education. We have products to suit all abilities and needs, from Key Stage 1, Key Stage 2, Key Stage 3, Key Stage 4, through to University Level.Our aim is to encourage and develop technology in schools and cross curriculum learning. To promote the desire, to explore, stimulate the imagination and learn STEM subjects – Science, Technology, Engineering and Maths. Through a hands on engaging experience to encourage a whole range of interactive learning. To share ideas, investigation, observation, predicting, problem solving, collecting data and describing outcomes to understand basic principles. We have a wide selection of affordable educational products for class led, group or individual learning. From simple starter kits to advanced kits including links for added learning and optional extra activities, teachers notes, design briefs, extensive support material. A wide selection of individual parts, including Raspberry Pi and parts to enable you to build your own computer or parts to build your own robot, with USB Ports, smart phone compatible and WIFI , As well as high end Robots, including the Nao Robot and the Baxter Research Robot and Virtual Robots simulation, enabling students to learn programming.
  • Blog

Machine-learning system tackles speech and object recognition, all at once

    Home News Machine-learning system tackles speech and object recognition, all at once
    NextPrevious

    Machine-learning system tackles speech and object recognition, all at once

    By robonews | News | 0 comment | 18 September, 2018 | 0

    MIT computer scientists have developed a system that learns to identify objects within an image, based on a spoken description of the image. Given an image and an audio caption, the model will highlight in real-time the relevant regions of the image being described.

    Unlike current speech-recognition technologies, the model doesn’t require manual transcriptions and annotations of the examples it’s trained on. Instead, it learns words directly from recorded speech clips and objects in raw images, and associates them with one another.

    The model can currently recognize only several hundred different words and object types. But the researchers hope that one day their combined speech-object recognition technique could save countless hours of manual labor and open new doors in speech and image recognition.

    Speech-recognition systems such as Siri, for instance, require transcriptions of many thousands of hours of speech recordings. Using these data, the systems learn to map speech signals with specific words. Such an approach becomes especially problematic when, say, new terms enter our lexicon, and the systems must be retrained.

    “We wanted to do speech recognition in a way that’s more natural, leveraging additional signals and information that humans have the benefit of using, but that machine learning algorithms don’t typically have access to. We got the idea of training a model in a manner similar to walking a child through the world and narrating what you’re seeing,” says David Harwath, a researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Spoken Language Systems Group. Harwath co-authored a paper describing the model that was presented at the recent European Conference on Computer Vision.

    In the paper, the researchers demonstrate their model on an image of a young girl with blonde hair and blue eyes, wearing a blue dress, with a white lighthouse with a red roof in the background. The model learned to associate which pixels in the image corresponded with the words “girl,” “blonde hair,” “blue eyes,” “blue dress,” “white light house,” and “red roof.” When an audio caption was narrated, the model then highlighted each of those objects in the image as they were described.

    One promising application is learning translations between different languages, without need of a bilingual annotator. Of the estimated 7,000 languages spoken worldwide, only 100 or so have enough transcription data for speech recognition. Consider, however, a situation where two different-language speakers describe the same image. If the model learns speech signals from language A that correspond to objects in the image, and learns the signals in language B that correspond to those same objects, it could assume those two signals — and matching words — are translations of one another.

    “There’s potential there for a Babel Fish-type of mechanism,” Harwath says, referring to the fictitious living earpiece in the “Hitchhiker’s Guide to the Galaxy” novels that translates different languages to the wearer.

    The CSAIL co-authors are: graduate student Adria Recasens; visiting student Didac Suris; former researcher Galen Chuang; Antonio Torralba, a professor of electrical engineering and computer science who also heads the MIT-IBM Watson AI Lab; and Senior Research Scientist James Glass, who leads the Spoken Language Systems Group at CSAIL.

    Audio-visual associations

    This work expands on an earlier model developed by Harwath, Glass, and Torralba that correlates speech with groups of thematically related images. In the earlier research, they put images of scenes from a classification database on the crowdsourcing Mechanical Turk platform. They then had people describe the images as if they were narrating to a child, for about 10 seconds. They compiled more than 200,000 pairs of images and audio captions, in hundreds of different categories, such as beaches, shopping malls, city streets, and bedrooms.

    They then designed a model consisting of two separate convolutional neural networks (CNNs). One processes images, and one processes spectrograms, a visual representation of audio signals as they vary over time. The highest layer of the model computes outputs of the two networks and maps the speech patterns with image data.

    The researchers would, for instance, feed the model caption A and image A, which is correct. Then, they would feed it a random caption B with image A, which is an incorrect pairing. After comparing thousands of wrong captions with image A, the model learns the speech signals corresponding with image A, and associates those signals with words in the captions. As described in a 2016 study, the model learned, for instance, to pick out the signal corresponding to the word “water,” and to retrieve images with bodies of water.

    “But it didn’t provide a way to say, ‘This is exact point in time that somebody said a specific word that refers to that specific patch of pixels,’” Harwath says.

    Making a matchmap

    In the new paper, the researchers modified the model to associate specific words with specific patches of pixels. The researchers trained the model on the same database, but with a new total of 400,000 image-captions pairs. They held out 1,000 random pairs for testing.

    In training, the model is similarly given correct and incorrect images and captions. But this time, the image-analyzing CNN divides the image into a grid of cells consisting of patches of pixels. The audio-analyzing CNN divides the spectrogram into segments of, say, one second to capture a word or two.

    With the correct image and caption pair, the model matches the first cell of the grid to the first segment of audio, then matches that same cell with the second segment of audio, and so on, all the way through each grid cell and across all time segments. For each cell and audio segment, it provides a similarity score, depending on how closely the signal corresponds to the object.

    The challenge is that, during training, the model doesn’t have access to any true alignment information between the speech and the image. “The biggest contribution of the paper,” Harwath says, “is demonstrating that these cross-modal alignments can be inferred automatically by simply teaching the network which images and captions belong together and which pairs don’t.”

    The authors dub this automatic-learning association between a spoken caption’s waveform with the image pixels a “matchmap.” After training on thousands of image-caption pairs, the network narrows down those alignments to specific words representing specific objects in that matchmap.

    “It’s kind of like the Big Bang, where matter was really dispersed, but then coalesced into planets and stars,” Harwath says. “Predictions start dispersed everywhere but, as you go through training, they converge into an alignment that represents meaningful semantic groundings between spoken words and visual objects.”

    “It is exciting to see that neural methods are now also able to associate image elements with audio segments, without requiring text as an intermediary,” says Florian Metze, an associate research professor at the Language Technologies Institute at Carnegie Mellon University. “This is not human-like learning; it’s based entirely on correlations, without any feedback, but it might help us understand how shared representations might be formed from audio and visual cues. … [M]achine [language] translation is an application, but it could also be used in documentation of endangered languages (if the data requirements can be brought down). One could also think about speech recognition for non-mainstream use cases, such as people with disabilities and children.”



    Source: news.mit.edu

    Robotics

    NextPrevious

    Recent Posts

    • Video Friday: Final Goodbye to Opportunity Rover, and More
    • For Micro Robot Insects, Four Wings May Be Better Than Two
    • Robot Attempts to Navigate As Well As a Tiny Desert Ant
    • Learning preferences by looking at the world
    • Is the green new deal sustainable?
    • 4 Experts Respond to Trump's Executive Order on AI
    • From the Segregated South to Bell Labs to the AI Frontier
    • Robot Melts Its Bones to Change How It Walks
    • James E. West backstory
    • Video Friday: Robotic Gecko Gripper, and More

    Recent Posts

    • Video Friday: Final Goodbye to Opportunity Rover, and More February 15, 2019
    • For Micro Robot Insects, Four Wings May Be Better Than Two February 14, 2019
    • Robot Attempts to Navigate As Well As a Tiny Desert Ant February 13, 2019
    • Learning preferences by looking at the world February 12, 2019
    • Is the green new deal sustainable? February 12, 2019
    • 4 Experts Respond to Trump's Executive Order on AI February 12, 2019

    Featured Products

    • bioloid_b_web BIOLOID STEM Standard Kit $200.50 $240.60
    • stem_expansion_web BIOLOID STEM Expansion Kit $181.02
    • Skeleton-bot 4WD Hercules Mobile Robotic Platform $147.48 $122.90
    • ev3_box_web_3 LEGO® Mindstorms EV3 $239.99 $199.99

    Top Rated Products

    • Surveyor-Rover-MK1 Surveyor Rover MK1 $360.00 $300.00
    • MechaTE_Limited MechaTE Robot Hand Limited Edition $900.00
    • mecha_TE_gen2_shop Mecha TE GEN2 $1,200.00 $960.00
    • potentiometer Potentiometer (2-pack) $14.28 $11.90

    Follow us

    Quick Links

    • About us
    • Terms & Conditions
    • Returns Policy
    • Shop
    • Blog
    • Contact Us

    Disclaimer

    robohive.net is a Web Development Showcase and content on all pages is just a placeholder and it is not real. If you have any questions or you are interested in this site in any way, advertising opportunities or even our web development services feel free to Contact Us via this form.
    Copyright 2018 Robo Hive | This is a demo store for showcase purposes — no orders shall be fulfilled.
    • Home
    • Robots
    • Robot Parts
      • Batteries
      • Motors & Wheels
      • Sensors
      • Controllers
      • Radio
    • Robot Apps
    • Education
    • Blog
    Robo Hive