Skip to content

Human Computer Interaction and Future scope

Human Computer Interaction (HCI) as the name suggests, is related to humans and computers and the way, both interact with each other. In this post we review the major approaches to multi modal human computer interaction from a computer vision perspective. In  particular, we focus on core vision techniques (body, gesture, gaze) and effective interaction (facial expression recognition, and emotion in audio) which  are needed for Multimodal Human Computer Interaction (MMHCI) research. Since MMHCI  is a  very dynamic and broad research area we do not  intend to present a complete report. The main contribution of this  survey, therefore, is to consolidate some of the main issues and approaches, and to highlight some of the techniques and applications developed  recently within the context of MMHCI.We are also giving an idea about how HCI will turn out to be in the future. From advanced HCI techniques to improved HCI devices, we are expressing the way as to how HCI will have a revolutionary change in the future. The advanced techniques involve improved GUI related gestures, VDU fabrics and others along with related devices which can make the life of a person simpler while dealing with a computer.


Humancomputer  interaction  (HCI)  is  the study of interaction  between people  (users)  and  computers. It  is often regarded as the  intersection of computer   science, behavioural sciences, design and several other  fields of study.  Interaction between users and  computers  occurs  at  the user  interface  (or simply interface), which includes both software and hardware; for example, characters or objects displayed by software on a personal computer’s  monitor,  input received  from  users  via hardware peripherals  such  as  keyboards  and mice,  and other user  interactions with  large-scale  computerized  systems  such  as aircraft and power plants. Because human-computer  interaction  studies  a human  and  a machine  in  conjunction, it draws from supporting  knowledge  on both  the  machine  and  the  human side. On  the machine  side,  techniques  in computer graphics, operating systems, programming  languages, and development environments  are  relevant. On the human side, communication  theory,  graphic  and  industrial  design disciplines,  linguistics,  social  sciences,  cognitive psychology, and human  factors are  relevant. Attention  to  human-machine interaction  is  important,  because poorly  designed human- machine interfaces can lead to many unexpected problems.

A  Multimodal  Human  Computer  Interaction  (MMHCI) system is simply one that responds to inputs in more  than one  modality  or  communication channel (e.g., speech,  gesture,  writing,  and others). MMHCI  lies  at the crossroads  of  several  research  areas  including  computer vision, psychology, artificial intelligence, and many others. As computers  become  integrated  into  everyday  objects (ubiquitous and  pervasive  computing),  effective natural human-computer  interaction  becomes  critical:  in  many applications,  users  need  to  be  able  to  interact naturally with computers  the way  face-to- ace  human-human  interaction takes place. Human Computer Interaction  techniques must  be  in phase  with  the  fast  growing  technology.  Our  report  depicts some  of  the uses and  areas  of  interests  in  which  human interaction with computers can be implemented, which can be an answer to the question that can arise  in our mind which  is How will the Human Computer Interaction be in the future?‚ The aim of this  report is to reflect upon the changes afoot and outline a new  paradigm  for understanding our relationship with technology. A more extensive set of  lenses, tools  and  methods  is  needed  that puts human values center stage. And here, both positive and negative aspects need to be considered: on the one hand, people use technology to pursue  healthier and more enjoyable  lifestyles, expand  their creative skills  with digital  tools,  and instantly  gain  access  to information required whenever necessary.

II. MMHCI Implementation

Fig 1: MMHCI Implementation Model

As  depicted  in Figure-1, multimodal  techniques  can  be  used  to  construct  a  variety  of interfaces.  Of particular interest  for  our  goals  are  perceptual  and  attentive  interfaces. Perceptual interfaces as  defined  in,  are highly interactive, multimodal interfaces that enable rich, natural, and efficient  interaction  with  computers. Perceptual  interfaces seek  to  leverage  sensing  (input)  and  rendering  (output) technologies  in order to provide interactions not feasible with standard  interfaces  and  common  I/O  devices  such  as  the keyboard, the mouse and the monitor. Attentive interfaces, on  the other hand, are context-aware  interfaces  that  rely on a persons attention as the primary input the goal of these interfaces is  to  use  gathered  information  to  estimate  the best time and approach for communicating with the user.We  communicate  through  speech  and  use  body language (posture, gaze & hand motions)  to express emotion, mood,  attitude,  and  attention. A  multimodal  HCI  system  is simply one that responds  to inputs in more than one modality or communication channel (e.g., speech, gesture, writing, and others). Some of the MMHCI techniques are-

A. Core vision techniques

B.Affective computer interaction

A. Core Vision Techniques

We  classify  vision  techniques  for  MMHCI  using human-centered  approach  and  divide  them  according  to  how humans  may  interact  with  the  system:

(1)  large  scale  body movements, (2) gestures, and (3) gaze.

1) Large-Scale Body Movements

Tracking  of  large-scale  body  movements  (head,   arms,  torso,  and  legs)  is  necessary  to  interpret  pose  and motion  in  many  MMHCI  applications. Three  important  issues  in  articulated  motion  analysis:  representation (joint  angles  or  motion  of  all  the  sub-parts),  computational  paradigms, (deterministic  or  probabilistic),  and computation  reduction.  Body posture  analysis  is  important  in  many  MMHCI  applications.  A  stereo  and thermal infrared  video  system to estimate driver posture  for deployment of smart air  bags. A  novel method  is proposed  for  recovering  articulated  body pose without initialization and tracking (using  learning).  The pose and velocity vectors are used  to recognize body  parts and detect different activities, while temporal  templates are  also often  used.  Important  issues  for  large-scale  body  tracking include whether the approach uses 2D or 3D, desired accuracy,  speed, occlusion and other constraints. Some of the  issues  pertaining  to  gesture  recognition,  discussed next,  can  also apply to body tracking.

2)  Gesture Recognition

Psycholinguistic  studies  for  human-to-human  communication describe  gestures  as  the  critical  link  between  our conceptualizing capacities  and  our linguistic  abilities.  Humans  use  a  very  wide  variety  of  gestures  ranging from simple actions of using the hand to point at objects to the more  complex actions  that  express  feelings  and  allow communication with others. Gestures should therefore play an  essential  role  in  MMHCI.  A  major  motivation  for these  research  efforts  is the  potential  of  using  hand  gestures in   various applications aiming at natural interaction between  the  human and the computer-controlled interface.

There  are  several  important  issues that should be considered when  designing  a  gesture  recognition  system.  The first phase of  a  recognition  task is choosing a mathematical model that may consider both the spatial and the emporal characteristics  of  the  hand  and  hand  gestures. The  approach used  for modeling plays a crucial  role  in the nature and performance of gesture  interpretation. Once  the model  is  detected,  an  analysis  stage  is  required for computing the model parameters  from  the  features that  are  extracted  from  single  or multiple  input streams. These parameters represent  some description of the hand pose or trajectory and depend on  he modeling approach used. Most of the  gesture-based HCI  systems allow only  symbolic commands  based  on  hand  posture or 3D pointing. This is  due  to  the  complexity  associated with gesture  analysis  and  the  desire  to  build  real- time interfaces.

3)  Gaze Detection

Gaze, defined as  the direction  to which  the eyes  are  pointing  in space,  is a strong  indicator of attention, and  it has been studied extensively since as early as 1879 in psychology, and more  recently  in  neuroscience  and  in computing applications.  While early  eye  tracking  research  focused  only on systems for in-lab experiments, many commercial and  experimental systems are available  today  for a wide range of  applications. Eye tracking systems can be grouped  into wearable or  non-wearable,  and  infrared-based  or  appearance-based.  In infrared-based  systems,  a light  shining  on  the  subject whose  gaze is to be tracked creates a red-eye effect: the difference in reflection between  the  cornea  and  the  pupil  is  used  to determine the direction of sight. In appearance- based systems,   computer  vision  techniques  are  used  to  find  the  eyes  in  the  image and then determine their orientation.

The main issues in developing gaze  tracking systems   are intrusiveness, speed, robustness, and accuracy. The type of   hardware  and  algorithms  necessary, however,  depend  highly  on  the  level  of  analysis  desired.  Gaze  analysis can  be  performed  at  three  different  levels:  (a)  highly  detailed  low- level  micro-events, (b) low-level intentional events,  and  (c) coarse-level  goal-based  events. Micro-events  include micro- saccades,  jitter,  nystagmus,  and brief fixations,  which  are  studied for their physiological and psychological relevance by  vision  scientists  and psychologists.  Low-level intentional  events  are  the smallest  coherent  units  of movement  that  the  user  is  aware of  during  visual  activity,  which  include  sustained fixations and revisits. Although most of the work on  HCI has focused on coarse-level goal-based events,  it  is easy  to  foresee  the  importance  of  analysis  at  lower levels, particularly  to  infer  the  users  cognitive  state  in  affective  interfaces. Within  this  context,  an  important issue  often  overlooked is how to interpret eye tracking data.

B. Affective Human-computer Interaction

Affective  states are intricately  linked  to  other  functions such  as  attention,  perception,  memory, decision- making,  and  learning. This  suggests  that it  may be  beneficial for computers  to recognize  the user’s emotions and  other related cognitive states and expressions. The  techniques  used in this context are-

1. Facial expression recognition

2. Emotions in audio

1)Facial Expression Recognition

Expressions are classified into a predetermined set of   categories. Some methods follow a feature-based approach, where  one  tries  to  detect  and  track  specific  features  such  as  the corners of  the mouth, eyebrows, etc. Other methods use a region-based‚  approach  in  which  facial  motions  are   measured  in certain  regions  on the face  such  as  the  eye/eyebrow and the  mouth. In  addition,  we  can  distinguish two  types  of classification schemes: dynamic and  static.  Static  classifiers  (e.g., Bayesian Networks)  classify  each  frame  in  a  video  to  one of the  facial  expression  categories  based  on  the  results  of  a  particular  video  frame. Dynamic  classifiers  use several  video  frames  and  perform  classification by analyzing the temporal patterns of the regions  analyzed or features  extracted. They  are  very  sensitive  to  appearance  changes  in  the  facial  expressions  of  different individuals  so they  are  more  suited  for person-dependent  experiments. Static classifiers, on the other hand, are easier to train and in general need less training data but when used on a  continuous video  sequence  they  can  be unreliable  especially  for frames that are not at the peak of an expression.

2) Emotions in Audio

Researchers use mainly different methods to analyze  emotions. One  approach  is  to  classify  emotions  into  discrete categories such as joy, fear, love, surprise, sadness, etc., using  different modalities  as  audio  inputs  to  emotion recognition  models. The vocal aspect of a communicative message carries  various  kinds  of  information.  If we disregard  the manner  in  which  a  message  is  spoken  and  consider  only  the  textual  content,  we  are  likely  to miss  the  important  aspects  of  the  utterance  and  we  might  even  completely  misunderstand  the  meaning  of the  message.  Recent studies  seem  to  use  the  Ekman six basic emotions, although others in  the past have used many more categories. The reasons for using these basic  categories are often not justified  since  it  is not clear whether  there  exist  universal emotional  characteristics  in  the  voice for  these  six  categories. The  most surprising  issue  regarding  the multimodal  affect  recognition  problem  is  that  although recent advances in video and audio processing could  make  the  multimodal analysis  of  human affective  state  tractable,  there are only a  few research efforts that have  tried  to implement a multimodal affective analyzer. alarming  rate. What can  the HCI community do  to  intervene  and  help? How  can  it  build on what  it has  achieved?  In  this  part we map  out  some fundamental  changes  that we suggest  need  to  occur within  the  field.  Specifically, we  suggest  that  HCI  needs  to extend  its  methods  and  approaches  so  as  to  focus more clearly on human values. This will require a more sensitive  view  about  the  role,  function  and consequences  of  design, just as it will force HCI to be more inventive. HCI will  need to form new partnerships with other disciplines, too, and  for  this  to  happen  HCI  practitioners  will need  to  be  sympathetic to the tools and techniques of other trades.

A. GUIs to Gestures

In the last few years, new input techniques have been developed  that are  richer and less prone  to the many shortcomings  of  keyboard  and  mouse interaction.  For  example,  there  can be tablet  computers  that  use stylus-based  interaction  on  a  screen,  and  even  paper-based  systems  that  digitally capture markings made on specialized paper using a  camera  embedded  in  a  pen.  These  developments support  interaction  through sketching  and handwriting.  Speech- recognition  systems also support  a different kind of‚ natural interaction, allowing  people  to  issue  commands  and  dictate  through voice. Meanwhile, multi-touch  surfaces  enable interaction  with  the hands  and  the  fingertips  on  touch- sensitive surfaces, allowing us  to manipulate objects digitally  as if they were physical. From GUIs  to multi-touch,  speech  to  gesturing,  the  ways we  interact  with computers  are  diversifying  as  never  before (see  Fig  3). Two-handed  and  multi-fingered  input  is  providing  a more  natural  and  flexible  means  of  interaction  beyond the single point of contact offered by either the mouse  or stylus.  The  shift  to multiple  points  of  input  also  supports  novel  forms of  interaction  where  people  can  share  a single  interface by gathering  around  it and  interacting  together  (see  the‚ Reactable‚, below Fig 2).


Fig 2: The Reactable: a multitouch  interface  for playing music. Performers   can simultaneously interact with  it by moving and rotating physical objects  on its surface.

blankFig 3: The Hot Hand device: a  ring worn by electric guitar players  that uses motion sensors  and  a wireless  transmitter  to  create different kinds of  sound  effects by various hand gestures.

B. VDUs to Smart Fabrics

The  fixed  video  display  units  (VDUs)  of  the  1980s  are  being  superseded  by  a  whole  host  of  flexible  display  technologies  and  ‚smart  fabrics. Displays  are  being  built  in  all sizes,  from  the  tiny  to  the gigantic, and  soon will become  part of the fabric of our clothes and our buildings. By a decade or  so,  these advances  are  likely  to  have  revolutionized  the  form that computers will take.

Recent  advances  in Organic  Light  Emitting  Diodes  (OLEDs) (see  Fig4)  and  plastic  electronics  are  enabling  displays  to  be  made  much  more  cheaply, with  higher  resolution  and  lower  power  consumption,  some  without  requiring  a  backlight  to  function.  OLEDs  are  an  emissive  electroluminescent  layer made  from  a  film  of  organic  compounds,  enabling  a  matrix  of  pixels  to  emit  light  of  different colors. Plastic electronics also use organic materials  to create  very  thin  semi-conductive  transistors  that  can  be  embedded  in  all  sorts  of  materials,  from  paper  to  cloth,  enabling, for example, the paper in books or newspapers to be digitised. Electronic components and devices, such as Micro- Electro-Mechanical Systems (MEMS), are also being made at  and extremely small size, allowing for very small displays.

blankFig-4: Animated Textiles developed  by Studio at the Hexagram Institute, Montreal,  Canada.  These  two  jackets‚ synch up  when  the  wearers  hold hands, and the message scrolls from the back of one person to the other.

C. Hard Disks to Digital Footprints:

People are beginning to talk about their ever growing digital footprints. Part of the reason for this is that the limits of digital  storage  are no  longer  a pressing  issue.  It  is all around us,  costing  next  to  nothing,  from  ten-a-penny memory  sticks and  cards  to  vast  digital  Internet  data  banks  that  are  freely available  for  individuals  to  store their  photos,  videos,  emails and documents (See Fig 5).`The  decreasing  cost  and  increasing  capacity  of digital storage  also  goes  hand-in-hand  with  new  and  cheap methods  for  capturing,  creating  and  viewing  digital media. The  effect  on  our behaviour has been  quite dramatic:  people are  taking  thousands  of  pictures  rather than  hundreds  each year. They no longer keep them in shoeboxes or stick them in albums  but  keep  them  as  ever growing  digital  collections, often online.

blankFig 5: The Rovio robotic connected to the Internet. It roams around the home providing  an  audio  and  video  link  to keep  an  eye  on  family  or  pets when you are out.

D. Changing Lives

By  a  decade  or  so  more  people  than  ever  will  be  using  computing  devices  of  one  form  or  other,  be  they  a  retiree in Japan, a schoolchild in Italy or a farmer in India (see  Fig 6). At the same time, each generation will have its own set  of  demands.Technology will  continue  to  have  an  important  impact at all  stages of life.


Fig  6: A  boy using  a  digitally  augmented  probe  tool  that  shows  real-time measurements of light and moisture on an accompanying mobile device.

E. New Ways of Family Living

New technologies are proliferating that enable people  to  live  both  their  own  busy  social  and  working  life  while enabling  them  to  take  an  active  part  in their  family  life.  A number  of  computer  applications  have  been developed  to  enable  family members  to keep  an  eye  on one  another,  from  the  Family Locator  feature  on   the Disney  cell  phone  (which  allows parents to display the location of a child’s handset on a  map)  to  devices  that  can be  installed  on  cars  to  track  their  location and speed. In the next decade or two, we will witness  many changes in family life brought about by technology, but  also sparking new forms of digital tools. Such changes will of  course have a  larger impact on societal and ethical issues  that  is difficult to predict ( See Fig 7).

blankFig 7: Audiovoxs Digital Message Center is designed  to be attached  to the  refrigerator,  letting  families scribble digital notes  and  leave audio and video  messages for each other.

IV.   Conclusion

We have highlighted  major  vision  approaches  for  multimodal  human-computer  interaction.  We  discussed techniques for  large-scale  body movement, gesture  recognition, and  gaze  detection. We  discussed  facial expression  recognition,  emotion analysis and a variety of emerging applications. Another  important  issue  is  the affective aspect  of  communication  that  should  be  considered  when  designing  an  MMHCI  system.  Emotion modulates almost all modes of human communication facial  expression,  gestures,  posture,  tone  of  voice,  choice  of words,  respiration,  skin  temperature  and  clamminess, etc.

Emotions  can  significantly change  the message:  often  it  is  not   what was said that is most important, but how it was said. How  we  define  and  think  about our  relationships  with computers  is  radically changing. How we use them and  rely on  them is also being transformed. At  the same  time, we  are becoming hyper connected and our interactions are being   increasingly etched  into our digital landscapes. There  is more  scope than ever before to solve hard problems and  allow new  forms  and  creativity. We  have  begun  to  raise  the  issues  and  concerns  that these  transformations. Some will  be within  the  remit  of Human- Computer  Interaction  to  address  and  others will not.

Comments are closed.